Skip to content

flashify_repo: hardcoded get_param_tmp causes cross-model contamination on repeated calls #9

@fm1320

Description

@fm1320

Bug

get_param() at transformer_tricks.py:57-62 downloads safetensors into a hard-coded relative path ./get_param_tmp/, and never cleans it up. When flashify_repo is called more than once from the same working directory, safetensors files from previous runs accumulate in that dir, and the subsequent glob('./get_param_tmp/*.safetensors') + load_file() loop merges all of them into a single param dict. The resulting flashified checkpoint contains a mix of tensors from multiple source models.

Repro

import transformer_tricks as tt
tt.flashify_repo('Qwen/Qwen3-1.7B',     dir='a', strict=True)  # leaves Qwen shards in ./get_param_tmp
tt.flashify_repo('meta-llama/Llama-3.2-1B', dir='b', strict=True)  # merges Qwen shards + Llama weights -> broken checkpoint in b/

The second call produces a b/model.safetensors whose keys/shapes don't match either model consistently (confirmed by inspecting with safetensors.safe_open).

Impact

Silent — no error is raised; flashify_repo reports success and uploads a corrupted checkpoint. Hit during a batch upload of 6 FlashNorm variants; two of the uploads (open-machine/Llama-3.2-1B-FlashNorm, open-machine/Gemma-3-1B-FlashNorm) had to be re-run after catching the contamination via safetensors inspection.

Suggested fix

Use a unique tempdir per call (and clean up on exit) in get_param():

import tempfile
with tempfile.TemporaryDirectory(prefix='flashify_') as dir:
    snapshot_download(repo_id=repo, allow_patterns='*.safetensors',
                      local_dir=dir)
    files = glob.glob(dir + '/*.safetensors')
    ...

Affects only get_param(); no API changes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions