Skip to content

140GB is needed on hard disk instead of 2.62GB for downloading a dataset for puzzletron algorithm #1658

@danielkorzekwa

Description

@danielkorzekwa

see: https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/puzzletron/README.md#compress-the-model (v.0.44.0)

Nemotron-Post-Training-Dataset-v2 dataset is first downloaded to hf_home requesting 136GB:

.../experiments/6_5_qwen_35_moments_lab$ du -ms ./hf_home
136234  ./hf_home

then the final data set is created in a separate folder requesting 2.6GB

could you please:

  • clarify it in docs
  • ideally only require to download 2.6GB instead of the excessive 136GB dataset

thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions