Questions about data preprocessing

Looking at run_detect_segment, we can guess that it requires an annotation file for the video, and that file consists of a start time, an end time, and a text prompt. (The text prompt is not used in the code.)

I wonder if these annotations are created manually, or if they can be created automatically.

Also, when extracting features through the CLIP encoder in the run_clip_filtering file, what text input is required?

Finally, when will the pre-training dataset be released?

Thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about data preprocessing #52

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Questions about data preprocessing #52

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions