Please check the Competition Website for the set of Rules and Instructions on how to submit your exam.
-
`All submissions must follow the file and folder structure below:
-
main.py- The script must accept the following command-line arguments:
python main.py --test_path <path_to_test.json.gz> --train_path <optional_path_to_train.json.gz>
- Behavior:
- If
--train_pathis provided, the script must train the model using the specifiedtrain.json.gzfile. - If
--train_pathis not provided, the script should only generate predictions using the pre-trained model checkpoints provided. - The output must be a CSV file named as:
Here,
testset_<foldername>.csv<foldername>corresponds to the dataset folder name (e.g.,A,B,C, orD). - Ensure the correct mapping between test and training datasets:
- Example: If
test.json.gzis located in./datasets/A/, the script must use the pre-trained model that was trained on./datasets/A/train.json.gz.
- Example: If
- If
- The script must accept the following command-line arguments:
-
Folder and File Naming Conventions
-
checkpoints/: Directory containing trained model checkpoints. Use filenames such as:model_<foldername>_epoch_<number>.pthExample:
model_A_epoch_10.pthSave at least 5 checkpoints for each model.
-
source/: Directory for all implemented code (e.g., models, loss functions, data loaders). -
submission/: Folder containing the predicted CSV files for the four test sets:testset_A.csv, testset_B.csv, testset_C.csv, testset_D.csv -
logs/: Log files for each training dataset. Include logs of accuracy and loss recorded every 10 epochs. -
requirements.txt: A file listing all dependencies and the Python version. Example:python==3.8.5 torch==1.10.0 numpy==1.21.0 -
README.md: A clear and concise description of the solution, including:- Image teaser explaning the procedure
- Overview of the method
-
-
Ensure that your solution is fully reproducible. Include any random seeds or initialization details used to ensure consistent results (e.g.,
torch.manual_seed()ornp.random.seed()) and If using a pre-trained model, include the instructions for downloading or specifying the model path. -
Submission Limits:
- Teams or individuals can submit up to 6 submissions per day.
- Multiple submissions are allowed, but only the best-performing model will count toward the leaderboard.
-
Note: Use
zipthefolder.pyto create submission.gz from submission folder for submission to hugging face.
The dataset used in this competition is a subset of the publicly available Protein-Protein Association (PPA) dataset. We have selected 30% of the original dataset, focusing on 6 classes out of the 37 available in the full dataset. For more information about the PPA dataset, including its source and detailed description, please visit the Hugging Face competition space.
This code serves as an example of how to load a dataset and utilize it effectively for training and testing a GNN model:
- The data set can be download from https://drive.google.com/drive/folders/1Z-1JkPJ6q4C6jX4brvq1VRbJH5RPUCAk?usp=drive_link
- The
mainfile contains the implementation of the GNN model. - It uses the traindataset located in one of the data folders (A, B, C, or D) based on the
path_trainargument. - The GNN model is trained on the specified traindataset from the folder corresponding to the
path_trainargument. - After training, the code generates a CSV file for the test dataset, named based on the
test_pathargument. - For example, if
test_pathpoints to folder B, the output file will be namedtestset_B.csv. - If only the
test_pathargument is provided , the code should generate the respective test dataset’s CSV file using the pre-trained model.( This functionality is for you to implement).