final project in bioinformatics
Navigate to the script's directory and run the command:
Linux/Mac:
cd src/generateTrainData
python3 orginize_df.py
python3 data_to_vec_save_to_h5py.pyWindows:
cd src\generateTrainData
python orginize_df.py
python data_to_vec_save_to_h5py.pyNavigate to the script's directory and run the command:
Linux/Mac:
cd src/LRtrainData
python3 LOO_trainLR.pyWindows:
cd src\LRtrainData
python LOO_trainLR.pyFirst, create and save the train/test split indices
Linux/Mac:
cd src/LRtrainData
python3 splitDataIndices.pyWindows:
cd src\LRtrainData
python splitDataIndices.pyThen, run the 50-50 training script: Linux/Mac:
cd src/LRtrainData
python3 50trainLR.pyWindows:
cd src\LRtrainData
python 50trainLR.pyyou must have a text file that contains a list of gRNA seperated with ',' and 1 or more models saved as pkl file. Linux/Mac:
cd src
python3 python Windows:
cd src
python .\inference.py path\to\guides_list.txt path\to\<output_file_name>.csv path\to\model1.pkl path\to\model2.pkl .....To run different indices and guides from data set,must give the wanted files to run in the 50trainLR.py script in the end of the script, and the names of the models. for example:
first_model_train = np.load('split_data/Train_guides_1_indices.npy') first_model_test = np.load('split_data/Test_guides_1_indices.npy') second_model_train = np.load('split_data/Train_guides_2_indices.npy') second_model_test = np.load('split_data/Test_guides_2_indices.npy')
run_training_50_50(1, "without_prob_guide_1", hdf5_file, first_model_train, first_model_test) run_training_50_50(2, "without_prob_guide_2", hdf5_file, second_model_train, second_model_test)
Requirements:
- numpy
- h5py
- sklearn
- torch
- pandas
- joblib