final_proj

final project in bioinformatics

How to Run

1. Generate Training Data

Navigate to the script's directory and run the command:

Linux/Mac:

cd src/generateTrainData
python3 orginize_df.py
python3 data_to_vec_save_to_h5py.py

Windows:

cd src\generateTrainData
python orginize_df.py
python data_to_vec_save_to_h5py.py

2. Leave-One-Out (LOO) Training

Navigate to the script's directory and run the command:

Linux/Mac:

cd src/LRtrainData
python3 LOO_trainLR.py

Windows:

cd src\LRtrainData
python LOO_trainLR.py

3. 50-50 Split Training

First, create and save the train/test split indices

Linux/Mac:

cd src/LRtrainData
python3 splitDataIndices.py

Windows:

cd src\LRtrainData
python splitDataIndices.py

Then, run the 50-50 training script: Linux/Mac:

cd src/LRtrainData
python3 50trainLR.py

Windows:

cd src\LRtrainData
python 50trainLR.py

4. Run inference (for test)

you must have a text file that contains a list of gRNA seperated with ',' and 1 or more models saved as pkl file. Linux/Mac:

cd src
python3 python

Windows:

cd src
python .\inference.py path\to\guides_list.txt path\to\<output_file_name>.csv path\to\model1.pkl path\to\model2.pkl .....

To run different indices and guides from data set,must give the wanted files to run in the 50trainLR.py script in the end of the script, and the names of the models. for example:

first_model_train = np.load('split_data/Train_guides_1_indices.npy') first_model_test = np.load('split_data/Test_guides_1_indices.npy') second_model_train = np.load('split_data/Train_guides_2_indices.npy') second_model_test = np.load('split_data/Test_guides_2_indices.npy')

train two models: switch between indices, one time as test and one time as train

run_training_50_50(1, "without_prob_guide_1", hdf5_file, first_model_train, first_model_test) run_training_50_50(2, "without_prob_guide_2", hdf5_file, second_model_train, second_model_test)

Requirements:

numpy
h5py
sklearn
torch
pandas
joblib

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
input_for_cassoff		input_for_cassoff
src		src
.gitignore		.gitignore
README.md		README.md
modelsScores.xlsx		modelsScores.xlsx
problems.txt		problems.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

final_proj

How to Run

1. Generate Training Data

2. Leave-One-Out (LOO) Training

3. 50-50 Split Training

4. Run inference (for test)

train two models: switch between indices, one time as test and one time as train

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

final_proj

How to Run

1. Generate Training Data

2. Leave-One-Out (LOO) Training

3. 50-50 Split Training

4. Run inference (for test)

train two models: switch between indices, one time as test and one time as train

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages