Skip to content

zxguocsu/GCMSFormer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GCMSFormer

This is the code repo for GCMSFormer mehtod. We proposed the GCMSFormer for resolving the overlapped peaks in complex GC-MS data based on a Transformer model. The GCMSFormer model was trained, validated, and tested with 100,000 augmented simulated overlapped peaks in a ratio of 8:1:1, and its bilingual evaluation understudy (BLEU) on the test set was 0.9988. With the aid of the orthogonal projection resolution method (OPR), GCMSFormer can predict the pure mass spectra of all components in overlapped peaks (mass spectral matrix S), and then use the least squares method to find the concentration distribution matrix C. The automatic resolution of the overlapped peaks can be easily achieved.

Package required:

We recommend to use conda.

By using the environment.yml file, it will install all the required packages.

git clone https://github.com/zxguocsu/GCMSFormer.git
cd GCMSFormer
conda env create -f environment.yml
conda activate GCMSFormer

Data augmentation

The overlapped peak dataset for training, validating and testing the GCMSFormer model is obtained using the gen_datasets functions.

TRAIN, VALID, TEST, tgt_vacob = gen_datasets(para)

Optionnal args

  • para : Data augmentation parameters

Model training

Train the model based on your own training dataset with train_model function.

model, Loss = train_model(para, TRAIN, VALID, tgt_vacob)

Optionnal args

  • para : Hyperparameters for model training
  • TRAIN : Training set
  • VALID : Validation set
  • tgt_vacob : Library

Resolution

Automatic Resolution of GC-MS data files by using the Resolution function.

Resolution(path, filename, model, tgt_vacob, device)

Optionnal args

  • path : GC-MS data path
  • filename : GC-MS data filename
  • model : GCMSFormer model
  • tgt_vacob : Library
  • device : Data distribution devices (cuda/cpu)

Clone the repository and run it directly

git clone

An example has been provided in test.ipynb script for the convenience of users. The GC-MS file used in it is available in the file Essential Oil Data.

Contact

About

A fully automatic method based on Transformer for resolution of overlapping peaks in gas chromatography-mass spectrometry

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors