TrackBack is a custom Deep Learning pipeline built in PyTorch that maps text queries to their corresponding source documents. It uses a hybrid architecture, leveraging Transfer Learning for feature extraction and a custom neural network for classification.
Currently, the frontend is themed around an "ABBA Lyric Finder," but the backend is completely generalized and can be trained on any text dataset (legal documents, medical records, FAQs, etc.).
This project implements a multi-stage Machine Learning pipeline:
- Feature Extraction: Uses the Hugging Face
all-MiniLM-L6-v2SentenceTransformer to convert text into 384-dimensional dense semantic embeddings. - Custom Classifier: A PyTorch Feed-Forward Neural Network (FFNN) containing:
- Input layer with Gaussian Noise (0.05) for extreme regularization during training.
- Hidden Layer (128 neurons) with Batch Normalization and ReLU activation.
- Dropout Layer (p=0.5) to prevent overfitting.
- Output Layer dynamically sized to the number of classes.
- Optimization: Trained using the Adam optimizer with Weight Decay (1e-3) and a
ReduceLROnPlateauLearning Rate Scheduler. - Validation: Implements an 80/20 train/test split, tracks Top-5 Accuracy, and features Early Stopping to automatically save the weights of the best-performing epoch.
TrackBack/
│
├── scrapper/
│ ├── songs.txt # List of songs to scrape from Genius Lyrics
│ └── scraper.py # Script to scrape the data
├── data/ # Directory for storing raw .txt files (one file per song)
├── trackback_weights.pth # Saved model weights (Generated after training)
├── learning_curve.png # Plotted loss metrics (Generated after training)
├── genius_api.env # Environment file for Genius API credentials (Git ignored)
├── example_genius_api.env # Example template for setting up API credentials
├── requirements.txt # Python package dependencies
├── .gitignore # Ignored files configuration
├── README.md # Project documentation
│
└── src/
├── data_prep.py # Data loading, cleaning, tokenization, and splitting
├── model.py # PyTorch architecture and training loop definition
├── train.py # Offline execution script for training and plotting
├── UiConfig.py # UI configuration, styling, and YouTube links mapping
└── app.py # Streamlit web application for deployment
Clone the repository and install the required dependencies using pip:
pip install -r requirements.txtIf you wish to augment the dataset, you must configure the Genius Lyrics API:
- Copy
example_genius_api.envto a new file namedgenius_api.env. - Add your Genius API credentials (
GENIUS_CLIENT_ID,GENIUS_CLIENT_SECRET,GENIUS_ACCESS_TOKEN). - Add any desired songs to
scrapper/songs.txt. - Run the scraper:
python scrapper/scraper.py
Note
Artist Filter: By default, scrapper/scraper.py contains a strict condition that only downloads and saves lyrics explicitly matched to the artist "ABBA". If you are training TrackBack on a different artist or a generic dataset, open scrapper/scraper.py and adjust or remove the "ABBA" condition to suit your preference!
Execute the offline training pipeline. This will process the data, extract embeddings, train the neural network, and generate the learning curve plot:
python src/train.pyLaunch the interactive Streamlit dashboard:
streamlit run src/app.py