- Team Members
- Development Workflow
- Setup & Running
- Data Strategy & Model Training
- Frontend Application
- Myroslav Natalchenko
- Kiryl Sankouski
- Michał Zach
To ensure code stability and minimize merge conflicts, we will strictly follow a Fork & Branch workflow.
- Each team member must fork the main Emotify repository to their personal GitHub account
- Create a specific branch in your fork for your tasks
- Once task is complete, open a Pull Request (PR) from your fork's branch to the upstream repository's
mainbranch
- Python 3.10+
- Node.js 18+
# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtCreate a .env file in the project root with your Spotify credentials:
SPOTIFY_CLIENT_ID=your_client_id
SPOTIFY_CLIENT_SECRET=your_client_secret
SPOTIFY_REDIRECT_URI=http://127.0.0.1:5000/callback# Start the Flask server — http://127.0.0.1:5000
python app.pyOn first run the backend downloads the MERT model from HuggingFace (~375 MB).
cd frontend
npm install
npm run dev # http://localhost:3000To achieve accurate and scalable emotion recognition in music, Emotify adopts a feature-based, two-step pipeline:
- High-level audio representation extraction using a large pretrained music model
- Supervised training of a lightweight emotion classifier on extracted embeddings
This approach allows us to decouple heavy audio processing from model training, significantly reducing training cost and improving experimentation speed.
As a foundation for emotion modeling, we use the MTG-Jamendo Dataset, specifically the subset annotated with mood/theme tags.
To transform raw audio into meaningful numerical representations, we employ the pretrained MERT (Music Embedding Representation from Transformers) m-a-p/MERT-v1-95M from HuggingFace model.
Each track is converted into a fixed-size embedding tensor, which is stored as a .npy file.
Our emotion prediction model is trained directly on the extracted MERT embeddings, rather than raw audio or spectrograms.
This design provides:
- Faster training cycles
- Lower hardware requirements
- Strong generalization thanks to MERT pretraining
Datasets
- MTG-Jamendo Dataset (mood/theme subset)
Pretrained Models
Core Stack
- Python 3.10+, PyTorch, NumPy, librosa
- Hugging Face Transformers (
m-a-p/MERT-v1-95M) - Flask + Flask-SQLAlchemy (REST API + SQLite analysis cache)
- Next.js 16 / React 19, Tailwind CSS, Recharts (frontend)
The Emotify frontend is implemented as a modern web application using Next.js.