This project implements sequence-to-sequence encoder-decoder architectures for Grapheme-to-Phoneme (G2P) conversion using PyTorch.
The project compares three sequence modelling approaches:
- Bottleneck Encoder-Decoder
- Fixed Context Vector Decoder
- Attention-Based Decoder
The models are trained on the CMU Pronouncing Dictionary dataset (CMUdict).
- Sequence-to-Sequence Learning
- LSTMs from Scratch
- Encoder-Decoder Architectures
- Attention Mechanisms
- Cross-Attention
- Speech and Language Processing
- Grapheme-to-Phoneme Conversion
- Custom LSTM implementation (without nn.LSTM)
- Dot-product attention implementation
- Greedy decoding
- Hyperparameter tuning
- Phoneme Error Rate (PER) evaluation
- Attention heatmap visualisation
- Python
- PyTorch
- NumPy
- Matplotlib
- Pandas
The attention-based encoder-decoder achieved the best performance by overcoming the hidden-state bottleneck problem.
Key findings:
- Attention improved convergence speed
- Lower Phoneme Error Rate (PER)
- Better handling of longer words
- Improved sequence alignment
notebooks/→ Main notebookreport/→ final report
Kamva