NLP Final Project — French to English Machine Translation

A complete machine translation pipeline comparing LSTM Seq2Seq vs Transformer (MarianMT) on the OPUS-100 French-English dataset.

Project Overview

	Details
Task	Machine Translation (FR → EN)
Dataset	OPUS-100 (fr-en) — HuggingFace
Model 1	LSTM Seq2Seq (trained from scratch)
Model 2	Transformer — Helsinki-NLP/opus-mt-fr-en (fine-tuned)

Results

Metric	LSTM	Transformer
BLEU Score	0.23	39.38
Test Loss	5.97	1.54
Best Epoch	6/10	3/3
Overfitting	Yes	No

Repository Structure

NLP-Machine-Translation/ ├── Notebook1_EDA_Preprocessing.ipynb ├── Notebook2_LSTM.ipynb ├── Notebook3_Transformer.ipynb ├── Notebook4_Comparison_Demo.ipynb ├── requirements.txt └── README.md

How to Run

Clone the repository git clone https://github.com/medattia/NLP-Machine-Translation.git cd NLP-Machine-Translation
Install dependencies pip install -r requirements.txt
Run notebooks in order Notebook 1 → Notebook 2 → Notebook 3 → Notebook 4

Note: Notebooks were built and trained on Google Colab with T4 GPU. A Google Drive mount is required to save/load data and models between notebooks.

Architecture

LSTM Seq2Seq

2-layer Encoder LSTM + 2-layer Decoder LSTM
Embedding dim: 256 | Hidden dim: 512
27.8M trainable parameters
Word-level tokenization (20k vocabulary)

Transformer (MarianMT)

Pretrained Helsinki-NLP/opus-mt-fr-en
Fine-tuned for 3 epochs on OPUS-100 subset
74M parameters
SentencePiece BPE tokenization (59k subwords)

Dataset

Source: OPUS-100 (fr-en) via HuggingFace
Training pairs: 75,218 (after filtering)
Validation pairs: 1,650
Test pairs: 1,638
Max sentence length: 50 words

Demo

The project includes an interactive Gradio demo (Notebook 4) where you can type any French sentence and see both models translate it simultaneously.

Author

Muhammed Attia GitHub: https://github.com/medattia

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
NLP_MT_Project_Notebook1_EDA_&_Data_Preprocessing.ipynb		NLP_MT_Project_Notebook1_EDA_&_Data_Preprocessing.ipynb
NLP_MT_Project_Notebook2_LSTM.ipynb		NLP_MT_Project_Notebook2_LSTM.ipynb
NLP_MT_Project_Notebook3_Transformer.ipynb		NLP_MT_Project_Notebook3_Transformer.ipynb
NLP_MT_Project_Notebook4_Comparison_Demo.ipynb		NLP_MT_Project_Notebook4_Comparison_Demo.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Final Project — French to English Machine Translation

Project Overview

Results

Repository Structure

How to Run

Architecture

LSTM Seq2Seq

Transformer (MarianMT)

Dataset

Demo

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NLP Final Project — French to English Machine Translation

Project Overview

Results

Repository Structure

How to Run

Architecture

LSTM Seq2Seq

Transformer (MarianMT)

Dataset

Demo

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages