Skip to content

AdelAdool/news-topic-modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

📰 BBC News Topic Modeling

Discover hidden topics and themes from BBC News articles using Unsupervised NLP techniques: LDA and NMF. Explore insights, visualize topics, and compare model performance! 🚀


📂 Dataset

We use the BBC News Dataset (Kaggle).
The dataset contains news articles categorized into topics such as sports, politics, business, technology, and entertainment.


🛠️ Tools & Libraries

  • Python 🐍
  • Gensim
  • Scikit-learn
  • NLTK / spaCy
  • pyLDAvis
  • WordCloud

📝 Preprocessing

  • Tokenization
  • Lowercasing
  • Stopword removal

📊 Topic Modeling

We applied two unsupervised NLP techniques:

  1. Latent Dirichlet Allocation (LDA)
  2. Non-negative Matrix Factorization (NMF)

Both models extract dominant topics and display the most significant words per topic.


🖼️ Results / Visualizations

LDA Visualization

The interactive LDA visualization is saved as: lda_visualization.html

Open it in a browser to explore topic-term relationships. 🌐

WordClouds

LDA Topics: LDA Topic 0 LDA Topic 1 LDA Topic 2 LDA Topic 3 LDA Topic 4

NMF Topics: NMF Topic 0 NMF Topic 1 NMF Topic 2 NMF Topic 3 NMF Topic 4

Each image visually represents the most significant words for the topic. 🌟


📈 Model Performance

  • LDA Coherence Score: 0.3530
  • NMF Reconstruction Error: 45.8986

Compare model performance to understand which approach captures topics more clearly. 🔍


🚀 Usage

  1. Clone the repo:
git clone https://github.com/your-username/bbc-news-topic-modeling.git
cd bbc-news-topic-modeling
  1. Install required packages

  2. Run the notebook or Python script:

python topic_modeling.py

WordClouds and LDA visualizations are saved automatically. ✅


⚖️ License

This project is licensed under the MIT License. 📝


Made with ❤️ and ☕ by Adel

About

Topic Modeling on BBC News articles using LDA and NMF. Preprocessing includes tokenization, lowercasing, and stopword removal. Visualize topics with pyLDAvis and WordClouds. Compare LDA vs NMF performance for unsupervised NLP insights.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages