Natural Language Processing for Multi-Text Classification

Overview

This study investigates the effectiveness of various machine learning models for multi-class text classification of Urdu news articles from renowned Pakistani media organizations such as ARY, Geo, Jang, Express and Dunya News.

Machine Learning Models

After scraping 1500 articles from the websites of these media outlets, models such as Multinomial Naïve Bayes (MNB), Neural Networks, Logistic Regression, and Random Forest were evaluated for their ability to classify Urdu content into distinct categories.

Accuracies

MNB: 96.3% on internal test data and 98% on third-party test data.

Neural Networks: 95.6% on internal test data.

Logistic Regression: 94.6% on internal test data.

Random Forest: 84.2% on internal test data.

File Details:

Scraping_NewsArticles: Webscraping code for specified media outlets.

Cleaning + EDA: Data cleaning, preprocessing and EDA.

Model1_MNB: Implementation of Multinomial Naïve Bayes.

Model2_NN: Implementation of Neural Network.

Model3_LogisticRegression: Implementation of Logistic Regression.

Model4_RandomForest: Implementation of Random Forest Classifier.

Research Paper: Comprehensive details of our study.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Cleaning + EDA.ipynb		Cleaning + EDA.ipynb
Model1_MNB.ipynb		Model1_MNB.ipynb
Model2_NN.ipynb		Model2_NN.ipynb
Model3_LogisticRegression.ipynb		Model3_LogisticRegression.ipynb
Model4_RandomForest.ipynb		Model4_RandomForest.ipynb
README.md		README.md
Research Paper.pdf		Research Paper.pdf
Scraping_NewsArticles.ipynb		Scraping_NewsArticles.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Natural Language Processing for Multi-Text Classification

Overview

Machine Learning Models

Accuracies

File Details:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Natural Language Processing for Multi-Text Classification

Overview

Machine Learning Models

Accuracies

File Details:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages