Skip to content

NightFuryAnalytics/multi-text-classification

Repository files navigation

Natural Language Processing for Multi-Text Classification

Overview

This study investigates the effectiveness of various machine learning models for multi-class text classification of Urdu news articles from renowned Pakistani media organizations such as ARY, Geo, Jang, Express and Dunya News.

Machine Learning Models

After scraping 1500 articles from the websites of these media outlets, models such as Multinomial Naïve Bayes (MNB), Neural Networks, Logistic Regression, and Random Forest were evaluated for their ability to classify Urdu content into distinct categories.

Accuracies

MNB: 96.3% on internal test data and 98% on third-party test data.

Neural Networks: 95.6% on internal test data.

Logistic Regression: 94.6% on internal test data.

Random Forest: 84.2% on internal test data.

File Details:

Scraping_NewsArticles: Webscraping code for specified media outlets.

Cleaning + EDA: Data cleaning, preprocessing and EDA.

Model1_MNB: Implementation of Multinomial Naïve Bayes.

Model2_NN: Implementation of Neural Network.

Model3_LogisticRegression: Implementation of Logistic Regression.

Model4_RandomForest: Implementation of Random Forest Classifier.

Research Paper: Comprehensive details of our study.

About

This study investigates effectiveness of various machine learning models for multi-class text classification of 1500+ Urdu news articles.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors