Skip to content

Laviesha/Random-Forest-Rangers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

11 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Autonomous News Summarizer AI Agent

WhatsApp Image 2025-02-24 at 21 39 45_98c35896

WhatsApp Image 2025-02-24 at 21 42 30_b6ebd4f7

Overview

This project automates news scraping, summarization, SEO optimization, and publishing to a WordPress site. It consists of multiple modules for handling different functionalities, from web scraping to content optimization and automated publishing.

Features

  • Scrapes news articles from NDTV (Delhi News).
  • Extracts and summarizes articles using an NLP model.
  • Enhances content with SEO optimization.
  • Stores processed articles in MongoDB.
  • Publishes articles to a WordPress blog automatically.
  • Runs on a scheduled interval.

Modules

1. app.py (Main Application)

  • Schedules and runs the entire process every 30 minutes.
  • Calls process_articles() for scraping and summarizing.
  • Calls publish_articles() to publish content to WordPress.

Dependencies:
schedule, nltk, scraper.py, publisher.py, seo_optimizer.py

2. scraper.py (News Scraper)

  • Fetches NDTV Delhi news articles.
  • Extracts links and processes articles using newspaper3k.
  • Summarizes content using summarizer.py.
  • Optimizes content for SEO with seo_optimizer.py.
  • Stores articles in MongoDB if not previously stored.

Dependencies:
requests, BeautifulSoup, newspaper3k, pymongo, summarizer.py, seo_optimizer.py

3. publisher.py (WordPress Publisher)

  • Fetches unpublished articles from MongoDB.
  • Publishes content to WordPress using wordpress_xmlrpc.
  • Updates MongoDB to mark articles as published.

Dependencies:
pymongo, wordpress_xmlrpc

4. scheduler.py (Task Scheduler)

  • Runs publish_articles() at set intervals.
  • Stops execution after a predefined runtime.

Dependencies:
schedule, time, publisher.py

5. seo_optimizer.py (SEO Enhancer)

  • Extracts keywords using TF-IDF.
  • Generates SEO-friendly titles and meta descriptions.
  • Improves readability by simplifying text.

Dependencies:
nltk, TextBlob, sklearn.feature_extraction.text.TfidfVectorizer

6. summarizer.py (Text Summarization)

  • Uses a pre-trained BART model (facebook/bart-large-cnn).
  • Produces concise and informative summaries.

Dependencies:
transformers, torch

Installation

Step 1: Install Dependencies

pip install requests beautifulsoup4 pymongo newspaper3k wordpress-xmlrpc nltk transformers torch schedule

Step 2: Download NLTK Resources

import nltk
nltk.download("punkt")
nltk.download("stopwords")

Step 3: Configure Credentials

  • Update MongoDB and WordPress credentials in publisher.py.

Usage

Run the Main Application

python app.py

Run Scraper Separately

python scraper.py

Run Publisher Separately

python publisher.py

Notes

  • Ensure MongoDB is running and accessible.
  • Update WordPress credentials before running.
  • Modify scheduler.py for different scheduling intervals.

Team Members

Website

๐Ÿ”— Live Blog

About

AI-powered Autonomous News Scraper & Publisher ๐Ÿ”น Web Scraping | NLP Summarization | SEO Optimization | Automated Publishing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages