This project automates news scraping, summarization, SEO optimization, and publishing to a WordPress site. It consists of multiple modules for handling different functionalities, from web scraping to content optimization and automated publishing.
- Scrapes news articles from NDTV (Delhi News).
- Extracts and summarizes articles using an NLP model.
- Enhances content with SEO optimization.
- Stores processed articles in MongoDB.
- Publishes articles to a WordPress blog automatically.
- Runs on a scheduled interval.
- Schedules and runs the entire process every 30 minutes.
- Calls
process_articles()for scraping and summarizing. - Calls
publish_articles()to publish content to WordPress.
Dependencies:
schedule, nltk, scraper.py, publisher.py, seo_optimizer.py
- Fetches NDTV Delhi news articles.
- Extracts links and processes articles using
newspaper3k. - Summarizes content using
summarizer.py. - Optimizes content for SEO with
seo_optimizer.py. - Stores articles in MongoDB if not previously stored.
Dependencies:
requests, BeautifulSoup, newspaper3k, pymongo, summarizer.py, seo_optimizer.py
- Fetches unpublished articles from MongoDB.
- Publishes content to WordPress using
wordpress_xmlrpc. - Updates MongoDB to mark articles as published.
Dependencies:
pymongo, wordpress_xmlrpc
- Runs
publish_articles()at set intervals. - Stops execution after a predefined runtime.
Dependencies:
schedule, time, publisher.py
- Extracts keywords using TF-IDF.
- Generates SEO-friendly titles and meta descriptions.
- Improves readability by simplifying text.
Dependencies:
nltk, TextBlob, sklearn.feature_extraction.text.TfidfVectorizer
- Uses a pre-trained BART model (
facebook/bart-large-cnn). - Produces concise and informative summaries.
Dependencies:
transformers, torch
pip install requests beautifulsoup4 pymongo newspaper3k wordpress-xmlrpc nltk transformers torch scheduleimport nltk
nltk.download("punkt")
nltk.download("stopwords")- Update MongoDB and WordPress credentials in
publisher.py.
python app.pypython scraper.pypython publisher.py- Ensure MongoDB is running and accessible.
- Update WordPress credentials before running.
- Modify
scheduler.pyfor different scheduling intervals.
๐ Live Blog

