ScrapeSmart ✨

ScrapeSmart is an intelligent web scraper and site analysis tool built with modern web technologies. It allows you to enter any URL, dynamically analyzes the underlying technology stack, and visualizes the extractable entities before providing a seamless way to scrape and download the data.

Features

Technology Stack Detection: Intelligently detects frontend frameworks (React, Next.js, Vue, Angular, WordPress), backend technologies, server software, and CDNs (Cloudflare) using headers and DOM markers.
Pre-Scrape Entity Discovery: Scans the target URL and provides a statistical breakdown of available entities (Links, Images, Headings, Paragraphs, Tables, and Metadata) before you commit to scraping.
Smart Extraction: Extracts rich, structured data using cheerio for lightning-fast parsing.
Interactive UI: Built with Next.js App Router, Tailwind CSS, and Framer Motion for a stunning, glassmorphism-inspired dark mode experience.
Export Capabilities: Instantly copy scraped JSON to your clipboard or download it directly as a .json file for further data processing.

Tech Stack

Framework: Next.js 14+ (App Router)
Language: TypeScript
Styling: Tailwind CSS (v4)
Animations: Framer Motion
Icons: Lucide React
HTTP Client: Axios
HTML Parsing: Cheerio

Getting Started

Prerequisites

Make sure you have Node.js installed.

Installation

Navigate to the project directory:
```
cd scraper-app
```
Install dependencies:
```
npm install
```
Run the development server:
```
npm run dev
```
Open http://localhost:3000 in your browser to see the result.

Usage

Open the application in your browser.
Enter a valid URL (e.g., https://example.com) in the search bar and click Analyze.
View the discovered Technology Stack and the tally of Extractable Entities.
If satisfied with the discovery, click Initiate Scrape.
Browse the scraped data through the interactive tabs (Links, Images, Headings, Metadata).
Click Download Full JSON to save the structured data to your local machine.

Limitations

Client-Side Rendering: Currently, ScrapeSmart uses cheerio which does not execute JavaScript. If a website heavily relies on client-side rendering (like heavily obfuscated Single Page Applications without SSR), some dynamic content may not be detected.
Anti-Bot Protection: Some websites actively block automated scrapers. ScrapeSmart attempts to bypass basic blocks using standard User-Agents, but sophisticated protections (like strict Cloudflare challenges or CAPTCHAs) may prevent analysis or scraping.

License

This project is created for educational and utility purposes. Please respect the robots.txt and Terms of Service of the websites you scrape.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src/app		src/app
.gitignore		.gitignore
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScrapeSmart ✨

Features

Tech Stack

Getting Started

Prerequisites

Installation

Usage

Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ScrapeSmart ✨

Features

Tech Stack

Getting Started

Prerequisites

Installation

Usage

Limitations

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages