Webcrawl

Extracting contents

This project is about extracting contents from the commnunity sites and bring the contents into our system and stored them. The full project is done in python using Flask framework for creating server and Beautiful Soup module in python for scraping contents from the web.

Many other modules are used they are:

request - To establish connection to a website.
pymysql - to perform database operation.
multiprocessing - To enable multi-threading.
Flask
BeautifulSoup

The flow of the project is as follows:

A user interface is provided to fire an query or question by the user, when query is fired the request goes to localserver created using flask framework. The query is extracted and it is first google searched to get related links to query where we can find answer for the query. The links of Quora and Wikipedia is selected for extracting the contents. (search.py) Firstly the contents from Quora links obtained from google search results is extratced using self desgined crawler and then from Wikipeda if available. To execute the process of extracting contents faster,the program extracts the contents from different links simultaneously using multi threading where each thread handles each link. (To reduce waiting period.) After extracting contents, the contents are stored in database in an MYSQL Database where questions and answers are stored. Finally, the extracted contents are displayed to user .

How to setup the project?

Directly download the zip folder and import it to Pycharm IDE

Install MYSQL in your computer.
Change if required in connect method of Database.py
Run the connect.py file (A localserver will be created at port no: 3000)
Open the link in your browser: http://localhost:3000
Ask a question.
See the result.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
ExtractQA Project		ExtractQA Project
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Webcrawl

Many other modules are used they are:

The flow of the project is as follows:

How to setup the project?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Webcrawl

Many other modules are used they are:

The flow of the project is as follows:

How to setup the project?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages