Extracting contents
This project is about extracting contents from the commnunity sites and bring the contents into our system and stored them. The full project is done in python using Flask framework for creating server and Beautiful Soup module in python for scraping contents from the web.
- request - To establish connection to a website.
- pymysql - to perform database operation.
- multiprocessing - To enable multi-threading.
- Flask
- BeautifulSoup
A user interface is provided to fire an query or question by the user, when query is fired the request goes to localserver created using flask framework. The query is extracted and it is first google searched to get related links to query where we can find answer for the query. The links of Quora and Wikipedia is selected for extracting the contents. (search.py) Firstly the contents from Quora links obtained from google search results is extratced using self desgined crawler and then from Wikipeda if available. To execute the process of extracting contents faster,the program extracts the contents from different links simultaneously using multi threading where each thread handles each link. (To reduce waiting period.) After extracting contents, the contents are stored in database in an MYSQL Database where questions and answers are stored. Finally, the extracted contents are displayed to user .
Directly download the zip folder and import it to Pycharm IDE
- Install MYSQL in your computer.
- Change if required in connect method of Database.py
- Run the connect.py file (A localserver will be created at port no: 3000)
- Open the link in your browser: http://localhost:3000
- Ask a question.
- See the result.