EntityLinkerPDF is a web application designed to recognize named entities in PDF documents and assist with entity linking. Users can upload PDFs, view them, and interact with named entities extracted from the documents.
- Upload and manage PDF documents.
- View PDF documents within the web interface.
- Extract named entities from PDFs using SpaCy.
- Link entities to a SQLite database.
- Highlight occurrences of entities within the PDF.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
- Conda
- Node.js and npm (for the React front-end)
A step-by-step series of examples that tell you how to get a development environment running.
-
Clone the repository:
git clone https://github.com/rlnsanz/EntityLinkerPDF.git cd EntityLinkerPDF -
Create a Conda environment and install Python dependencies:
- Create a new Conda environment:
conda create --name entitylinkerpdf python=3.10
- Activate the environment:
conda activate entitylinkerpdf
- Install the required Python packages:
pip install -r requirements.txt
- Download the SpaCy English language model:
python -m spacy download en_core_web_sm
- Create a new Conda environment:
-
Install Node modules for the React front end:
cd client npm install cd ..
-
Run the application:
-
Start the Flask server:
cd server flask run -
In a new terminal, start the React front end:
cd client npm start
-