This project is designed to analyze sentiment from movie reviews using a BERT-based model. The pipeline includes data ingestion, preprocessing, model training, evaluation, and deployment, orchestrated with Apache Airflow. The project utilizes DVC for data versioning and model tracking, MLflow for experiment tracking, and FastAPI for serving the model through an API.
For more information on the product design and vision, you can refer to the Sentiment Aanalysis Product Design Template.
Below is a high-level architecture diagram of the Sentiment Aanalysis. This diagram provides an overview of how the components are structured, including the web client, API server, ML core template, REST API template, and the deployment pipeline for hosting the application in a cloud environment.
- Python 3.9.16, but latest version should also work, might be incompatible with libraries like airflow etc.
- Virtualenv or Conda for environment management
- DVC for data version control
- MLflow for experiment tracking
- Apache Airflow for orchestrating workflows
- FastAPI for serving the model through an API
- AWS as our preferred cloud platform and MongoDB for data storage
- Other packages listed in requirements.txt
-
Clone the repository:
git clone https://github.com/leemaHmaid/Sentiment-Analysis.git cd Sentiment-Analysis -
Set up a virtual environment:
python3 -m venv sent_venv source sent_venv/bin/activate- Using Conda
python3 -m venv sent_venv source sent_venv/bin/activate -
Install the required packages:
pip install -r requirements.txt
-
MLflow Tracking:
- During training, MLflow is used to track experiments automatically. You need to ensure the MLflow tracking server is running:
mlflow server --backend-store-uri file://$(pwd)/mlruns --default-artifact-root file://$(pwd)/mlruns --host 0.0.0.0 -p 5050
You can access the MLflow UI to see the experiment metrics at
http://localhost:5050 -
Model Training:
- To initiate the entire workflow, run the
main.pyscript. Ensure that the MLflow server is running to track all experiments
python main.py
This will execute the data ingestion, validation, transformation, model training, and evaluation processes consecutively. All generated logs will be saved in the
logsfolder, and the entire workflow will be tracked on the MLflow UI. Here is how it looks like (from our run): - To initiate the entire workflow, run the
Alternately, we can us Apache Airflow to manage the entire workflow of the project, from data ingestion to model training and evaluation for scheduling and orchestrating the different tasks of the project.
-
Install Apache Airflow:
- First, install Apache Airflow using the following script:
run chmod 777 install.sh run ./install.sh
This will download airflow and initialize it. Make sure AIRFLOW_HOME is exported as variable in the terminal you use. Inside airflow.cfg set load_examples=False and run airflow db reset -y.
-
Setup Airflow
- Initialize the Airflow database to set up the necessary tables and users:
run chmod 777 setup.sh run ./setup.sh
This will initialize the airflow database, make sure the script contains the following:
airflow users create --username admin --firstname Admin --lastname User --role Admin --email [email protected] --password admin -
Start Airflow Scheduler and Webserver
- Start the scheduler to manage your tasks and the webserver to access the Airflow UI:
run chmod 777 scheduler.sh run ./scheduler.sh run chmod 777 scheduler.sh run ./webserver.sh
You can access the Airflow UI at
http://localhost:8080to view and manage DAGs (Directed Acyclic Graphs) for the project. Here is how it looks like (from our run):
-
Set Up Environment Variables To run the application, you'll need to set up some environment variables. You can create a .env file in the root of your project with the following placeholders:
# .env file MONGO_DB_URL=mongodb://your-mongo-url TOKEN_SECRET_KEY=your-secret-key ACCESS_TOKEN_EXPIRE_MINUTES=30 # Base URL for the API service API_BASE_URL=http://127.0.0.1:8000
Note: Make sure to replace the placeholders with your actual values. The
API_BASE_URLshould match the URL you use when running the API service. For local development, it should be set to http://127.0.0.1:8000.First of all, to run anything related to the API, navigate to the backend directory:
cd backend -
Run the Setup Script: Before starting the server, run the
setup.shscript in thescriptsfolder to create dummy users in the database:sh scripts/setup.sh
-
Start the FastAPI Server: To run the API, execute the
server.shscript in thescriptsfolder:sh scripts/server.sh
The API should now be running at
http://127.0.0.1:8001. Navigate tohttp://127.0.0.1:8001/docsand you'll see an interface like this:
- Run the Streamlit App: Navigate to the frontend directory and use the following command:
cd frontend streamlit run app_ui.py - Access the Application: Open your web browser and go to
http://localhost:8501. You see a ui like this when you click on register:
If you have any comments, suggestions or anything you'd like to be clarify on, feel free to reach us via email Prince, Leema, Asim or let's connect on linkedin, Leema,Prince, Asim.






