This project integrates scholarly journal metadata coming from two heterogeneous sources:
- a graph database (Blazegraph + RDF triples)
- a relational database (SQLite)
and exposes a unified Python query engine that returns results as Python objects (Journal, Category, Area).
The architecture is modular, extensible, and designed to support heterogeneous data ingestion and querying.
- Source: Directory of Open Access Journals (DOAJ)
- Contains: ISSN, title, languages, publisher, license, APC, DOAJ Seal
- Loaded via: JournalUploadHandler
- Source: Scimago Journal Rank (SJR)
- Contains: categories, areas, quartiles
- Loaded via: CategoryUploadHandler
- CategoryUploadHandler loads JSON into the relational database (SQLite).
- JournalUploadHandler loads CSV into the graph database (Blazegraph).
- CategoryQueryHandler and JournalQueryHandler retrieve data as Pandas DataFrames.
- FullQueryEngine integrates both sources and returns Python objects.
The schema includes:
IdentifiableEntityHasCategoryHasArea
This structure supports many-to-many relationships between journals, categories, and areas.
Base class for all entities with one or more identifiers.
- title
- languages
- publisher
- seal
- license
- apc
- hasCategory
- hasArea
- id
- quartile
- id
Queries the relational database (SQLite):
getAllCategories()getAllAreas()getCategoriesWithQuartile()getCategoriesAssignedToAreas()getAreasAssignedToCategories()getCategoryWithName()— returns all categories whose name partially matches the input stringgetAreaWithName()— returns all areas whose name partially matches the input string
Provides simple filters over the graph database:
getAllJournals()getJournalsWithTitle()getJournalsPublishedBy()getJournalsWithLicense()getJournalsWithAPC()getJournalsWithDOAJSeal()getAllCategories()getAllAreas()getCategoriesWithQuartile()
Provides composite queries combining graph + relational data:
getJournalsInCategoriesWithQuartile()getJournalsInAreasWithLicense()getDiamondJournalsInAreasAndCategoriesWithQuartile()getJournalByName()— returns journals whose title contains the input string and that have at least one category or area whose name also contains it
Brigata/
├── main.py
├── impl.py
├── laura.py
├── Yang.py
├── daniele.py
├── li.py
├── baseHandler.py
├── images/
│ ├── workflow.png
│ ├── relational_database_structure.png
│ ├── datamodel.png
├── data/
│ ├── doaj.csv
│ ├── scimago.json
└── relational.db # auto-generated
Install dependencies:
pip install pandas rdflib SPARQLWrapper sqlalchemy requests
Download Blazegraph:
wget https://github.com/blazegraph/database/releases/download/BLAZEGRAPH_2_1_6_RC/blazegraph.jar
Start the server:
java -server -Xmx1g -jar blazegraph.jar
Verify at http://localhost:9999/blazegraph/.
⚠️ Update the Blazegraph endpoint URL inmain.pyto match your local setup. The default ishttp://127.0.0.1:9999/blazegraph/namespace/kb/sparql.
python main.py
This will:
- upload CSV to Blazegraph
- upload JSON to SQLite
- initialize the query engine
- run example queries
| Name | Role | |
|---|---|---|
| Tianchi Yang | Graph Query Handler (SPARQL) | [email protected] |
| Yihua Li | Graph Upload Handler (RDF → Blazegraph) | [email protected] |
| Daniele Camagna | Relational DB + Upload Handler | [email protected] |
| Laura Bortoli | Query Engine (Basic & Full) | [email protected] |


