RefMerger is a robust and free Python tool for researchers, librarians, and academics working with large volumes of bibliographic references. It automates the tedious process of merging references from multiple sources and removing duplicates, saving hours of manual work.
In research projects, it is common to collect references from various sources such as PubMed, Web of Science, Scopus, or local databases. Each source exports in different formats (.bib, .xml, .csv, .json), and when joining them, inevitable duplicates arise. RefMerger solves this by converting everything to RIS format (standard for reference managers like Zotero, Mendeley, and EndNote), merging the files, and applying intelligent deduplication.
- Multi-Format Support: Automatically converts .bib (BibTeX), .xml (PubMed), .csv, .json, and .ris
- Advanced Deduplication: Uses similarity algorithms to detect duplicates even with title variations or formatting
- Flexibility: Configurable deduplication modes for different needs
- Robust: Handles various encodings and file errors gracefully
- Free and Open-Source: Pure Python code, no heavy dependencies
- Format Conversion: Supports .ris, .bib, .xml, .csv, .json
- File Merging: Combines multiple files into a single RIS
- Robust Deduplication: Removes duplicates with priority DOI > PMID > Title+Year+Author > Hash
- Deduplication Modes:
strict: DOI onlybalanced: DOI + Title/Year/Authoraggressive: Includes title similarity (90%+)
- Export: To CSV or JSON (optional)
- Encoding Detection: UTF-8, Latin-1, Windows-1252
-
Clone the repository:
git clone https://github.com/fdossi/refmerger.git cd refmerger -
Install dependencies (optional, only for .bib):
pip install bibtexparser
-
Run the script:
python refmerger.py
- Place all files (.ris, .bib, .xml, .csv, .json) in the same folder.
- Edit the variables at the end of the script:
pasta_dos_arquivos: Folder pathmodo_deduplicacao: 'strict', 'balanced', or 'aggressive'formato_exportar: None (RIS), 'csv', or 'json'
- Run the Python script.
# Settings at the end of refmerger.py
pasta_dos_arquivos = r"C:\My\Files\References"
modo_deduplicacao = 'balanced'
formato_exportar = None # Output in RISExpected output:
Success! 25 files were merged into 'todas_referencias_juntas.ris'.
707 duplicate reference(s) removed. Final total: 268 unique reference(s).
- Python 3.x
- bibtexparser (for .bib):
pip install bibtexparser - Standard libraries: os, glob, re, json, csv, xml.etree, unicodedata, hashlib, difflib
Success! 25 files were merged into 'todas_referencias_juntas.ris'.
707 duplicate reference(s) removed. Final total: 268 unique reference(s).
Fields: title, author, year, doi, journal
Expected structure: ArticleTitle, Author/LastName, Year, DOI
Columns: title, authors (separated by ;), year, doi
Structure: [{"title": "...", "authors": [...], "year": "...", "doi": "..."}]
Contributions are welcome! To contribute:
- Fork the repository
- Create a branch for your feature (
git checkout -b feature/new-feature) - Commit your changes (
git commit -am 'Add new feature') - Push to the branch (
git push origin feature/new-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Developed by Fábio Dossi.
Note: If this tool assists in your research, feel free to cite it in your paper. It’s entirely optional, but always appreciated!
APA (7th Edition) Dossi, F. C. A. (2026). RefMerger (Version 1.0.0) [Computer software]. GitHub. https://github.com/fdossi/refmerger
BibTeX (For LaTeX users) @software{Dossi_RefMerger_2026, author = {Dossi, F. C. A.}, title = {{RefMerger: Automating reference merging and deduplication}}, url = {https://github.com/fdossi/refmerger}, version = {1.0.0}, year = {2026} }
ABNT (Brazil) DOSSI, Fabio F. C. A. RefMerger: a tool for automating the merging and deduplication of references. Version 1.0.0. Aracaju, 2026. Available at: https://github.com/fdossi/refmerger. Accessed on: Apr. 23, 2026.
MLA (9th Edition) Dossi, F. C. A. RefMerger. Version 1.0.0, 2026, github.com/fdossi/refmerger.