A robust, privacy-focused tool that converts technical PDF documents into high-quality Anki flashcards with proper Latex formatation. It leverages local Large Language Models (LLMs) to ensure data privacy and zero cost.
Flashcards-Generator is designed for students who need reliable study materials. Unlike generic summarizers, this tool uses a structured extraction pipeline to:
- Read PDF documents.
- Extract key concepts using a local LLM (Ollama).
- Enforce strict JSON output for stability.
- Generate flashcards in Portuguese (pt-BR), even if the source text is in English.
- Format mathematical formulas with rigorous LaTeX syntax (e.g.,
\frac{a}{b}).
- 100% Local & Free: Runs entirely on your machine using Ollama. No API keys or credit cards required.
- Strict Math Formatting: specifically engineered to handle complex formulas, ensuring fractions and variables render correctly in Anki.
- Structured Data Validation: Uses
InstructorandPydanticto validate every single output from the LLM, preventing broken import files. - Portuguese Output: The system prompt is hardcoded to translate and synthesize concepts into Portuguese, making it ideal for non-native English speakers studying technical docs.
- Batch Processing: Capable of processing entire directories of PDFs in one go.
While functional, this project relies on local hardware resources. Consider the following:
- Hardware Dependency (Speed): Generation speed is directly tied to your CPU/GPU.
- High-End (M1/M2/M3 Mac, RTX GPU): Very fast generation.
- Mid-Range (Modern i5/i7): Acceptable speeds (~5-15 seconds per chunk).
- Low-End / Older PCs: Generation may be slow. Since the model runs locally, older CPUs may take significant time to process large PDFs.
- Model Intelligence: We currently recommend
llama3.2for speed. However, larger models (likellama3.3ormistral) may produce better summaries but will require more RAM and processing power. - PDF Complexity: Scanned PDFs (images) are not currently supported (OCR is not implemented yet). The tool works best with text-selectable PDFs.
- Python 3.10+
- Ollama: You must have Ollama installed and running.
- Hardware: A decent CPU (modern i5/i7) or any discrete GPU is recommended for reasonable generation speeds.
-
Install Ollama Follow the instructions at ollama.com. Once installed, pull the required model (Llama 3.2 is recommended for the balance of speed/quality):
ollama pull llama3.2
-
Clone the Repository
git clone [https://github.com/yourusername/Flashcards-Generator.git](https://github.com/yourusername/Flashcards-Generator.git) cd Flashcards-Generator -
Set Up Virtual Environment
python3 -m venv venv source venv/bin/activate -
Install Dependencies
pip install -r requirements.txt
-
Prepare your Data Place your PDF files into the
data/input/directory. -
Run the Generator Execute the main script. You can specify the input directory and the output file path.
python main.py data/input data/output/deck.tsv
-
Import to Anki
- Open Anki.
- Select File > Import.
- Choose the generated
.tsvfile. - Ensure "Allow HTML in fields" is checked.
- Field mapping should be:
Field 1 -> Front,Field 2 -> Back,Field 3 -> Tags.
src/core/: Contains the core logic modules.extractor.py: Handles PDF text extraction.generator.py: Interfaces with the local Ollama instance; contains the prompts and validation logic.formatter.py: Cleans and formats text into Anki-compatible HTML/LaTeX.models.py: Pydantic data structures.
data/: Directory for input files and output artifacts (ignored by git).main.py: CLI entry point.
This tool processes data locally. No text from your PDFs is sent to external cloud servers (like OpenAI or Anthropic). Your documents remain private on your machine.