Flask-based web interface for matching annotations to dataset entries with visual verification and progress tracking.
This tool helps match PAD annotations to the correct dataset entries by:
- Displaying PAD images from the dataset
- Showing candidate matches based on PAD# (sample_id)
- Allowing selection of correct matches with visual verification
- Adding notes and marking unmatched annotations
- Tracking progress across annotations
- Exporting merged CSV with all matched data and notes
- 3,609 rows (images)
- 624 unique sample_ids
- Fields:
id,sample_id,sample_name,quantity,camera_type_1,url,hashlib_md5,image_name
- 4,253 rows (annotations)
- 739 unique PAD#s
- Fields:
PAD#,Camera,Lighting,black/white background,API,Sample,mg concentration,% Conc
sample_id(dataset) ↔PAD#(annotation)sample_name(dataset) ↔API(annotation)camera_type_1(dataset) ↔Camera(annotation)
# Install dependencies
cd flask-app
uv add flask pandas pillow requestscd flask-app
uv run python -c "import app; app.app.run(host='127.0.0.1', port=5001)"The app will open at http://localhost:5001
- Open the dashboard
- Select an API (e.g., "Hydroxyurea (oral)")
- Click "Enter" to view all annotations for that API
- See the progress bar showing completion percentage
For each annotation:
- Click "Edit" on a row to open the matching interface
- Review Annotation Data (left panel):
- See the annotation's PAD#, Camera, API, Sample, and concentrations
- Review Candidates (right panel):
- Scroll through matching candidates from the dataset
- Look at images to verify matches
- Select a Match:
- Click the "Select" button on the correct candidate
- The card will turn green with a "✓ Selected" badge
- Add Notes (optional):
- Type in the Notes field to record observations
- Click "Save Note" to persist
- Mark as No Match (if applicable):
- Click "Mark as No Match" if no suitable match exists
- Button turns orange with "✓ No Match"
- Save Your Work:
- Click "Save Match" to record and move to next annotation
- Or "Skip" to skip without saving
When ready to export:
- Go back to the PAD# List view
- Click the "Export" button at the bottom
- Download the CSV file
The exported CSV includes:
- All original annotation fields
matched_id: The dataset entry ID (or "no_match" if marked as no match)matched_sample_id: The dataset sample ID- Matched dataset information (sample_name, quantity, camera_type, etc.)
notes: Any notes you addedmissing_card: Flag for missing dataset entries
- Filters dataset rows by
sample_id=PAD# - Candidates sorted by relevance
- Visual preview of matching dataset images
- Displays PAD images directly from URLs
- Shows candidate cards with dataset information
- Easy-to-read layout with candidate details
- Select Matches: Click button to choose correct dataset entry
- Mark as No Match: For annotations with no suitable match
- Add Notes: Record observations and special cases
- Session Persistence: Progress automatically saved
- Dashboard shows completion percentage per API
- Progress bar during annotation matching
- Visual status indicators (Complete, Partial, Not Started)
- Filter annotations by status
- Purpose: Quality review of PAD annotations organized by lighting conditions
- Organization: Groups images by lighting (lightbox, benchtop, no light)
- Filters: Lighting, camera type, background, API, match status
- Features: Lazy loading, full-size image preview, statistics
- Use Case: Review annotation quality across different lighting conditions
- Purpose: Browse and manage all project cards in the laboratory database
- Organization: Groups cards by API/drug name
- Features:
- Shows both matched and unmatched cards
- Hover displays Database ID and PAD ID (sample_id)
- Quick Match button for direct navigation to matching interface
- Mark cards with issues and provide descriptions
- Filter by camera type, match status, and issue status
- Export filtered results to CSV
- Use Case: Manage complete inventory of lab cards and track problematic cards
- SQLite Database: Reliable concurrent access with Write-Ahead Logging
- Automatic Backups: Created when completing PAD matching and on export
- Manual Backups: On-demand backup creation with retention policy
- Data Integrity: All operations are atomic and persistent
- Issue Tracking: Separate table for tracking cards with problems
- Frontend: HTML/CSS/JavaScript with responsive design
- Backend: Flask web framework (Python)
- Data Storage:
- Database:
/database/chemopad.db(SQLite)matchestable: Maps annotation rows to dataset entriesnotestable: Stores annotation notesinvalid_cardstable: Tracks cards with issuesbackupstable: Records backup history
- Backup files:
/database/backups/folder (auto and manual backups) - Generated exports:
/exports/folder (timestamped CSV files) - Source data:
/data/folder (original CSV files)
- Database:
- Filter by PAD#: Find all dataset rows where
sample_idequals the annotation'sPAD# - Sort by Relevance: Display best matches first
- Visual Verification: Users review images and information
- Manual Selection: Users click "Select" to confirm match
- Multiple images exist per
sample_id(different cameras, lighting conditions) - Field values may be incorrect or differ in naming conventions
- Users need visual verification against actual images
- Allows documentation of edge cases and notes
- Matches and notes are saved immediately to JSON files
- Progress is preserved across browser sessions
- No data is lost if the browser is closed
- Session folder is excluded from git (
.gitignore)
- Annotations (4,253 rows) exceed dataset entries (3,609 rows)
- Some PAD#s in the annotations CSV may not exist in dataset
- Some dataset rows may have multiple annotations
- Images are loaded from remote URLs (requires internet connection)
Server not starting:
- Make sure you're in the
flask-appdirectory - Check that port 5001 is not in use:
lsof -ti:5001 - Ensure dependencies are installed:
uv add flask pandas pillow requests
Images not loading:
- Check internet connection
- Verify URL accessibility at https://pad.crc.nd.edu/
- Some remote images may be unavailable
Missing PAD#s:
- Some PAD#s may not exist in dataset
- Use "Mark as No Match" to document these cases
- Notes are helpful for tracking issues
Progress lost:
- Session data is automatically saved to
/session/folder - If data is lost, check that the session folder is not deleted
- Matches are saved as soon as "Save Match" is clicked
Export issues:
- Make sure
/exports/folder has write permissions - Check available disk space
- CSV will include all data from matches.json
Access the app at: http://pad-annotation.crc.nd.edu:8080/
Quick start guide with visual references: quick-start.md
This guide covers:
- Step-by-step interface walkthrough
- How to access the app
- Login and authentication
- Quick matching workflow
- Exporting results
Detailed user guide: user-guide.md
This guide covers:
- How to navigate the interface
- Step-by-step matching process
- Tips for better accuracy
- Common workflows
- Troubleshooting