Summary
When testing serialization changes (like the stripped-fields work in the serialization sync), it's useful to quickly see the JSON sizes of transcriptions in the database without fetching the full content.
Suggested Implementation
Add a method to SwaraClient that lists transcriptions with their approximate sizes, or fetches a batch of transcriptions and reports sizes. Something like:
client.get_transcription_sizes()
# Returns: [{'_id': '...', 'title': '...', 'size_bytes': 6941692, 'pitch_count': 6770}, ...]
This could either:
- Client-side: Fetch each transcription and measure
len(json.dumps(data)) (slow but accurate)
- Server-side: Add an API endpoint that uses MongoDB's
$bsonSize aggregation to report document sizes without transferring full content (fast, preferred)
Context
During serialization sync testing, we needed to find the largest transcription to verify size reduction percentages. Currently this requires fetching each transcription individually and measuring, which is slow for 145+ documents.
Summary
When testing serialization changes (like the stripped-fields work in the serialization sync), it's useful to quickly see the JSON sizes of transcriptions in the database without fetching the full content.
Suggested Implementation
Add a method to
SwaraClientthat lists transcriptions with their approximate sizes, or fetches a batch of transcriptions and reports sizes. Something like:This could either:
len(json.dumps(data))(slow but accurate)$bsonSizeaggregation to report document sizes without transferring full content (fast, preferred)Context
During serialization sync testing, we needed to find the largest transcription to verify size reduction percentages. Currently this requires fetching each transcription individually and measuring, which is slow for 145+ documents.