ODM Performance Benchmarking

Version: 1.0
Status: In progress
Minimum Server Version: N/A

Abstract

This document describes a standard benchmarking suite for MongoDB ODMs (Object Document Mappers). Much of this document's structure and content is taken from the existing MongoDB driver benchmarking suite for consistency.

Overview

Name and purpose

ODM performance will be measured by the MongoDB ODM Performance Benchmark. It will provide both "horizontal" insights into how individual ODM performance evolves over time and two types of "vertical" insights: the relative performance of different ODMs, and the relative performance of ODMs and their associated language drivers.

We expect substantial performance differences between ODMs based on both their language families (e.g. static vs. dynamic or compiled vs. virtual-machine-based) as well as their inherent design (e.g. web frameworks such as Django vs. application-agnostic such as Mongoose). Within families of ODMs that use similar design and language families, these comparisons could be used to identify potential areas of performance improvement.

Task Hierarchy

The benchmark suite consists of two groups of small, independent benchmarks. This allows us to better isolate areas within ODMs that are faster or slower.

Flat models -- reading and writing flat models of various sizes, to explore basic operation efficiency
Nested models -- reading and writing nested models of various sizes, to explore basic operation efficiency for complex data

The suite is intentionally kept small for several reasons:

ODM feature sets vary significantly across libraries, limiting the number of benchmarks that can be run across the entire collection of extant ODMs.
Several popular MongoDB ODMs are actively maintained by third-parties, such as Mongoose. By limiting the benchmarking suite to a minimal set of representative tests that are easy to implement, we encourage adoption of the suite by these third-party maintainers.

Measurement

In addition to latency data, all benchmark tasks will be measured in terms of "megabytes/second" (MB/s) of documents processed, with higher scores being better. (In this document, "megabyte" refers to the SI decimal unit, i.e. 1,000,000 bytes.) This makes cross-benchmark comparisons easier.

To avoid various types of measurement skew, tasks will be measured over several iterations. Each iteration will have a number of operations performed per iteration that depends on the task being benchmarked. The final score for a task will be the median score of the iterations. A range of percentiles will also be recorded for diagnostic analysis.

Data sets

Data sets will vary by task. In most cases, data sets will be synthetic line-delimited JSON files to be constructed by the ODM being benchmarked into the appropriate model. Some tasks will require additional modifications to these constructed models, such as adding generated ObjectIds. See the Benchmark task definitions section for details.

Versioning

The MongoDB ODM Performance Benchmark will have vX.Y versioning. Minor updates and clarifications will increment "Y" and MUST have little impact on score comparison. Major changes, such as task modifications, MongoDB version tested against, or hardware used, MUST increment "X" to indicate that older version scores are unlikely to be comparable.

Benchmark execution phases and measurement

All benchmark tasks will be conducted via a number of iterations. Each iteration will be timed and will generally include a large number of individual ODM operations.

The measurement is broken up this way to better isolate the benchmark from external volatility. If we consider the problem of benchmarking an operation over many iterations, such as 100,000 model insertions, we want to avoid two extreme forms of measurement:

measuring a single insertion 100,000 times -- in this case, the timing code is likely to be a greater proportion of executed code, which could routinely evict the insertion code from CPU caches or mislead a JIT optimizer and throw off results
measuring 100,000 insertions one time -- in this case, the longer the timer runs, the higher the likelihood that an external event occurs that affects the time of the run

Therefore, we choose a middle ground:

measuring the same 10,000 insertions over 10 iterations -- each timing run includes enough operations that insertion code dominates timing code; unusual system events are likely to affect only a fraction of the 10 timing measurements

With 10 timings of inserting the same 10,000 models, we build up a statistical distribution of the operation timing, allowing a more robust estimate of performance than a single measurement. (In practice, the number of iterations could exceed 10, but 10 is a reasonable minimum goal.)

Because a timing distribution is bounded by zero on one side, taking the mean would allow large positive outlier measurements to skew the result substantially. Therefore, for the benchmark score, we use the median timing measurement, which is robust in the face of outliers.

Each benchmark is structured into discrete setup/execute/teardown phases. Phases are as follows, with specific details given in a subsequent section:

setup -- (ONCE PER TASK) something to do once before any benchmarking, e.g. construct a model object, load test data, insert data into a collection, etc.
before operation -- (ONCE PER ITERATION) something to do before every task iteration, e.g. drop a collection, or reload test data (if the test run modifies it), etc.
do operation -- (ONCE PER ITERATION) smallest amount of code necessary to execute the task; e.g. insert 10,000 models one by one into the database, or retrieve 10,000 models of test data from the database, etc.
after operation -- (ONCE PER ITERATION) something to do after every task iteration (if necessary)
teardown -- (ONCE PER TASK) something done once after all benchmarking is complete (if necessary); e.g. drop the test database

The wall-clock execution time of each "do operation" phase will be recorded. We use wall clock time to model user experience and as a lowest-common denominator across ODMs. Iteration timing SHOULD be done with a high-resolution monotonic timer (or best language approximation).

Unless otherwise specified, the number of iterations to measure per task is variable:

iterations MUST loop for at least 30 seconds cumulative execution time
once this 30 second minimum execution time is reached, iterations SHOULD stop after at least 10 iterations or 1 minute cumulative execution time, whichever is shorter

This balances measurement stability with a timing cap to ensure all tasks can complete in a reasonable time.

For each task, the 10th, 25th, 50th, 75th, 90th, 95th, 98th and 99th percentiles will be recorded using the following algorithm:

Given a 0-indexed array A of N iteration wall clock times
Sort the array into ascending order (i.e. shortest time first)
Let the index i for percentile p in the range [1,100] be defined as: i = int(N * p / 100) - 1

N.B. This is the Nearest Rank algorithm, chosen for its utter simplicity given that it needs to be implemented identically across a wide variety of ODMs and languages.

The 50th percentile (i.e. the median) will be used for score composition. Other percentiles will be stored for visualizations and analysis.

Each task will have defined for it an associated size in megabytes (MB). This size will be calculated using the task's dataset size and the number of documents processed per iteration. The benchmarking score for each task will be the task size in MB divided by the median wall clock time.

Benchmark task definitions

Datasets are available in the odm-data directory adjacent to this spec.

Note: The term "LDJSON" means "line-delimited JSON", which should be understood to mean a collection of UTF-8 encoded JSON documents (without embedded CR or LF characters), separated by a single LF character. (Some Internet definition of line-delimited JSON use CRLF delimiters, but this benchmark uses only LF.)

Flat models

Datasets are in the flat_models.tgz tarball.

Flat model tests focus on flatly-structured model reads and writes across data sizes. They are designed to give insights into the efficiency of the ODM's implementation of basic data operations.

The data will be stored as strict JSON with no extended types. These JSON representations MUST be converted into equivalent models as part of each benchmark task.

Flat model benchmark tasks include:

Small model creation
Small model update
Small model find by filter
Small model find foreign key by filter (if joins are supported)
Large model creation
Large model update

Small model creation

Summary: This benchmark tests ODM performance creating a single small model.

Dataset: The dataset (SMALL_DOC) is contained within small_doc.json and consists of a sample document stored as strict JSON with an encoded length of approximately 250 bytes.

Dataset size: For scoring purposes, the dataset size is the size of the small_doc source file (250 bytes) times 10,000 operations, which equals 2,250,000 bytes or 2.5 MB.

This benchmark uses a comparable dataset to the driver small doc insertOne benchmark, allowing for direct comparisons.

Phase	Description
Setup	Load the SMALL_DOC dataset into memory.
Before task	n/a.
Do task	Create an ODM-appropriate model instance for the SMALL_DOC document and save it to the database. Repeat this 10,000 times.
After task	Drop the collection associated with the SMALL_DOC model.
Teardown	n/a.

Small model update

Summary: This benchmark tests ODM performance updating fields on a single small model.

Dataset: The dataset (SMALL_DOC) is contained within small_doc.json and consists of a sample document stored as strict JSON with an encoded length of approximately 250 bytes.

Dataset size: For scoring purposes, the dataset size is the size of the updated_value string file (13 bytes) times 10,000 operations, which equals 130,000 bytes or 130 KB.

Phase	Description
Setup	Load the SMALL_DOC dataset into memory as an ODM-appropriate model object. Save 10,000 instances into the database.
Before task	n/a.
Do task	Update the `field1` field for each instance of the model to equal `updated_value` in an ODM-appropriate manner.
After task	n/a.
Teardown	Drop the collection associated with the SMALL_DOC model.

Small model find by filter

Summary: This benchmark tests ODM performance finding documents using a basic filter.

Dataset: The dataset (SMALL_DOC) is contained within small_doc.json and consists of a sample document stored as strict JSON with an encoded length of approximately 250 bytes.

Dataset size: For scoring purposes, the dataset size is the size of the small_doc source file (250 bytes) times 10,000 operations, which equals 2,250,000 bytes or 2.5 MB.

Phase	Description
Setup	Load the SMALL_DOC dataset into memory as an ODM-appropriate model object. Insert 10,000 instances into the database, saving the `_id` field for each into a list.
Before task	n/a.
Do task	For each of the 10,000 `_id` values, perform a filter operation to find the corresponding SMALL_DOC model.
After task	n/a.
Teardown	Drop the collection associated with the SMALL_DOC model.

Small model find foreign key by filter

Summary: This benchmark tests ODM performance finding documents by foreign keys. This benchmark MUST only be run by ODMs that support join ($lookup) operations.

Dataset: The dataset (SMALL_DOC) is contained within small_doc.json and consists of a sample document stored as strict JSON with an encoded length of approximately 250 bytes. An additional model (FOREIGN_KEY) representing the foreign key, consisting of only a string field called name, MUST also be created.

Dataset size: For scoring purposes, the dataset size is the size of the small_doc source file (250 bytes) times 10,000 operations, which equals 2,250,000 bytes or 2.5 MB.

Phase	Description
Setup	Load the SMALL_DOC dataset into memory as an ODM-appropriate model object. For each SMALL_DOC model, create and assign a FOREIGN_KEY instance to the `field_fk` field. Insert 10,000 instances of both models into the database, saving the inserted `_id` field for each FOREIGN_KEY into a list.
Before task	n/a.
Do task	For each of the 10,000 FOREIGN_KEY `_id` values, perform a filter operation in an ODM-appropriate manner to find the corresponding SMALL_DOC model.
After task	n/a.
Teardown	Drop the collections associated with the SMALL_DOC and FOREIGN_KEY models.

Large model creation

Summary: This benchmark tests ODM performance creating a single large model.

Dataset: The dataset (LARGE_DOC) is contained within large_doc.json and consists of a sample document stored as strict JSON with an encoded length of approximately 3.2 MB.

Dataset size: For scoring purposes, the dataset size is the size of the large_doc source file (3,226,020 bytes) times 10,000 operations, which equals 32,260,200,000 bytes or 32.2 GB.

Phase	Description
Setup	Load the LARGE_DOC dataset into memory.
Before task	n/a.
Do task	Create an ODM-appropriate model instance for the LARGE_DOC document and save it to the database. Repeat this 10,000 times.
After task	Drop the collection associated with the LARGE_DOC model.
Teardown	n/a.

Large model update

Summary: This benchmark tests ODM performance updating fields on a single large model.

Dataset: The dataset (LARGE_DOC) is contained within large_doc.json and consists of a sample document stored as strict JSON with an encoded length of approximately 3.2 MB.

Dataset size: For scoring purposes, the dataset size is the size of the updated_value string file (13 bytes) times 10,000 operations, which equals 130,000 bytes or 130 KB.

Phase	Description
Setup	Load the LARGE_DOC dataset into memory as an ODM-appropriate model object. Save 10,000 instances into the database.
Before task	n/a.
Do task	Update the `field1` field for each instance of the model to `updated_value` in an ODM-appropriate manner.
After task	Drop the collection associated with the LARGE_DOC model.
Teardown	n/a.

Nested models

Datasets are in the nested_models.tgz tarball.

Nested model tests focus performing reads and writes on models containing nested (embedded) documents. They are designed to give insights into the efficiency of operations on the more complex data structures enabled by the document model.

The data will be stored as strict JSON with no extended types. These JSON representations MUST be converted into equivalent ODM models as part of each benchmark task.

Nested model benchmark tasks include:

Large model creation
Large model update nested field
Large model find nested field by filter
Large model find nested array field by filter

Large model creation

Summary: This benchmark tests ODM performance creating a single large nested model.

Dataset: The dataset (LARGE_DOC_NESTED) is contained within large_doc_nested.json and consists of a sample document stored as strict JSON with an encoded length of approximately 8,000 bytes.

Dataset size: For scoring purposes, the dataset size is the size of the large_doc_nested source file (8,000 bytes) times 10,000 operations, which equals 80,000,000 bytes or 80 MB.

Phase	Description
Setup	Load the LARGE_DOC_NESTED dataset into memory.
Before task	n/a.
Do task	Create an ODM-appropriate model instance for the LARGE_DOC_NESTED document and save it to the database. Repeat this 10,000 times.
After task	Drop the collection associated with the LARGE_DOC_NESTED model.
Teardown	n/a.

Large model update nested

Summary: This benchmark tests ODM performance updating nested fields on a single large model.

Dataset: The dataset (LARGE_DOC_NESTED) is contained within large_doc_nested.json and consists of a sample document stored as strict JSON with an encoded length of approximately 8,000 bytes.

Dataset size: For scoring purposes, the dataset size is the size of the updated_value string file (13 bytes) times 10,000 operations, which equals 130,000 bytes or 130 KB.

Phase	Description
Setup	Load the LARGE_DOC_NESTED dataset into memory as an ODM-appropriate model object. Save 10,000 instances into the database.
Before task	n/a.
Do task	Update the value of the `embedded_str_doc_1.field1` field to `updated_value` in an ODM-appropriate manner for each instance of the model.
After task	Drop the collection associated with the LARGE_DOC_NESTED model.
Teardown	n/a.

Large nested model find nested by filter

Summary: This benchmark tests ODM performance finding nested documents using a basic filter.

Dataset: The dataset (LARGE_DOC_NESTED) is contained within large_doc_nested.json and consists of a sample document stored as strict JSON with an encoded length of approximately 8,000 bytes.

Dataset size: For scoring purposes, the dataset size is the size of the large_doc_nested source file (8,000 bytes) times 10,000 operations, which equals 80,000,000 bytes or 80 MB.

Phase	Description
Setup	Load the LARGE_DOC_NESTED dataset into memory as an ODM-appropriate model object. Insert 10,000 instances into the database, saving the value of the `unique_id` field for each model's `embedded_str_doc_1` nested model into a list.
Before task	n/a.
Do task	For each of the 10,000 `embedded_str_doc_1.unique_id` values, perform a filter operation to search for the parent LARGE_DOC_NESTED model.
After task	n/a.
Teardown	Drop the collection associated with the LARGE_DOC_NESTED model.

Large nested model find nested array by filter

Summary: This benchmark tests ODM performance finding nested document arrays using a basic filter.

Dataset: The dataset (LARGE_DOC_NESTED) is contained within large_doc_nested.json and consists of a sample document stored as strict JSON with an encoded length of approximately 8,000 bytes.

Dataset size: For scoring purposes, the dataset size is the size of the large_doc_nested source file (8,000 bytes) times 10,000 operations, which equals 80,000,000 bytes or 80 MB.

Phase	Description
Setup	Load the LARGE_DOC_NESTED dataset into memory as an ODM-appropriate model object. Insert 10,000 instances into the database, saving the value of the `unique_id` field for the first item in each model's `embedded_str_doc_array` nested model into a list.
Before task	n/a.
Do task	For each of the 10,000 `unique_id` values, perform a filter operation to search for the parent LARGE_DOC_NESTED model.
After task	n/a.
Teardown	Drop the collection associated with the LARGE_DOC_NESTED model.

Benchmark platform, configuration and environments

Benchmark Client

The benchmarks SHOULD be run with the most recent stable version of the ODM and the newest version of the driver it supports.

Benchmark Server

The MongoDB ODM Performance Benchmark MUST be run against a MongoDB replica set of size 1 patch-pinned to the latest stable database version without authentication or SSL enabled. This database version SHOULD be updated as needed to better differentiate between ODM performance changes and server performance changes. The Benchmark MUST be run on the established internal performance distro for the sake of consistency.

Benchmark placement and scheduling

The MongoDB ODM Performance Benchmark SHOULD be placed in one of two places. For first-party ODMs, the Benchmark SHOULD be placed within the ODM's test directory as an independent test suite. For third-party ODMs, if the external maintainers do not wish to have the Benchmark included as part of the in-repo test suite, it SHOULD be included in the ODM performance testing repository created explicitly for this purpose.

Due to the relatively long runtime of the benchmarks, including them as part of an automated suite that runs against every PR is not recommended. Instead, scheduling benchmark runs on a regular cadence is the recommended method of automating this suite of tests.

ODM-specific benchmarking

As discussed earlier in this document, ODM feature sets vary significantly across libraries. Many ODMs have features unique to them or their niche in the wider ecosystem, which makes specifying concrete benchmark test cases for every possible API unfeasible. Instead, ODM authors SHOULD determine what mainline use cases of their library are not covered by the benchmarks specified above and expand this testing suite with additional benchmarks to cover those areas.

Changelog

2025-11-14: Release initial 1.0 version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ODM Performance Benchmarking

Abstract

Overview

Name and purpose

Task Hierarchy

Measurement

Data sets

Versioning

Benchmark execution phases and measurement

Benchmark task definitions

Flat models

Small model creation

Small model update

Small model find by filter

Small model find foreign key by filter

Large model creation

Large model update

Nested models

Large model creation

Large model update nested

Large nested model find nested by filter

Large nested model find nested array by filter

Benchmark platform, configuration and environments

Benchmark Client

Benchmark Server

Benchmark placement and scheduling

ODM-specific benchmarking

Changelog

FilesExpand file tree

odm-benchmarking.md

Latest commit

History

odm-benchmarking.md

File metadata and controls

ODM Performance Benchmarking

Abstract

Overview

Name and purpose

Task Hierarchy

Measurement

Data sets

Versioning

Benchmark execution phases and measurement

Benchmark task definitions

Flat models

Small model creation

Small model update

Small model find by filter

Small model find foreign key by filter

Large model creation

Large model update

Nested models

Large model creation

Large model update nested

Large nested model find nested by filter

Large nested model find nested array by filter

Benchmark platform, configuration and environments

Benchmark Client

Benchmark Server

Benchmark placement and scheduling

ODM-specific benchmarking

Changelog