Microscopy-guided crystal structure reconstruction: recover a simulation-ready crystal structure (CIF) from a single noisy STEM image when the composition is known.
Electron microscopy resolves individual atoms, yet turning an image into a usable material description (a structure, and from it properties) still relies on slow, manual reconstruction. The longer-term aim is a direct bridge from microscopy to materials: generate the structure straight from the image, then analyze it end to end. STEM2Crystal-Bench targets the first and hardest link in that bridge: recovering a simulation-ready crystal structure from a single noisy STEM image when the composition is known.
STEM2Crystal-Bench supplies paired data and one scoring protocol for this task. A method takes the STEM image and the known composition and returns one or more candidate CIFs, which are scored against the ground truth.
- a synthetic set of 1021 structures rendered at three controlled noise levels (low / mid / high), and
- a small set of real STEM images of 2D monolayers with single-layer ground truth.
Predictions are scored by reconstruction success (Hit@1, Hit@5), geometric error (MinRMS@K, RMSD), absolute-scale agreement (Match_abs, W₁), symmetry accuracy (SG-Acc), and a failure-mode breakdown (see Metrics). The dataset, its noise model, and the splits are documented in docs/benchmark.md.
stem2crystal/data: download the benchmark and resolve its file paths.stem2crystal/eval: the metrics and a function that scores a directory of predictions. It contains no model code, so it scores any method's output the same way.stem2crystal/methods: theMethodinterface and a registry. Several methods are included as reference implementations (SCCD, DiffCSP, MicroscopyGPT, MatterGen); they are starting points, not the scope; any model that outputs CIFs can be run and scored._sccd_impl/holds SCCD's training and sampling scripts.scripts/:download_benchmark.py,download_models.py,generate.py,evaluate.py.
Downloaded data, weights, and generated predictions are git-ignored and live in local data/,
outputs/, and predictions/ directories.
pip install -e .
python scripts/download_benchmark.py --config all # benchmark -> ./dataScore a directory of predictions (any method) against the benchmark:
python scripts/evaluate.py --pred predictions/sccd/low --benchmark synthetic --name SCCDMethod Hit1 Hit5 MinRMS_median MinRMS_mean Match_abs RMSD_A SG_Acc W1_b W1_gamma
SCCD 0.5867 0.8217 0.0536 0.1206 0.7855 0.2063 0.2693 0.1331 2.0072
failure modes: {'formula': 0.0, 'wrong_lattice': 0.135, 'lattice_ok_wrong_SG': 0.034, 'coord_other': 0.009}
Run a built-in method end to end:
python scripts/download_models.py --model diffcsp # weights -> ./outputs
python scripts/generate.py --method diffcsp --benchmark synthetic --noise low
python scripts/evaluate.py --method diffcsp --benchmark synthetic --noise lowThe built-in methods are sccd, diffcsp, microscopygpt, and mattergen. Each wraps its upstream
model and downloaded weights; see docs/methods.md.
Implement one method, register it, and the evaluator scores it like the others:
from stem2crystal.methods import Method, register_method
@register_method("my_model")
class MyModel(Method):
def setup(self):
self.net = load_ckpt(self.config["ckpt"])
def generate(self, image, composition, k=5):
return my_sampler(self.net, image, composition, k) # -> list[pymatgen.Structure]python scripts/generate.py --method my_model --method-module my_model.py --benchmark synthetic
python scripts/evaluate.py --method my_model --benchmark syntheticSee docs/add_a_method.md for the full contract.
Each method's rank-1 reconstruction next to the ground truth, labelled with the RMS to GT (two samples from the synthetic set; figure from the paper):
Regenerate it from your own predictions:
pip install -e ".[viz]" # adds ase + matplotlib
python scripts/plot_comparison.py --benchmark synthetic --noise low \
--methods mattergen microscopygpt diffcsp sccd \
--samples <sid1> <sid2> --out comparison.pngThe suite reports the metrics from the paper (§5.2):
| Metric | Definition |
|---|---|
| Hit@1 / Hit@5 | top-1 / best-of-K reconstruction success (scale-normalized matcher) |
| MinRMS@K (median, mean) | smallest scale-normalized RMS over formula-matched candidates |
| Match_abs | match rate with cell rescaling disabled (exposes absolute-scale errors) |
| RMSD (Å) | absolute-Ångström structural RMS |
| SG-Acc | top-1 space-group accuracy (spglib 3D number, not the 2D layer group) |
| W₁(b), W₁(γ) | 1-D Wasserstein-1 distance between predicted and reference lattice marginals |
| failure modes | formula / wrong-lattice / lattice-ok-wrong-SG / coord-other |
Each metric is a pure function in stem2crystal/eval/metrics.py,
aggregated by stem2crystal/eval/evaluate.py. This list is a starting
point, not a fixed set: a new metric is just another function over the ground-truth and candidate
structures (for example property-level error or thermodynamic stability), and additions are welcome.
benchmark · installation · user guide · methods · add a method
The benchmark is small on purpose, and it should not stay that way. Three ways to help:
- Methods. Wrap your model in the
Methodinterface (docs/add_a_method.md) and open a pull request that adds your row toleaderboard.json, with a link to predictions that reproduce it. - Data. New material classes (beyond 2D slabs), more real STEM images, and harder cases make the benchmark more useful. Propose additions on the dataset page or open an issue.
- Metrics. Add a metric as a pure function in
stem2crystal/eval/metrics.py(property-level error, stability, a different matcher) and open a pull request.
@inproceedings{chen2026stem2crystal,
title = {From Noisy STEM to Crystal Structure: Evidence-Structure CoDiffusion under Composition Constraints},
author = {Chen, Guangyao and You, Fengqi},
booktitle = {Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)},
year = {2026},
note = {Oral presentation}
}If you use the Real-5 images or a baseline method, please also cite the corresponding source papers.
Code is MIT (LICENSE). The benchmark is CC BY 4.0 / CC BY 3.0; see the dataset card.
