An official website of the United States government
Here’s how you know
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock (
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
MicroscopyGPT: Generating 3D Atomic Structure Captions from Microscopy Images Using Vision-Language Transformers
Published
Author(s)
Kamal Choudhary
Abstract
Determining complete atomic structures directly from microscopy images remains a longstanding challenge in materials science. MicroscopyGPT is a vision-language model (VLM) that leverages multimodal generative pre-trained transformers to predict full atomic configurations including lattice parameters, element types, and atomic coordinates, from Scanning Transmission Electron Microscopy (STEM) images. The model is trained on a chemically and structurally diverse dataset of simulated STEM images generated using the AtomVision tool and the JARVIS-DFT \textcolorblack}as well as the C2DB two-dimensional (2D)} materials databases. The training set for finetuning comprises approximately 5000 2D materials, enabling the model to learn complex mappings from image features to crystallographic representations. I fine-tune the 11-billion-parameter LLaMA model, allowing efficient training on resource-constrained hardware. The rise of VLMs and the growth of materials datasets offer a major opportunity for microscopy-based analysis. This work highlights the potential of automated structure reconstruction from microscopy, with broad implications for materials discovery, nanotechnology, and catalysis.
Choudhary, K.
(2025),
MicroscopyGPT: Generating 3D Atomic Structure Captions from Microscopy Images Using Vision-Language Transformers, Journal of Physical Chemistry Letters, [online], https://doi.org/10.1021/acs.jpclett.5c01257, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=960025
(Accessed August 7, 2025)