NOTICE: Due to a lapse in annual appropriations, most of this website is not being updated. Learn more.
Form submissions will still be accepted but will not receive responses at this time. Sections of this site for programs using non-appropriated funds (such as NVLAP) or those that are excepted from the shutdown (such as CHIPS and NVD) will continue to be updated.
An official website of the United States government
Here’s how you know
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock (
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
MicroscopyGPT: Generating 3D Atomic Structure Captions from Microscopy Images Using Vision-Language Transformers
Published
Author(s)
Kamal Choudhary
Abstract
Determining complete atomic structures directly from microscopy images remains a longstanding challenge in materials science. MicroscopyGPT is a vision-language model (VLM) that leverages multimodal generative pre-trained transformers to predict full atomic configurations including lattice parameters, element types, and atomic coordinates, from Scanning Transmission Electron Microscopy (STEM) images. The model is trained on a chemically and structurally diverse dataset of simulated STEM images generated using the AtomVision tool and the JARVIS-DFT \textcolorblack}as well as the C2DB two-dimensional (2D)} materials databases. The training set for finetuning comprises approximately 5000 2D materials, enabling the model to learn complex mappings from image features to crystallographic representations. I fine-tune the 11-billion-parameter LLaMA model, allowing efficient training on resource-constrained hardware. The rise of VLMs and the growth of materials datasets offer a major opportunity for microscopy-based analysis. This work highlights the potential of automated structure reconstruction from microscopy, with broad implications for materials discovery, nanotechnology, and catalysis.
Choudhary, K.
(2025),
MicroscopyGPT: Generating 3D Atomic Structure Captions from Microscopy Images Using Vision-Language Transformers, Journal of Physical Chemistry Letters, [online], https://doi.org/10.1021/acs.jpclett.5c01257, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=960025
(Accessed October 20, 2025)