MicroscopyGPT: Generating 3D Atomic Structure Captions from Microscopy Images Using Vision-Language Transformers

Kamal Choudhary

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

MicroscopyGPT: Generating 3D Atomic Structure Captions from Microscopy Images Using Vision-Language Transformers

Published

July 1, 2025

Author(s)

Kamal Choudhary

Abstract

Determining complete atomic structures directly from microscopy images remains a longstanding challenge in materials science. MicroscopyGPT is a vision-language model (VLM) that leverages multimodal generative pre-trained transformers to predict full atomic configurations including lattice parameters, element types, and atomic coordinates, from Scanning Transmission Electron Microscopy (STEM) images. The model is trained on a chemically and structurally diverse dataset of simulated STEM images generated using the AtomVision tool and the JARVIS-DFT \textcolorblack}as well as the C2DB two-dimensional (2D)} materials databases. The training set for finetuning comprises approximately 5000 2D materials, enabling the model to learn complex mappings from image features to crystallographic representations. I fine-tune the 11-billion-parameter LLaMA model, allowing efficient training on resource-constrained hardware. The rise of VLMs and the growth of materials datasets offer a major opportunity for microscopy-based analysis. This work highlights the potential of automated structure reconstruction from microscopy, with broad implications for materials discovery, nanotechnology, and catalysis.

Citation

Journal of Physical Chemistry Letters

Pub Type

Journals

Download Paper

https://doi.org/10.1021/acs.jpclett.5c01257

Local Download

Nanotechnology, Materials and Electronics

Citation

Choudhary, K. (2025), MicroscopyGPT: Generating 3D Atomic Structure Captions from Microscopy Images Using Vision-Language Transformers, Journal of Physical Chemistry Letters, [online], https://doi.org/10.1021/acs.jpclett.5c01257, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=960025 (Accessed July 16, 2026)

Additional citation formats

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created July 1, 2025, Updated July 31, 2025

Was this page helpful?