Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

2024 NIST GenAI (Pilot Study): Text-to-Text Evaluation Overview and Results

Published

Author(s)

Hariharan Iyer, Seungmin Seo, Lukas Diduch, Kay Peterson, George Awad, Yooyoung Lee

Abstract

The 2024 NIST Generative AI (GenAI) Pilot Study focuses on evaluating text-to-text (T2T) generation and discrimination tasks to assess the capabilities and limitations of generative AI models and AI detectors. The study aims to measure the effectiveness of AI-generated text in mimicking human writing and the ability of AI-based discriminators to distinguish between human- and AI-generated content. A curated dataset of article groups and associated human- and machine-generated summaries served as the benchmark, with performance assessed using statistical and machine learning-based metrics, including AUC (Area Under the Curve) and Brier scores. The results indicate that while AI-generated summaries increasingly resemble human writing, detection models remain reasonably effective in distinguishing between them. Performance varies significantly depending on the systems used, but there are some generators that could deceive most discriminators, and there are discriminators that could detect AI-generated content from almost all generators. There is certainly room for improvement for both generator and discriminator systems. We also found that discriminator systems improved over the multiple rounds of testing. Moving forward, future work will focus on refining evaluation methodologies, expanding multi-modal assessments across text, image, and audio domains, and developing standardized benchmarking protocols. These efforts aim to provide a robust test and evaluation framework for assessing generative AI technologies and AI detector technologies, guiding both researchers and policymakers in understanding their evolving impact.
Citation
NIST Trustworthy and Responsible AI - NIST AI 700-1
Report Number
NIST AI 700-1

Keywords

Artificial Intelligence (AI), Generative AI, Discriminative AI, Deepfakes, Large Language Model (LLM), Forensics, Evaluation, Measurement, Provenance, Authenticity, Detection, Accuracy, and Robustness

Citation

Iyer, H. , Seo, S. , Diduch, L. , Peterson, K. , Awad, G. and Lee, Y. (2025), 2024 NIST GenAI (Pilot Study): Text-to-Text Evaluation Overview and Results, NIST Trustworthy and Responsible AI, National Institute of Standards and Technology, Gaithersburg, MD, [online], https://doi.org/10.6028/NIST.AI.700-1, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=959809 (Accessed June 18, 2025)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created June 17, 2025
Was this page helpful?