Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

NIST GenAI (Pilot): an Overview of Text-to-Text Evaluation Results

Published

Author(s)

Yooyoung Lee, Hariharan Iyer

Abstract

The 2024 NIST Generative AI (GenAI) Pilot Study focuses on evaluating text-to-text (T2T) generation and discrimination tasks to assess the capabilities and limitations of generative AI models. The study aims to measure the effectiveness of AI-generated text in mimicking human writing and the ability of AI-based discriminators to distinguish between human- and AI-generated content. A curated dataset of human-authored and machine-generated summaries served as the benchmark, with performance assessed using statistical and machine-learning-based metrics, including AUC (Area Under the Curve) and Brier scores. The presentation includes the evaluation submissions, data analyses, results, challenges, and future work.
Citation
NIST GenAI

Keywords

AI, Generative AI, evaluation, performance metrics

Citation

Lee, Y. and Iyer, H. (2025), NIST GenAI (Pilot): an Overview of Text-to-Text Evaluation Results, NIST GenAI, [online], https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=959907, https://ai-challenges.nist.gov/genai (Accessed May 7, 2025)

Issues

If you have any questions about this publication or are having problems accessing it, please contact reflib@nist.gov.

Created May 5, 2025