NIST GenAI (Pilot): an Overview of Text-to-Text Evaluation Results

Yooyoung Lee; Hariharan Iyer

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

NIST GenAI (Pilot): an Overview of Text-to-Text Evaluation Results

Published

May 5, 2025

Author(s)

Yooyoung Lee, Hariharan Iyer

Abstract

The 2024 NIST Generative AI (GenAI) Pilot Study focuses on evaluating text-to-text (T2T) generation and discrimination tasks to assess the capabilities and limitations of generative AI models. The study aims to measure the effectiveness of AI-generated text in mimicking human writing and the ability of AI-based discriminators to distinguish between human- and AI-generated content. A curated dataset of human-authored and machine-generated summaries served as the benchmark, with performance assessed using statistical and machine-learning-based metrics, including AUC (Area Under the Curve) and Brier scores. The presentation includes the evaluation submissions, data analyses, results, challenges, and future work.

Citation

NIST GenAI

Pub Weblink

https://ai-challenges.nist.gov/genai

Pub Type

Websites

Download Paper

Local Download

Keywords

AI, Generative AI, evaluation, performance metrics

Artificial intelligence

Citation

Lee, Y. and Iyer, H. (2025), NIST GenAI (Pilot): an Overview of Text-to-Text Evaluation Results, NIST GenAI, [online], https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=959907, https://ai-challenges.nist.gov/genai (Accessed July 16, 2026)

Additional citation formats

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created May 5, 2025

Was this page helpful?