Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

AI Metrology Colloquia Series

As a follow-on to the National Academies of Science, Engineering, and Medicine workshop on Assessing and Improving AI Trustworthiness (link) and the National Institute of Standards and Technology (NIST) workshop on AI Measurement and Evaluation (link), NIST has begun hosting a bi-weekly AI metrology colloquia series, where leading researchers share current and recent work in AI measurement and evaluation.

This series provide a dedicated venue for the presentation and discussion of AI metrology research and to spur collaboration among AI metrology researchers in order to help advance the state-of-the-art in AI measurement and evaluation. The series is open to the public and the presentation formats are flexible, though generally consist of 50-minute talks with 10 minutes of questions and discussion. All talks start at 12:00 p.m. ET.

Information on viewing the series can be found here.

Please contact aime [at] (aime[at]nist[dot]gov) with any questions, or to join the AIME mailing list.


2024 Schedule

Jan 18Elham Tabassi / NIST

NIST Risk Management Framework (RMF) (Postponed)

Feb 01Anupam Datta / TrueraEvaluating and Monitoring LLM applications
Feb 15Dr. Marta Kwiatkowska /University of OxfordSafety and robustness for deep learning with provable guarantees
Feb 29Vera Liao / MicrosoftBridging the Socio-Technical Gap: Towards Explainable and Responsible AI
March 14Bo Li / University of ChicagoRisk Assessment, Safety Alignment, and Guardrails for Generative Models
March 28Phil Koopman / Carnegie Mellon UniversitySafety Performance Indicators and Continuous Improvement Feedback
April 11Josh Tobin /  GantryEvaluating LLM-based applications
April 25Peter Kairouz /  Research Scientist at GoogleNavigating Privacy Risks in (Large) Language Models
May 9Asma Ben Abacha / MicrosoftCancelled -- Challenges and Opportunities in Clinical Note Generation and Evaluation
May 23Henning Müller, HES-SOScientific challenges in medical imaging, benchmarks, how to assure fair comparisons

2023 Schedule

Jan 19Robert R. Hoffman / Institute for Human & Machine CognitionPsychometrics for AI Measurement Science
Feb 02Ludwig Schmidt / University of WashingtonA Data-Centric View on Reliable Generalization
Feb 16Rich Caruana / Microsoft ResearchHigh Accuracy Is Not Enough --- Not Everything That Is Important Can Be Measured
March 02Nazneen Rajani / Hugging FaceThe Wild West of NLP Modeling, Documentation, and Evaluation
March 16Ben Shneiderman /University of MarylandHuman-Centered AI: Ensuring Human Control while Increasing Automation
March 30Isabelle Guyon / Google BrainDatasets and benchmarks for reproducible ML research: are we there yet?
April 13Peter Fontana /National Institute of Standards and TechnologyTowards a Structured Evaluation Methodology for Artificial Intelligence Technology
April 27Jutta TreviranusStatistical Discrimination
May 11Juho Kim / KAISTInteraction-Centric AI
May 25Sina Fazelpour / NortheasternML Trade-offs and Values in Sociotechnical Systems
June 8Rishi Bommasani / Stanford CRFMMaking Foundation Models Transparent
June 22Visvanathan Ramesh / Goethe UniversityTransdisciplinary Systems perspective for AI
July 20Pin-Yu Chen / IBMFoundational Robustness of Foundation Models
Aug 3James Zou / Stanford UniversityData-centric AI: what is it good for and why do we need it?
Aug 17Olivia Wiles / Google DeepmindRigorous Evaluation of Machine Learning Models
Aug 31Patrick HallMachine Learning for High-Risk Applications
Sep 14Pradeep Natarajan / AmazonRecent advances in building Responsible LM technologies at Alexa: Privacy, Inclusivity, and Disambiguation
Sep 28Jason Yik / Harvard University, neurobench.aiNeuroBench: Advancing Neuromorphic Computing through Collaborative, Fair and Representative Benchmarking
Oct 26Chris Welty / Sr. Research Scientist at Googlep-Value: A statistically rigorous approach to machine learning model comparison
Nov 09Joon Sung Park / Stanford UniversityGenerative Agents: Interactive Simulacra of Human Behavior
Dec 07Joaquin Vanschoren / Eindhoven University of Technology (TU/e)Systematic benchmarking for AI safety and Machine Learning research

2022 Schedule

December 8Sharon Yixuan Li / University of Wisconsin MadisonHow to Handle Data Shifts? Challenges, Research Progress and Path Forward
November 17Prof. Emiliano De Cristofaro / University College LondonPrivacy and Machine Learning: The Good, The Bad, and The Ugly
November 3Peter Bajcsy, Software and Systems Division, ITL, NISTExplainable AI Models via Utilization Measurements
October 20Soheil Feizi / University of MarylandSymptoms or Diseases: Understanding Reliability Issues in Deep Learning and Potential Ways to Fix Them
October 6Thomas Dietterich / Oregon State UniversityMethodological Issues in Anomaly Detection Research
September 22Douwe Kiela / Head of Research at Hugging FaceRethinking benchmarking in AI: Evaluation-as-a-Service and Dynamic Adversarial Data Collection
September 8Been Kim / Google BrainBridging the representation gap between humans and machines: first steps
August 25Aylin Caliskan and Robert Wolfe / University of WashingtonQuantifying Biases and Societal Defaults in Word Embeddings and Language-Vision AI
August 11Chunyuan Li / Microsoft ResearchA Vision-Language Approach to Computer Vision in the Wild: Modeling and Benchmarking
July 28Nicholas Carlini / Google BrainLessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples
July 14Andrew Trask, OpenMined / University of Oxford / Centre for the Governance of AIPrivacy-preserving AI 
June 16Theodore Jensen / National Institute of Standards and TechnologyUser Trust Appropriateness in Human-AI Interaction
June 2Reva Schwartz, Apostol Vassilev, Kristen Greene & Lori A. Perine / National Institute of Standards and TechnologyTowards a Standard for Identifying and Managing Bias in Artificial Intelligence (NIST Special Publication 1270)
May 19Jonathan Fiscus, NISTThe Activities in Extended Video Evaluations : A Case Study in AI Metrology
May 5Judy Hoffman, Georgia TechMeasuring and Mitigating Bias in Vision Systems
April 21Yuekai Sun, Statistics Department at the University of MichiganStatistical Perspectives on Federated Learning
April 7Rayid Ghani, Professor in Machine Learning and Public Policy at Carnegie Mellon UniversityPractical Lessons and Challenges in Building Fair and Equitable AI/ML Systems
March 24Haiying Guan, NISTOpen Media Forensic Challenge (OpenMFC) Evaluation Program
March 10Dan Weld, Allen Institute for AI (AI2)Optimizing Human-AI Teams
February 24Peter Hase, UNCEvaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?
February 10Timnit Gebru, Distributed AI Research Institute (DAIR)DAIR & AI and Its Consequences
January 27Brian Stanton, NISTTrust and Artificial Intelligence 

2021 Schedule

December 16Rich Kuhn, NISTHow Can We Provide Assured Autonomy?
December 2Andreas Holzinger, Medical University Graz / Graz University of Technology / University of AlbertaAssessing and Improving AI Trustworthiness with the Systems Causability Scale
November 18David Kanter, ML CommonsIntroduction to MLCommons and MLPerf
November 4Michael Sharp, NISTRisk Management in Industrial Artificial Intelligence
October 21Finale Doshi-Velez, HarvardThe Promises, Pitfalls, and Validation of Explainable AI
October 7----- 
September 23Jonathon Phillips, NISTFace Recognition: from Evaluations to Experiment
September 9José Hernández-Orallo, Universitat Politècnica de València / Leverhulme Centre for the Future of Intelligence (Cambridge)Measuring Capabilities and Generality in Artificial Intelligence
August 26Rachael Sexton, NISTUnderstanding & Evaluating Informed NLP Systems: The Road to Technical Language Processing
August 12Michael Majurski, NISTTrojan Detection Evaluation: Finding Hidden Behavior in AI Models
July 29Ellen Voorhees, NISTOperationalizing Trustworthy AI

NOTE: Portions of the events may be recorded and audience Q&A or comments may be captured. The recorded event may be edited and rebroadcast or otherwise made publicly available by NIST.  By registering for -- or attending -- this event, you acknowledge and consent to being recorded.


Created March 15, 2022, Updated May 8, 2024