Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

AI Metrology Colloquia Series

As a follow-on to the National Academies of Science, Engineering, and Medicine workshop on Assessing and Improving AI Trustworthiness (link) and the National Institute of Standards and Technology (NIST) workshop on AI Measurement and Evaluation (link), NIST has begun hosting a bi-weekly AI metrology colloquia series, where leading researchers share current and recent work in AI measurement and evaluation.

This series provide a dedicated venue for the presentation and discussion of AI metrology research and to spur collaboration among AI metrology researchers in order to help advance the state-of-the-art in AI measurement and evaluation. The series is open to the public and the presentation formats are flexible, though generally consist of 50-minute talks with 10 minutes of questions and discussion. All talks start at 12:00 p.m. ET.

Information on viewing the series can be found here.

Please contact aime [at] with any questions, or to join the AIME mailing list.

2023 Schedule




Jan 19

Robert R. Hoffman / Institute for Human & Machine Cognition

Psychometrics for AI Measurement Science

Feb 02

Ludwig Schmidt / University of Washington

A Data-Centric View on Reliable Generalization

Feb 16 Rich Caruana / Microsoft Research High Accuracy Is Not Enough --- Not Everything That Is Important Can Be Measured

March 02

Nazneen Rajani / Hugging Face

The Wild West of NLP Modeling, Documentation, and Evaluation

March 16

Ben Shneiderman /University of Maryland

Human-Centered AI: Ensuring Human Control while Increasing Automation

March 30

Isabelle Guyon / Google Brain

Datasets and benchmarks for reproducible ML research: are we there yet?

April 13

Peter Fontana /National Institute of Standards and Technology

Towards a Structured Evaluation Methodology for Artificial Intelligence Technology

April 27 Jutta Treviranus Statistical Discrimination
May 11 Juho Kim / KAIST Interaction-Centric AI
May 25 Sina Fazelpour / Northeastern ML Trade-offs and Values in Sociotechnical Systems
June 8 Rishi Bommasani / Stanford CRFM Making Foundation Models Transparent
June 22 Visvanathan Ramesh / Goethe University Transdisciplinary Systems perspective for AI
July 20 Pin-Yu Chen / IBM Foundational Robustness of Foundation Models
Aug 3 James Zou / Stanford University Data-centric AI: what is it good for and why do we need it?
Aug 17 Olivia Wiles / Google Deepmind Rigorous Evaluation of Machine Learning Models
Aug 31 Patrick Hall Machine Learning for High-Risk Applications
Sep 14 Pradeep Natarajan / Amazon Recent advances in building Responsible LM technologies at Alexa: Privacy, Inclusivity, and Disambiguation
Sep 28 Jason Yik / Harvard University, NeuroBench: Advancing Neuromorphic Computing through Collaborative, Fair and Representative Benchmarking
Oct 12 Elham Tabassi / NIST NIST AI RMF
Oct 26 Dr. Marta Kwiatkowska /University of Oxford Safety and robustness for deep learning with provable guarantees
Dec 07 Joaquin Vanschoren / Eindhoven University of Technology (TU/e) Reproducible AI evaluation with OpenML

2022 Schedule




December 8 Sharon Yixuan Li / University of Wisconsin Madison How to Handle Data Shifts? Challenges, Research Progress and Path Forward
November 17 Prof. Emiliano De Cristofaro / University College London Privacy and Machine Learning: The Good, The Bad, and The Ugly
November 3 Peter Bajcsy, Software and Systems Division, ITL, NIST Explainable AI Models via Utilization Measurements
October 20 Soheil Feizi / University of Maryland Symptoms or Diseases: Understanding Reliability Issues in Deep Learning and Potential Ways to Fix Them
October 6 Thomas Dietterich / Oregon State University Methodological Issues in Anomaly Detection Research
September 22 Douwe Kiela / Head of Research at Hugging Face Rethinking benchmarking in AI: Evaluation-as-a-Service and Dynamic Adversarial Data Collection
September 8 Been Kim / Google Brain Bridging the representation gap between humans and machines: first steps
August 25 Aylin Caliskan and Robert Wolfe / University of Washington Quantifying Biases and Societal Defaults in Word Embeddings and Language-Vision AI
August 11 Chunyuan Li / Microsoft Research A Vision-Language Approach to Computer Vision in the Wild: Modeling and Benchmarking
July 28 Nicholas Carlini / Google Brain Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples
July 14 Andrew Trask, OpenMined / University of Oxford / Centre for the Governance of AI Privacy-preserving AI 
June 16 Theodore Jensen / National Institute of Standards and Technology User Trust Appropriateness in Human-AI Interaction
June 2 Reva Schwartz, Apostol Vassilev, Kristen Greene & Lori A. Perine / National Institute of Standards and Technology Towards a Standard for Identifying and Managing Bias in Artificial Intelligence (NIST Special Publication 1270)
May 19 Jonathan Fiscus, NIST The Activities in Extended Video Evaluations : A Case Study in AI Metrology
May 5 Judy Hoffman, Georgia Tech Measuring and Mitigating Bias in Vision Systems
April 21 Yuekai Sun, Statistics Department at the University of Michigan Statistical Perspectives on Federated Learning
April 7 Rayid Ghani, Professor in Machine Learning and Public Policy at Carnegie Mellon University Practical Lessons and Challenges in Building Fair and Equitable AI/ML Systems
March 24 Haiying Guan, NIST Open Media Forensic Challenge (OpenMFC) Evaluation Program
March 10 Dan Weld, Allen Institute for AI (AI2) Optimizing Human-AI Teams
February 24 Peter Hase, UNC Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?
February 10 Timnit Gebru, Distributed AI Research Institute (DAIR) DAIR & AI and Its Consequences

January 27

Brian Stanton, NIST

Trust and Artificial Intelligence 

2021 Schedule




December 16

Rich Kuhn, NIST

How Can We Provide Assured Autonomy?
December 2 Andreas Holzinger, Medical University Graz / Graz University of Technology / University of Alberta Assessing and Improving AI Trustworthiness with the Systems Causability Scale
November 18 David Kanter, ML Commons Introduction to MLCommons and MLPerf
November 4 Michael Sharp, NIST Risk Management in Industrial Artificial Intelligence
October 21 Finale Doshi-Velez, Harvard The Promises, Pitfalls, and Validation of Explainable AI
October 7 -----  
September 23 Jonathon Phillips, NIST Face Recognition: from Evaluations to Experiment
September 9 José Hernández-Orallo, Universitat Politècnica de València / Leverhulme Centre for the Future of Intelligence (Cambridge) Measuring Capabilities and Generality in Artificial Intelligence
August 26 Rachael Sexton, NIST Understanding & Evaluating Informed NLP Systems: The Road to Technical Language Processing
August 12 Michael Majurski, NIST Trojan Detection Evaluation: Finding Hidden Behavior in AI Models
July 29 Ellen Voorhees, NIST Operationalizing Trustworthy AI

NOTE: Portions of the events may be recorded and audience Q&A or comments may be captured. The recorded event may be edited and rebroadcast or otherwise made publicly available by NIST.  By registering for -- or attending -- this event, you acknowledge and consent to being recorded.


Created March 15, 2022, Updated August 15, 2023