Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

EVENTS

AI Measurement and Evaluation Workshop

Name: AI Measurement and Evaluation Workshop
Start: 2021-06-15T00:00:00-04:00
End: 2021-06-17T23:59:00-04:00
Location: Virtual Only - Eastern Time

Virtual Event

June 15 - 17, 2021

Virtual Only - Eastern Time

Registration for this event will close on June 14, 2021 or when maximum capacity is reached

The link and instructions will be sent to registered attendees on Monday, June 14, 2021.

There is no fee to attend the workshop.

Registration Contact

Rachel Trello

[email protected]

(301) 975-2002

Technical Contact

Harold Booth

[email protected]

(301) 975-8441
Craig Greenberg

[email protected]

(301) 975-3605

Workshop Description

NIST will hold a virtual workshop on Artificial Intelligence Measurement and Evaluation June 15-17, 2021. The three-day workshop aims to bring together stakeholders and experts to identify the most pressing needs for AI measurement and evaluation and to advance the state of the art and practice.

NIST is assigned responsibility by statute to advance underlying research for measuring and assessing AI technologies. That includes the development of AI data standards and best practices, as well as AI evaluation and testing methodologies and standards. NIST is working collaboratively with the private and public sectors to help prioritize and work on its AI activities.

Workshop Summary Report

Workshop Goals:

Identify:
1. the needs for and intended uses of AI measurement and evaluation
2. the gaps in knowledge/practice preventing current AI measurement and evaluation activities from effectively meeting these needs/uses
Solicit guidance on which specific areas that NIST should focus its efforts
Identify specific users and applications in need of measurement and evaluation
Collate best practices for AI measurement and evaluation
Build community around AI measurement and evaluation: provide tools, resources, pointers to other sources, and continued engagement through periodic talks and seminars

Panels and discussions will be organized to provide feedback on topics related to the AI Measurement and Evaluation, and to influence the future direction of NIST efforts in this area.

This workshop will be ideal for:

Researchers who are interested in AI measurement and evaluation.
Developers of AI technologies who need to perform evaluation and testing of AI systems.
Policymakers and decision makers who need to use the outputs of AI measurements and evaluations.

Workshop Materials

Stay connected with the latest NIST AIME updates -- sign up for the mailing list by either:

If you already have a google account associated with your email address, go to https://groups.google.com/a/list.nist.gov/g/aime and click the “Join group” button
Or, send an email to aime+subscribe [at] list.nist.gov (aime+subscribe[at]list[dot]nist[dot]gov)

A new project called Dioptra has been released on GitHub at https://github.com/usnistgov/dioptra!

Dioptra is a test bed software currently focused on adversarial machine learning and defensive mitigations. It is in a pre-release status but we would like to start collecting community feedback.

Workshop Read-Ahead: Artificial Intelligence Measurement and Evaluation at the National Institute of Standards and Technology (Draft)

Fact Sheet: NIST AI Program

Workshop Agenda

Download the detailed agenda (PDF)

Day 1: Tuesday June 15, 2021

All times EDT (UTC-4)

Start Time	End Time	Topic
11:00 AM	11:20 AM	Welcome, Workshop Goals & Logistics, Overview Elham Tabassi (Chief of Staff, Information Technology Laboratory, NIST)
11:20 AM	11:50 AM	Keynote: A National Security Perspective on AI Measurement and Evaluation Jason Matheny (Deputy Assistant to the President for Technology and National Security; Deputy Director for National Security in the White House Office of Science and Technology Policy; and Coordinator for Technology and National Security at the National Security Council)
11:50 AM	12:00 PM	Break
12:00 PM	1:30 PM	Panel 1: Measuring with Purpose Discussion of the needs for and uses of AI evaluation outputs and their role in driving down-stream processes, including the requirements and properties important for an AI evaluation to possess in order to be fit for the intended uses. Identification of areas for which current measurement and evaluation approaches are insufficient or do not exist, where further AI metrology research would be beneficial. Moderator: Tess DeBlanc-Knowles (White House Office of Science and Technology Policy) Panelists: Jack Clark (Anthropic) Michael Hind (IBM Research) (slides) Chuck Howell (MITRE) Jane Pinelis (Test and Evaluation of AI/ML at DoD Joint Artificial Intelligence Center) Salvatore Scalzo (European Commission) Bill Scherlis (DARPA)
1:30 PM	1:45 PM	Break and Discussion Time
1:45 PM	2:15 PM	Panel 2: Overview of Past & Current Evaluations Overview of the evaluation-driven research paradigm that has been used at NIST to evaluate AI systems, with a description of the various styles of evaluations, as well as examples of some of the AI measurement and evaluation activities conducted at NIST. Moderator: Mark Przybocki (NIST) Panelists: Peter Bajcsy (NIST) Jonathan Fiscus (NIST) Jonathon Phillips (NIST) Michael Sharp (NIST) Ellen Voorhees (NIST) Megan Zimmerman (NIST)
2:15 PM	2:45 PM	Panel 3: Discussion of NIST/Community Future Work (slides) Discussion of the limitations of current AI measurement and evaluation activities that prevent them from addressing all the needs for AI measurement and evaluation, and future plans for NIST to address these limitations together with the research community.
2:45 PM	3:00 PM	Break
3:00 PM	4:00 PM	Panel 4: Evaluating AI during Operation Discussion of AI evaluation in production/operational environments, including topics drawn from: MLOps; Operational evaluation metrics/Business metrics; Model quality/Data drift with online data; Latency, throughput, and scalability issues; Adversarial attacks and robustness to corruptions/perturbations; Governance and regulatory compliance. Moderator: Antonio Moretti (Walmart) Panelists: Clarence Agbi (Brex) Sergey Karayev (Turnitin) Josh Tobin (Gantry)
4:00 PM	4:15 PM	Closing Remarks NIST Workshop Organizing Committee
4:15 PM	5:00 PM	After Hours: Slack with NIST staff

Day 2: Wednesday June 16, 2021

All times EDT (UTC-4)

Start Time	End Time	Topic
11:00 AM	11:30 AM	Keynote Fei-Fei Li (Sequoia Professor, Stanford University; Co-Director of Stanford’s Human-Centered AI Institute)
11:30 AM	12:30 PM	Panel 5: Evaluation Design Process Discussion of the processes and procedures for designing evaluations of AI systems, including: the high-level considerations and decisions that must be made in order to design and implement effective evaluations; the components of and relationships between the various evaluation design elements; and the role of the applications and overall evaluation goals in evaluation design. Moderator: Nicholas Carlini (Google Brain) Panelists: Matthias Hein (University of Tübingen) Deborah Raji (Mozilla Foundation) Shibani Santurkar (MIT / Stanford) Ludwig Schmidt (Toyota Research / UW)
12:30 PM	12:45 PM	Break
12:45 PM	1:45 PM	Panel 6: Metrics and Measurement Methods Discussion of: the properties of an AI system that can/should be measured, and which properties have/lack metrics and measurement methods; the different measurement methods that are used to measure AI and their strengths/limitations; the different types and uses of metrics, and the various properties that a metric can poses; the impacts of the chosen metrics and measurements methods have on an evaluation; when is it important to have glass box access to AI systems for evaluation, and when the design/approach taken by an AI system influences the choice of metrics/measurement methods. Moderator: Craig Greenberg (NIST) Panelists: José Hernández-Orallo (Universitat Politècnica de València) (slides) Douglas Reynolds (NSA / MIT Lincoln Laboratory) Sameer Singh (UCI)
1:45 PM	2:00 PM	Break
2:00 PM	3:00 PM	Panel 7: Data and Data Sets Data collection methods and dataset design for AI system measurement and evaluation, along with discussions drawing from the following topics: approaches for data annotation/labeling; uncertain, missing, or non-existence of ground truth; how much data is necessary; needs for and uses of simulated/generated data; roles of common datasets in research; repurposing of data; ethical and privacy considerations; et al. Moderator: Aleksander Mądry (MIT) Panelists: Marzyeh Ghassemi (U Toronto/MIT) Tom Goldstein (UMD) Emre Kiciman (MSR) Nicolas Papernot (U Toronto)
3:00 PM	3:15 PM	Break
3:15 PM	4:15 PM	Panel 8: Limitations, Challenges, and Future Directions of Evaluation Discussion of the limitations, challenges, shortcomings, and future directions for the evaluation and measurement of AI, including the new or emerging evaluation paradigms, the ability/inability to generalize evaluation results and its policy implications. Needs and plans for improvements to existing measurement and evaluation activities as well as the creation of new AI evaluation challenge problems and measurement research. Moderator: Soheil Feizi (UMD) Panelists: Kamalika Chaudhuri (UCSD) Eric Horvitz (MSR) Percy Liang (Stanford) Chris Meserole (Brookings) Daniela Rus (MIT)
4:15 PM	4:30 PM	Break and Slack Discussion Time
4:30 PM	5:00 PM	Closing Remarks NIST Workshop Organizing Committee
5:00 PM	5:30 PM	After Hours: Slack with NIST staff

Day 3: Thursday June 17, 2021

All times EDT (UTC-4)

Start Time	End Time	Topic
11:00 AM	11:30 AM	Keynote: AI test and evaluation from National AI Initiative Perspective Lynne Parker (Director, National AI Initiative Office, White House Office of Science and Technology Policy)
11:30 AM	12:30 PM	Panel 9: Measuring Concepts that are Complex, Contextual, and Abstract Discussion of the challenges and approaches for measuring AI system characteristics that are complex, contextual, and/or abstract, or are otherwise difficult to quantify (such as explainability, bias, trustworthiness, safety, etc.) including the role that descriptive and/or qualitative measurements should play in these cases. Moderator: Ellen Voorhees (NIST) Panelists: Lora Aroyo (Google) Ben Carterette (Spotify) David Ferrucci (Elemental Cognition)
12:30 PM	12:45 PM	Break
12:45 PM	1:45 PM	Panel 10: Measuring with Humans in the Mix Discussion of the measurement and evaluation of AI systems that work in cooperation with humans, including the roles and relationships between the AI systems and the humans, and the challenges of and approaches to measurement and evaluation when humans and AI systems are involved. Moderator: Margaret Burnett (OSU) Panelists: Rachel Bellamy (IBM) Madeleine Clare Elish (Google) Robert Hoffman (IHMC)
1:45 PM	2:00 PM	Break
2:00 PM	3:00 PM	Panel 11: Software Infrastructure Overview, Existing Tools and Future Desires Discussion of the landscape, challenges, and needs of developing tools and infrastructure for the particular purpose of measuring, testing, and evaluating AI systems. Moderator: Harold Booth (NIST) Panelists: Pin-Yu Chen (IBM) (slides) Harsha Nori (Microsoft) David Pitman (Google)
3:00 PM	3:15 PM	Break and Discussion Time
3:15 PM	4:15 PM	Panel 12: Practical Considerations and Best Practices for Measurement and Evaluation Discussion of the practical considerations and concrete best practices for the measurement and evaluation of AI-based systems, including the testing and evaluation strategies that can be used to mitigate privacy loss or intellectual property exposure in AI testing. Moderator: William Streilein (MIT Lincoln Laboratory) Panelists: Matt Gaston (SEI Emerging Technology Center, CMU) Sven Krasser (CrowdStrike) Sanjeev Mohindra (MIT Lincoln Laboratory) Jane Pinelis (Test and Evaluation of AI/ML at DoD Joint Artificial Intelligence Center) Richard Tatum (CIV USN NAVSURFWARCEN PNC FL)
4:15 PM	4:30 PM	Break
4:30 PM	5:00 PM	NIST: Workshop Debrief and Next Steps NIST Workshop Organizing Committee