Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Comments Received on A Proposal for Identifying and Managing Bias in Artificial Intelligence (SP 1270)

We are seeking your feedback on our recently released first draft document of “A Proposal for Identifying and Managing Bias in Artificial Intelligence" (Special Publication 1270).

 

Comment number

Commenter organization

Commenter name

Document line or section referenced (if available)

Comment

1

President of The Foundation for Sustainable Communities & Sr. Adjunct Professor

Deborah Hagar, MBA

319

A key link to identifying and establishing accountability for bias is in the decision making process - human interventions.  The accountability link for results is the key.  Ex. Sarbannes-Oaxley idientified responsibility.

 

President of The Foundation for Sustainable Communities & Sr. Adjunct Professor

Deborah Hagar, MBA

397

Ensuring the identification and full engagement across domains is critical to both accountability and rationale for proposed decisions (i.e., transparency)

 

President of The Foundation for Sustainable Communities & Sr. Adjunct Professor

Deborah Hagar, MBA

410

This is the key!  The full cycle of implementation is the true test that the outputs result in the intended AND desired outcomes for the identified Stakeholders.

 

President of The Foundation for Sustainable Communities & Sr. Adjunct Professor

Deborah Hagar, MBA

595

Again, the clear accountability that the initial problems are resolved, without a zero-sum game of negative impact on select stakeholders, will result in desired outcomes AND effective Stakeholder Capitalism.

2

Bayana Corporation

Joseph S. Bayana

198

to cultivate trust --- There is an assumption that artificial intelligence has big data, and these big data has equal representation. AI can be "trusted" only if the big data is actual and real-world based.

 

Bayana Corporation

Joseph S. Bayana

199

accuracy, "reliability" -- once again, there is an assumption that big data for AI has already been accumulated and substantiated. There are no actual, real-world sources for many data used currently for AI.

 

Bayana Corporation

Joseph S. Bayana

203

to understand and reduce harmful forms of AI bias -- there must be a minimum amount of big data with ample or sufficient representation from various sources (e.g. male, female, young, old, race, etc.) before AI can be used or professed as AI.

 

Bayana Corporation

Joseph S. Bayana

222

The presumption is that bias is present throughout AI systems, the challenge is identifying, measuring, and managing it.  --- once again, there is an assumption that AI has reliable big data.

 

Bayana Corporation

Joseph S. Bayana

223

There is also the assumption that methodology, "type," and "industrial sector," to name a few, are sufficient to proceed with AI, when in reality no big data is available.

 

Bayana Corporation

Joseph S. Bayana

234

proliferation of modeling and predictive approaches based on data-driven and machine learning techniques has helped to expose various social biases baked into real-world systems --- there are no reliable big data sources for "modeling," "predictive approaches," "techniques."

 

Bayana Corporation

Joseph S. Bayana

233

Just like a minimum viable product or prototype in manufacturing and/or industry, there must be a minimum amount of viable big data that can substantiate AI models, approaches, techniques, et al.

 

Bayana Corporation

Joseph S. Bayana

247

higher ethical standard -- an AI testing standard or measure should be done to check the minimum amount of viable big data, and to determine if the AI model, approach, and/or technique is based on factual, real-world data rather than a model or simulation.

 

Bayana Corporation

Joseph S. Bayana

251

Our group's research has data to substantiate that healthcare currently does not have enough big data to substantiate healthcare as we know it, much less substantiate healthcare AI. Healthcare industry refers to evidence-based practices but healthcare AI does not have real-world data nor does it have person-centered big data.

 

Bayana Corporation

Joseph S. Bayana

251

Healthcare big data, as it exists today, is not fact-based coming from direct, actual, real-world sources. Healthcare big data's methodology is plain and simple word search,thereafter wrapped up and presented as big data.

 

Bayana Corporation

Joseph S. Bayana

267

specific conditional traits associated with automation that exacerbate distrust in AI tools --- there are 1.5 million words in the English language, with limitless combinations and permutations of nouns, pronouns, verbs, adjectives, adverbs, prepositions, articles, determiners, etc. Buttressed with academic prestige coming from an author or writer, a write-up can be passed off as an AI article.

 

Bayana Corporation

Joseph S. Bayana

280

What's stopping the NIST and other groups, agencies, etc. from setting up "direct measures" that gauge, rate, and/or measure AI and its big data sources?

 

Bayana Corporation

Joseph S. Bayana

324

describe technology that is based on questionable concepts, deceptive or unproven practices, or lacking theoretical underpinnings --- Our group has the same findings but focusing more on healthcare big data and AI in healthcare.

 

Bayana Corporation

Joseph S. Bayana

335

Bias in healthcare AI can come from lack of primary sources, and real-world, fact- and evidence-based data, including the absence of actual sources from physical or chemical data.

 

Bayana Corporation

Joseph S. Bayana

410

1. Pre-design --- Assumes data is reliable, substantiated, and without bias.

 

Bayana Corporation

Joseph S. Bayana

411

2. DESIGN AND DEVELOPMENT: where the technology is constructed  --- data is already biased.

 

Bayana Corporation

Joseph S. Bayana

412

3. DEPLOYMENT: where technology is used by, or applied to, various individuals or groups. == Data is already biased.

 

Bayana Corporation

Joseph S. Bayana

424

Pre-design stage --- there should be a minimum viable data amount for all AI at the pre-design stage to ensure that the data is not biased nor are the data not just made up.

 

Bayana Corporation

Joseph S. Bayana

440

If at the pre-design stage there are no reliable big data sources, the AI bias is already present and all other stages after this will have bias.

 

Bayana Corporation

Joseph S. Bayana

506

Improving pre-design practices --- once again, there should be minimum amount of viable data from actual, real-world, fact-based primary sources.

 

Bayana Corporation

Joseph S. Bayana

615

 small number of participants --- most, if not all, AI sources rely on synonymous descriptions to substantiate their claims. In other words, there are really no standards or measurements for minimum viable data used for most, if not, all AI literature and/or research.

 

Bayana Corporation

Joseph S. Bayana

683

development of standards and a risk-based framework. --- once again, there should be minimum viable data amount as the "standard" in order to create the lowest benchmark for any and all "frameworks."

 

Bayana Corporation

Joseph S. Bayana

707

NIST will collaboratively develop additional guidance --- our group would like to work with NIST to develop a "standard" and/or "framework" to avoid AI bias in healthcare.

 

Bayana Corporation

Joseph S. Bayana

730

The biases we've identified in healthcare can be categorized as content production bias, data generation bias, exclusion bias, historical bias, inherited bias, institutional bias, error propagation, systemic bias, Loss of situational awareness bias, et al.

3

 

 

 

Comment out of scope

4 The University of British Columbia Mike Zajko 731 Appendix A defines "societal bias" (a term that does a lot of work in this document) from a social psychology textbook as "an adaptable byproduct of human cognition". This does not reflect the understanding of these notions in domains such as sociology (my field) and notable works in the critical AI literature. These biases aren't just byproducts of cognition, but products of identifiable, hierarchical social systems (sexism is not a byproduct). For some of my thoughts on societal & historical bias in AI see: https://link.springer.com/article/10.1007/s00146-021-01153-9

5

IEEE Life Fellow

Doug Verret PhD

106

This describes the desired attributes of an AI system, but in your reductionist, life-cycle approach you have no way to detect biases that arise at the interstices between stages or may not be able to be localized.  You need some form of traceability or schema that comprehends the interfaces.

 

IEEE Life Fellow

Doug Verret PhD

424

In software engineering circles the terminology "V and V" is used routinely.  You use "validation" terminology in your deployment stage, but you do not use "verification" with its specific technical meaning in the Pre-Design stage. The verification process targets software architecture, design, databases, input translation etc. and uses reviews, walk-throughs, inspections and desk-checking, but not coding.  Why not use the vocabulary that professional engineers recognize?

 

IEEE Life Fellow

Doug Verret PhD

462

A major flaw in the way problems are conceptualized here is the lack of recognition that in machine learning approaches there is no theoretical foundation for how neural nets work.  That is, machine-learning AI tools produce "solutions" or "decisions" or "recommendations" that are unpredictable in many cases and not obvious or explainable by even the developer.  On the one hand if it were obvious you wouldn't need the tool, but on the other hand it is not uncommon for the developer to be unable to explain how an output was arrived at other than mechanistically.  How then do you propose to discover bias in the pre-design stage when the code is not even written and there is no theory that is not inscrutable?

 

IEEE Life Fellow

Doug Verret PhD

638

This seems to suggest that systems should be designed with unintended uses in mind.  Engineers design systems to a spec and in a sense guarantee the spec.  It is utter folly to design a system for "off-road" use.  The possibilities are infinite and such a design is impossible.  Engineers can design robust systems whereby the system is protected from misuse but cannot design system that protect users from abuse and from harm.  This may not be what is intended by this sentence, but it should be clarified.

 

IEEE Life Fellow

Doug Verret PhD

702

As you have aptly pointed out, determining the qualities of and biases in AI tools is often context-dependent. If you include "off-road" use cases the possibilities are endless.  What I am missing in this document is a clear set of principles that teaches me how to generalize without having to examine each use case one at a time.  You want to be flexible and context-aware and at the same time generalize above all of these cases.  Other than "be aware of this" and "be aware of that" I am not getting how you want me to do that.

6

Rapid7

STUART MILLAR

 

Hello, thank you for reading my comments.  I appreciate the first draft is broad. I feel as a community we have the change to get into the low-level detail of how to mitigate bias, with a framework and also detailed process and where needed algorithm technical solutions to deal with bias.

 

Rapid7

STUART MILLAR

424

This pre-design stage also needs to consider the dataset itself being used for the use-case.  Is it being made from scratch vs. being inherited, is it labelled, if so who labelled it.  How accurate are those labels?  What does the class balance look like?  The lineage of the data should be tracked, consistently captured and have an audit trail.  A broader question is for the given dataset in question, figure out how we define, measure and deal with bias in that data. 

 

Rapid7

STUART MILLAR

424

At this stage one can also consider any legal requirements already existing to mitigate against bias, depending on the use-case.  If there is a legal requirement for the algorithm to be explainable/interpretable (in case of court case triggered by bias), that law is likely there for a reason.  If this is the scenario, consider very carefully whether this is the right-use case.  Be aware early on of any and all legal requirements across geographical regions that will affect your algorithm.

 

Rapid7

STUART MILLAR

426

Have a detailed and documented (in case of audit) discussion with stakeholders on:

What is the main intended objective?
What are any other intended objectives?
What possible learning outcomes do we need to mitigate against?
What is the specification of the algorithm?

 

Rapid7

STUART MILLAR

426

Decide what would a human review entail exactly to check for bias (at any point in the project) if artefacts (such as the data, or a trained model) were available for the closest of inspections.

 

Rapid7

STUART MILLAR

510

I feel like we need to recognise that algorithms are software.  We are building models and in essence writing software.  Best-practice for software is that is needs to go through rigorous, documented QA, meaning:

1. The algorithm does what it is meant to.
2. The algorithm learns what it is meant to.
3. The algorithm does not do what it is not meant to do.
4. The algorithm does not learn what it is not meant to learn.

I think point 4 is particularly poignant and we should be cognizant of that.

 

Rapid7

STUART MILLAR

574

In the deployment stage, most of what I have written in the previous comments still holds true, and extra tooling will be needed in dashboards etc to measure selected metrics that are identified as being indicators of bias, from class balance in datasets through to the distribution of predicts, data/concept and everything in-between.

 

Rapid7

STUART MILLAR

 

I feel a future framework should have technical algorithmic suggestions for dealing with bias.  For example, if an image classifier was learning from pictures of people, and we only wanted to predict their age, there are other learnings we may want to negate.  For example gender, and race.  One approach could be to label the training data with this information and actively try to negate the learning of these traits during training, for example with multi-task learning negating the appropriate gradients.

 

Rapid7

STUART MILLAR

 

Two interesting papers that are relevant:

Achieving Fairness through Adversarial Learning: an Application to Recidivism Prediction
Christina Wadsworth, Francesca Vera, Chris Piech

Does Object Recognition Work for Everyone?
Terrance DeVries, Ishan Misra, Changhan Wang, Laurens van der Maaten

7

 

John F. Raffensperger

 

I congratulate the authors on an interesting and important draft. You have outlined some important ideas and called out the long list of biases possible in AI. I hope this work expands and strengthens.
I am an operations researcher with experience in public policy, including problems of the commons. I know much about AI but am not a recognized expert in it. Further, I have some ignorance about your role and responsibilities. Feel free to ignore (or forward to some more appropriate party) any of comments here, which are my own.
Apologies for this format. I discovered your comments template after I had nearly finished writing!

   

John F. Raffensperger

1 Private versus public risk.

Your stated main audience is researchers and practitioners. These two groups probably don’t belong together in this way. Researchers don’t injure others very much through their work and they usually have little incentive to allow bias in their code; they even have incentive to call it out. But practitioners can injure a great many people through the enormous scale of their deployments, and practitioners have profit-seeking incentives to allow bias in code.
“Practitioners will benefit by gaining an understanding about bias in the use of AI systems.” I believe practitioners already have a very good understanding of the bias, as observed by the energetic discussion in the news media and scientific literature! But practitioners are not well motivated to eliminate bias. So I am skeptical (even cynical) of the benefit in explaining AI bias to practitioners.
“Risk-based framework…” Your paper describes the risk of bias in AI. Please expand on the distinct risks to each stakeholder group. (As I will explain below, your paper should name the various stakeholders clearly and consistently.) Most importantly, please distinguish private business risk from public risk.
Private firms engages in profit-seeking behavior which injures consumers. Thus, bias in AI is a problem of the commons analogous to a polluting firm or a restaurant selling bad meat. In doing so, the firm raises small risk to their own reputation while enjoying higher profits. But the injury to consumers can be large. Further, this injury is insidious, hidden, and hard to measure, like a toxin released to groundwater or like late-night stomach upset. Your paper mentions risks to firms and risks to users (“consumers” here), but the difference between them is much more important than your paper conveys.
Specifically, how the presence of bias in automated systems can contribute to harmful outcomes and a public lack of trust.
…How bias and trust interrelate is a key societal question…
The regulator need not care whether the public trusts an AI run by a business; the business should care because the business wants the consumers’ trust. The regulator should care only whether the automated business system harms the public; interaction of trust and bias is irrelevant to the regulator. In fact, your efforts to improve trust in AI could backfire by getting consumers to believe the AI is fine when in fact the AI is bad. Rather than encouraging the consumer to accept the validity of AI uncritically (“building trust”), perhaps the regulator’s job should include teaching the consumer to be skeptical of AI, to help them question the business about how their software works.
“Bias reduction techniques are needed that are flexible…” Again, the private company is responsible for finding their own bias reduction techniques. You are kind to try to give them techniques, but helping developers write software is not the regulator’s main job. The regulator should penalize bias. Knowles and Richards (your reference) have it exactly right, but your paper doesn’t follow through on their point. You need not suggest “resilience,” but rather please talk about regulating.
Here’s an interesting edge case: “interviews with admissions officials suggest that ‘they didn’t believe in the validity of the risk scores…’” So the school knew their AI was biased and should have fixed it. If they fixed it quickly, the regulator probably could stand idle. If they didn’t fix it, the regulator should hammer them for knowingly making biased admissions decisions.
“Contextual gaps lead to performance gaps… intention…” The regulator should not have to worry about intention until the court proceedings; the regulator should strenuously worry about performance.
“One challenge rests on the reality that decisions about which data to use for these indices are often made based on what is available or accessible,…” Again, you are generous to help the developers think about their data inputs. This is like telling Big Boy’s Burgers to monitor their refrigerator temperature. It’s helpful, but not the regulator’s main concern. More on Big Boy’s shortly.
You list core building blocks of accuracy, explainability and interpretability, privacy, reliability, robustness, safety, and security (resilience), and mitigation of harmful bias. But the costs in these blocks to business differ from their costs to the public.
• Accuracy, reliability, and robustness can cost the company and the public. The business will have incentive to improve these. That is, they will make sure the system works the way they want it to. These blocks could benefit or injure the public, but they are essentially a private problem.
• Explainability and interpretability cost the company but benefit the public. A business may have huge incentive to avoid explaining their system to the public, and even their managers and board members! The regulator could choose to require a firm to explain and interpret their AI to the firm’s customers as some kind of formal disclosure. I expect the firms would find such disclosure to be onerous, fraught with complications, and expensive. Can you write an example disclosure, say, from a hiring firm to its applicants? How could the regulator assess these disclosures?
• Privacy protects mainly the public. It is a high cost to the business and not in their interests. Easily avoided with a long EULA. The firm’s loss of privacy can benefit the public!
• Whose security and safety do you mean? These sound positive but seem vague. “Security” sounds like mainly a private benefit while “safety” sounds like a public benefit.
These “blocks” look like a great list but scattershot with their costs and benefits labeled as private or public.
“The intention is to leverage key locations within stages of the AI lifecycle for optimally identifying and managing bias.” Your AI lifecycle omits the most important part: when the AI is deployed. And this is why your paper doesn’t seem to really dig into the regulatory problem.
I have in mind a simple Big Boy’s Bad Burgers model. When a restaurant wants to save some money, the manager can be tempted to sell meat they know is bad. If the city health inspectors discover this behavior, the inspectors can warn, fine, or shut down the store. That is regulation. Please work through some simple scenarios of bias. What should the regulator do on finding bias in mortgage lending or job hiring?
Your paper recognizes bad beef in only a couple places — “In extreme cases, with tools or apps that are fraudulent, pseudoscientific, prey on the user, or generally exaggerate claims, the goal should not be to ensure tools are bias-free, but to reject the development outright in order to prevent disappointment or harm to the user as well as to the reputation of the provider.” These folk won’t reject the development! They want to prey on the user. For firms who give a bit more than lip service to avoiding fraud, “[e]xpertise matters…”, but it’s also expensive, when the developers are on short deadline.
“Improving pre-design practices to ensure more inclusive representation” is akin to telling restaurants to keep their refrigerators sufficiently chilled, to train managers to identify bad meat, and to discard meat they believe to be bad. All good ideas and helpful in mitigating bias. But what should the regulator do when the restaurant keeps selling bad burgers? Your paper doesn’t address this at all.

   

John F. Raffensperger

2 Please declare the actors. 

I began my review as though I were refereeing a journal manuscript. In the first read, I simply look for clarity, improvable sentences, etc. In your generally well-written paper, I found some passive voice. At first, the passive voice seemed like issues of minor copy-editing. But as I went on, it became worrisome as I will explain. Bear with me.
The writing tics “There is” and “There are” get tiresome. Search & destroy as follows:
• “There are specific conditional traits associated with automation that exacerbate distrust in AI tools” would be better as “Specific conditional traits associated with automation exacerbate distrust in AI tools.”
• “There are many challenges that come with this common practice” would be better as “Many challenges come with this common practice”.
• “There are also examples from the literature which describe technology that is based on ….” would be better as “Examples from the literature describe technology that is based on …”.
• “…there is increasing evidence that…” would be better as “…evidence is increasing that…”.
Passivizing perfectly good verbs:
• “bias is often connected” would be better as “bias often connects”.
• “… what is required for building systems…” would be better as “what building systems require” or “what system builders require” or “what developers require”.
• “Even if datasets are reflective of the real world…” would be better as “Even if datasets reflect the real world…”.
• “Improving trust in AI systems can be advanced…” would be better as “Improving trust in AI systems can advance…”.
Backwards passive sentences:
• “…many people are affected or used as inputs by AI technologies and systems” would be better as “… AI technologies and systems affect or use as inputs many people”, which now reveals the silliness of the sentence.
• “Historical, training data, and measurement biases are ‘baked-in’ to the data [sic]…” would be better as “… the data ‘bakes in’ historical training data and measurement biases…”, revealing the silly “data bakes in data”.
• “Data representing certain societal groups may be excluded in the training datasets used by machine learning applications” would be better as “Machine learning applications may use training datasets which exclude data representing certain societal groups”.
• “Indeed, there are many instances in which the deployment of AI technologies have been accompanied by concerns of whether and how societal biases are being perpetuated or amplified” would be better as “Indeed, concerns of whether and how societal biases are being perpetuated or amplified have accompanied many deployments of AI technologies”. Better yet: “Indeed, concerns of whether and how AI technologies perpetuate or amplify societal biases have often accompanied deployments”.
Omitting an important actor. Like the suggestions above, this suggestion may seem mere copy-editing, yet it takes us down a rabbit hole to major policy implications: please distinguish the various actors explicitly, e.g., owners, users, programmers, and the general public. Doing so may lend an accusatory tone in some places, but at least get each sentence to a point of clarity before toning it down. Below, I address a few phrases with passive voice and then I discuss some implications of choosing the actor.
• “… that harmful biases are mitigated.” Perhaps this example of passive voice is most important. Please state who is responsible for mitigating bias. Surely it’s not the job applicant, the Uber rider, nor the general public. Name all the actors and activate them consistently in now-passive sentences. Search for and clarify “user” everywhere, because your paper in different places uses that word to mean different folk. Here is a list of possible actors.
The Uber driver is not a “user” of the AI; Uber management is the user. Maybe you could call people like the Uber rider “consumers” in the market economy sense.
Some consumers are actually victims, not merely disadvantaged people. “Certain social groups” – many of us – have little ability to fight a large corporation imposing its EULAs and AIs on us. “This kind of systemic discriminatory pricing is perpetuated on the citizens
….”? The regulator should be busting heads! We need the regulator to shut down Big Boy, not simply suggest a “3 stage approach” to the chefs.
As your paper implies, developers have responsibility in this problem. You are talking to and about developers, but you do not emphasize their heavy responsibility. For example, NIST could certify AI programmers just as government agencies certify airline pilots and civil engineers. But this makes sense only if you identify them as having responsibility, like chefs who might cook bad meat. (Are programmers the same as developers? I think of a “programmer” as the person with hands on keyboard, while “developer” could refer to a company.)
Business executives, who manage AI implementation and control their features, should ultimately bear great responsibility. They can tolerate or perpetrate bias in AI for profit; they are often perpetrators.
“In such cases the technology can be taken out of production.” I’m sure you mean “In such cases, the business managers should take the technology out of production,” just like saying “the restaurant should stop selling bad meat.” Put this way, it’s simply naive, even with the rest of that paragraph.
You don’t discuss liability, but liability is a great motivator to executives. Liability enters the discussion as soon as we mention the business actor explicitly. Is liability sufficient to regulate bias in AI? That may be the laissez faire approach. I beg you to propose more energetic regulation! I thought about liability only as I thought about this actor, because I tried to re-word a sentence in active voice. Switching to active voice has policy implications.
What role has government? Surely regulators have a role! “Federal laws and regulations have been established to prohibit discrimination based on grounds such as gender, age, and religion.” Yet your paper omits “regulator” as well. Who regulates bias in AI? Who should regulate it? Should the regulator be NIST, another federal agency, a state agency, perhaps county or city governments?
What if the AI results in business action which violates the law? I hope someone would warn, fine, or shut down the business. Your paper has none of this.
“Federal laws and regulations have been established to prohibit discrimination based on grounds such as gender, age, and religion.” You might reword as “Federal law prohibits discrimination based on gender, age, and religion.” Why regulate bias? Because it breaks the law! Doesn’t matter whether they used “AI”.
Explain this to the developers in the introduction: “If your AI is biased, you violate Federal law. That’s a felony. We’re trying to help you stay out of jail.” They may take your “3 stage approach” more seriously. Bring the regulator and the law into your paper.
• Your paper mentions a few examples of government AI deployments. Please separate the discussion of private AI from government AI. These are different problems with different actors, costs, and risks. (Also, “expansion of AI into many aspects of public life” should probably be “expansion of AI into many aspects of commerce” or “expansion of AI into many functions of society”, because “public life” could mean only government.)
In contrast to for-profit AI, public trust matters for public AI. Even if the crime sentence calculator is unbiased and correct, our democracy requires public trust in that calculator.
Society must address bias in government AI with different tools than we use to address bias in private AI systems. I expect the tools will be court proceedings, inspectors general, and whistle-blowing researchers. How can society address intentional bias in government-operated AI systems? Please consider a full workup on the problem, a separate major section at a minimum. How can we regulate military AI?
• “Another cause for distrust may be due to an entire class of untested and/or unreliable algorithms deployed in decision-based settings. Often a technology is not tested – or not tested extensively – before deployment, and instead deployment may be used as testing for the technology.”
I’ll try to add actors: “Consumer distrust may rise from an entire class of untested and/or unreliable algorithms which businesses deploy in decision-based settings. Often the programmer did not test a technology – or did not test it extensively – before management deployed it, and instead management used deployment as testing for the technology.” This new wording raises questions of responsibility which you may want to avoid but should face.
Throughout this note, I’ve made a case for a stronger discussion about regulation, but I don’t know your role or responsibilities. If you’re going to ignore regulation, your paper should explain why you’re talking about regulation but ignoring regulation. You might change the title to “A Proposal for Developers to Identify and Manage Bias within Artificial Intelligence”. But developers share responsibility for AI bias with their handlers, bosses, and investors, so restricting your Proposal to developers won’t be the last word on this problem. At some point, perhaps in a year or so, you’ll have to address all the actors.
• “As these tools proliferate across our social systems, there has been increased interest in identifying and mitigating their harmful impacts.” The first clause has active voice, but makes AI seem to grow autonomously. Not yet, Skynet! The second clause is passive; can you document or reference this interest? Perhaps “As businesses and government agencies have deployed these tools across our social systems, the media [refs] and lawmakers [refs] have increased interest in identifying and mitigating the harmful impacts of these tools.”
• “Improving trust in AI systems can be advanced by putting mechanisms in place to reduce harmful bias in both deployed systems and in-production technology.” Who is in charge of this!?
Here is a possible rewording: “Businesses can advance consumers’ trust in AI systems by installing mechanisms to reduce harmful bias in both deployed systems and in-production technology.”
But you could instead mean “Developers can advance regulators’ trust by installing mechanisms to reduce harmful bias in their in-production technology and deployed systems.” This version has a stronger bite with this selection of actors, but this version may imply regulatory policy that you haven’t thought through or that you wish to avoid.
• “PRE-DESIGN: where the technology is devised, defined and elaborated DESIGN AND DEVELOPMENT: where the technology is constructed DEPLOYMENT: where technology is used by, or applied to, various individuals or groups.” Who does all this? Only developers?
Figure 1 vaguely mentions “stakeholders”. The section “Practices” mentions expert stakeholders. “DESIGN AND DEVELOPMENT STAGE” mentions software designers, engineers, and data scientists. “DEPLOYMENT STAGE” mentions end users, operators, subject matter experts, humans-in-the-loop, and decision-makers. Where do investors and Congress fit in?
Build a complete list of actors and include them all in your Proposal, even if your audience is only developers. Developers and their businesses have incentive to bia$ their decisions; they will be the perpetrators. Don’t be shy to say so.

   

John F. Raffensperger

3.1 Computational law and constrained AI

Consider the paperclip-making AI that took over the world. It had a fine objective function – to make paperclips. But it had too few constraints in its optimization. Most importantly, it had no constraint to follow the law.
Society has to require that AIs follow the law, just as society requires its member to follow the law. Computational law will prevent world domination by paperclip AIs and Skynets. We should start our implementation of computational law ASAP! Teslas should stop at stop signs. Military drones shouldn’t shoot civilians. And banking AIs shouldn’t discriminate on the basis of gender, age, or religion.
Once we think about bias in AI in terms of the law, as constraining the AI to follow the law, a lot of the process falls naturally into place.
(Technical note: as an operations researcher, I view AI as heuristic optimization over huge datasets. Among other shortcomings, modern AI as I understand it does not include the rich constraints of classical linear integer programming which can calculate propositional logic. I expect the need for full binary propositional logic in AI to implement computational law.)

   

John F. Raffensperger

3.2 Certify developers

Regulation of AI could eventually include certification of developers, as the FAA certifies pilots and state governments certify civil engineers. The certification could test developers for their knowledge of AI, bias, legal definitions of discrimination, and regulations. The regulator could require developers to sign their names on software specs to certify that the spec follows the law, just as a professional engineer must sign the design for a bridge. The developer could lose their coding license if the regulator finds bias in the software. This would make the developer directly accountable for bias. Businesses could then announce “Our developers are NIST-certified for bias.”

   

John F. Raffensperger

3.3 Certify code 

Similarly, the regulator could certify some AIs as the FAA certifies airplanes and as inspectors certify finished bridges. Because AI is deployed so widely, the regulator could require only selected “Bad Burger” businesses submit to certification should they fail to respond to warnings.
To get code certified, the business would submit their software to the regulator’s testing. The regulator could test an AI and inform the company of the results, or the regulator could provide test cases to the company and the company could return authenticated results.
The regulator could obtain test cases in several ways. The regulator could write tests. The regulator could require businesses to submit their tests. And the regulator could hire researchers to create the tests.
The testing could be voluntary, at least at the start for many applications. For some deployments, e.g., job hiring at previously naughty firms, the regulator could require validation.
Companies could then announce their success in passing these tests: “Our credit authorization software is certified free of bias.”
To implement this testing regime, your “blocks” need simplification.
(1) A business should be ready to show the specifications of their software.
(2) Those specifications must show respect for the law.
(3) The business must prepare for the regulator’s testing their code.
(4) The regulator’s tests must be able to check authoritatively whether their code follows the law.
This specification-and-test approach fits naturally in developer workflows. Developers just need the test cases from the regulator and an authentication mechanism to return the results. Further, this approach works for all code, not just “AI”, whatever that is. And then business could announce “Our hiring software is NIST certified for bias.”
The regulator could invite firms to submit edge cases automatically to the regulator’s databases for discussion and review (e.g., the Tesla may creep through the stop sign to allow passage of an ambulance). With this submission process, the business is transparent with the regulator about a case which the regulations address only vaguely.
A business’ willingness to submit these edge cases would play a role in court cases. For example, if the regulator finds bias in a company’s code, and the company had never submitted an edge case, the regulator could argue that the company was doing business in bad faith. If the company had submitted a great many edge cases, the company may be able to argue that the regulator’s tests were incomplete.

   

John F. Raffensperger

3.4 This testing mechanism enables the regulator can measure and manage consumer risk. 

Using the test mechanism, the regulator can manage risk explicitly, e.g., setting probability limits on bias and distinguishing risk to consumers from risk to business. Using a statistical approach, the regulator’s test cases could assess the probability of benefit to the public versus probability of harm to the public.
The regulator should carefully distinguish Type 1 error from Type 2 error, because they have asymmetric costs. For example, if a business loans too much money (false positive), the consumer benefits while the business loses. The regulator need do nothing in this case. If a business loans too little money (false negative), the consumer probably loses more, especially with bias. Require the probability of Type 1 (in this example) to be lower than the probability of Type 2. That is, the regulator should ignore Type 1 error and put all effort into preventing the Type 2 biased error.

   

John F. Raffensperger

3.5 The regulator must choose their battles. 

Because “AI” is vague and widely deployed, the regulator must choose business sectors (e.g., mortgage lending) to regulate in this way.
The regulator will need a process to call out specific businesses for scrutiny, the way the Justice Department calls out jurisdictions for scrutiny of voting procedures.
“This proliferation of AI bias into an ever-increasing list of settings makes it especially difficult to develop overarching guidance or mitigation techniques.” Given the regulator’s budget, the regulator has to choose their battles. Maybe the regulator could develop an AI to figure out which business sectors and applications will do greatest public damage. The risk assessment describe above improves if the regulator can use explicit costs to the consumer. The regulator could prioritize prosecution of bias with high expected cost to consumers over bias with low expected cost to consumers.
But “settings already known to be discriminatory” are easy to find. Follow the money.

   

John F. Raffensperger

4 Appendix A is excellent, a great contribution. 

Appendix A is a great contribution to the discussion. I urge you to tend it and build on it, so it can help set standards.
For each type of bias, is it likely to cost mostly business or mostly consumers? Which types cost consumers the most? How would you test a developer to be able to certify their knowledge of that type of bias? How would you test their code for that type of bias? I imagine a database of tests for each type of bias. How would a developer identify an edge case worthy of discussion?

   

John F. Raffensperger

5 Conclusion 

How would the regulator identify bias? How would the regulator assign responsibility for bias? What consequences should the regulator impose on the developer and the business for bias? How would the regulator decide the degree of penalty for a given instance of bias? I hope you would spend some time thinking about this problem.
“Participants also referred to the long-term nature of this challenge.” Yes. We’re in the first generation. Plan for the third generation. What should the regulator do with fabulous bias-detection software and an enormous database of bias-checking tests?
Congratulations again on an important paper! Thanks very much for attending to this important problem of AI bias. I hope you find some use in my comments here. Given my effort in writing these comments, I would be grateful for an acknowledgement from one of the authors. I submit this respectfully

8

American Statistical Association

Steve Pierson, Ph.D.

 

I write to pass along the recommendation of a member of the American Statistical Association. He recommends you reach out to the professional associations with experience in psychometrics and even the broader testing and measurement expertise community. These associations would include the following:
Associaton for Psychological Science (APS),
American Educational Research Association (AERA)
and National Council on Measurement in Educa5on (NCME)
University of Maryland Education and Measurement
His recommendation is based on the fact that, because AI is measuring and evaluating a variety of people on a large scale, one needs to establish reliability and validity and minimize adverse impacts against different demographic groups. Psychometricians do this with testing measurement invariance techniques for evaluating equal measurement item and test parameters via Structural Equation/ Confirmatory Factor Analysis techniques and differential item functioning parameters. Older techniques include those used by the Guilty Knowledge Test.

   

Mark Y. Czarnolewski, Ph.D., LLC

 

The American Statistical Association suggested that I contact you to add the American Psychological Association, particularly Division 5, to the list of organizations that have psychometric expertise.
Thank you for considering this list to contact in your important efforts to calibrate the use of AI for identification.

9

Salinger Privacy

Anna Johnston,

 

 Dear NIST,

I wish to offer some brief feedback on your recently released first draft document of “A Proposal for
Identifying and Managing Bias in Artificial Intelligence" (Special Publication 1270).
My viewpoint is that of a specialist advisor to clients – both government and private sector organisations – developing or procuring AI systems. I am the founder and Principal of Salinger Privacy, which is a specialist privacy consultancy firm based in Sydney and Melbourne, Australia.
In order to assist our client base, which is primarily the privacy officers within an organisation, or their legal / risk / compliance advisors, we recently published our own guide, “Algorithms, AI, And Automated Decisions - A guide for privacy professionals”.

Interestingly, your paper has framed the issues in quite a similar ‘lifecycle’ way to where we landed. In our guide we propose a “4 D’s framework”, in which both legal and ethical issues must be examined across four stages: Design, Data, Development and Deployment.
I support the need for a framework for trustworthy and responsible AI that is flexible enough to apply across different sectors and applications.

My feedback on your proposal is that I do not believe that the issue of managing bias can or should be divorced from other legal and ethical issues in AI and other algorithmic systems. In other words, I believe a holistic approach is needed. Further, such a framework must include guidance for non-technical advisors (such as those who will need to conduct Algorithmic Impact Assessments), as well as technical standards for system developers.

In my view, for an assessment of an algorithmic system to be robust, it should encompass:
Legal compliance – ensure the algorithmic system is lawful, with particular focus on privacy, anti-
discrimination, and consumer protection laws Social impacts – consider the social, political, and economic context for a deeper appreciation of potential privacy-related harms, and
Technical considerations – integrate testing for accuracy, performance, fairness and bias.
I have attached a copy of our guide as an example of what we have developed as guidance for a non-technical
audience (specifically, privacy officers). In particular, in addition to identifying legal and ethical risks and challenges, we have attempted to articulate what ‘good’ could look like, when developing or procuring AI systems.  Please note that the attached guide is a commercial publication, and as such is not for further distribution or publication without permission. However the text of this submission is not confidential.

Please do not hesitate to contact me if you have any questions arising.

10

OneSpan, Inc.

Michael Magrath

199

Stability over a long period of time of the systems should also be considered as an important characteristic of a trusted AI system. The high testing accuracy is not enough to ensure system performance over time, because assumptions made at time of training/creation may no longer hold after the system is deployed. 

 

OneSpan, Inc.

Michael Magrath

213

The ISO definition, taken out of its context, is not helpful: the term “truth” is highly loaded philosophically and would need an explanation in the context of this document. Will data scientists interpret “truth” the same way as regulators? Besides, we were unable to find this definition in the cited ISO standard. Instead, we found “Bias: expectation of error of estimation” which is much less ambiguous but is probably too technical for the current document.   https://www.iso.org/obp/ui/#iso:std:iso:3534:-1:ed-2:v2:en:term:1.33 

 

OneSpan, Inc.

Michael Magrath

219-231

 It would be beneficial if NIST were to include additional information on what is considered as harmful bias. Is there a categorization between non-negative and harmful biases or are they all considered equally harmful in the context of this report (link to terminology in the last section)?

 

OneSpan, Inc.

Michael Magrath

267

The specific conditional traits that exacerbate distrust in AI could be analysed in more details. It would be beneficial if NIST provided some examples. 

 

OneSpan, Inc.

Michael Magrath

389

The quoted formulation is too generic. Can “expert auditors” really provide the sufficient “sanction of authority” that “AI can be trusted” or just that a given AI system or class of AI system for some given applications can be trusted? 

 

OneSpan, Inc.

Michael Magrath

445

In several cases, the practitioners who need to use the AI tool, are also often those whose job might be at risk because of the AI tool itself. Their incentive is therefore often lower. This makes the point of 441-442 even stronger. 

 

OneSpan, Inc.

Michael Magrath

465

Be a bit more specific about rejecting flawed designs. How is such rejection process envisioned and who would be responsible for it (i.e., developers of AI, providers of AI, expert auditors, governing bodies)? 

 

OneSpan, Inc.

Michael Magrath

490

As written, “This is also a place where innovation in approaching bias can significantly contribute to positive outcomes.”. We suggest replacing the word “can” with “could”. 

 

OneSpan, Inc.

Michael Magrath

513

Subject matter experts are also crucial at that stage because for their deep knowledge of the domain in which data is collected. For instance, it is hard to believe that a medical AI tool could solely be designed and developed by “designers, engineers, and data scientists” without an ongoing close collaboration with medical doctors. 

 

OneSpan, Inc.

Michael Magrath

558

Include more information on what is considered as “modest approach” and how it would translate in an AI bias risk context. The cited report [111] does not provide a lot of information on it. 

 

OneSpan, Inc.

Michael Magrath

633

e ability of the user to provide feedback on the recommendation/decision of the AI system is another factor that impacts this gap. 

 

OneSpan, Inc.

Michael Magrath

703

We found the Glossary around bias types and their definitions very useful for setting a common language in this field. However, this glossary may benefit from some grouping of the biases, by similarity, or by relationship with the 3 stages defined in the documents. 

11

 

Kathy Rondon

 

As a data governance professional focusing on the public sector, I would like to submit comments in response to the request for public comments on SP 1270:
Proposal for Identifying and Managing Bias in Artificial Intelligence. I believe this is a critical topic for our society at this juncture in the development of AI and machine learning technology. My comments are from the perspective of a data
professional, rather than an AI developer, and are attached to this email, using the suggested template.

   

Kathy Rondon

212

In data ethics terms, in seeking to minimize harmful impacts of bias in AI, the ISO definition that casts bias in terms of "deviation from the truth" may not be optimal. If the truth is actively discriminatory, having AI perpetuate that "truth" may not be an ethical approach. Suggest using an alternate bias definition that reflects the need to identify more than just the data's deviation from the as-is state. Data can be accurate and precise but still not be appropriate for use in a specific or general use case. The appropriateness of data for the specific AI utility should also be considered in any definition of bias.

   

Kathy Rondon

292

Using what data is available, rather than allowing the research design to dictate data collection and use is indeed a problem. Data science that does not follow rigorous research design is not science--it's alchemy. NIST standards should make it clear that letting available data drive the development and training of AI technology yields products that cannot be  deemed reliable under NIST standards. Other scientific standards would not allow this "backing into" results.

   

Kathy Rondon

335

Suggest consideration of another "bullet point" to the list of reasons for distrust: The application of technology or data that were developed for collected for a specific purpose to a different use case (particularly without public  transparency regarding this "re-application") which may decrease the reliability of results. You mention this later in the publication, but emphasizing it with a bullet here would seem to be appropriate.

   

Kathy Rondon

521

Accuracy as the sole or primary characteristic of quality is a concept that has been challenged in academic literature as it applies specifically to data quality. Suggest the incorporation of different quality frameworks, such as the one based upon the work of MIT professors Richard Wang and Diane Strong in their article "Beyond Accuracy: What Data Quality Means to Data Consumers."

   

Kathy Rondon

650

The risk of individuals "offloading" decisions to an automated tool suggests a workforce or general public without sufficient education or training in data and AI literacy to do otherwise. Suggest NIST consider a standard that new AI applications include documentation and "user guides" that specifically address how the AI should be incorporated, used, and communicated to user groups. Also, more rigorous data ethics and data quality training for AI developers, whose academic work and training are typically more programming based and not sufficiently focused on data quality or ethics.

   

Kathy Rondon

707

As NIST works collaboratively to develop guidance on this topic, suggest an effort to recast the AI bias issue entirely to focus less on technology development and more on the identification, preparation, and validation of data used to train AI.  It's not that data bias isn't discussed in this publication, but rather that it appears to a subsidiary of the AI technology development cycle, rather than as the primary issue. Data--articulating what the right data is and then collecting it and making it transparent to the public--should be the primary focus of standards seeking to minimize harmful effect of bias in AI.

12

University of South Alabama

Aviv Segev

729

Add new definition in Table 1. "Bias bias - importance of the bias component of prediction error is inflated, and the variance component of prediction error, which reflects an oversensitivity of a model to different samples from the same population, is neglected." "Seagroatt V, Stratton I. Bias in meta-analysis detected by a simple, graphical test. Test had 10% false positive rate. BMJ. 1998;316(7129):470-471.
Henry Brighton, Gerd Gigerenzer, The bias bias, Journal of Business Research, Volume 68, Issue 8, 2015, Pages 1772-1784, ISSN 0148-2963, https://doi.org/10.1016/j.jbusres.2015.01.061."

13

Arthur AI

Lizzie Kumar

 

Arthur appreciates the opportunity to provide feedback on NIST’s proposal for identifying and managing bias in artificial intelligence (AI). As a company building bias detection solutions, we commend NIST's efforts to further research on this important issue. We are especially happy to see that the document draws attention to two important conclusions from their literature review. First, technology must be considered within the social system in which it is developed and deployed. Second, and relatedly, bias detection and mitigation efforts need to span the full development cycle of the technology at hand. These ideas lead the authors to point to "contextual gaps" between the initial design stages and the eventual deployment stage as a source of unexpected problems. The authors suggest that this makes it necessary to closely monitor how AI performs in the wild, a problem which Arthur's team has been tackling.

In addition to our more specific feedback (submitted through the provided form), we would like to point out that an AI project lifecycle does not "end" at the deployment stage, as is suggested by the framework presented in this proposal: changes within the deployment setting also need to be monitored. In practice, AI models developed with machine learning are rarely static, and continuous improvement efforts are often
ongoing. A model may be manually retrained as more data is collected, or may be updating in real-time as it encounters new scenarios, as in the case of reinforcement learning; bias can be introduced through these changes even if it did not exist at the initial point of deployment. 1 The context of the deployment setting itself may also change over time, even if the model remains static—due to either a feedback loop from the model itself,2 or processes external to it, causing a problem commonly known as data drift. For any of these reasons, at some point during the deployment stage, a developer may need to circle back to the beginning of the design process and craft a whole new solution to the problem at hand. We therefore caution NIST against unintentionally framing deployment as a static stage. Instead, like the document's Figure 1 indicates,
the deployment phase can evolve in a way that leads the product back into design or pre-design. If standards are developed with the aim to identify and manage bias in this phase, they should be implemented in a way that is mindful of this dynamic. Sincerely, Arthur AI
1 https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist
2 http://proceedings.mlr.press/v81/ensign18a.html

 

Arthur AI

Lizzie Kumar

525

Not sure ecological fallacy is the best way to describe the way machine learning uses group membership to make inferences about individuals— many such inferences are not fallacies

 

Arthur AI

Lizzie Kumar

655

Unclear why counterfactual fairness is highlighted here as a natural way to address "contextual gaps." While it is true CF addresses weaknesses of other fairness metrics, it is difficult to implement in practice and requires very specific models of the world to identify bias.

 

Arthur AI

Lizzie Kumar

673

This figure suggests a project can move from Deployment back to Pre-design. However, we did not see this notion addressed in the text.

14       Comment not available for posting
15 DataRobot Haniyeh Mahmoudian 197-200 (Section 1) The characteristics mentioned in this sentence to cultivate trust in AI does not include data quality and governance framework which includes accountability,monitoring, tracability, and auditability. In addition, the AI systems should be humble meaning when uncertain notifying the user/stakeholder. Lastly, compliance with regulation and organizational policies which require the process to be fully documented
  DataRobot Haniyeh Mahmoudian 262-265 (Section 2) This sentence undermines the uniqueness of addressing AI bias at use case level. Providing framework can help with managing and mitigating the bias that may occur in the system but how fairness should be defined, how bias should be detected, margin of tolerance for bias and other considerations should be viewed at use case level and cannot be addressed by a broad framework 
  DataRobot Haniyeh Mahmoudian 288-299 (Section 2) I agree with the statements and examples mentioned. I think it's worthwhile mentioning that in some cases these systems are designed without input of subject matter experts which can lead to inapproperiate problem definition and result in unintendet discrimination
  DataRobot Haniyeh Mahmoudian 502-508 (Pre-design stage) I agree with the statements and examples mentioned, however, this assessment should not only be limited to pre-design stage but through out the whole process.
  DataRobot Haniyeh Mahmoudian 566-568 (Design and Development stage) Target leakage occurs when one or combination of multiple features used by the model leaks information about the target. For example, consider a hospital readmission use case. The target is whether a patient is readmitted to hospital. One of the features in the dataset is discharge type which can be discharged to home, caregiving facility, nursing home, expired. The discharged type expired already leaks information that the patient will not be readmitted.
  DataRobot Haniyeh Mahmoudian 410-412 (Section 4) At DataRobot we use a framework with 3 components, people, process, and tech. People reflects stakeholders, what trust means for them, how they interact with the system or might be impacted by it. The process focuses on planning the process based on impact and risk assessment conducted during the ideation, design of the system, and its evaluation. These assessments should include management and mitigation protocols for various situations to address concerns such as bias. Lastly, technology includes all the technical aspeccts of building an AI system, data quality and integrity, model performance, system evaluation and thorough testing of the system. Additionally, the technology should be designed with governance framework in place and include full documentation for auditability, tracability, reproducibility, repeatability, and accountability.
16 ACCESS of WNY
American University SOC
Arab American Association of
New York
Aspiration
Citizens Privacy Coalition of
Santa Clara County
Council on American-Islamic
Relations, New York (CAIR-NY)
Defending Rights & Dissent
Electronic Privacy Information
Center (EPIC)
Emonyo Yefwe
Ethics In Technology
European Center for
Not-for-Profit Law
Fight for the Future
Hamai Consulting
Occupy Bergen County (New
Jersey)
Privacy Watch STL
Restore The Fourth
RootsAction.org
S.T.O.P. – Surveillance
Technology Oversight Project
The Legal Aid Society of NYC
X-Lab
    Re: Comment of 20 Civil Rights and Community-Based Organizations In Response To First Public Draft of NIST SP 1270 "A Proposal for Identifying and Managing Bias in Artificial Intelligence".

We, the undersigned civil rights and community-based organizations, write to express our serious concerns with NIST’s draft proposal for identifying and managing bias in Artificial Intelligence (“AI”). Contrary to NIST’s proposal, mere technical safeguards in development and deployment of AI are incapable of fully mitigating the technology’s risk of bias. NIST’s suggestion – focus on fixing the algorithm – is unhelpful and dangerously idealistic.

We raise concerns regarding the proposal’s focus on narrow technical definitions of bias. Technical drivers of AI bias, such as design failure and poor training data selection, only account for a small fraction of the harms these systems create. While the report includes a long list of biases in its appendix and acknowledges external influences and contextual issues, the framework does not solve for the systemic and institutional biases and dynamics of power that compound the technical bias of AI systems. Additionally, such a framework ignores how probabilistic risk scores deprive individuals of the opportunity to be evaluated on their own merits and the tremendous power of law enforcement.

This risk is most acute for law enforcement AI, including forecasting and surveillance tools. Even on the rare occasions when such AI systems are technically unbiased, their broader community impact is deeply distorted by the over deployment of such systems in low-income and BIPOC communities. Such inequity is further compounded by officer discretion in how to respond to system outputs, as well as the endemic discrimination that impacts every decision point for those who are arrested as a result. Even worse, the constant stream of revelations about police bias are a reminder that no AI system can remain unbiased in the hands of a biased user.

NIST fails to account for how AI systems are routinely misused by  public and private sector entities. For example, in many cities that use facial recognition, officers routinely edit images prior to running a search. Such alterations add multiple layers of bias that would never be identified during NIST’s pre-design, design, and deployment review. Using “unbiased” algorithms only addresses a minute aspect of AI Bias.

Even if every technical driver of bias in such facial recognition software is addressed, doing so does not address the bias in the decision-making process that determines when to use the algorithm, how to interpret its results, and its broader social impact.

Turning to NIST’s evaluation of data set integrity, the proposal is once again far too narrow. In many of the most sensitive areas of AI deployment, the underlying data can never be effectively sampled because the data itself is the distorted product of biased human decision making. A powerful example of this is crime rate data, which is frequently incorporated into predictive AI systems.

If we simply focus on the algorithm, as NIST suggests, we see sampling bias: overpoliced communities are overrepresented in algorithms’ training data, and disproportionately targeted by algorithms. The technical solution to sampling bias is to seek out a more balanced training dataset. But the truth is that there simply is no unbiased dataset for policing in America, and sampling techniques will only automate the human-made inequality that has defined policing in America since its inception.

NIST’s laboratory evaluations of AI efficacy are often received by the public without the understanding that the reliability of such systems was demonstrated under testing conditions and not real-world conditions. In fact, there is no available data on how AI systems perform in the hands of law enforcement users in real life settings. Contrary to the proposal, “deployment monitoring and auditing” is woefully inadequate to address such a performance gap. This is particularly true as there are few standards on how users can deploy such technologies, leading to near-infinite permutations of AI
deployment that would need to be compared to lab benchmarks. This type of data needs to be evaluated before the technology is tested on vulnerable communities when its results can impact life and liberty.

Lastly, NIST fails to answer the most important question of all: Should we build this technology in the first place? This is a threshold inquiry that must be satisfied in the pre-design phase before moving on to questions about how to measure and optimize such technology. The report does not give sufficient weight to the importance of the pre-design phase and should more forcefully emphasize the need to stop development of harmful technologies or technologies whose just and equitable use cannot be assured at that phase.

In other areas of scientific development, we’ve long recognized that some advances simply come at too high a price. Consider the breakthroughs we could have made in chemical and biological warfare over the past 50 years if permitted. Such agents would be potent weapons in our military arsenal, but they would pose an intolerable risk to all of humanity. Many of the AI systems under development today also impose an intolerable cost, and not merely in the
“wrong hands,” but in any hands.

Technical fixes remain important, but for the foregoing reasons, they are insufficient for assuring the just and equitable development of AI technologies in this proposal.


Signed,
ACCESS of WNY
American University SOC
Arab American Association of New York
Aspiration
Citizens Privacy Coalition of Santa Clara County
Council on American-Islamic Relations, New York (CAIR-NY)
Defending Rights & Dissent
Electronic Privacy Information Center (EPIC)
Emonyo Yefwe
Ethics In Technology
European Center for Not-for-Profit Law
Fight for the Future
Hamai Consulting
Occupy Bergen County (New Jersey)
Privacy Watch STL
Restore The Fourth
RootsAction.org
S.T.O.P. – Surveillance Technology Oversight Project
The Legal Aid Society of NYC
X-Lab
17 International Society for Pharmacoepidemiology   1 In view of the document draft, there are a few points we believe are relevant to consider:
1) Achieves comprehensiveness of results
The AI system-generated should be able to achieve comprehensiveness in the corresponding sector to prevent bias.
2) Deployment phase should also include reporting details of the design and development phase, ideally including the iterative process when developing the algorithm.
3) To develop best practices in pre-designing, designing, developing, and deploying AI systems.
4) To increase transparency and replicability
5) If possible involving multiple data sources throughout the process, especially in the validation process
  International Society for Pharmacoepidemiology   4 From the title, the expectations for this proposal were to provide a solution to first, identify biases in AI, and second, to manage biases in AI. However, we are struggled to identify constructive and scientific recommendations in how to identify and manage bias, across different types of biases mentioned in the document. Instead, the document included extensive descriptions, yet in a little bit of spontaneous approach, of different types of bias with some examples. It would be very helpful to the readers if the documents could include some sensible approaches in both identifying and managing bias. Otherwise, it may be sensible to consider changing the title of the document.
  International Society for Pharmacoepidemiology   101 The abstract sets the tone for this proposal, where the focus is directed towards establishing the credibility of AI without the mention of its actual validity for use. Biases not only affect AI trustworthiness but more importantly misinform stakeholders, with potentially serious consequences to its users and the broader public. Perhaps the authors wish to convey that biases will have diminished effects on misinforming the audience when properly identified and managed and output can be interpreted sensibly and productively. This link between AI biases and their effects on actual AI validity and consequently AI trustworthiness is currently missing. Overall, the NIST should be commended for providing recommendations to advance the science of bias identification and minimization significantly, particularly for the principles, practices and implementations. This document should add a statement that evaluation frameworks should be used to guide the design of the bias identification and minimization approaches and cite sources where different frameworks can be found.
  International Society for Pharmacoepidemiology   207 Starting from the fourth paragraph, there are instances where the concepts of biases and trustworthiness were mixed together, e.g. line 209 “Managing bias is a critical but still insufficiently developed building block of trustworthiness.” The authors are advised to rethink the differences between the challenges brought by distrust in AI vs that by biases in AI as they are very different matters. Distrust in AI when existing biases are not addressed nor declared should not be discouraged as it reflects the critical thinking of its audience. This is exactly what is needed when using AI – human intelligence and leadership to ensure the output is informative and ethical, where AI system provides. (and also stated in Section 3 – Approach where the authors quoted Knowles and Richards). AI bias per se does not directly cause public distrust, invalid output and harmful consequences do.   
  International Society for Pharmacoepidemiology   213 “This deviation from the truth can be either positive or negative, it can contribute to harmful or discriminatory outcomes or it can even be beneficial.” – the first and second halves of the sentence refer to the same idea. However, is not immediately clear how biases can be beneficial. Examples are needed on how (1) biases in AI could be beneficial and (2) AI will be a valid and useful tool when biases are identified and managed. Alternatively, leave out sentences that refer to "beneficial bias"
  International Society for Pharmacoepidemiology   219 "Not all types of bias are negative" may need some elaboration and seems this statement is only true in some sectors. For example when building an AI prediction model for health care, efforts should be made to minimize any types of bias.  
  International Society for Pharmacoepidemiology   231 Perhaps a succinct sentence to describe this sequential relationship of the inherent introduction of biases, mitigation of harms from AI bias, and productive use of AI is needed to conclude the introduction section: “By realizing the different type of biases and mitigating the harms from AI bias, this could better utilize the applications of AI in the modern society.” 
  International Society for Pharmacoepidemiology   233 Section 2 provided the problem statement that this paper is trying to address.
The first paragraph would benefit greatly from digging deeper into the reasons behind the public’s concern. The current narrative provided may unintentionally polarise the public (the distrusting) and AI (the distrusted). Concerns from public are not solely directed towards AI technology per se, but more so towards those who misuse or even exploit AI for their personal gains. This has to do with regulatory monitoring, quality assessment, and an open feedback system.  
  International Society for Pharmacoepidemiology   237 "...that biases may be automated within these technologies, and can perpetuate harms more quickly, extensively, and systematically than human and societal biases on their own." The belief that “biases may be automated within these technologies” is not in itself untrue when considering how engineers and users have the ability to influence the systems. However, this is suggesting the importance of preventing biases.
Although this premise may not be what the paper is trying to address, it should not be overlooked. Overall, there is a lack of mentions of the current understanding of AI technology in the public including the ways it operates, benefits, potential dangers, and precautions to be taken for its appropriate use in understandable layman terms. This perhaps is one of the key reasons that has led to public distrust and potentiates further biases of AI. When unaddressed, even upon scrutinization of the development process, will remain huge obstacles towards building reliable AI systems.
  International Society for Pharmacoepidemiology   279 "The difficulty in characterizing and managing AI bias is exemplified by systems built to model concepts that are only partially observable or capturable by data."
This difficulty potentially leads to an important concept: Transparency. Some AI systems or algorithms are being criticized as a "black-box" as they are lacking transparency and thus lead to challenges in building trust. It is recommended to increase the transparency of AI technology. Similar to the previous comment on public understanding of AI technology, the publicity work perhaps needed to be expanded to the scientific community as well.  
  International Society for Pharmacoepidemiology   356 It is unclear whether Section 3 aims to illustrate the approach that NIST is taking to mitigate AI bias or provide a direction to the broader audience of how they can manage these biases. It is advised that this section be broken down to highlight the roles of various parties in this “collaborative approach”, covering that of regulatory authority, decision-makers, developers, and users.
  International Society for Pharmacoepidemiology   367 “...accompanying definitions are presented in an alphabetical glossary in Appendix A.”
Appendix A provides a useful compilation of prominent biases and associated definitions. Where feasible, it may be beneficial to illustrate where/how these biases can be introduced/map to the 3 phases in the proposed framework stated in the following section.
  International Society for Pharmacoepidemiology   397 Section 4 may benefit from rearranging the order of Figures 1 and 2, where the summary should be reframed to focus on specific action to be taken at each of the three-stage processes instead of the bias presentation which should be part of the problem statement. The comprehensive list of bias types included at the end will be extremely useful with examples of how they can be managed and when they arise during the AI lifecycle.
  International Society for Pharmacoepidemiology   435 It is important to emphasize that investigators and developers need to be clear if the data is appropriate to address the study question/problem in the "pre-design" phase. In fact, it is essential to understand the data source in advance before carrying out any studies.  Just as transparency is a critical concept for AI systems and algorithms as noted in our Comment in reference to Line 279, similar transparency in the characterization and assessment of the "fitness" and limitations/gaps of the source data/datasets being considered for use is essential during the "pre-design" phase
  International Society for Pharmacoepidemiology   441 This paragraph highlighted the need for practice guidelines or recommendations for best practices. We understand there could be unknown impacts, but investigators should have contingency approaches for the unexpected. This also reinforces the need for guidance/recommendations for best practices regarding broad stakeholder engagement and representation in the pre-design phase to elucidate potential unintended or unexpected uses and impacts in order to inform paths to mitigate potential biases and harms, or in certain cases, to determine that development of the proposed AI system or tool should not proceed.
  International Society for Pharmacoepidemiology   462 In healthcare study that will involve human subjects, the study protocol needs to be reviewed and approved by a corresponding ethical or independent review board as a gatekeeper of the study. To a certain extent, this may also be applicable to other sectors as well in addressing problems that can occur in the pre-design phase.  
  International Society for Pharmacoepidemiology   512 "The stakeholders in this stage tend to include software designers, engineers, and data scientists who carry out risk management techniques..." As with the potential biases and unintended effects that can arise from restricting engagement during the predesign phase to a narrow set of stakeholder perspectives, it seems that similar considerations would be relevant for the Design and Development Phase. While the importance of a tighter connection between AI development teams and subject matter experts is noted later in the Design and Deployment Phase section of this document, it may be beneficial to highlight its importance in the introduction paragraph for this Phase. 
  International Society for Pharmacoepidemiology   521 "....always select the most accurate models"
This may potentially lead to overfitting and thus the AI system cannot generalize or fit well on an unseen dataset, reducing the reliability in deployment. 
  International Society for Pharmacoepidemiology   540 "...It is also notable that, depending on the industry or use case, AI is typically marketed as an easy solution that does not necessarily require extensive support."
It is quite clear that the advantage of deploying an AI system is that it does not require extensive human effort. However, it is the converse in building the system in designing and developing. Although most of the AI systems are data-driven, human effort is essential to make sure (1) the algorithm is set up properly and (2) to monitor its evolving uses and intended/unintended impacts over time. 
  International Society for Pharmacoepidemiology   690 Preventing bias in the pre-design phase, applying the best practice in the design and development phase and sensible deployment with proper recognition of the strengths and weaknesses of the system are all essential. 
  International Society for Pharmacoepidemiology   711 In all studies involving big data, transparency and replicability are also important. We recommend that to promote researchers to increase transparency in developing an AI system, and also to support "open science" to increase replicability.
  International Society for Pharmacoepidemiology   730 Table 1 listed some common terminology of biases but some of them are quite similar to each other and it may be easier for the readers to understand if the list can be organized into a hierarchical table. 
18 University of Rostock Lilian Do Khac, Michael Leyer 204 f. + 494-500 + 654-655 + 401-402 + 597 + 537-538 The proposal targets the identification and managing of bias in AI. However, a large portion of the content is dedicated to the identification across the three-stage approach and not to the management of bias in AI, that is to my understanding also the allocation of responsibility or implementation of controls in the organizational setting.
  University of Rostock Lilian Do Khac, Michael Leyer 219-231 It should be clarified what harmful societal outcomes and unjust outcomes are. In the end the judgement whether a bias is harmful or unjust is a normative view. This has prominently shown by the moral machine.
  University of Rostock Lilian Do Khac, Michael Leyer 2 Biases are part of reality and the paper is quite technical on describing them as baked into the data. It would be great to highlight that they exist and AI is a chance to discover them and have the possibility to replace biased human decision makers with less biased AI as a decision maker.
19 IEEE SA IEEE SA 350-351
662-663
701
NIST notes that "Improving trust in AI systems can be advanced by putting mechanisms in place to reduce harmful bias in both deployed systems and in-production technology. Such mechanisms will require features such as a common vocabulary, clear and specific principles and governance approaches, and strategies for assurance. For the most part, the standards for these mechanisms and associated performance measurements still need to be created or adapted."

IEEE SA would offer that the IEEE Ethics Certification Program for Autonomous and Intelligent Systems (ECPAIS) represents one such mechanism. ECPAIS AI ethics-oriented Certification Criteria, which focus on transparency, accountability, and algorithmic bias, were developed by a diverse group of AI experts to guide the responsible innovation and delivery of autonomous and intelligent systems. They include Bias Certification Requirements that are intended as top level, non-sector specific requirements that can be used by different stakeholders, industry sectors, jurisdictions, private or public organisations engaged in the development and use of A/IS technologies.

We would offer a similar suggestion when noting lines 662-663 "Identifying standards of practice for implementing these types of risk management tools and techniques will be a focus of future activities," and line 701: "• Standards and guides are needed for terminology, measurement, and evaluation of bias."
  IEEE SA IEEE SA 387-389 IEEE SA suggests that opting for only a sanctioned authority that can audit AI systems and declare them trustworthy would not be enough. Providing users with a voice assisted feedback tool or the ability to call a live operator to state concerns could function to potentially catch issues  not covered by standards and help increase trust. Waiting for a standard or  legislation does not necessarily help to make a user feel that their specific needs or issues are being addressed.  
  IEEE SA IEEE SA 653 IEEE SA suggests that ensuring transparency and offering knowledge regarding automated decisions would also help to empower impacted stakeholders to take mitigating actions against A/IS bias, including providing feedback that users can share to anticipate/prevent future bias.
  IEEE SA IEEE SA Section 5 A creative identification of various classes/types of bias and their relevance to specific facets of the AI lifecycle is a great start to a comprehensive treatment of bias. The next task will be contextual mapping and selection of sub-sets relevant to a given application and customization of the risk management framework. It is inadvisable to develop one solution for all contexts and system scales/threat profiles.
  IEEE SA IEEE SA Section 5 Further to the previous comment, while the intended development of standards and a framework for bias is definitely an excellent contribution to combat bias, IEEE SA suggests that it needs to be part of an overall risk assessment  suppliers need to carry out when making use of AI. In AI, there is no 'one size fits all' approach possible. In addition, bias is just one area that deserves attention.

The IEEE Ethics Certification Program for Autonomous and Intelligent Systems (ECPAIS) provides for an overall holistic approach to the use of AI, ranging from pure technical to individual to societal challenges. ECPAIS offers a process and defines  a series of marks by which organizations can seek certifications for the processes around the AIS products, systems, and services they provide. ECPAIS Certification will be able to help instill trust around individual products, systems and services using AI. More information about the Program and how to participate can be found at https://standards.ieee.org/industry-connections/ecpais.html.
20 TruEra Anupam Datta, Shameek Kundu, Divya Gopinath 424 Especially in AI/ML contexts, data provenance is critical. Critically thinking through where and what data is being piped into a model prior to its intiial development is essential to understanding potential sources of bias.
  TruEra Anupam Datta, Shameek Kundu, Divya Gopinath 517 Part of the challenge in over-optimizing models for accuracy and other performance metrics is the inability to measure bias in a salient way. "Fairness" is an abstract concepts and there are many ways to mathematically define, measure, and even optimize for fair outcomes. But this requires a nuanced analysis of the problem at and to pick the correct fairness metric. Model builders and validators can benefit from guidance on the appropriate notions of fairness to use for different application contexts.
  TruEra Anupam Datta, Shameek Kundu, Divya Gopinath 595 Lines 620-630 aptly point out that developers or domain experts often assume that some validation on the bias is inherent to the AI model itself, or can even ignore red flags based on their perceived bias. This issue is only exacerbated by the fact that when a group bias is measured, there is little to no guidance about where or how the bias is arising. In this sense, root cause analysis to understand the context and source of the bias is imperative to truly understanding and diagnosing the bias.
  TruEra Anupam Datta, Shameek Kundu, Divya Gopinath N/A The report overall does not mention any attempts to remedy or mitigate bias. Specifically, more clarity on (a) how to determine when a bias is unjustified and needs remedying; and (b) which bias mitigation techniques are suitable to the application context. Model builders need not be blind when applying bias mitigation techniques, and should use mitigation techniques that are suited for the application at hand.
21 IBM Ryan Hageman   Dear Acting Director Olthoff:
On behalf of International Business Machines Corporation (IBM), we welcome the opportunity to comment on the National Institute of Standards and Technology’s (NIST) recent draft report on “A Proposal for Identifying and Managing Bias within Artificial Intelligence” (hereafter, “draft report”).

At IBM, we recognize that with great innovative power comes even greater responsibility. That is why we recently pointed to explicit steps that should be part of any mandatory requirements for companies addressing issues related to AI bias. These include conducting impact assessments prior to the deployment of a high-risk AI system, transparent
documentation and auditability of those assessment processes, and supporting developer training around bias to better equip the workforce with the tools necessary to recognize how bias may be introduced within the AI development pipeline1.

IBM supports NIST’s goal of using this draft report “to advance methods to understand and reduce harmful forms of AI bias … in the pursuit of a framework for trustworthy and responsible AI.” To that end, we would draw NIST’s attention to BSA | The Software Alliance’s (BSA) framework on AI bias, which proposes similar approaches based on a whole-of-lifecycle approach to addressing bias concerns.2 To the extent this draft report is intended to fold into the existing work NIST is undertaking in pursuit of an AI Risk Management Framework, we would also draw the agency’s attention to separate comments IBM has filed in response to the Artificial Intelligence Risk Management Framework, Docket #210726-0151.

We thank you for your efforts in drafting this report and look forward to both your consideration of these comments and future opportunities to contribute to the work ahead.

1 See Dr. Stacy Hobson and Anjelica Dortch, “Mitigating Bias in Artificial Intelligence,” IBM Policy
Lab, 26 May 2021, available at https://www.ibm.com/policy/mitigating-ai-bias/.
2 See “Confronting Bias: BSA’s Framework to Build Trust in AI,” available at https://ai.bsa.org/wpcontent/
uploads/2021/06/2021bsaaibias.pdf.

Respectfully,
Christina Montgomery
Vice President and Chief Privacy Officer
Co-Chair, IBM AI Ethics Board
IBM Corporation

Francesca Rossi
IBM Fellow and AI Ethics Global Leader
Co-Chair, IBM AI Ethics Board
IBM Research
22 USC Information Sciences Institute Kristina Lerman   Algorithms, even simple ranking, create mechanisms of cumulative advantage. Besides creating inequality (e.g., of popularity in cultural markets or influence on social media platforms where a few users aquire inordinate number of followers), algorithms also amplify small biases to create disparities. In this paper, we show that algorithms amplify advantage that some content gets due to cognitive biases. https://dl.acm.org/doi/abs/10.1145/3415237
  USC Information Sciences Institute Kristina Lerman   In this paper, we should that mechanisms like cumulative advantage amplify subtle gender biases (due to preference to cite researchers of similar gender) to create disparities in research impact of women scientists. https://arxiv.org/abs/2103.10944
23 Medical Imaging & Technology Alliance (MITA) Zack Hornberger   As the leading trade association representing the manufacturers of medical imaging equipment, radiopharmaceuticals, contrast media, and focused ultrasound therapeutic devices, the Medical Imaging & Technology Alliance (MITA) and its Members have a long history innovating and adopting cutting-edge technologies to provide improved care to millions of patients nationwide. MITA companies acknowledge the potential risk that bias can pose to the effective deployment of artificial intelligence (AI)-based solutions. However, draft NIST Special Publication 1270, “A Proposal for Identifying and Managing Bias in Artificial Intelligence” oversimplifies the current state of the technology across all sectors and provides some conclusions which are demonstrably false when applied to healthcare the industry.

The NIST document shows that any attempt to address bias concerns require targeted, sector-specific, and risk-based approaches. We urge NIST to consider the serious risks to innovation which the generalizations contained within the draft document pose and to reevaluate the conclusions drawn from those generalizations.

The draft document repeatedly asserts that AI products across all sectors and within all industries are released without quality assurance or oversight. This claim may be true for some companies, in some industries, for some applications, but is not true with respect to the broad perspective the document purports to represent. On line 355, the draft document claims:
“…there are many reasons for potential public distrust of AI related to bias in systems. These include:
• The use of datasets and/or practices that are inherently biased and historically contribute to negative impacts
• Automation based on these biases placed in settings that can affect people’s lives, with little to no testing or gatekeeping
• Deployment of technology that is either not fully tested, potentially oversold, or based on questionable or non-existent science causing harmful and biased outcomes”.

The document also offers no qualifications nor clarifications. Each bulleted claim relies upon a narrow view of existing solutions and practices, often identified by media outlets, and does not represent the full solution spectrum. For instance, manufacturers within the medical imaging device industry that develop AI solutions subject their products to stringent internal Quality Management System (QMS) processes and undergo intensive review by the Food and Drug Administration, which has reviewed hundreds of AI applications. This perspective is conspicuously absent from the draft document.

Moreover, the healthcare-related materials cited by the draft document (line 251) are one-sided and present an incomplete sector view. This issue is underscored by literature review statement, starting on line 739, which suggests the referenced materials represent a full, complete, and unbiased perspective on the issue—a conclusion MITA rejects. We ask NIST to acknowledge the limited scope and perspective provided by the materials reviewed.
Conflation between AI solutions applied to subjective tasks, such as employment decisions and mortgage qualification, and objective tasks, such as tumor identification, introduces further concerns. Managing bias in these disparate situations requires specialized techniques which this draft does not acknowledge.

Although the paper states in the introduction (line 203) that its primary goal is “to advance methods to understand and reduce harmful forms of AI bias”, the current draft does not deliver many recommendations to that end. Similarly, a reference to managing bias (line 209) as a “critical but still insufficiently developed building block of trustworthiness” raises more questions than answers. What is trustworthiness, and how does managing bias build it? How is bias managed without a way to measure, quantify, or classify its existence or impact?

Figure 1 (line 415) also raises many questions. It is unclear what the arrows represent, what each arrow label refers to, or how the labels within and without the labels interact. The figure should be removed because of this ambiguity.
MITA urges NIST to consider the drawbacks to a broad, cross-sector attempt to establish bias management principles and acknowledge both current regulatory oversight and the specific concerns and capabilities of AI solutions in individual sectors. NIST must ensure that any activities pursued engage participants from all sectors, and MITA stands prepared to assist NIST in its continued work to advance AI in healthcare and medical technology.

Further, MITA strongly recommends NIST provide additional comment periods on further drafts to allow engagement on this topic.
24 Information Technology Industry Council (ITI) Courtney Lang   The Information Technology Industry Council (ITI) appreciates the opportunity to submit
comments in response to the publication of Draft NIST-SP 1270: A Framework for
Identifying and Managing Bias in Artificial Intelligence (the draft).

ITI represents the world’s leading information and communications technology (ICT) companies. We promote innovation worldwide, serving as the ICT industry’s premier advocate and thought leader in the United States and around the globe. ITI’s membership comprises leading innovative companies from all corners of the technology sector, including hardware, software, digital services, semiconductor, network equipment, and other internet and technology-enabled companies that rely on ICT to evolve their businesses. Artificial Intelligence (AI) is a priority technology area for many of our members, who develop and use AI systems to improve technology, facilitate business, and solve problems big and small. ITI and its member companies believe that effective government approaches to AI clear barriers to innovation, provide predictable and sustainable environments for business, protect public safety, and build public trust in the
technology.

ITI is actively engaged on AI policy around the world and issued a set of Global AI Policy Recommendations earlier this year, aimed at helping governments facilitate an environment that supports AI while simultaneously recognizing that there are challenges that need to be addressed as the uptake of AI grows around the world.1 We have also actively engaged with NIST as it has considered various aspects important to fostering trust in the technology, most recently on explainability.

We share the firm belief that building trust in the era of digital transformation is essential and agree that there are important questions that need to be addressed with regard to the responsible development and use of AI technology. As AI technology evolves, the tech industry is aware of and is already taking steps to understand, identify and mitigate the
potential for negative outcomes that may be associated with the use of AI systems, including biased outcomes.

We appreciate that NIST is exploring many areas that will be of importance to fostering trustworthy and responsible AI. We agree that determining an effective approach to addressing bias is vital, especially given the frequency with which it comes up as a very specific risk that policymakers around the world are concerned about. Addressing bias will
require collaboration across the public and private sectors in order to foster a practical understanding of how AI tools are designed, developed, and deployed and create state-ofthe-art approaches to address identified challenges. It is also necessary to develop datadriven techniques, metrics, and tools that industry can operationalize to properly measure
and mitigate bias in concrete terms. On the whole, we agree with the way that NIST has broken down the AI lifecycle, as well as the stages in which bias can be introduced and also managed.

Below, we offer several recommendations and perspectives that we encourage NIST to
consider as it revises its draft, which we believe will help to strengthen it.

Our complete Global AI Policy Recommendations are available here:
https://www.itic.org/documents/artificial-intelligence/ITI_GlobalAIPrinciples_032321_v3.pdf
 
  Information Technology Industry Council (ITI) Courtney Lang   At the outset, it is necessary to note that while a comprehensive approach to detecting and mitigating bias is important, generally accepted approaches for doing so in all circumstances do not yet exist. We recognize that NIST’s draft attempts to contribute to the effort to develop standards and a risk framework for building and using trustworthy AI by focusing on the challenge of bias in AI. However, to appropriately build a framework that does so, consensus methods for assessing, measuring, and comparing data and AI systems, as well as standards for reasonable mitigations, are needed. This will require the development of new frameworks, standards, and best practices.

That being said, we believe the draft is a solid first step in contributing to the conversation around addressing AI bias, though would exercise caution in describing it as a “framework” given it is still at a preliminary level. Given the success of NIST’s Cybersecurity and Privacy Frameworks and the interest of global policymakers in adopting those frameworks, we are somewhat concerned that policymakers may misinterpret the draft as a definitive guide to addressing and managing bias without recognizing that additional standards and practices need to be developed to do so effectively. As such, we recommend NIST include language up front to provide additional context including to explain that this is a preliminary document and is not intended to solve for every bias-related challenge in all circumstances.
  Information Technology Industry Council (ITI)Courtney Lang Courtney Lang   We encourage NIST to more clearly identify up front the different types of bias (e.g., by using definitions and labels for the bullets and scenarios outlined on p. 4, line 335) and provide more detail on how the approach outlined in the proposed framework aims to specifically address each type. While there is some discussion of conscious or unintentional bias in the paper (e.g., line 453, p. 7), we recommend NIST include more discussion around these concepts earlier in the paper. It may also be helpful to provide the definitions of different types of bias as they appear in the paper, as opposed to listing them all in a glossary, as doing so will provide helpful context for the definitions. Finally, we encourage NIST to define bias more clearly, particularly given the divergent definitions of statistical, legal, and other types of bias.
  Information Technology Industry Council (ITI) Courtney Lang   We recognize that this is a preliminary document. However, in future iterations of the draft, it would be helpful for NIST to offer concrete technical guidance as to how to address bias in specific instances.
For example, NIST notes that pre-deployment testing can help address public distrust. However, for certain classes of AI technologies, particularly novel ones, large testing datasets do not exist, making it difficult to test them to the same extent as in areas where large test sets do already exist. In cases such as these, articulating standards, a framework or set of guidelines that reflect these differences in capabilities would be helpful.

Additionally, for a certain important subset of bias testing, labelled data about protected or marginalized identities is often required2; however, such data is not always available, and it often is not easy to collect and store such information. In its efforts to develop a framework to identify and manage bias, it would be helpful for NIST to develop additional guidance about what reasonable expectations are in such circumstances.

NIST also references “disparate impact” throughout the draft. Providing additional technical guidance around how a developer could test for disparate impact in machine learning contexts would be helpful, as the notion of disparate impact has a particular, established interpretation that can be challenging to map into novel contexts. The draft also references “valid” performance; guidance here would be similarly helpful as notions of validity in predictive tools are well-established yet still contested in certain domains such as employment (e.g., the American Psychological Association’s Principles for the Validation and Use of Personnel Selection Procedures3 differ from those outlined in the Uniform Guidelines on Employee Selection Procedures).

Finally, we recognize that NIST is working to develop an AI Risk Management Framework. In revising and updating this draft, we encourage NIST to consider how this Framework and the AI RMF complement or otherwise interact with each other. We agree that a risk-based approach to identifying and managing bias is appropriate, but as the concept is used frequently in the abstract throughout the draft, we recommend including cross-references to the RMF or otherwise defining more concrete guidelines around what constitutes a riskbased approach.
2 https://arxiv.org/pdf/1912.06171.pdf
3
https.www.apa.org/ed/accreditation/about/policies/personnel-selection-procedures.pdf
  Information Technology Industry Council (ITI) Courtney Lang   While NIST puts forth many considerations around where bias may emerge and how or where it might be managed in the pre-design phase, a recommendation to boost education and awareness of the different types of bias and their impacts is missing. Training sessions, in particular, are subject to inherent biases as humans who are not properly trained will take their biases with them to the development process. While bias cannot be removed completely, it is vital that bias is both disclosed and assessed. We thus believe that adding education and awareness of bias as a category would be helpful, as this is one way in which developers can begin to understand and manage bias. Indeed, awareness that bias exists is the first step to addressing it.

Specific ideas that NIST could consider to address the above are:
• Practices, standards, curricula, and modules around educating developers about identifying risks from bias and approaches for mitigating those risks.
• Practices and standards around how AI providers can educate downstream users about bias so that developers and system operators who incorporate AI systems or
components into larger systems can address bias in their design phases, with a particular focus on prevention of deploying AI systems in unexpected ways or on
unforeseen populations, which is a frequent source of bias.
• Practices and standards around educating end users around the intended use of AI systems, so that systems are used for their intended purposes.
    Courtney Lang   NIST should consider how and if it can reference ongoing international standards development activities in the draft. To be sure, ISO/IEC JTC 1 SC 42 is exploring themes that may be relevant to reference and/or integrate in the draft. For example, that committee is developing a standard on bias in AI systems and AI aided decision-making. We encourage NIST to consider if or how this standard can be integrated into future guidance to facilitate interoperability.
  Information Technology Industry Council (ITI) Courtney Lang   Recently the FTC has announced its intent to begin investigating AI bias. At the same time, the DOJ, HUD, CFPB, and other government agencies are also increasingly concerned about AI. There is, however, no concrete consensus understanding regarding what the terms fairness and bias mean in the context of AI and what reasonable efforts companies should be expected to take to mitigate bias.

The proliferation across the federal government of varied definitions of fairness and bias, and differing expectations regarding what constitutes reasonable efforts to mitigate bias, creates a real challenge for entities seeking guidance on how to develop and measure bias in their AI systems.

NIST played a pivotal role in driving adoption of cybersecurity standards across the federal government. We therefore encourage NIST to articulate a clearer plan for how its work on measuring and mitigating AI bias can translate into harmonized standards across federal agencies.
  Information Technology Industry Council (ITI) Courtney Lang   Once again, we appreciate the opportunity to provide feedback on Draft NIST-SP 1270. Developing a practical approach to identify and manage AI bias is a key component to ensuring trustworthy AI systems. While we believe the draft acts as a solid foundation, there are several areas we believe can be strengthened in future iterations of the document, including outlining specific ways in which bias can be actively managed. We look forward to continuing to engage with NIST as it refines its recommendations around addressing AI bias.
26 Accenture Sean Sweeney   Accenture commends and supports NIST’s consultative process, and the current framework is appreciated for approaching the mitigation of bias by stage. In addition to line-specific comments, we would like to share several general comments. In addition to approach when to mitigate bias:
  • Where bias stems from is also an important consideration. Bias can be introduced through experimental bias or inadvertent bias. The existing framework provides numerous examples of these types, and would benefit from calling out these categories so that readers can separate process from type.
    • Experimental bias: Can stem from data selection or sampling that may not be representative, or data that is sensitive in nature (e.g., yelp effect).
    • Inadvertent bias: Can occur in the construction of algorithms (e.g., making assumptions in heuristics based on world experience)
  Accenture Sean Sweeney Introduction 203: NIST should more clearly explain what constitutes “harmful” bias.
208: Indicates presence of bias as a given, recommend the introduction of bias referenced here as well
  Accenture Sean Sweeney Section 2 235: Replace “has helped to expose various social biases baked into real-world systems, and” with “have made existing social biases much more visible. Such biases have long been baked into real-world systems, but the novelty of AI has brought new attention to its applications and, therefore, to the existing biases it inadvertently propagates. Now there is increasing evidence...”
239: Limited to Human decisions based on AI system, seems to underrepresent the bias that could be transferred into the system and process
241: delete “or, at a minimum, perceptions of inequalities.”
275: As it is currently structured, the paper makes it sound like the potentially significant downside is the increased interest in mitigating harmful impacts (the sentence that immediately follows). Assuming that is not what was intended, replace with “…the convenience of automated classification and discovery within large datasets can amplify existing biases. As these tools proliferate across our social systems, it becomes even more important to identify and mitigate harmful impacts.”
296: replace “leaves out” with “leave out”
310: The draft states, “AI is not necessarily something with which [much of the public] directly interact.” Given the incredible proliferation of AI systems over recent years, this statement is not factual. The authors might be trying to make the point that the public is unaware that they are interacting with AI or that the inner workings of AI are not something the public has a lot of insight into. Consider changing this to: “Additionally, for much of the public, the existence of the AI they are interacting with is not always evident and the AI systems’ algorithmic assumptions are not transparent or easily accessible. Yet they are affected by, and used as inputs for, AI technologies and systems.”
337: add “Using Biased Data:”
337: replace “contribute” with “contributed”
339: add “Inadvertently Propagating Bias:”
339-342: Consider combining the second and third bullets. Automation requires deployment, meaning that both the second and third bullets effectively reference technology being deployed that is not fully tested.
341: add “AI Systems Requiring More Rigor:”
336: This includes add=> “but are not limited to”:
New line after 342: “Changing Results: Another reason for potential public distrust of AI due to bias includes lack of governance in development and when deployed. Without it, data drift could happen and go undetected, and outputs can change over time.”
  Accenture Sean Sweeney Section 4 435: (An example where these early decisions can lead to biased outcomes) “For example, during the initial collection and/or annotation stage, if the data does not reflect a representative sample, then one can have a training dataset that reflects a narrow condition.”
464: Should read: “…or approaches that are generally technically flawed.”
468: The draft currently states, “Technology designed for use in high-stakes settings requires extensive testing to demonstrate valid and reliable performance.” This is not currently the case. Numerous examples of unregulated AI systems that significantly impact individuals’ lives exist. If NIST is saying this is the goal, who decides what “high-stakes settings” are? How are those “high-stakes settings” defined?
475-486: NIST appears to be referring to two different types of stakeholder diversity in this section interchangeably: social diversity (e.g., age, race, gender, physical ability) and professional diversity (e.g., law, business subject matter experts, practitioners). Recommend providing more clarity in this section.
530: The draft mentions “unintentional weightings” that may have positive side effects for the research community. However, the article by T. Feathers that is referenced (Reference #48) seems to describe intentional weighting. Perhaps a difference reference was intended to be included.
534: Recommend adding a reference to “intended bias” that can be introduced by bad actors attacking data sets used to train models.
673: Figure 2. Recommend including in the framework the high-level mitigation approaches described in the management of bias section preceding it.
27 Anthem Stephanie Fiore   In response to NIST SP 1270, we offer the following support and recommendations below.
Recommendations for Ongoing NIST Activities on AI
Anthem values NIST’s comprehensive research to date, as well as the ongoing commitment for future workshops and activities to bring together diverse stakeholders to contribute to the important conversations around AI principles. In lines 165-178 of the Proposal, NIST notes to commenters that a variety of activities will be conducted in 2021 and 2022 to continue development across core building blocks of AI, which includes this work on mitigation of harmful bias. NIST also asks commenters to submit recommended activities and events. Anthem looks forward to participating in these events, and in addition to cross-industry considerations, we recommend specific conversations to discuss AI considerations and guideline impacts to specific sectors, including industry-focused discussions such as healthcare. In alignment with NIST’s recommendations to engage a diverse set of stakeholders, we recommend ongoing sessions to encourage perspectives across operations, business, technical, and societal impact discussions. In addition to NIST’s stakeholder and consensus-building efforts, we also encourage NIST to collaborate with other federal entities on consumer-focused AI educational opportunities and recommended guidelines, including the U.S. Department of Health and Human Services (HHS) and the Federal Trade Commission (FTC).
  Anthem Stephanie Fiore   Support for NIST SP 1270
Anthem applauds NIST’s draft document and work to provide understanding of the challenges of harmful bias and ways to manage it in AI systems. We appreciate NIST’s ongoing work and inclusion of bias as part of NIST’s larger trustworthy AI framework, and look forward to continued discussions of the interconnectivity of these areas. In appreciation of and alignment with NIST, we support NIST’s statement in line 219 that “not all types of bias are negative,” and we also agree that it is possible to manage bias that may have a harmful impact to specific groups or individuals. We also appreciate NIST’s work to further broaden the understanding of how harmful bias may be introduced, including the statement in line 304 that excluding certain attributes [such as gender, age, and religion] will not remedy the issue of harmful bias, as they can be inferred in other ways and still potentially produce negative outcomes. Anthem also supports the statement in line 352 that “the goal is not ‘zero risk,’ but to manage and reduce bias in a way that supports equitable outcomes and engender public trust.”

Anthem recommends a comprehensive approach to evaluating and recommending guidelines for the development and use of AI in healthcare, including managing adverse bias. We are encouraged by NIST’s continued work with stakeholders to discuss real-world examples and application of AI and data science principles to ensure robust algorithm development, deployment, and monitoring.
  Anthem Stephanie Fiore   Recommendations to Expand the Draft Guidance
We offer the following recommendations to further develop NIST’s work in this area:
In addition to the challenges of bias in AI noted in SP 1270, Anthem recommends NIST include that data proxies do not equally replace direct data and may not accurately reflect individuals or groups. Furthermore, we ask NIST to consider including that we may be unaware of harmful biases that exist in manual processes that we are working to automate. Better understanding of how to identify and manage potential biases in current non-AI processes is worthy of consideration as NIST continues research and recommendations.
  Across the examples and recommendations shared for the pre-design, design and development, and deployment stages, we ask NIST to further enrich the draft guidance with use cases and examples.
  Under the pre-design stage, we recommend additional NIST activities include discussion of the human factor when incorporating AI recommendations for managing bias. For example, AI end users may become complacent with AI decisions or unknowingly introduce their own biases with adverse impacts when leveraging AI in practice.
  We also recommend NIST work with stakeholders to consider defining bias, or potential harmful bias outcomes, at the pre-design stage of algorithm development. While we recognize that the AI teams may not be able to predict every potentially misinformed outcome, this consideration may help focus the pre-development stage.
  Under the deployment stage, we ask for NIST to leverage the upcoming stakeholder discussion to further build out considerations and examples of techniques that may assist with the monitoring and auditing recommended.
  We also recommend NIST and stakeholders discuss opportunities for a time-appropriate and continuous approach to AI monitoring that includes applicable standards of documentation and is adjusted accordingly across AI functions. In the medical community, for instance, collecting data on off-labeled drug use sometimes expands the benefits.
  We ask NIST to consider inclusion of ethics to inform risk, which may further help to achieve the goal of consensus standards for trustworthy AI.
  As NIST advances the AI Risk Management Framework development, we recommend revisiting this draft guidance to align updates as appropriate.
  We offer additional specific text edits and recommendations for SP 1270 in the attached spreadsheet.
  Anthem Stephanie Fiore 296-297 We agree that historically, data representing certain groups may be excluded from training datasets used by machine learning applications. We also wish to include that data may be erroneous or misrepresentative of certain populations, as well.
  Anthem Stephanie Fiore 311-312 This statement highlights the challenge of users awareness when they interact with and are impacted by AI, but does not address the significance of the level of impact.
  Anthem Stephanie Fiore 337-338 We also wish to highlight that data can have inaccuracies, which can lead to public distrust of AI related to bias in systems.
  Anthem Stephanie Fiore 435-438 We agree with the importance of planning, developing guidance and governance processes. It is also important at this stage to define bias and potential outcomes and impacts specific to the AI component.
  Anthem Stephanie Fiore 554-556 When discussing algorithmic decision-making tools, we recommend NIST consider inclusion of labeling, unsupervised learning, and reinforced learning as part of the design and development stage.
  Anthem Stephanie Fiore 653 NIST only mentions counterfactual fairness as a technique. However, that may not be sufficient or appropriate across all AI use cases.
28 CalypsoAI Steven C. Howell, PhD 248-254 This section points examples of AI bias causing harm in the lives of individuals. Many such situations occur in a situation where the priorities of the marginalized individual conflict with the business objectives of a larger cooperation. Hiring is one potential example, but also lending, and insurance. For lending and insurance, these companies maximize profits by categorizing risk using every piece of data they can, much of which is already regulated and evaluated using established metrics. The existing metrics and regulations should serve as a starting point for evaluating the AI technology used to replace human procesess. Beyond self advocation, what metrics and reporting is in place?
  CalypsoAI Steven C. Howell, PhD 262-265 I strongly agree with this sentiment. A solution fit for a secific solution will not generalize, making the broader approach necessary.
  CalypsoAI Steven C. Howell, PhD 457-460 The statement about rejecting to develop AI for applications that are "fraudulent, pseudoscientific, prey on the user, or generally exaggerate claims," seems optimistic and unlikely. If an organization, company, or group has such negative intentions, they likely have the means to override or replace those that refuse to take part. A higher authority and governance would be needed to significantly curtailing such practices.
  CalypsoAI Steven C. Howell, PhD 462-466 This statement reflects an organization that prioritizes rapid solution delivery over thoughful problem framing and solution evaluation. In this environment, developer or group of developers should certainly push back and support a more thoughtful approach, but an organization may decide to override and even replace those who stand in the way. In such situations, a higher authority or governance would need to be established to support best practices and protect these values.
  CalypsoAI Steven C. Howell, PhD 556-557 I strongly agree that AI development teams need to work more closely with subject matter experts and end users. This highlights misconceptions for all parties and leads to significantly improved overall results. We need to find ways to break down barriers to understanding AI to facilitate this type of collaboration. This document, to include the definitions, appendix, and references, are an excellent resource to further this effort.
  CalypsoAI Brendan Quinlivan   It's worth talking about the potential bias-performance trade off in this section. If the data being employed to train a model contains harmful biases then the models with the best performance are likely to mirror these patterns in order to mimimize thier loss. This means that reducing the bias of a model will lead to models with lower performance metrics. This apparent trade-off may make reducing bias in systems an unattractive proposal for some practictioners.
  CalypsoAI Brendan Quinlivan 382 All bias is statistical, it just happens that some bias has a societal impact while others do not.
  CalypsoAI Brendan Quinlivan 426-438 As the authors correctly point out in section 2, in many cases the AI-product is designed after data has already been collected. How should practioners handle this situation where adequate pre-design cannot be preformed to the specification of this framework. I would like to see this common issue addressed as part of the framework.
  CalypsoAI Brendan Quinlivan 535-543 All learning algorithms attempt to minimize some loss with respect to the training data. If the training data contains biases they will be leveraged by the learning process. If bias is to be removed from models then these learning algorithms need to be modified to minimize some combination of loss and unfairness. I would like to see some potential solutions mentioned in this section. Again, this relates back to the bias-performance trade-off.
  CalypsoAI Brendan Quinlivan 574 During the deployment stage I believe it is critical to monitor the bias of an ML system. The example given of the ride-hailing app illustrates this need well but I think the framework should suggest than monitoring disparate impact in a production setting should be included in modern ML systems if bias is a concern.
  CalypsoAI Mitchell Sutika-Sipus PhD 212 To utilize the ISO definition of bias as a statistical deviation from truth necessitates a working definition of truth. I advise avoidance of philisophical speculation on truth, but rather a definition tied to visibility of evidence and accountability. In this respect, "truth" falls more under a legal definition, on the state of the case as presented by observable fact.
  CalypsoAI Mitchell Sutika-Sipus PhD 291 This is among the perils of current education and training within the field of data science. The problem is less about the data scientists, and more about leadership, failing to understand that data is a granular abstraction, pointing toward a more complex whole. A constellation of data points may create a representation of truth, and thus inform positive decision making, but does not stand alone.
  CalypsoAI Mitchell Sutika-Sipus PhD 394 NIST should look to the work of Horst Rittel and his development of IBIS in the 1970s, an "issue based information system," which utilized participatory dialogue mapping to build concensus around complex social and group decision making. This work sets precedent for modern collaborative development of algorithms, reducing the voice of the 'expert' and elevating stakeholder needs within "wicked problems," ie, dynamic, slippery and unsolvable problems that cannot be solved but only tamed.
  CalypsoAI Mitchell Sutika-Sipus PhD 395 This approach still much reads as a top-down institutional approach to technological planning and implementation. We should consider other methods, drawn from other planning disciplines, such as "advocacy planning" or "decentralized planning" methods. NIST is well positioned to provide grants, resources, and experts to support these alternative planning models, and thus avoid the same technocratic failures introduced within major infrastructure intiatives in the 1950s-1970s during the Cybertnetic I and Cybernetic II eras of human technology relations.
  CalypsoAI Mitchell Sutika-Sipus PhD 400 We should ask ourselves, "what is an AI designer?" Is this todays highly educated and specialized data scientist? Is it the product designer? The entpreneur? More importantly, who could be the AI designer going forward? If we recast our expectations, and encourage new directions in the role of the AI designer, this will very much transform the state of the problem on bias (since it is natively a social problem defined by the shared mental model of AI).
  CalypsoAI Mitchell Sutika-Sipus PhD 415 The three stage approach should be a circle, enabling an iterative and transformative approach. It should also have a phase of post-deployment algorithmic transformation, followed by a phase of algorithmic response and reconsideration (by stakeholders). The justification for this argument: 1. The retraining of machine learning algorithms creates a linear redirection of logic distinct from the intial design. 2) Tools and processes are constantly needed for model monitoring to understand and work with the redirection of logic (sometimes known as drift, if not desired, but considered improved learning, when is desired). 3. Depending upon how the model changes over time, it will be necessary for the organization to review, reorganize, and redefine how to move into the redeployment of the model. In this manner, we should see a continual feedback loop, not a single linear A-to-Z deployment lifespan.
  CalypsoAI Mitchell Sutika-Sipus PhD 415 The guidance provided to translate the current 3 phase approach into a circular 5 phase approach is fundamental to reduce bias in algorithm design, as it is more than a 'human in the loop' but is an organizational intermediation with algorithm maturity. Working at organizational scale enables a broader distribution of equities to become realized in the model life cycle, offsetting emergent biases.
  CalypsoAI Mitchell Sutika-Sipus PhD 640 These statesments reinforce the necessity of my two statements above.
  CalypsoAI Mitchell Sutika-Sipus PhD 704 The broad framework is only one approach. See comment 13 above.
  CalypsoAI Mitchell Sutika-Sipus PhD 710 Absent from this list is a consideration on how interface design defines the human interpretation and cognition on machine learning algorithms. An algorithm may be biased, but a particular workflow may negate the ability for the user to see that bias if the user logic redirects the user into another direction. For example, scrolling news feeds, GIFs, and granular text based interaction over time my force the user to engage in biased content, but the user will not see it on account of the native neurological response. UI/UX is a fundamental component to model interaction, but has zero attention within the AI/ML community.
29 Neo4j Kara Doriani O'Shee
Leena Bengani
233-243 To our fellow citizens, leaders, and to whom it may concern:
The National Institute of Standards and Technology has requested feedback regarding its approach to managing bias in artificial intelligence (AI) in a broader effort to develop a risk management framework for trustworthy and responsible AI.

Artificial intelligence has the capacity to transform our world in ways we cannot yet imagine. Building public trust in AI requires transparency and a capacity to manage the human biases that inevitably find their way into these systems. Unlike human beings, however, AI can and should be probed and tested for bias – providing an opportunity to detect influences that previously were impossible to ascertain. While the human mind remains a black box in the truest sense of the word, the right technology provides a window into the inner workings of an automated decision.

At Neo4j, we enable organizations to unlock the value of the connections, influences, and relationships in their data. Graph technology naturally stores and analyzes data connections, making it ideal not only for a wide range of use cases but also for training machine learning (ML) models. Connected data provides the rich context necessary for complex decision-making, which is why graphs underpin the most effective AI in use today. Our understanding of connected data puts us in a position to suggest a path forward to mitigating bias in AI. We argue that the ability to interrogate our data is critical for fruitful collaboration between humans and machines.

Our expertise leads us to comment on the pre-design stage referred to in Draft NIST Special Publication 1270. Specifically, we would like to offer suggestions regarding 1) data documentation, which we consider fundamental to transparency; and 2) the use of connected data in detecting bias, a yet uncharted territory we invite the community to explore with us.

In our work across government and the private sector, we’ve learned that data challenges are widespread and no single solution addresses all of them. We are offering our ideas and suggestions in support of this initiative to develop standards around mitigating bias in AI.
  Neo4j Kara Doriani O'Shee
Leena Bengani
233-243 The Importance of Knowing Data Provenance
Understanding the provenance of our data is essential to explainability, and therefore to trustworthy AI. Data provenance involves documenting how a piece of data is sourced and the processes by which it was produced. It is impossible to mitigate bias in our data without knowing where and how it was created.

Knowledge graphs are well suited to documenting the history of data over time. That history might include who, how, when, and where the data was collected, how the data was cleaned, and who was involved. Data may be extracted from another source or purchased from third-party providers. By documenting these processes and results in a knowledge graph, we form a complete picture of our data lineage.

When it comes time to ask questions of the data, whether for suspected errors or harmful biases, we can examine any aspect of it within the graph. A knowledge graph provides a contextualized view of our data so anyone can see where it came from. Since its structure is flexible, what the graph looks like could vary greatly depending on the kind of information that needs to be tracked.

Once we create even a simple model of data sourcing, it becomes possible to assess data quality and suitability. For example, if our data came from four researchers who all know each other, we may want to seek out a different perspective. In healthcare, AI is increasingly used to determine risk for certain conditions. For an AI technology to be useful, the dataset used to train the models should include data from all relevant populations. Far too often, our models fail because we lack an understanding of the data used to train them. Data collected from one population may cause the model to fail when it is applied to another population – whether they represent a different age, race, gender, or nationality group.

Transparent data sourcing should be our starting point to achieve trustworthy AI. Even as AI norms and regulations continue to evolve, we believe most organizations want to do right by the people they serve. We also expect that as compliance requirements catch up to the capacities of our technology, organizations will be expected to demonstrate that their AI practices are non-discriminatory – starting with the appropriateness of their datasets. Verifying that AI is
behaving the way we claim requires knowing our data lineage.
  Neo4j Kara Doriani O'Shee
Leena Bengani
233-243 Investigating the Data for Bias
In addition to supporting data documentation, graphs allow us to interrogate our data for otherwise unattainable insights. Identifying bias in our data is difficult, because we may not know exactly what to look for or what it is when we do see it. But when we put raw data into a graph data structure, that data takes on a shape – a network structure to be explored.
Bias could appear as an unexpected pattern in a graph. With a normal table featuring columns and rows, our data would remain flat. Using a graph visualization tool such as Neo4j Bloom, however, we might observe unusual groupings that bear further investigation. If bias is systemic, we may see such groupings repeated throughout the dataset.

Tertiary connections may be the key to discovering bias in graph data. Our data reflects our human systems, where often what matters is not a direct connection but rather a third- or fourth-level connection. This concept is expressed in the social sciences as correlation does not mean causation. Two variables may be correlated, but that does not necessarily imply that one caused the other. Rather, a third (or fourth or fifth!) factor may be the real reason behind the
correlation. Yet statistical analysis alone fails to adequately describe behaviors within connected systems.

Our clients’ success in detecting fraud and money laundering with graph technology demonstrates its utility in exposing unusual patterns of behavior. Could the same logic apply when we look for bias in our data?

While this is still an untested concept, we believe that exploring data connected in a graph holds promise for identifying bias. Graph algorithms are helpful here, especially when we don’t already know or suspect a potential source of bias. A graph algorithm is a mathematical operation based on graph theory that analyzes the relationships between data points.

Graph algorithms have different purposes, but they all reveal hidden structures in connected data. Graph community detection is based on the idea that community structures may be a defining characteristic of complex systems. Community detection evaluates how nodes are clustered or segmented. In the researcher example from the previous section, community detection would allow us to discern that all the researchers know each other – presenting a possible source of bias in our data.

Graph centrality algorithms are used to find the most influential nodes in a graph. More influential nodes tend to score higher, giving them more weight in a predictive algorithm. For example, if several press releases made reference to 2-4 companies in a short period of time, those companies would score higher. The ML model trained on that information would tend to favor companies with similar characteristics.

Another example involving a centrality algorithm is credit risk scoring. Predictive features are employed by ML models to assess whether to grant or deny credit to a person. If zip code and county data were both used to predict risk, then we would be double-counting geography – thereby amplifying its effect.

In another variation of this example, say a person was denied credit, but not because of their income or delinquency. Maybe their zip code happened to have a high crime rate within a certain time frame, which the algorithm weighted more heavily. The only way we would discover this is by finding the hidden pattern in the graph.
  Neo4j Kara Doriani O'Shee
Leena Bengani
233-243 Conclusion
Knowing our data is key to guiding AI to make appropriate decisions. Although any new standards endeavor will be difficult, NIST is taking the necessary steps to protect against human harm through its three-stage approach: pre-design, design and development, and deployment.
We have focused on the pre-design stage, which involves documenting data lineage and examining data for bias before use. As this is an emergent and evolving area, widely applicable guidelines around transparency should be a priority in the field of AI.

Ultimately, AI quality depends upon the integrity and appropriateness of the data it has been trained upon. Graph technology is undeniably the state-of-the-art tool for leveraging context, making it a key player in present and future AI development. We welcome ideas from the community as we continue to explore the potential of graphs in detecting harmful bias – in the pursuit of greater transparency, accountability, and fairness in our AI outcomes.
31 RTI International Emily Hadley 106-107, 200 "explainability and interpretability" are listed separately while "security (resilience)" seems to imply that security and resilience have the same meaning. Security and resilience are often considered separate concepts (see Presidential Policy Directive/PPD-21 Critical Infrastructure Security and Resilience for example usage)
  RTI International Emily Hadley 219 "there many ways to categorize" seems to be missing "are"
  RTI International Emily Hadley 528-529 The sentence including the phrase "basing college admissions decisions on an individual's race" is an inaccurate interpretation of the referenced source for this sentence, which explicitly states, "None of the universities The Markup contacted for this article are currently using EAB’s risk scoring algorithms in their admission process."
  RTI International Emily Hadley 532 "enabling the research community to discover them" implies that all researchers do not know about these inequities - often these inequities are known, including by the communities experiencing these inequities, but the individuals or institutions developing AI tools may not be familiar with this research as it may have been completed by other sectors of research
  RTI International Emily Hadley 564-565 The GRADE example would be stronger if there was a peer-reviewed publication providing more detail on the biases in the model, rather than quotes in a news article that are about models generally rather than GRADE itself
  RTI International Emily Hadley 614 "university admissions" is not an accurate description as this source (79) for this example has nothing to do with admissions and instead is focused on predictive advising which happens after a student is admitted and impacts what courses or majors they are advised to take
  RTI International Emily Hadley 616 This quote is not from "admissions officials" - the article (source 48) explicitly says it is from advisers and faculty and it is inaccurate to assume these are also admissions officials since the article later says, "None of the universities The Markup contacted for this article are currently using EAB’s risk scoring algorithms in their admission process,"
  RTI International Emily Hadley 619-621 This is not a "college admissions" example. This is a "college advising" example
  RTI International Emily Hadley 500-625 The multiple cases of misuse of "college admissions" when "college advising" is the correct term is very concerning and suggests a lack of subject matter expertise (which is also larger concern in AI generally).
  RTI International Thom Miano 102 The use of the word "the" in "the research, standards, evaluation, and data..." makes the nouns that follow it specific since "the" is a definite article. Use of this article is confusing since each of these nouns have multiple sources, not some single definitive source.
  RTI International Thom Miano 106-107 The list of technical characteristics "needed to cultivate trust in AI systems" is missing "accountability" and "appropriateness". Human systems (hopefully) have a defined logic of accountability when things go wrong. Automation of previously human systems with AI systems risk eliminating this process. As a simple example, imagine an automated metro train fails to break appropriately and crashes, causing human fatalities. Who should be held accountable? How do factors of the situation create additional nuances to this question? Additionally, just because an AI system could be created to solve some problem doesn't necessarily mean that it should be created. Infrastructure, especially public infrastructure created through the tax dollars of its citizens, should arguably reflect some of the underlying sets of values of that society. Whether or not an AI system should be created is a value judgement based on the set of costs and expected and unexpected outcomes associated with its inception.
  RTI International Thom Miano 104-107 The enumeration of characteristics needed to cultivate trust in AI systems seems out of place. The following sentences in the paragraph then completley switch to the issue of bias specifically, with no other relevant follow-up on the other characteristics mentioned. Additionally, based on the title of this document, this document is about bias, not the other concerns related to trusting AI systems.
  RTI International Thom Miano 120-149 Formatting is not consistent
  RTI International Thom Miano 169-171 [See comment #11] "Accountability" and "appropriateness" should be added to the list of core building blocks of trustworthy AI.
  RTI International Thom Miano   While the focus of this document is bias in AI systems, it would be worthwhile for this document to point out that any type of decision making or analytics (not having to amount to an "AI system") that relies on data suffers from many of the same vulnerabilites/risks discussed in this document. A separate but related (and arguably underlying) question is how should data-driven technologies in general manage bias and mitigate risks.
  RTI International Thom Miano 437 This sentence references how well-developed guidance can assist "business units" and "data scientists". It is overly specific to point out these groups. Across industry "data scientist" has different meanings and sets of responsibilities.
  RTI International Thom Miano 452-460 This issue is visible within the U.S. government itself. See variety of "Request for Proposal"s that define requirements or make requests for overly-specific yet simultaneously vague descriptions of AI solutions for problems that may not warrant AI solutions in the first place.
  RTI International Thom Miano 540-541 "It is also notable that, depending on the industry or use case, AI is typically marketed as an easy solution that does not necessarily require extensive support." Two issues with this statement: (1) the conditional of "depending on..." followed up by the characterization of "typically" doesn't make sense; more importantly, (2) a strong statement like this warrants citations/references, otherwise it lacks credibility. In my experience this statement is not true.
  RTI International Thom Miano 583-585 This statement isn't clear. I can't tell what it's trying to say. For example, I don't know what "AI-based tools can skip deployment to a specific expert end user" means. There seems to be some comparison being drawn to other technologies/products but that isn't clear to me either. The statement "the intended uses for a given tool are often quickly overcome by reality" is also puzzling. Does it mean to say something like "while a technology may be designed with a specific usage in mind, end-users may identify alternative uses for it"?
32 BSA Heidi Obermeyer   BSA | The Software Alliance (BSA)(1) appreciates the opportunity to provide comments to the National Institute of Standards and Technology (NIST) on the draft document “Proposal for Identifying and Managing Bias in Artificial Intelligence” (Draft Proposal). BSA is an association of the world’s leading enterprise software companies that provide businesses in every sector of the economy with tools to operate more competitively and innovate more responsibly. As companies at the forefront of AI innovation, BSA members are acutely aware of both the incredible potential that AI has to improve the world and its unique risks.

Bias has emerged as a top concern for policymakers, developers, deployers and the public as AI tools are used more and more frequently to make consequential decisions, like approval of loan applications or predicting healthcare outcomes. BSA recently released Confronting Bias: A Framework to Build Trust in AI,(2) a first-of-its-kind framework that organizations can use to perform impact assessments to identify and mitigate risks of bias that may emerge throughout an AI system’s lifecycle. We hope that the BSA Framework will be a valuable resource as NIST works to create guidance around mitigating AI bias, as BSA sees significant overlap in the approach outlined in the BSA Framework and the Draft Proposal.

In particular, we would like to applaud several elements of the Draft Proposal that coincide with best practices laid out in our Framework. They include:
• Emphasis on a lifecycle approach. The BSA Framework identifies steps that can be taken in the design, development, and deployment stages of the AI lifecycle to mitigate the risk of bias. Bias can arise in a system at multiple points of its lifecycle and through many different channels, such as in the data used to train a model, in the formulation of the problem the system seeks to solve, or if a model is used in a scenario other than its intended purpose. Efforts to identify and mitigate the risk of bias in AI systems should therefore span throughout a system’s lifecycle. We view a lifecycle approach as a fundamental starting point from which bias should be evaluated, and strongly agree with NIST’s use of the approach in the Draft Proposal.
• A comprehensive taxonomy of bias. The inclusion of a comprehensive taxonomy of the types of bias that can emerge in an AI system further emphasizes the need for a lifecycle approach to risk management. We outline many of these sources of bias in our Framework, and agree that the inclusion of a bias taxonomy will be useful for developers and deployers seeking to better understand the many ways bias can emerge in an AI system.
1 BSA’s members include: Adobe, Atlassian, Autodesk, Bentley Systems, Box, CNC/Mastercam, DocuSign, IBM, Informatica, MathWorks, Microsoft, Okta, Oracle, PTC, Salesforce, ServiceNow, Siemens Industry Software Inc., Slack, Splunk, Trend Micro, Trimble Solutions Corporation, Twilio, Workday, and Zoom.
2 Available at https://ai.bsa.org/confronting-bias-bsas-framework-to-build-trust-in-ai
  BSA Heidi Obermeyer   Overall, the Draft Proposal includes many of the key elements that BSA has identified as effective approaches for managing and addressing the risk of bias in AI, and we are in agreement with NIST on much of what is covered in the document. We offer below a few suggestions to further clarify aspects of the Draft Proposal to better convey the nuanced nature of managing the risk of bias, particularly in cases where it may implicate multiple stakeholders. NIST should consider:
• Emphasizing that bias identification and mitigation may involve multiple stakeholders. The Draft Proposal should account for scenarios in which the deployer and developer of a system are different entities and explain how responsibility for addressing bias may vary in those cases. Many BSA members provide B2B AI services to corporate customers that may retrain and/or customize the underlying AI model using their own data.(3) The Draft Proposal should account for such scenarios and explain how risk management responsibilities may vary depending on the nature of the AI system and the role that the deploying entity may have had in customizing the underlying model.
• Clearer articulation of why bias must be treated as a risk management priority. The underlying rationale of the Draft Proposal is that the risk of AI bias must be managed because it cannot be fully eliminated. We concur. Given the fundamental importance of this principle, we would recommend providing additional detail and context to explain why a risk management approach is necessary. For instance, the Draft Proposal could include a section to explain that effectively guarding against the harms that might arise from AI bias requires a risk management approach because: (1) the concept of bias – and corresponding definitions of “fairness” – are contextual and contested; (2) efforts to mitigate bias can involve tradeoffs that need to be evaluated on a case-by-case basis; and, (3) bias can emerge post-deployment, including in circumstances where a deploying entity uses the systems in an unforeseen or unintended manner.(4)
• Recognizing the role of governance in AI risk management. The Draft Proposal should highlight the critical role of that governance practices play in AI risk management. To that end, the Draft Proposal should emphasize that effective AI risk management should be underpinned by a governance framework that establishes the policies, processes, and personnel that will be used to identify, mitigate, and document risks throughout a system’s lifecycle. In addition, a governance framework should promote understanding across organizational units—including product development, compliance, marketing, sales, and senior management—about each entity’s role and responsibilities for promoting effective risk management during the design, development, and deployment of AI systems.
• Evaluation of “fairness” metrics. The concept fairness is an inherently contested ideal that is subject to multiple potential definitions. To manage the risk of AI bias it is nonetheless critical to select appropriate metrics for evaluating whether the system is performing in an acceptable manner.5 As Professor Arvind Narayanan famously noted, fairness can be mathematically represented using more than 21 different definitions that are impossible to satisfy simultaneously.6 Given the critical importance that fairness metrics play in evaluating whether an AI system is performing in a manner that is unfairly biased, the Draft Proposal would benefit from the addition of a section that surveys the existing range of fairness metrics and discusses what factors that stakeholders can consider in determining whether particular metrics are relevant and/or appropriate for their use case.
• Emphasizing documentation as a useful tool throughout the AI lifecycle. Documentation can play a useful role in both the identification of AI bias risks and communication about how those risks have been mitigated. The Draft Proposal would be strengthened if it highlighted the types of documentation that can serve as useful artifacts for risk management activities, such as records of data provenance, documentation of stakeholder impacts, and risk mitigations.
• Mapping the bias taxonomy to the relevant phases of the AI lifecycle. NIST should add a section under each phase of the AI lifecycle that identifies the types of bias that may occur during that phase. Such a mapping would help stakeholders better understand how bias can emerge in a system and what tangible steps can be taken to mitigate those specific risks during each phase of the AI lifecycle.
BSA appreciates the opportunity to provide feedback on the Draft Proposal and looks forward to continued collaboration with NIST on this important topic.
3 See “Spectrum of AI Development and Deployment Models” on page 18 of Confronting Bias: BSA’s Framework to Build Trust in AI, available at https://ai.bsa.org/confronting-bias-bsas-framework-to-build-trust-in-ai
4 See “Managing the Risk of Bias” on page 9 of Confronting Bias: BSA’s Framework to Build Trust in AI, available at https://ai.bsa.org/confronting-bias-bsas-framework-to-build-trust-in-ai
 
33 New America’s Open Technology Institute Spandana Singh Introduction While NIST has outlined numerous technical factors such as explainability and privacy that are integral for promoting trustworthy AI and mitigating bias, we believe transparency is also critical. As the report points out, most Americans are unaware when they are interacting with algorithmic systems. Transparency measures that are designed with users in mind can help address this issue.
  New America’s Open Technology Institute Spandana Singh   As the report outlines, some technologies are not tested extensively or at all before deployment. Rather, developers use deployment scenarios to test their technologies. This can result in harmful and concerning outcomes. We believe NIST should encourage the creation of guidelines or legislation that requires developers of high-risk algorithms to test their systems before they can be deployed. NIST should also help create clear indicators around when a system can be green lighted to be deployed and when it cannot, within the context of a risk-based framework. Such a risk-based evaluation may also be helpful during the pre-design and design phases. If a system poses too significant of a risk to society and fundamental rights, it shouldn’t be deployed at all. Any efforts to promote FAT around this kind of system in the development, deployment and post-deployment phases will be meaningless if the system is inherently high-risk. We define high risk-algorithms as systems that pose “high risks” to the fundamental rights and freedoms of citizens and society and medium-risk algorithms as systems that pose a moderate risk to the fundamental rights and freedoms of citizens and society.
  New America’s Open Technology Institute Spandana Singh   We believe that NIST should recommend that developers of AI systems, particularly medium and high-risk AI systems, should outline the intended use cases of their tools as well as cases in which the use of their systems could generate harmful or unreliable results. This is similar to the information encompassed in Model Cards.

Additionally, we suggest that NIST recommend that developers and deployers of medium and high-risk AI systems provide a basic, public outline of how their algorithmic system functions to users. Providers of high-risk AI systems that have consequential impacts (e.g. credit algorithms) should also have the ability to appeal decisions made by the system, or if this is not scalable, at the least understand what factors went into informing the decision.

We believe these approaches can help promote transparency and accountability around harms that can result from certain AI systems. In some cases, this information could also help mitigate such harms. As previously noted, however, certain systems may be too high-risk. The design and deployment of such systems should not be permitted.


Providing users with access to an appeals process also helps expand user control and agency over systems that are often responsible for making critical life decisions.
  New America’s Open Technology Institute Spandana Singh   NIST proposes addressing bias in the design, development, and deployment stages of AI systems. However, there is not as much emphasis on continuing these practices post-deployment. A pre-deployment evaluation of a system may indicate that the system is low-risk. But, AI and ML-based systems are constantly changing, learning, and adapting. Additionally, a system may be deployed in a new context. Both of these factors can change the risk potential of the system.
  New America’s Open Technology Institute Spandana Singh   As NIST points out, many companies use proxy data to inform their algorithmic systems. However, as research has indicated, proxy-based inferences are not always accurate and can result in biased and discriminatory outcomes. The proposed alternative to using proxy data is to have companies collect demographic data such as gender and race from users. However, there is little trust in algorithmic systems and companies in certain industries who are deploying them (e.g. internet platforms). There are also few safeguards to protect the collection and use of this data (e.g. the U.S.does not have comprehensive privacy legislation). As a result, we do not believe collective sensitive demographic data is an appropriate solution as it could create new harms and exacerbate existing harms caused by algorithmic systems. However, some researchers and civil rights groups have pushed for the collection of racial data as it could enable audits of racial discrimination to take place. It would be helpful for NIST to provide guidance on how to strike a balance between using proxy data and the collection of demographic data.
  New America’s Open Technology Institute Spandana Singh   The report discusses how multistakeholder and interdisciplinary experts can help AI developers identify and mitigate harmful outcomes. While NIST recognizes that setting these kinds of processes up requires deliberate planning and guidance, it does not consider that larger companies and deployers may have greater access to these resources than smaller ones. Because of this, smaller entities may be at a disadvantage.
  New America’s Open Technology Institute Spandana Singh   OTI has done extensive research on how internet platforms use AI and ML based tools and on the risks poses by facial recognition systems. We have also made recommendations on how companies and governments can promote greater FAT around the ue of these algortihmic systems, particularly high-risk AI. Please find links to this work below: 1) https://www.newamerica.org/oti/reports/report-series-content-shaping-mo…, 2) https://www.newamerica.org/oti/briefs/civil-rights-concerns-regarding-l… 3) https://www.newamerica.org/oti/reports/cracking-open-the-black-box/
  New America’s Open Technology Institute Spandana Singh   See PDF here
34 Japan Mirror Committee of ISO/IEC JTC1/SC42     In ISO/IEC JTC 1/SC42, a technical report, TR 24027, is under the work and will be published soon. In this document, about 30 biases are listed up and classified into 4 categories such as human cognitive bias, data bias, machine learning model architecture bias, bias in rule-based system design and requirement bias. This report also classifies biases based on AI system life cycle. We also understand the intention is to leverage key locations within stages of the AI lifecycle for optimally identifying and managing bias.
  Japan Mirror Committee of ISO/IEC JTC1/SC42     At SC42-WG3, there is a contribution on bias terminologies which lists up more than 170 terminologies depicting biases which was written by a Japanese expert.
We agree with the policy of SP 1270 to provide a broader perspective by which we can strike the problem of AI bias where it might be easiest to manage.
One of the most promising ways to have simple scheme to classify biases may be utilization of cognitive model of human and AI as shown in the attached. This must be the good chance for us to create de-fact scheme to classify tremendous amount of bias related terminologies.
  Japan Mirror Committee of ISO/IEC JTC1/SC42   212, 900 Definition of bias in ISO is not properly quoted. [67] seems to be ISO 3534-1:2006, but it does not define bias as "“the degree to which a reference value deviates from the truth"
  Japan Mirror Committee of ISO/IEC JTC1/SC42   397 We understand this docment do not intend to be exhaustive. The description is simple and concise, which caused to skip some important issues, in particular in pre-design stage, e.g., stakeholder identification, risk/impact analysis, and documentation.
If so, even clearer problem statment is desirable. The description on the focused (or not focused) problem in the huge problem space of bias may help to clarify the objective of this deliberable, which will enlighten practitioners. Future issues should also be clarified.
  Japan Mirror Committee of ISO/IEC JTC1/SC42   415, Figure 1 The relationship between stakeholder groups, risk management, and standards development is not clear. They do not seem to be in the same category.
  Japan Mirror Committee of ISO/IEC JTC1/SC42   673 The intention of the loop is not explained in the text.
  Japan Mirror Committee of ISO/IEC JTC1/SC42   729 Relationship between biases in table 1 and the main body should be clarified. This helps to clarify the relationship between each bias, development stages the bias may creep into, resulting problem, and the issues we can/should solve.
  Japan Mirror Committee of ISO/IEC JTC1/SC42   Table 1 In ISO/IEC 24027 that will be published soon, cognitive bias is not understood as an error, but as a bias in conscious or unconscious human behaviour.
  Japan Mirror Committee of ISO/IEC JTC1/SC42     See PDF Here
35 Surveillance Technology Oversight Project     We thank NIST for inviting commen ts on its draft report, A Proposal for Identifying and
Managing Bias in Artificial Intelligence.” Our comments align with the authors’ three stages of managing AI bias.
To summarize:
• During the pre-design phase, NIST should categorically oppose many of the AI tools under consideration. NIST underestimates the degree to which tools’ harms can be anticipated or prevented, particularly in areas like policing, where both errors and the accurate use of AI can have devastating impacts.
• During the design and development phase, we recommend greater humility regarding the degree to which algorithmic tools can be debiased. Algorithms trained on police administrative data incorporate historical patterns of police abuse and bias. “Debiasing” may indemnify developers against liability for bias they fail to meaningfully curb.
• During the deployment stage, we recommend that NIST acknowledge the rights violations that occur when police AI is misused (regardless of whether tools are “debiased”). NIST’s authors suggest that “[i]nstead of viewing the challenge of AI bias within a given context or use case… [we] strike the problem of AI bias where it might be easiest to manage – within the design, development, and use of AI systems.” NIST’s generalized approach glosses over the unique harms AI threatens in sectors like policing and criminal justice, underemphasizing our obligation to protect the public from systems that not only can increase injustice, but threaten Americans lives. NIST states that “[t]he goal is not zero risk but rather, identifying, understanding, measuring, managing and reducing bias.” To the contrary, police AI is inherently incompatible with the public’s safety, liberty, and fundamental rights. We must not merely mitigate policing tools’ risk, but instead truly protect the public by banning tools that impose too great a cost on society.
  Surveillance Technology Oversight Project     I. The Pre-Design Phase: Focus on whether a tool should be built at all
Police AI exacts such a predictable toll on civil rights that these tools should have been blocked during NIST’s “pre-design” phase. And such systems would have been blocked if developers seriously evaluated their social impact. But given vendors’ eagerness to sell policing technology, NIST is dangerously dismissive of the key pre-design question: should we build a tool at all? Or rather, because police AI typically is used first in other fields, should an existing technology be imported into policing?

The question of whether to adapt algorithmic tools for law enforcement use could not be more momentous. Individuals’ freedom from wrongful imprisonment, safety, privacy, freedom of association and other fundamental rights have been jeopardized by the importation of AI systems into policing. Predictive policing algorithms have justified the continued, dangerous over-policing of BIPOC neighborhoods by police precincts with a history of racial bias.1 Facial recognition errors have already been documented as causing the wrongful arrest of several Black men, though many more individuals have likely been impacted by the technology without knowing.2 ShotSpotter errors bring armed police into Black and Latinx communities under conditions primed for deadly mistakes.3 Criminal justice algorithms routinely mete out biased recommendations for pretrial detention and imprisonment of Black and Latinx individuals.4

Any adequate pre-design phase would require developers to demonstrate that they are not replicating the same sort of deadly errors showcased by this technology to date. The tools described above could have been and should have been abandoned during development. And those of us who work in the police technology space have seen enough such tools to anticipate the civil rights violations that future will introduce.

Consider the devastating and foreseeable effects of the New York City Police Department’s (“NYPD’s”) use of PredPol beginning in 2013. PredPol is an algorithmic tool that claims to predict where and when crimes will occur and who will commit it.5 PredPol predictably focused police on the low-income BIPOC communities targeted by NYPD officers for years, with devastating results.6

Had PredPol’s developers considered the probable effects of the technology during the pre-design phase, the following facts would have stood out: I. Police discrimination against BIPOC communities has distorted historical policing data.7 II. Police encounters are disproportionately dangerous for these same communities. Black men are two and a half times more likely than white men to be fatally shot by police,8 and they are the victims in one of three fatal traffic stops.9
III. Users use products to “tech-wash” biased behavior.10 Developers could anticipate that officers would use PredPol to justify biased policing and avoid scrutiny.11

In short, developers could have foreseen the very reasons not to move forward with building predictive policing tools: they perpetuate racist policing, subjecting BIPOC communities to dangerous, continual police harassment and ensnaring individuals in the criminal justice system.

NIST’s authors do acknowledge the kinds of reasons that could lead to a tool’s abandonment in the pre-design phase: It is an obvious risk to build algorithmic-based decision tools for settings already known to be discriminatory.
[P]re-design is often where decisions are made that can inadvertently lead to harmful impact, or be employed to extremely negative societal ends.

But NIST plays the apologist, downplaying developers’ ability to foresee civil rights concerns: [A]wareness of which conditions will lead to disparate impact or other negative outcomes is not always apparent in pre-design, and can be easily overlooked once in production. Instead, NIST focuses on managing risk while moving risky projects forward: [W]ell-developed guidance, assurance, and governance processes can assist business units and data scientists to collaboratively integrate processes that reduce bias without being cumbersome or blocking progress. And NIST casts the decision to stop tools’ development as a rare, “extreme” measure rather than a reasonable and common solution:
In extreme cases, with tools or apps that are fraudulent, pseudoscientific, prey on the user, or generally exaggerate claims, the goal should not be to ensure tools are bias-free, but to reject the development outright. Contrary to NIST’s contention, pseudoscience and exaggerated claims are not “extreme cases”; they are typical for policing AI. It is not enough, having anticipated “extremely negative societal ends,” to ensure that “risk management processes… set reasonable limits related to mitigating such potential harms.” Contrary to NIST’s contention, effective limits frequently cannot “reduce bias without being cumbersome or blocking progress.” The correct response to biased and invasive tools is to simply stop their development and sale completely.
This recommendation is not extreme. Other scientific disciplines long recognized that some advances simply come at too high a price. Consider the breakthroughs that scientists could have made in chemical and biological warfare over the past 50 years if permitted. Such agents would be potent weapons in our military arsenal, but they would pose an intolerable risk to all of humanity. Many of the AI systems under development—and indeed, many in use today—pose an intolerable cost to human rights and civil rights, both here in the United States and when exported abroad. Before we ask how to build better AI, we must ask if that AI is truly an acceptable solution for the problems we purport to solve.
1 Rashida Richardson, Jason Schultz, and Kate Crawford, “Dirty Data, Bad Predictions: How Civil Rights Violations Impact Police Data, Predictive Policing Systems, and Justice,” N.Y.U. Law Review Online 94, no. 192 (February 13, 2019), https://papers.ssrn.com/abstract=3333423.
2 Kashmir Hill, “Another Arrest, and Jail Time, Due to a Bad Facial Recognition Match,” The New York Times, December 29, 2020, sec. Technology, https://www.nytimes.com/2020/12/29/technology/facial-recognition-miside….
3 “Comments on Draft NYPD Surveillance Policies,” Center for Constitutional Rights, February 25, 2021, https://ccrjustice.org/node/9092.
4 Julia Angwin et al., “Machine Bias,” ProPublica, May 23, 2016, https://www.propublica.org/article/machine-bias-risk-assessments-in-cri….
5 Tim Lau, “Predictive Policing Explained” (Brennan Center for Justice, April 1, 2020), https://www.brennancenter.org/our-work/research-reports/predictive-poli….
6 Rashida Richardson, “Dirty Data, Bad Predictions.”
7 Josmar Trujillo, “Why NYPD’s ‘Predictive Policing’ Should Scare You,” City Limits, January 29, 2015, sec. CITY VIEWS: OPINIONS and ANALYSIS, https://citylimits.org/2015/01/29/why-nypds-predictive-policing-should-….
8 Lynne Peeples, “What the Data Say about Police Brutality and Racial Bias — and Which Reforms Might Work,” Nature 583, no. 7814 (June 19, 2020): 22–24, https://doi.org/10.1038/d41586-020-01846-z.
9 Wesley Lowery, “A Disproportionate Number of Black Victims in Fatal Traffic Stops,” Washington Post, December 24, 2015, sec. National, https://www.washingtonpost.com/national/a-disproportionate-number-of-bl….
10 John D. Lee and Katrina A. See, “Trust in Automation: Designing for Appropriate Reliance,” Human Factors 46, no. 1 (2004): 50–80, https://doi.org/10.1518/hfes.46.1.50_30392.
11 Josmar Trujillo, “NYPD’s ‘Predictive Policing’.”
 
  Surveillance Technology Oversight Project     II. The Limits of Debiasing in the Design and Development Phase
NIST states that algorithmic tool bias can be mitigated during the design and development phase:
Instead of viewing the challenge of AI bias within a given context or use case, a broader perspective can strike the problem of AI bias where it might be easiest to manage – within the design, development, and use of AI systems.
These unintentional weightings of certain factors can cause algorithmic results that exacerbate and reinforce societal inequities. The surfacing of these inequities is a kind of positive “side effect” of algorithmic modeling, enabling the research community to discover them and develop methods for managing them.

This is a dangerously idealistic approach to ending technology-aided discrimination, particularly in fields like policing. If an algorithm does not “exacerbate and reinforce social inequities” in a lab, it easily may do so in the real world.

The difficulty with debiasing policing algorithms has to do with how the tools become biased in the first place:
Historical, training data, and measurement biases are “baked-in” to the data used in the algorithmic models underlying those types of decisions. Such biases may produce unjust outcomes for racial and ethnic minorities in areas such as criminal justice.

Policing algorithms’ “training data”—the data that models correct behavior for algorithms—is administrative data that police departments and courts collect on a day-in, day-out basis. Police records includes information on reported incidents, police stops, arrests and charges leading to arrest. Court records add information on convictions and acquittals, pretrial detention and bail, and other details about individuals’ passage through the criminal justice system. But those records include the results of biased, corrupt, and criminal policing.12
Here in New York City, NYPD records memorialize unconstitutional practices such a Stop-And-Frisk, which targeted 5 million individuals who were stopped from 2002-2013 in a practice likened to a police “war with Black and Brown people.”13 In 2013, a federal court determined that stop-and-frisk violated the Fourth and 13th Amendments.14 But algorithms trained on administrative data collected from 2002-2013 still learn how to replicate the NYPD’s racial profiling from the height of stop-and-frisk.
NYPD records are also shaped by police criminality, including falsifying records, arbitrary arrest and summons quotas, and planting evidence on innocent New Yorkers.15 In 2017 alone NYC taxpayers paid $335 million to victims of police abuse.16 Those crimes—including many wrongful arrests—are “baked in” to whatever tools the NYPD trains. Even data that is supposed to have been expunged is still part of the NYPD’s records. As of at least 2018, officers still had routine access to data on dropped, declined, and dismissed arrests that should have been expunged pursuant to state law.17
If we simply focus on fixing the algorithm, as NIST suggests, the technical solution is to seek out a more balanced training dataset—one that doesn’t systematically target BIPOC communities. But there is no unbiased dataset for policing in America, just records bathed in bias, memorializing practices that no developer should seek to emulate.

Proponents of algorithmic policing tools suggest “cleaning” training data or balancing its outputs to reduce algorithms’ disparate impacts. We believe that it is dubious that developers have the proper incentives to effectively implement such strategies—something that frequently may not be technically possible. Rather, they will use such techniques to minimize liability and reputational risks. It would be the height of “data hubris,” to use NIST’s term, to imagine that a debiased algorithm can correct centuries of systemic discrimination.
12 Rashida Richardson, “Dirty Data, Bad Predictions.”
13 Ashley Southall and Michael Gold, “Why ‘Stop-and-Frisk’ Inflamed Black and Hispanic Neighborhoods,” The New York Times, November 17, 2019, sec. New York, https://www.nytimes.com/2019/11/17/nyregion/bloomberg-stop-and-frisk-ne….
14 Joseph Goldstein, “Judge Rejects New York’s Stop-and-Frisk Policy,” The New York Times, August 12, 2013, sec. New York, https://www.nytimes.com/2013/08/13/nyregion/stop-and-frisk-practice-vio….
15 Rashida Richardson, “Dirty Data, Bad Predictions.”
16 Jake Offenhartz, “Lawsuits Against NYPD Cost Taxpayers $230 Million Last Year,” Gothamist, April 17, 2019, https://gothamist.com.
17 Eli Hager, “Your Arrest Was Dismissed. But It’s Still In A Police Database.,” The Marshall Project, July 18, 2019, https://www.themarshallproject.org/2019/07/18/your-arrest-was-dismissed-but-it-s-still-in-a-police-database.
  Surveillance Technology Oversight Project     III. Bias and Misuse in the Deployment Phase
In its discussion of the deployment phase, NIST anticipates “off-road” uses of algorithms—unplanned uses where “the tool is used in unforeseen ways.” This bland description does not capture the gravity of police technology abuses. As seen with the misuse of ShotSpotter and facial recognition technology, even if developers fixed technical drivers of algorithmic bias, such efforts would fail to address the true gravity of police AI’s threat to the public.

NYPD’s Biased Placement of ShotSpotter Units
ShotSpotter is a for-profit corporation that markets systems that use audio surveillance and algorithmic software to purportedly locate gunshots. ShotSpotter boasts high accuracy levels in laboratory conditions,18 though its real-world performance is wanting.19 ShotSpotter’s algorithm may not be biased, but it’s placement overwhelmingly in low-income BIPOC communities is. As shown below, the yellow areas on the left where ShotSpotter is deployed in New York City largely mirror the red neighborhoods on the right with the highest levels of poverty.

Since ShotSpotter is highly error prone, with one study finding that 89% of reports were false,20 ShotSpotter’s biased placement makes BIPOC communities bear the constant cost of armed officers rushing to the scene of shootings that never happened.21 In one tragic example earlier this year, police responded to a ShotSpotter report of gunshots. 22 Five minutes later, they shot and killed Adam Toledo, a 13-year-old who was holding his empty hands up when he died.23 ShotSpotter’s bias can’t be appreciated in a lab, but biased ShotSpotter deployment is already endangering real world neighborhoods in Chicago, New York, and countless other cities

NYPD Abuse of Facial Recognition
The NYPD uses at least two facial recognition (“FR”) vendors: DataWorks Plus, its main vendor, and Clearview AI, which NYPD has used on an extended trial basis. The tools’ performance in the lab is unknown. DataWorks does not conduct accuracy and bias testing, according to one of its own managers.24 Clearview AI generally does not submit its system for outside testing and its accuracy and bias are not publicly known. Clearview’s tool displays a disturbing lack of respect for privacy: it identifies unknown individuals by comparing their photos to “3 billion photos scraped from the web,”25 forcing individuals in those billions of photos to stand in a “perpetual line-up.”26
But the most disturbing thing about NYPD’s FR tools is not how they perform in the lab—it is how the department uses them. NYPD officers are free to misuse and abuse the tools, exercising “artistic license” with photos to improve their chances of finding a supposed match, having an unmeasurable impact on accuracy and bias. According to a report on NYPD FR practices from Georgetown’s Center on Privacy and Technology, photo “edits often go well beyond minor lighting adjustments and color correction, and often amount to fabricating completely new identity points not present in the original photo.”27 NYPD has replaced features and expressions in street-camera photos with features from mugshots.28 It has used “3D modeling software to complete partial faces” and to “rotate faces that are turned away from the camera.”29 By scanning altered photographs, officers destroy what little credibility FR has as a reliable source of identification.30 Police practices routinely transform the technology into the very sort of pseudoscience that NIST dismissed as “extreme cases.”

Even worse, the NYPD primarily compares probe images against a gallery of historical mugshots, skewing the risk of false “matches” to disproportionately impact the BIPOC communities who have long faced higher arrest rates. Like use of NYPD data to train PredPol, this practice can compound the impact of biased police practices even years after they take place.31

Lastly, when using FR, NYPD officers manually select the winning “match” from a list of hundreds of possible results. This adds yet another layer of bias—and potential for outright abuse—in the facial recognition decision process. NIST suggests that when a gap exists between an algorithm’s intended use and its actual use, the solution may be “deployment monitoring and auditing” followed by adjustments to the algorithmic model. But there is no algorithmic fix that will correct officers’ bias or misuse of AI, particularly facial recognition.

18 “ShotSpotter Repond Q&A,” ShotSpotter, December 2020, https://www.shotspotter.com/wp-content/uploads/2020/12/ShotSpotter-Resp….
19 See, for example, “End Police Surveillance,” Roderick & Solange MacArthur Justice Center, 2021, https://endpolicesurveillance.com/.
20 “End Police Surveillance.”
21 “End Police Surveillance: The Burden on Communities of Color,” Roderick & Solage MacArthur Justice Center, 2021, https://endpolicesurveillance.com/.
22 Timothy R. Homan, “Police Technology under Scrutiny Following Chicago Shooting,” Text, TheHill, April 21, 2021, https://thehill.com/homenews/state-watch/549612-police-technology-under….
23 Chrisoph Koettl and Evan Hill, “How an Officer Killed Adam Toledo: Video Investigation - The New York Times,” The New York Times, April 14, 2021, https://www.nytimes.com/2021/04/16/us/adam-toledo-video-investigation.h….
24 Kashmir Hill, “Wrongfully Accused by an Algorithm,” The New York Times, June 24, 2020, sec. Technology, https://www.nytimes.com/2020/06/24/technology/facial-recognition-arrest….
25 Tate Ryan-Mosley, “The NYPD Used Clearview’s Controversial Facial Recognition Tool. Here’s What You Need to Know,” MIT Technology Review, April 9, 2021, https://www.technologyreview.com/2021/04/09/1022240/clearview-ai-nypd-e… .
26 Clare Garvie, Alvaro Bedoya, and Jonathan Frankle, “The Perpetual Line-Up” (Center on Privacy & Technology at Georgetown Law, October 16, 2016), https://www.perpetuallineup.org/.
27 Clare Garvie, “Garbage In, Garbage Out: Face Recognition on Flawed Data,” Georgetown Law Center on Privacy and Technology, May 16, 2019, https://www.flawedfacedata.com.
28 Garvie, “Garbage In, Garbage Out.”
29 Garvie, “Garbage In, Garbage Out.”
30 Garvie, “Garbage In, Garbage Out.”
31 Mariko Hirose, “Privacy in Public Spaces: The Reasonable Expectation of Privacy against the Dragnet Use of Facial Recognition Technology,” Connecticut Law Review 49, no. 5 (September 2017): 1591–1620.
  Surveillance Technology Oversight Project     IV. Conclusion
As NIST finalizes its report, we recommend revisions that acknowledge the civil rights violations that AI policing tools enable. In the pre-design phase, developers must abandon tools that risk acute harms and high rates of bias. In the design and development phase, developers must acknowledge limits on technical debiasing, especially for policing tools. And during deployment, supposedly “debiased” tools must not be permitted to be deployed in biased ways. More broadly, NIST must look beyond technical fixes to algorithms. Only by addressing human behavior and systemic bias can we address the racism and injustice AI enables and augments.
36 Hitachi Group Companies Hicham Abdessamad   Responses to NIST Draft Proposal
The 2019 NIST U.S. Leadership In AI: A Plan for Federal Engagement in Developing Technical Standards and Related Tools correctly notes the mitigation of harmful bias is necessary to create a trustworthy AI standard. In the various filings Hitachi has done in response to NIST draft proposals, we have consistently noted bias in datasets is of particular concern. Bias can hurt the development and deployment of an AI system when that bias creates disparate impact on one segment of the population, or exacerbate societal biases. We continue to applaud NIST for its collaborative work with the industry to create the foundational needs to grow AI as a technology and tool for innovation and economic prosperity.

At the outset, it is important to remember that AI is based in science. Rigorous data science often follows the scientific method. This can include observations, well-structured questions, research on prior work, clear hypotheses, testing & experimentation, analysis, and thoughtful discussion about conclusions including limits to the methods, data, and useful application in the real world. Bias as defined in the Glossary section can appear in every step. The Draft mentions types of bias including statistical bias (foundationally related to quantitative methods, observable data, systemic error in parameter estimates) with cognitive bias (more broadly associated with qualitative methods, systemic error in our thinking or beliefs). Cognitive bias is particularly challenging because our ability to conceive of it is often language constrained, and is constantly evolving over time.

NIST’s recognizes that bias can be harmful or productive, and that is appreciated. There are instances where bias is useful in ensuring the outcome of an AI system is functioning correctly. The medical field is a perfect example where race and gender are required biases that may warrant inclusion in an AI system. The suggested medical treatment could see a need to account for both identifiers as those of different race or gender will react to medical treatments differently. This is an example where eliminating bias could create worse outcomes for a patient—incorrect medical treatment.

NIST should attempt to create more clarity on bias by creating complete definitions and evaluation methods to determine different types of biases: harmful bias, helpful/acceptable bias, preferred bias, and neutral bias. These definitions and categorizations should specify the harm NIST is seeking to prevent or stop, and describe how to apply the evaluation of the definition to each AI application. Creating these classifications and definitions of bias creates standard terminology that industry can utilize as it createsand evaluates AI systems. It is important to include context in the specifications since a harmful bias in one situation can be helpful in another, as shown with the medical example.

As NIST notes, context of an AI system is key, as we demonstrated in the medical example This Special Publication can further the understanding of bias via its definitions and offer examples so the industry understands how NIST is considering bias in context.

NIST recognizes the potential use of AI to “create conditions that reduce (or eliminate) biased human decisions making and bringing about a more equitable society.” Here it is offered that the use of counter factual outcomes is necessary to address bias in a system. We agree with this and appreciate the inclusion of this and the identification of AI making life easier as tenets to the deployment of tools. It is also noted, however, that the data for this evaluation is not actually available since the alternative outcome did not happen. When an outcome did not take place, there is not data available for interpretation of this alternative outcome from the observable result that did happen. The proposed redress is purposely introducing cognitive bias to remove our perceived level of statistical bias. The assumptions include, “We are smarter now” and at least our methods are explicit (open for observation and debate).

While trying to account for conditions to eliminate cognitive bias, it is possible designer bias is not put into the system and there is no way to test to see if the inserted bias is accurate. NIST uses hiring, healthcare, and the criminal justice system to discuss bias, but each of these areas can be highly subjective and influenced by societal values that may change over time. It would be helpful, when this type of bias is introduced to a system to try and reduce or eliminate biased human decisions for equitable society that the resulting AI system transparently alert the end user to the bias that has been inserted for this reason and what that bias is. We would ask for examples from NIST in this area to further understand NIST’s views.
  Hitachi Group Companies Hicham Abdessamad   In reviewing this Special Publication, we would offer that there could be an additional 4th step in the identification and mitigation of bias. Machine learning models are rarely, “set it an forget it.” The 4th step would be the Operation of the AI system, which must include a feedback loop. The feedback loop would help identify bias that may not have been seen in the previous steps. The operation stage would also address system drift, or offer continued validation that as the model learns and changes that it isn’t including new bias which creates unwanted outcomes. A quality control process in this operation step could catch the emergence of bias before the system drifts too far. The Special Publication does discuss a testing and auditing practice, and this is a critical step to any scientific research application to identify bias or offer limitations to an AI system which improves the product. By specifically recognizing as a step the need for AI systems to have this feedback loop, NIST can increase transparency of AI systems and provide the opportunity to describe the bias observed, what category the bias has been given, and the impact the bias has on the AI system, and provide a key to evaluating the subsequent iterations of the system to the correct version of the AI system.

Increasing transparency and creating public trust in AI systems is a key component to trustworthy AI. The operations phase that allows monitoring and quality control could include public participation to call out identified, or potential perceived, bias. Public participation could potentially demonstrate to the public that some applications of a specific AI system is not appropriate in some instances. This would essentially create a “nutritional label” for the AI system so the public can make independent determinations of the benefit or use of the AI system. On-going evaluation with those not engaged in the development of the AI system also helps mitigate mission creep/bleed where the system may itself go beyond what it is intended to measure.
A useful tool, understanding that bias may exist in some form, would be for documentation, like a nutrition label, to classify or quantify the bias, describe actions taken to eliminate or limit the bias impact to the algorithm or system, describe any bias that has been inserted into the system to “bring about a more equitable society,” and to describe the degree to which bias was removed and classify that removed, observed bias.

NIST can further help industry by giving guidance on evaluation of trade-offs of inserting bias to make a system align with NIST’s evaluation of an equitable society. Industry would need to understand if one model fixes gender bias and increases racial bias, but another might account for racial bias and increase gender bias, the first outcome is preferable to the second outcome if both biases cannot be equalized. Again, this can help provide a more complete picture of the AI system and inform the end user so they are making appropriate use of the technology.

While much focus is on algorithms and datasets when it comes to bias, this focus is too narrow. In fact, it is best to consider not only the elements, but how the entire AI system functions, to understand the effects of bias in a system. AI systems are not accurate in every circumstance, something that may not be discovered if you focus on only a dataset or an algorithm. A human element is necessary to see and determine when a recommendation from a system should be accepted or rejected. NIST’s efforts are important to help define the rules to reduce bias, but we encourage NIST to expand the consideration of bias to include datasets, algorithms, and the outcomes of the system.

Finally, NIST speaks in this Special Publication about criminal justice systems, healthcare, and credit ratings. While these areas are frequently cited as areas bias exists, focusing too much on these areas could leave out increasing the safety of an AI system in an industrial context. AI systems should first and foremost create, enhance, or otherwise ensure a safe environment. This is especially true for Industrial AI systems or cobot situations. In industrial settings, the key is to make sure the AI system or cobot is enhancing safety and a system that has bias could compromise that situation. For instance, a cobot that cannot recognize a human nearby, either because it can’t detect racial, ethnic, gender, or physical attributes, and thus not adjust to the presence of a human could create a safety hazard. If the system is not trained to recognize humans with physical disabilities, such as wheelchair use, or of various weights or heights, the machine interacting with a human might misjudge or not avoid a safety incident. NIST could either work to include a section on bias identification impacts to safety in industrial settings, or should consider a physical safety standard for AI systems in industrial settings. There may also be a need to investigate potential mental health safety standards as well.
  Hitachi Group Companies Hicham Abdessamad   Conclusion
Hitachi appreciates NIST’s vigorous effort to implement the February 11, 2019 Executive Order (EO 13859) on securing the country’s leadership in AI. This Draft Special Publication is a helpful step in that implementation, furthering the U.S. advancement in AI and working with industry to set standards for future innovation in this area. We look forward to our continued collaboration to assist the federal government as it works to develop internationally agreed-upon, consensus-based standards that promote trustworthiness and widespread AI adoption.
39 National Fair Housing Alliance Michael Akinwumi 197-200 The technical characteristics needed to cultivate trust in AI systems should include fairness, responsibility, and auditability.

These principles are core to advancing ethics of AI or machine learning and they should be promoted as features required to cultivate trusts in AI systems.

Please check:
(1) Müller, Vincent C., "Ethics of Artificial Intelligence and Robotics", The Stanford Encyclopedia of Philosophy (Summer 2021 Edition), Edward N. Zalta (ed.), URL = <https://plato.stanford.edu/archives/sum2021/entries/ethics-ai/&gt;.

(2)
https://www.fatml.org/resources/principles-for-accountable-algorithms
  National Fair Housing Alliance Michael Akinwumi 219 Authors wrote that "not all types of bias are negative" without an example or a clarification of when a bias does not become adversarial to consumers.
  National Fair Housing Alliance Michael Akinwumi 220 The report should focus on both harmful societal outcomes and harmful outcomes for individuals. The scale (either society or a small group of individuals such as communities of color) should not dictate what the focus should be when trust, bias or discrimination of AI systems are at issue.
  National Fair Housing Alliance Michael Akinwumi 335-342 Every AI or machine learning model has a lifespan. An AI system that claims to be responsible should have a mechanism that monitors the usefulness of an AI solution post-deployment.
  National Fair Housing Alliance Michael Akinwumi 344 Authors mentioned "in-place and in-development AI technologies and systems". It is not clear what the qualifiers mean.
  National Fair Housing Alliance Michael Akinwumi 415 Figure 1 suggests that the cycle of AI system management ends at deployment. However, deploying an AI solution generally leads to changes in underlying data (data that the AI system acts on post-deployment may differ in scope and patterns from those used to develop the system). Hence, a component that monitors the behavior and performance of an AI solution post-deployment should be included in the proposed approach for managin AI bias.
  National Fair Housing Alliance Michael Akinwumi 424 Data understanding and decisions about what model or set of models to train are usually decided in the pre-design stage. These concepts are currently missing in the section.
  National Fair Housing Alliance Michael Akinwumi 510 The design and development stage is extremely light on the role that algorithms or models play in driving bias and discrimination downstream the AI solution pipeline.
  National Fair Housing Alliance Michael Akinwumi 523 Authors recommend "... taking context into consideration ..." under Design and Development Stage. However, contexts are usually set at the problem formulation stage of the pre-design phase. In addition, the referenced context only refers to data context.
  National Fair Housing Alliance Michael Akinwumi 562 Similar to other sections of the report, the real-world example highlights proxy issues though it seems the intent is to call attentions to how decisions made at the model development stage may lead to bias and discrimination downstream.
  National Fair Housing Alliance Michael Akinwumi 583 In "Since many AI-based tools can skip deployment to a specified expert end user", it is not clear if authors meant users are the ones who deploy AI or that the deployment is often outsourced.
  National Fair Housing Alliance Michael Akinwumi 636-640 Arguments that compare pre-deployment and post-deployment model performances fit well under model monitoring.
  National Fair Housing Alliance Michael Akinwumi   The report is framed as an approach that could be used for identifying and mitigating bias in AI solutions. As is, it lacks actionable steps or guidance for mitigating or identifying any bias in AI
40 Chamber of Commerce Chamber Technology Engagement Center Michael Richards   To Whom It May Concern:
The U.S. Chamber of Commerce’s Technology Engagement Center (“C_TEC”)
appreciates the opportunity to submit feedback to the National Institute of Standards and Technology’s (“NIST”) draft special publication on “A Proposal for Identifying and Managing Bias within Artificial Intelligence” (“publication”). C_TEC appreciates NIST’s ongoing efforts to advance understanding of AI and convene diverse sets of stakeholders to “advance methods to understand and reduce harmful forms of AI bias.”

The draft publication addresses the importance of managing bias in the development of AI systems and proposes approaches to assessing and mitigating such bias. C_TEC believes that reducing and mitigating unwanted bias is extremely important in building trustworthy AI. A recent C_TEC report1 surveying top industry officials indicated that 68% believe that “bias” significantly impacts consumers’ trust in AI.

In addition, we are encouraged by NIST’s efforts to develop a voluntary Risk Management Framework (RMF) for AI. We encourage NIST to use those upcoming workshops for the RMF to further engage with stakeholders to review how specific frameworks may impact particular industries. Additionally, these workshops can discuss current rules and regulations that are already in place to determine if further guidance is needed. We also encourage NIST to revisit this draft publication as the framework is updated to ensure alignment and updates where appropriate.

Furthermore, C_TEC commends NIST’s understanding that “Bias is neither new nor
unique to AI” and that bias within algorithms can both be both positive and negative. Also, we
commend NIST’s acknowledgment that the “goal is not zero risk but, rather, identifying,
understanding, measuring, managing and reducing bias.” However, C_TEC believes that some
areas within the draft publication could benefit from further discussion and clarification. These
are outlined below.
1 https://www2.deloitte.com/content/dam/Deloitte/us/Documents/technology/…-
ai-full-report-new.pdf
  Chamber of Commerce Chamber Technology Engagement Center Michael Richards   First, the draft publication uses the International Organization for Standardizations
(“ISO”) definition of “bias,” “the degree to which a reference value deviates from the truth.”
While C_TEC is supportive of using standard definitions to help ensure common understanding,
we do have concerns with ISO’s definition and the term “truth” being used. We believe that
truth can be subjective as what is determined to be true one day can be deemed inaccurate the
next and vice versa. For this reason, we believe that any definition of bias should avoid using the
term.
  Chamber of Commerce Chamber Technology Engagement Center Michael Richards   Second, the draft publication outlines a three-stage approach of pre-design, design and
development, and deployment in reviewing and mitigating bias. A similar process has been
defined as “pre-processing, in-processing, and post-processing.” We are concerned that there are
many terms for this approach being used within the current literature. We therefore encourage
NIST to use established terms when possible and that regulatory agencies be consistent in their
vocabulary. This will help reduce confusion or misinterpretation as stakeholders come from a
wide array of industries.
  Chamber of Commerce Chamber Technology Engagement Center Michael Richards   Third, C_TEC understands that the draft publication was not looking to “identify and
tackle specific bias within cases.” However, we believe that NIST should look to provide realworld
examples and use cases to help stakeholders, creators, and industries further understand
how such mitigation techniques may be applicable to them. Providing real-world examples and
use cases would likely improve stakeholder input as it would better allow for comments on how
such processes and methods are transferable to their work.
  Chamber of Commerce Chamber Technology Engagement Center Michael Richards   Fourth, NIST comments that “Another cause for distrust may be due to an entire class of
untested and/or unreliable algorithms deployed in decision-based settings. Often a technology is
not tested – or not tested extensively – before deployment, and instead deployment may be used
as testing for the technology” While we understand NIST concerns with non-tested systems, we
do believe it’s important for NIST to take the context of the deployment into consideration in a
manner that does not incur ambiguity or unnecessary concern. The example within the draft
publication was of an AI “systems during the COVID pandemic that has turned out to be
methodologically flawed and biased.” C_TEC would like to highlight that there are times in
which prior testing may not be feasible (e.g., emergency circumstances and needed training
datasets that have yet to be created or developed.)

In cases like these, NIST should develop a framework or guidelines to help organizations
build mechanisms that would help enable continual improvement and continuous identification
and mitigation of issues.

We further note that bias is not necessarily a risk in all use cases, as some AI systems
may be working at such low stakes that bias may need to be reviewed and addressed differently.
In these instances, a framework that accounts for the range of bias impacts may not be
appropriate or necessary and could impose undue implementation costs. For these reasons,
C_TEC is highly supportive of adopting a risk-based approach to AI governance which is
outlined within our AI principles2.
2 https://americaninnovators.com/news/u-s-chamber-releases-artificial-intelligence-principles/
  Chamber of Commerce Chamber Technology Engagement Center Michael Richards   Fifth, the draft publication recommends new “standards and guides are needed for
terminology, measurement, and evaluation of bias.” C_TEC believes it’s important to highlight
that many different stakeholders have previously worked closely with their regulators to develop
processes to managing and mitigating potentially harmful AI bias. This is why we believe it is
vital that NIST work closely with those agencies in an effort to be consistent with other policies
and procedures already in place.

For example, the financial services industry is already heavily regulated and must adhere
to fair lending standards, which look to reduce unwanted bias. In other cases, much more work is
still needed to develop the appropriate standards and best practices around identifying,
measuring, and evaluating bias and for understanding what mitigation strategies would be
considered reasonable. The NIST framework should try to harmonize its framework with
existing standards and work with stakeholders to develop the appropriate guidance.
  Chamber of Commerce Chamber Technology Engagement Center Michael Richards   Sixth, Bias testing in some instances requires data on protected legal classes. Much of
this data is not available, and or stakeholders do not store such data. For this reason, we would
encourage NIST to provide further guidance about what they would consider reasonable
expectations for those stakeholders.
  Chamber of Commerce Chamber Technology Engagement Center Michael Richards   Seventh, the draft publication concludes that “bias reduction techniques are needed that
are flexible and can be applied across contexts, regardless of industry.” C_TEC supports a
flexible, non-prescriptive, and performance-based AI framework that can adapt to rapid changes
and updates. However, we believe NIST should make further clarification on what type of “bias
reduction techniques” or reasonable efforts companies can take towards managing and reducing
the impacts of harmful biases NIST is referring to. Specifically, whether they are explicit to the
AI algorithms or the processes to change an outcome to meet the definition of fairness.
  Chamber of Commerce Chamber Technology Engagement Center Michael Richards   Eight, the draft publication correctly highlights the importance of problem formulation in
addressing potential bias; however, guidance on best practices and how to operationalize those
practices to address this concern remain underdeveloped. We would encourage NIST to
elaborate on this topic in future work to facilitate a constructive conversation around the
feasibility of scaling such practices.
  Chamber of Commerce Chamber Technology Engagement Center Michael Richards   Ninth, we encourage NIST to include in its report a clear plan for how its work on
measuring and mitigating AI bias can translate into harmonized standards across federal
agencies. The proliferation across the federal government of definitions of fairness and bias and
differing expectations on what constitutes reasonable efforts to mitigate bias creates a real challenge for entities seeking guidance on how to develop and measure their AI systems. NIST
can play an important role in encouraging the government to adopt a harmonized approach to AI
and bias, as it did in driving the adoption of cybersecurity standards across the federal
government.
  Chamber of Commerce Chamber Technology Engagement Center Michael Richards   Tenth, NIST has not taken the decommissioning phase into account within the “AI
Lifecycle.” There are instances when systems will be retired and new systems will be deployed,
which may lead to impact to different stakeholders differentially. We ask NIST to provide
further clarification on how to address and mitigate such impacts during these transitions.
In conclusion, NIST has a critical role in convening stakeholders to discuss ways to
mitigate bias within AI systems. C_TEC continues to support NIST’s efforts on this topic and
again appreciates the opportunity to submit comments on this draft publication. We look
forward to collaborating with NIST on the next steps for this draft publication and future AI related
activities.
42

AI Blindspot

Americans for Financial Reform Education Fund

California Reinvestment Coalition

Center for Community Progress

Center for Responsible Lending

Consumer Action

Consumer Federation of America

Fair Housing Advocates Association

Fair Housing Advocates of Northern California

Fair Housing Center of Central Indiana

Fair Housing Center of Northern Alabama

Fair Housing Center of Southwest Michigan

Fair Housing Center of West Michigan

Fair Housing Council of Greater San Antonio

Integrated Community Solutions, Inc.

The Leadership Conference on Civil and Human

Rights

Long Island Housing Services, Inc.

Louisiana Fair Housing Action Center

Miami Valley Fair Housing Center, Inc.

MICAH- Metropolitan Interfaith Council on

Affordable Housing

Mountain State Justice

NAACP Legal Defense and Educational Fund, Inc.

(LDF)

National CAPACD

National Coalition For The Homeless

National Community Reinvestment Coalition

National Consumer Law Center (on behalf of its

low-income clients)

National Fair Housing Alliance

North Texas Fair Housing Center

NYU Center on Race, Inequality, and the Law

SolasAI

South Suburban Housing Center

Southern Poverty Law Center Action Fund

Woodstock Institute

Maureen Yap   See PDF Here
43 Center on Privacy & Technology at Georgetown Law Cynthia Khoo   See PDF Here
44 PASCO Coalition     See PDF Here
45 Center for Cybersecurity Standards (CCSS) at NSA    219 Not all types of bias are negative, and there many ways to categorize or manage bias; this report
Recommend updating ‘there many ways’ and ‘there are many"...
  Center for Cybersecurity Standards (CCSS) at NSA    286 candidates from certain neighborhoods). Recommend consider standards to prevent this type of discrimination
  Center for Cybersecurity Standards (CCSS) at NSA    295-296 questionnaires are from a specific sampling of the kinds of people who are online, and therefore  leaves out many other groups. Data representing certain societal groups may be excluded in the (recommend noting there are varied diverse groups of people
online and questionnaires need to be posted to ensure responses received from diverse populations are included)
  Center for Cybersecurity Standards (CCSS) at NSA    303 been established to prohibit discrimination based on grounds such as gender, age, and religion.) Recommend updating to gender, age, race, disability and religion.)
  Center for Cybersecurity Standards (CCSS) at NSA    593 Recommend standards to prevent this type of discrimination
  Center for Cybersecurity Standards (CCSS) at NSA    general Bias is a new headliner for AI/ML for the past few years and is widely known.  When it comes to trust in AI, in more complex applications, Bias probably needs to be taken into consideration somehow.

My last point on the paper is about distrust.  Society, especially AI/ML professionals are all aware of drift.  That is, the model is trained on some data, in some cases time-series. Then the data changes because everything changes. Society, constructs, behaviors, external influences, so the performance in accuracy of the AI/ML is going to decay over time, and this is hard to measure and understand.  However, it breeds mistrust in all AI/ML.  Would I trust a model trained on data from 1999?  Probably not, because of how much everything has changed since then.
46 Brooklyn Defender Services Elizabeth Vasquez et al   See PDF Here
47 Center for Democracy & Technology Gabriel Nicholas and Hannah Quay-de la Vallee 223-228; 262-264 The proposal takes a universal approach to bias in AI and points out the limitations of approaches that “classify bias by type (i.e.: statistical, cognitive), or use case and industrial sector (i.e.: hiring, health care, etc.)” Although in places the document acknowledges the variety of impacts biased AI can have depending on the complexities and dynamics of a given context, it does not adequately address the limitations of a universal, context-agnostic approach to bias in AI, particularly in contexts where human rights and anti-discrimination laws are significantly at issue, such as financial services or the criminal justice system (see comments from the PASCO Coalition for an example of the dangers of deploying AI without considering the human rights implications).
  Center for Democracy & Technology Gabriel Nicholas and Hannah Quay-de la Vallee   The proposal lacks clear definitions for fundamental terms such as “risk”, “harm”, and “artificial intelligence system”. NIST may want to leave room for context-specific authorities to define such terms themselves. However, readers may not have sufficient guidance to come up with their own definitions.
  Center for Democracy & Technology Gabriel Nicholas and Hannah Quay-de la Vallee 239-252 When discussing the challenges posed by bias in AI, the proposal gives use cases where AI systems can cause significant harm to individuals, such as in hiring, health care, and criminal justice. It does not include examples of AI systems that could harm entire populations, such as recommender systems and content moderation algorithms on social networks with hundreds of millions or billions of users. Such harms are not always as salient or easily observable as individual harms, but the sheer scale means they can have a significant impact.
  Center for Democracy & Technology Gabriel Nicholas and Hannah Quay-de la Vallee 242-243 The proposal suggests that adopting a risk-based framework encourages technologists to develop means of “managing and reducing the impacts of harmful biases in AI”. The document fails to mention that AI systems should have built-in means of redress for when bias does occur.
  Center for Democracy & Technology Gabriel Nicholas and Hannah Quay-de la Vallee 387-389 The document positions audits as the primary tool for limiting bias in AI systems. Audits can be a helpful tool for accountability and improvement, but they should not be considered sufficient without other forms of transparency about a given AI system. Focusing on audits without also providing for other forms of transparency overlooks the role of, among others, journalists, academics, impacted communities, and civil society in raising and addressing these issues.
  Center for Democracy & Technology Gabriel Nicholas and Hannah Quay-de la Vallee 387-390 The proposal suggests that the way to build public trust in AI is through empowering an authority of expert AI auditors rather than through building trust in individual AI systems. This is an unduly cramped and incorrect conclusion. Many communities, particularly marginalized communities that are most likely to be affected by bias, may not trust alleged experts and in any case will be guided in their views by their lived experience with AI systems. Moreover, members of impacted communities, particularly those that are already marginalized or at-risk, may be more aware of and adept at detecting bias than auditors. Explainability and transparency are crucial to building trust -- a “black box” system simply will not lead to trust even if it is “validated” by an auditor
  Center for Democracy & Technology Gabriel Nicholas and Hannah Quay-de la Vallee 418-422 The proposal provides a framework to identify bias in an AI system, but does not offer guidance on how to address bias once discovered. Notwithstanding its title, the proposal is significantly skewed towards “identifying” bias in AI rather than “managing” it.
  Center for Democracy & Technology Gabriel Nicholas and Hannah Quay-de la Vallee 654-664 The proposal implies that AI developers can sufficiently prevent their systems from producing biased behavior and outcomes by accounting for bias during the pre-design, design, and development stages. However, systems may exhibit unexpected behavior or results when placed in a live environment. Consequently, assessment done during the design and development process may be incomplete or incorrect.
  Center for Democracy & Technology Gabriel Nicholas and Hannah Quay-de la Vallee 475-478 We commend NIST for the guidance to assemble and support diverse teams of designers and developers. We do feel the language could be more inclusive where disability is concerned.
  Center for Democracy & Technology Gabriel Nicholas and Hannah Quay-de la Vallee 319-333 We commend NIST for acknowledging the presence of “snake oil salesmen” in the field of AI and the importance of evaluating systems on their efficacy in addition to considerations like non-discrimination and safety.
48 Palantir Anthony Bak   See PDF Here
49 Alliance for Automotive Innovation Hilary Cain   See PDF Here
50 STM Philip Carpenter   See PDF Here
51 (1): UC Berkeley Center for Long-Term Cybersecurity (CLTC); (2): UC Berkeley Center for Information Technology Research in the Interest of Society (CITRIS) Policy Lab; (3): UC Berkeley Center for Human-Compatible Artificial Intelligence (CHAI) Anthony Barrett (1), Jessica Newman (1), Ifejesu Ogunleye (1), Brandie Nonnecke (2), Thomas Gilbert (3) Lines 228-229, 415-416, 542, 655, 673-674, 992-993 NIST SP 1270 first mentions the AI lifecycle on lines 228-229, and first depicts it in Figure 1 on lines 415-416. The AI lifecycle seems sensible overall, though the depiction in Figure 1 seems simplified and linear, implying a “waterfall” sequential lifecycle. Software development often occurs in more iterative fashion, e.g. with Agile development lifecycles, which could have less distinct stages and thus NIST guidance may be less clearly applicable to those lifecycles. NIST mentions these other lifecycles but does not clearly state whether/how NIST AI bias management concepts should be adapted for other lifecycles. (The lifecycle diagram in Figure 2 on lines 673-674 appears to include a loop implying iterative development, with deployment leading back to pre-design, but the diagram has no accompanying text discussion.) In addition, though SP 1270 mentions monitoring in passing on lines 542 and 655, it does not currently depict monitoring as an important part of the AI lifecycle. By contrast, other lifecycles such as the OECD lifecycle explicitly include monitoring (see reference 106 on lines 992-993).

We suggest modifying the AI lifecycle in SP 1270 and add associated discussion as follows: First, more consistently depict potential for iterative/cyclical development in the AI lifecycle. Second, add another stage for “Monitoring and Evaluation” after Deployment; monitoring is both an important part of the AI lifecycle and there are risks of bias during this time, i.e. based on what is measured / evaluated, how, and by whom.
  (1): UC Berkeley Center for Long-Term Cybersecurity (CLTC); (2): UC Berkeley Center for Information Technology Research in the Interest of Society (CITRIS) Policy Lab; (3): UC Berkeley Center for Human-Compatible Artificial Intelligence (CHAI) Anthony Barrett (1), Jessica Newman (1), Ifejesu Ogunleye (1), Brandie Nonnecke (2), Thomas Gilbert (3) 279-281 We appreciate that several passages frame potential risk of bias in terms broad enough to potentially be applicable to increasingly advanced AI; e.g., lines 279-281 state that “The difficulty in characterizing and managing AI bias is exemplified by systems built to model concepts that are only partially observable or capturable by data. Without direct measures for these often highly complex considerations, AI development teams often use proxies.” However, increasingly advanced AI systems will pose risks of bias with impacts of greater scale and magnitude, with greater difficulty of mitigation. For a potential long-term example, if organizations utilize AI systems to maximize proxy measures of human wellbeing such as profit or GDP, large-scale optimization for these simple metrics instead of for more holistic human welfare goals could result in environmental or geopolitical harm (Bommasani et al. p. 116). For a near-term example, nearly all state-of-the-art natural language processing (NLP) models now are adapted from, and inherit biases from, one of a small number of “foundation models” such as BERT (Bommasani et al. 2021 pp. 5, 110), and foundation models can amplify effects of biases in training data that already present challenges for machine learning models (Bommasani et al. pp. 132, 134).

We suggest considering how best in SP 1270 to address the potential effects of bias risks from increasingly advanced and broadly-applicable AI such as foundation models. We suggest at least adding a statement such as “Careful mitigation of harmful bias will be even more important for increasingly advanced and broadly-applicable AI systems such as foundation models, which could pose risks of bias with impacts of greater scale and magnitude, with greater difficulty of mitigation.”

References:

Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, Bernstein MS, Bohg J, Bosselut A, Brunskill E, Brynjolfsson E, Buch S, Card D, Castellon R, Chatterji N, Chen A, Creel K, Davis JQ, Demszky D, Donahue C, Doumbouya M, Durmus E, Ermon S, Etchemendy J, Ethayarajh K, Fei-Fei L, Finn C, Gale T, Gillespie L, Goel K, Goodman N, Grossman S, Guha N, Hashimoto T, Henderson P, Hewitt J, Ho DE, Hong J, Hsu K, Huang J, Icard T, Jain S, Jurafsky D, Kalluri P, Karamcheti S, Keeling G, Khani F, Khattab O, Kohd PW, Krass M, Krishna R, Kuditipudi R, Kumar A, Ladhak F, Lee M, Lee T, Leskovec J, Levent I, Li XL, Li X, Ma T, Malik A, Manning CD, Mirchandani S, Mitchell E, Munyikwa Z, Nair S, Narayan A, Narayanan D, Newman B, Nie A, Niebles JC, Nilforoshan H, Nyarko J, Ogut G, Orr L, Papadimitriou I, Park JS, Piech C, Portelance E, Potts C, Raghunathan A, Reich R, Ren H, Rong F, Roohani Y, Ruiz C, Ryan J, Ré C, Sadigh D, Sagawa S, Santhanam K, Shih A, Srinivasan K, Tamkin A, Taori R, Thomas AW, Tramèr F, Wang RE, Wang W, Wu B, Wu J, Wu Y, Xie SM, Yasunaga M, You J, Zaharia M, Zhang M, Zhang T, Zhang X, Zhang Y, Zheng L, Zhou K, and Liang P (2021), On the Opportunities and Risks of Foundation Models. arXiv, https://arxiv.org/abs/2108.07258
  (1): UC Berkeley Center for Long-Term Cybersecurity (CLTC); (2): UC Berkeley Center for Information Technology Research in the Interest of Society (CITRIS) Policy Lab; (3): UC Berkeley Center for Human-Compatible Artificial Intelligence (CHAI) Anthony Barrett (1), Jessica Newman (1), Ifejesu Ogunleye (1), Brandie Nonnecke (2), Thomas Gilbert (3) Lines 223-229, 259-265, 344-345, 595-630, 632-650, 611-613, 638-640 We appreciate that several passages seem to reflect a broad concern about potential risks of bias, instead of focusing inappropriately on intended use cases (which could contribute to overlooking potential changes in use, abuse/misuse, etc., and which could undermine characteristics of trustworthy AI besides mitigation of bias). Passages reflecting broad concern about potential risks of bias instead of focusing excessively on intended use cases include lines 223-229, 259-265, 344-345, 595-630, 632-650, 611-613, and 638-640.
  (1): UC Berkeley Center for Long-Term Cybersecurity (CLTC); (2): UC Berkeley Center for Information Technology Research in the Interest of Society (CITRIS) Policy Lab; (3): UC Berkeley Center for Human-Compatible Artificial Intelligence (CHAI) Anthony Barrett (1), Jessica Newman (1), Ifejesu Ogunleye (1), Brandie Nonnecke (2), Thomas Gilbert (3) Lines 242-243, 351-353, 699-700 While we agree with your statements that “zero risk” seems an infeasible goal for technology, decision makers must decide what constitutes acceptable risk. Decision makers, especially those who are directly involved in the development process, have important risk management choices regarding technologies with harmful biases, including whether and how to modify them, when and how to deploy them, when to stop using them, and whether to forgo using them in the first place. The current draft for SP 1270 does not seem to mention such risk management options. Passages that mention “zero risk” include lines 242-243, 351-353, and 699-700.

We suggest adding at least one mention of such risk management options in SP 1270 after one of the statements about infeasibility of developing zero-risk technology. For example, consider adding “In some cases, stopping the use of an AI system or meaningfully changing its design may be required to prevent a biased outcome.”
  (1): UC Berkeley Center for Long-Term Cybersecurity (CLTC); (2): UC Berkeley Center for Information Technology Research in the Interest of Society (CITRIS) Policy Lab; (3): UC Berkeley Center for Human-Compatible Artificial Intelligence (CHAI) Anthony Barrett (1), Jessica Newman (1), Ifejesu Ogunleye (1), Brandie Nonnecke (2), Thomas Gilbert (3) 256-265 There is a need for accountability tools and metrics that are suited to the risks of actual existing systems that have already been (or are likely to be) deployed—including the API, licenses, and data usage—in addition to and beyond the potential for statistical bias in formal models. Beyond formal verification procedures, entirely new metrics are required for evaluating systems whose effects operate at scales that were previously inaccessible to both governments and corporations. Recent papers that speak to this perspective and acknowledge the distinction between systems and models include: Mitchell et al. (2019), Raji et al. (2020), and Paullada et al. (2020). In a broader sense, there is a need for basic research into how specific formal modeling assumptions dynamically interact with the system interface, either to users, administrators, or other stakeholders. Studies of strategic classification (Miller et al. 2020) and performative prediction (Perdomo et al. 2020) comprise very early theoretical steps in modeling this problem, but much more work is needed.

We suggest adding discussion of these points, and adding the 2020 papers to the literature review (Mitchell et al. 2019 is already in the bibliography).

References:

Miller J, Milli S, and Hardt M (2020) Strategic classification is causal modeling in disguise. International Conference on Machine Learning. Online, http://proceedings.mlr.press/v119/miller20b/miller20b.pdf

Mitchell M, Wu S, Zaldivar A, Barnes P, Vasserman L, Hutchinson B, Spitzer E, Raji ID, Gebru T (2019) Model Cards for Model Reporting, in Conference on Fairness, Accountability, and Transparency 2019, January 29–31, 2019, Atlanta, GA, USA. ACM, https://doi.org/10.1145/3287560

Paullada A, Raji ID, Bender EM, Denton E, and Hanna A (2020) Data and its (dis)contents: A survey of dataset development and use in machine learning research, NeurIPS 2020 Workshop: ML Retrospectives, Surveys & Meta-analyses, Virtual, https://arxiv.org/pdf/2012.05345.pdf

Perdomo JC, Zrnic T, Mendler-Dünner C, and Hardt M (2020) Performative prediction. International Conference on Machine Learning, Online, http://proceedings.mlr.press/v119/perdomo20a/perdomo20a.pdf

Raji ID, Smart A, White RN, Mitchell M, Gebru T, Hutchinson B, Smith-Loud J, Theron D, and Barnes P (2020) Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing, in Conference on Fairness, Accountability, and Transparency 2020, January 27–30, 2020, Barcelona, Spain, ACM, https://doi.org/10.1145/3351095
53 Modzy   471 The section fails to take into account tactical steps that technical teams can take to evaluate potential bias before models are trained or data is collected to ensure that potential biases are identified and can escalate to the appropriate decision-makers to determine whether models should be developed. This shift in thinking / approach helps teams to begin thinking about bias-related issues before models or products are even designed.
  Modzy   632 This section could be more prescriptive about steps teams can take to close the gap between context and performance. One of the persistent challenges in AI model deployment and management today stems from a lack of transparent documentation about model training, provenance, expected performance, potential biases, etc. This challenge is exacerbated by the fact that many organizations don't track how and where models are deployed, and are instead reliant on individual data scientists and engineers babysitting model performance. By documenting this in a centralized location, such as a model registry, teams can ensure that all stakeholders have access to information around model performance, potential biases, good use cases, etc.
  Modzy   642 While the document references interpretability, there is still a huge disconnect today in terms of how to best explain or provide interpretable results to different stakeholders. In many cases, tools only provide results in formats understandable to data scientists, but ignore other non-technical stakeholders. Where possible, teams should always incorporate explanations for results that are explainable to all stakeholders, resulting in greater transparency for AI-enabled decisions.
  Modzy   653 While the document acknowledges that risk management tools and techniques will be the focus of future documents, the explanation of auditing and monitoring techniques should be expanded to include more detail around how teams can expand upon the use of a "model registry" or ModelOps platform to perform more comprehensive auditing, monitoring, and governance. Monitoring is a key element of deployment to ensure that model performance remains high over time, and that models aren't subject to data drift and subsequent bias.
54 AHIP Danielle Lloyd   AHIP is responding to the call for public comments in response to Draft Special Publication
1270, “A Proposal for Identifying and Managing Bias in Artificial Intelligence” (AI).1
We support the intent of the proposal, and applaud the well-researched, comprehensive description of the challenges of addressing bias in AI. Many of the steps outlined in the document are in use by some entities and can be helpful for understanding AI and its benefits and challenges.

The intended use of the model is critical for understanding, identifying, and avoiding unintended bias in AI. Depending on the application, understanding and being transparent about bias in AI may be more important than trying to eliminate it. For example, some AI initiatives are designed to benefit specific groups or populations and might be considered “good bias.” In other words, bias should not be construed as “bad” when the intent of an application may, in fact, be to identify and benefit an underserved class or individual.

Engaging a diverse set of stakeholders, including payers specifically, in the design of AI use cases is a good way to understand when historical or underlying data patterns may be perpetuating bias, and to address or eliminate unintended outcomes as much as possible. Likewise, diverse individuals that can be impacted by the AI should be engaged as well.
1 AHIP is the national association whose members provide health care coverage, services, and solutions to hundreds of millions of Americans every day. We are committed to market-based solutions and public-private partnerships that make health care better and coverage more affordable and accessible for everyone. Visit www.ahip.org to learn how working together, we are Guiding Greater Health. See, https://www.nist.gov/artificial-intelligence/proposalidentifying-and-managing-bias-artificial-intelligence-sp-1270.
  AHIP Danielle Lloyd   Usefulness and industry acceptance will be critical for adoption and effective use of AI.
The NIST proposal is intended to be appropriately agnostic to sector, but accommodation between different sectors should be clarified, particularly for consumers (e.g., how consumers will interact with and utilize AI-powered tools in different or intersecting sectors). Higher trust will be evident if an AI powered service offers transparency and high utility to the individual.

Understanding of AI outside of the analytic community will be important to establish sound policy around AI. While systemic error introduced through bias is a significant concern because of the inequity it creates or exacerbates, we recommend NIST also consider the implication of random error in the use of AI. While it is rare to achieve 100% accuracy, the level of deviation must be assessed as part of the purpose for which AI is used, along with a risk analysis and potential error threshold. For example, AI techniques may require transparency for data quality, “decision outputs,” and other “behind-the-scenes” programming that enables the AI functionality. Other examples depend on the level of risk involved, such as whether the AI is being used to accomplish a direct outcome for a patient. The draft proposal could also benefit from more discussion and analysis of the differing levels of risk involved with AI for automation and for decision support, and the implications of being either insufficiently critical or overly optimistic of the various AI system program components and outputs.

We are eager to engage in public dialog with vendors/developers about how to mitigate bias as much as possible but understand that “glitches” might still occur. Consensus should be achieved for handling such events. Time and education are also needed for policymakers and regulators to become better equipped to understand AI and to ensure transparent communication to end users and other appropriate stakeholders.
  AHIP Danielle Lloyd   We also believe more public input into understanding “use cases” would help inform NIST’s work. For example, additional public forums could be scheduled by NIST to further explore use cases, applications, outcomes, and potential unintended results, along with the impact on individuals and/or groups. As discussed above, we recommend that NIST engage a diverse set of stakeholders and discuss how to include supported decision-making leveraging AI.

The simplicity, robustness and transparency of these frameworks will be the foundation for building public trust in AI and the organizations that use it. Governance is essential for engaging public trust, consistency across applications, and appropriate and ethical use of the technologies. Likewise, program design will be a key element for AI. We suggest that the document discuss governance structures more fully, as well as how automated and supported decision-making by AI directly relate to the key governance rules.
In addition, the lifecycle stages in the proposal correspond with existing processes, as do the actions suggested to mitigate bias at each stage. We would suggest adding an additional stage which describes the ongoing monitoring of AI for bias and including feedback from users as a “trigger” for re-examination and testing. Some of these elements could be incorporated into the framework, which can be updated as adoption and use of AI evolves.
  AHIP Danielle Lloyd   We recommend that the final document be revised to discuss the many positive impacts of AI more fully. While there is potential for negative impacts of unchecked or potentially unmonitored bias and unintended discriminatory practices or outcomes, this bourgeoning area holds significant promise for exponential advancements in healthcare. Policy makers and the public should be made aware of the AI functions in current systems, and the measures to ensure fairness and effectiveness throughout use and application. We must balance the risks associated with AI against the benefits and seek to mitigate risk in a way that does not stifle innovation.
55 ACLU     See PDF Here
56 LEIDOS LEIDOS 212-217 The definition of bias is very close to just “accuracy”
  LEIDOS LEIDOS 212-215 What do you mean by Truth? Is there any methods to determine the Truth? For instance, What is the Truth (for bias metric) when it comes to hiring?
  LEIDOS LEIDOS 212-217 These are multiple ISO definitions to consider here:
“difference between the expectation of a test result (3.4.1) or measurement result (3.4.2) and a true value (3.2.5)
Note 1 to entry: Bias is the total systematic error as contrasted to random error. There may be one or more systematic error components contributing to the bias . A larger systematic difference from the true value is reflected by a larger bias value.
Note 2 to entry: The bias of a measuring instrument is normally estimated by averaging the error of indication over an appropriate number of repeated measurements. The error of indication is the: “indication of a measuring instrument minus a true value of the corresponding input quantity”.”
https://www.iso.org/obp/ui/#iso:std:iso:3534:-2:ed-2:v1:en
  LEIDOS LEIDOS 562-572 The bias of a model is the different between a model’s predictions and a measurable true response value. This seems to be conflated with certain categorical variables being under/over represented, for example, in an admissions process.
  LEIDOS LEIDOS 212-217 Context matters here. Can't take a definition given for a specific context and reuse it for another purpose/context w/o recognizing or acknowledging the different use/intent. Bias (as in error/skew/offset) can happen in physical quantity measurements, in which "truth" can be (most of the time) established. This includes equipment calibration, measureable quantities in physics or physical properties, etc. The ISO/NIST definition for bias may be more closely associated with such physical measures. The context in social cases can be different, where establishing “truth” (such as college admissions) can be much harder to define, let alone establish. Also, another aspect is bias in design vs. bias in data. Bias in data is more easily “fixed” by bringing in “better” data. Bias in design may be harder to detect and/or not as easily fixed (once design/implementation/deployment has bee done).
  LEIDOS LEIDOS   Should add more examples that address cases such as military/intel/healthcare and address what harm vs no-harm means in these scenarios
  LEIDOS LEIDOS 719-731 No discussion of the other 37 important biases examples in the glossary(e.g, impact of annotation bias measurements)
  LEIDOS LEIDOS 415 Stakeholder, risk management and standards are mentioned in diagram but the document does not provide a clear flow for how these concepts fit together, specifically not addressing standards and how they are tied to risk management strategies
  LEIDOS LEIDOS 512-515 Design and Development stage −No mention in the section of actual metrics for doing validation and evaluation
  LEIDOS LEIDOS 653-664 Rarely saw the use of Fairness Terminoloy. 20 academic referenes for fairness but only 1-2 direct mention of fairness terms in a document about managing bias
57 Workday Evangelos Razis   Workday is pleased to respond to the National Institute of Standards and Technology’s (NIST) request for feedback on its Proposal for Identifying and Managing Bias in Artificial Intelligence (proposal).

Workday is a leading provider of enterprise cloud applications for finance and human resources, helping customers adapt and thrive in a changing world. Workday applications for financial management, human resources, planning, spend management, and analytics have been adopted by thousands of organizations around the world and across industries—from medium-sized businesses to more than 50 percent of the Fortune 500.

Workday incorporates machine learning (ML) technology within our applications to enable customers to make more informed decisions and accelerate operations, as well as assist workers with data-driven predictions that lead to better outcomes. Workday believes ML technology has the potential to impact enterprises in the near-term by making operations more efficient. In the longer term, organizations will be able to reorganize their operations around machine learning and artificial intelligence’s (AI) unique possibilities.

AI is becoming an ever-increasing and transformative presence in our lives, driving human progress in countless ways. To achieve AI’s full potential, however, there must be broad confidence that organizations are developing and using AI responsibly, including by addressing potentially unintended and harmful bias. With this in mind, Workday welcomes NIST’s efforts to develop consensus standards and a risk-based framework for trustworthy artificial intelligence. We are pleased to offer NIST the following comments on its proposal.
  Workday Evangelos Razis   I. General Comments
Workday commends NIST for recognizing that identifying and mitigating harmful bias is needed to cultivate trustworthy and responsible artificial intelligence. Harmful bias is today a chief concern for policymakers, developers and deployers of AI solutions, and the public at large. NIST’s proposal is a welcome tool for organizations seeking to address these concerns and an important contribution to its efforts to develop an AI Risk Management Framework (RMF).

To support efforts to advance responsible AI, earlier this year Workday published Building Trust in Artificial Intelligence and Machine Learning, a whitepaper outlining a “Trustworthy by Design” Regulatory Framework.

The paper calls on AI developers to implement and describe procedures to identify and mitigate sources of potentially harmful bias in their AI systems. It also proposes that AI developers document the procedures they use to test for, identify, and mitigate the effects of potentially harmful bias, as well as establish diverse teams to design and develop AI systems. Workday underscored these and other points in our comments to NIST’s request for information on an AI Risk Management Framework.
  Workday Evangelos Razis   II. Specific Comments
a. Support for a Risk-Based Approach to Governing AI
NIST wisely frames its efforts to reduce bias within AI and ML systems as having the goal of reducing harmful impacts and discriminatory outcomes on people. The clarity and focus of this framing is of great value. Different AI and ML use cases will pose different risks to individuals. A rigorous approach to identifying and managing bias in high-risk use cases will help to build trust in the technology. In addition, NIST’s proposal should also specify that there are low risk use cases for AI and ML enabled technologies which do not require the same rigorous approach to mitigating bias. This distinction would allow organizations to focus resources on identifying and addressing potentially harmful bias in AI systems that pose the highest risk to individuals.

b. Support for a Lifecycle Approach to Risk Management
Workday firmly supports NIST’s lifecycle approach to identifying and mitigating harmful bias in AI systems. As bias may enter an AI system at multiple points, from the pre-design to the development to the deployment stage, a lifecycle approach is an essential starting point for risk management in this area. Workday seeks to leverage a similar approach in its “Trustworthy by Design” Regulatory Framework.

c. The Importance of AI Governance Programs
Workday recommends that NIST’s proposal highlight the importance of governance programs in identifying and addressing harmful bias in AI systems. Organizations put in place governance programs to make tangible the goals, principles, and values underpinning trustworthiness, including addressing harmful bias. Absent these conditions, AI risk-management is a constellation of tools and practices implemented unevenly, without transparency and accountability. Recognizing that the specifics of a governance program will necessarily vary according to the size and capacity of organizations and the particular application/use of AI, NIST should nonetheless account for governance in its proposal and consider the basic elements of a holistic program in its forthcoming risk management framework.

d. Elaborating on Different Roles in the AI Ecosystem
Given the diversity of actors in the AI ecosystem, NIST’s proposal should further elaborate on the different roles companies play in identifying and addressing harmful bias in AI systems. Developers of AI often provide the technology to deployers and not directly to end users. In Workday’s case, we include ML-enabled features in our services, which our customers deploy and use in their operations. Our customers then configure the service, which operates on their data. All of this impacts how those ML-enabled features work and the risk profile they may present to individuals. By further delineating the distinction between developers and deployers, NIST’s proposal will enable all stakeholders to better understand how their organizations can identify and address harmful bias in AI systems.

e. Bias Metrics and Management Techniques
Given that the need to identify and manage harmful bias is most relevant to high-risk AI systems, we recommend that NIST evaluate different bias metrics and management techniques in the literature in the context of existing legal obligations. As NIST is aware from its extensive literature review, researchers have proposed many different definitions of bias and fairness and corresponding bias mitigation techniques. Developers and deployers of AI and ML technologies would find value in NIST categorizing these management techniques along the lines of the proposal (pre-design, design and development, and deployment) and analyzing the viability of these approaches, as well as the various definitions of bias and fairness, in different contexts.
  Workday Evangelos Razis   III. Conclusion
Thank you for the opportunity to respond to NIST’s request for feedback on its Proposal for Identifying and Managing Bias in Artificial Intelligence. We congratulate NIST on the work put into AI thus far, including its efforts to build an AI Risk Management Framework. Workday welcomes opportunities to support NIST in its efforts to advance trustworthy AI. We stand ready to provide further information and to answer any additional questions.
58 PEATWorks     See PDF Here
59 Innocence Project     See PDF Here
60 Pepsico Athina Kanioura 488 PepsiCo has established core project teams to identify, evaluate, and procure data. The project team includes representation across data science, data management, data governance/stewardship and data strategy to ensure that the data is properly identified, sourced and cleansed. As it pertains to AI specifically, the data science team is responsible for the conceptualization of the AI and associated outputs which will include mitigation based on data set. Part of this mitigation process will include support from Data Management and Data Governance/Stewardship when necessary to provide more domain specific detail and ensure control around data availability and access.
  Pepsico Athina Kanioura 543 As PepsiCo develops digital products and services, our team does extensive research with users (e.g. interviews) to capture the “day in the life” of application, their specific needs, and pain points. Subsequently, these are transformed into user stories and later into requirements for user experience, user interface, analytics (AI), and data. This product approach allows our team to map the requirements into a roadmap of releases. The users play a large part in this process and are key validators of the functionalities defined in the product.

When it comes to the specific AI components of a product, PepsiCo AI developers (data scientists and ML engineers) have full awareness of the use case context because they are an integral part of the discussions described above. They also provide input to the research by mapping existing AI experiences on those use cases.
  Pepsico Athina Kanioura 547 The developers at PepsiCo have a clear understanding of how the model or the model’s outputs are being used in the presentation to the end users. This helps avoid the introduction of bias in the translation, especially in optimization problems where the proposed solution may be lacking some business constraint information. Validation and unit tests are designed and put in place to prevent bugs/errors and make sure that the implementation does not have unwanted behavior that could result in bias.

The correct selection of variables or features is important to avoid bias in the predictions but also in the interpretation of the results. That is why we formulate the problem, for example, using a canonical function anchoring in a proper literature review to understand the potential variables impacting that target variable. From that general definition, we map to the datasets internally and externally that we have access to. We document any gaps between ideal variables and the ones we have access, also understanding the potential loss of information (surrogate data) plus accounting for measurement or aggregation bias. When it comes to understanding the bias coming from the data itself, the proper exploratory analysis (EDA) allows us to see missing data (connected to imputation techniques), variables with little variability, outliers, masked data, or even the need to generate synthetic data to account for certain data inequalities. EDA allows an understanding of statistics related to incoming training data. A key element to unbiasing data would relate to stratifying depending on the type of data required. A correct documentation of this allows us to expand better our algorithmic and evaluation metrics decisions.
  Pepsico Athina Kanioura 558 At PepsiCo, we use preprocessing, modeling, and postprocessing techniques to ensure that our model predictions are equitable across different groups of users and different scenarios. As we have mapped the requirements of each solution we are building, the benchmarks of previous AI solutions or naïve approaches to sort the use case, we adapt our framework to validate the new AI implementation. Making sure that we evaluate our entire end-to-end machine learning life cycle through a lens of fairness and unbiasedness is paramount to our organization's success in building models that create real value.
  Pepsico Athina Kanioura 558 PepsiCo avoids using an auto-machine learning approach as the experienced development teams understand the different foundations of each algorithm for the correct data preparation, tunning, and evaluation. For example, in cases of ensemble of models, the understanding of not using too similar models is key to avoid bias and reducing the variability needed by ensembles.

Also understanding the algorithms help us to understand the inductive bias, which allows a learning algorithm to prioritize one solution over another, independent of the observed data. An example of this bias is a tendency to praise lower complexity coming from features. So, unless there is good evidence that a feature is useful, it should be deleted.

For us, making sure that bias is part of the objective/loss function of a model solution allows bias to become as relevant as other commonly used performance metrics (such as MAPE in predictive models).

From a data perspective, we compare representativeness between training input data and testing. Meaning to detect if we are trying to model an observation that is far from the input space that was used for training (otherwise typically the models will extrapolate results beyond this region in a weird way). Recent impact of COVID-19 on data can be an example of this.

From an output perspective, that means control for the different dimensions of use of the application. For example, in forecasting demand, if the forecast is going to be shown at different level of aggregation, a way to reduce bias is to produce the forecast at different level of granularity to reduce potential bias.

So, assessing complexity is part of the audit PepsiCo teams do because it allows them to plan the product needs around data availability, computing resources plus the functional and non-functional tests needed in the model/s operations (MLops). Unit tests on the prediction data are used to see how features of the input drive the prediction. These unit tests will be integrated into the CI/CD of the application. There is also synergy here with understanding drivers of prediction. If a driver is seen but biased unjustifiably, it would appear in driver analyses.

The other area of success of our AI implementations is related to the ability for the AI system to be explained. From a bias perspective, this is a clear need, so we are able to correct the factors which inadvertently lead to bias. Thus, we do a post-match analysis of factors influencing output versus an initial data exploration would reveal potential biases. We are evaluating using the libraries like Aequitas Themis ML as post-processing framework to asses fairness.

One approach we use is to approximate a “complex” model (e.g., non-linear) using a simpler one (e.g., linear), which is easier to interpret. The main challenge comes from the fact that the simple model must be flexible enough so it can approximate the complex model accurately.

Another approach is regarding the global and local importance of the variables by quantifying the influence of each input variable. The quantification alone does not constitute a complete explanation but serves as a first step in gaining insight into the model’s reasoning.

Finally, another thing we check is the trade-off between feature engineering that do not respond to rational thinking provided by subject matter experts against the predictive power that that new variable can provide. This is a decision that we consider when we use “auto-encoders”, for example.
  Pepsico Athina Kanioura 588 PepsiCo utilizes user testing that can detect any new bias introduced during end-user application. Additionally, user statistics and surveys reveal insights into how applications are used during production.
  Pepsico Athina Kanioura 640 The developers at Pepsi contemplate intention gaps as they are part of the users. Subject matter experts (Quality Assurance) utilize interviews to understand how they will or are using a solution we built so we can account for potential intention gaps, output misinterpretation and repurposing efforts. Plus, via training and documentation, this risk can be mitigated. We are discussing adding in future releases, generating alerts on tool misused based on anomaly detection to provide additional support.
  Pepsico Athina Kanioura 650 PepsiCo has built numerous dashboards that include explanations of the data into the dashboard itself and tie these to a training that is recorded along with names of who to contact if there are questions. This has been more helpful than most other approaches as it keeps it tied to a person, so it feels personal. In some cases, Pepsi built the AI to restrict users to follow a specific plan and rules that are set in the background and also onboard people with a training and a comprehensive document giving them examples, walk-throughs, and explaining how and when to use the information and when not to.
  Pepsico Athina Kanioura 655 An area of exploration for researchers and companies like Pepsi is counterfactual fairness…
62 National Health Law Program Elizabeth Edwards et al.   See PDF Here
63 Twitter META     See PDF Here
64 One Concern Anand Sampat 550-552 AI practitioners can formally defend their techniques by creating formal peer review opportunities. For private companies taking part in peer reviewed process and bringing a third party set of technical experts (e.g. a Technical Working Group) to review existing methods regularly are two example methods.
65 Parity     See PDF Here

 

Created August 12, 2021, Updated April 5, 2022