Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Lessons Learned from the Consortium: Tool Use in Agent Systems

Today concerted attention across the AI ecosystem focuses on creating effective AI agents. Increasingly capable AI agents promise great opportunities for economic competitiveness, but also require their developers, deployers, and users to manage security and reliability risks. AI agents can perceive and take actions in environments; the leading AI agent paradigm today embeds general-purpose AI models into systems with software scaffolding that enable a model to manipulate tools to take actions beyond simple text output. AI agents are increasingly deployed as experimental products that can build software applications, browse the internet, and more.

To date there has been no attempt to provide a comprehensive taxonomy of these agent tools. Such a taxonomy could enable actors across the AI supply chain to more clearly share information about system capabilities and considerations. For example, this can enable an AI agent developer to share tool capabilities and limitations with downstream developers to create applications that make full use of agent capabilities. It can similarly support third-party researchers and users to report flaws or incidents with categories of AI agent tools. A shared vocabulary can support this communication.

To take steps toward providing this resource, CAISI and NIST hosted an AISIC workshop with approximately 140 experts in January. Below we present lessons learned from the community through that workshop.

Participants identified various approaches to structure a taxonomy of tool use, including:

  1. Functionality-focused: What action(s) does the tool enable?
  2. Access patterns: Can the tools access external resources? May they be configured with write permissions?
  3. Risk-based: How critical is the type of tool-enabled action to realizing possible harms? How severe are the possible harms? Are the actions stateful (i.e., compounding, lingering effects) or stateless? Are they reversible?
  4. Reliability: Can the tool be used with some level of consistency by a given model? Is the tool itself reliable?
  5. Modality: the form in which the tool is used, whether in plain text, via robotics commands, multimodal, or otherwise.
  6. Monitoring: Tools may enable different levels of observability, with some able to leverage existing logs or transcripts, while others require novel approaches to observe the effects of tool-enabled actions.
  7. Autonomy: the extent to which the agent can take initiative or exercise discretion in using the tool without user intervention.

Each approach above has its strengths and weaknesses. Some approaches may invoke things beyond the tool. A risk-based taxonomy, for example, will depend on deployment conditions; access patterns, autonomy, and other approaches may depend on the ways in which a tool is implemented in practice. Rather than homing in on one single taxonomy, workshop participants raised these multiple approaches, and some suggested that multidimensional intersections across multiple taxonomies would be a promising direction for future work. In practice, taxonomies may build upon or otherwise complement each other: One structured around monitoring may be informed by a risk-based taxonomy, which in turn may be informed by functionality-focused or access patterns. Stakeholders may benefit from creating taxonomies of tool use to fit their particular needs.

Consider below two taxonomies that address approaches identified in the AISIC workshop. The first takes a functional approach, categorizing tools by what they enable the model to do. Reflecting workshop discussions, it aims for comprehensive coverage for the categories of tools that enable types of actions. The taxonomy provides a clear baseline that can be expanded or tailored to particular needs. Developers may use these types as a structured way to reason about the capabilities of their agent systems during system development, and to subsequently communicate externally on what types of actions are possible and which ones may be constrained. Indeed, within each category, example tools may be more or less constrained, e.g., sensors may be subject to filters that reduce the risk of indirect prompt injections from search results.

Figure 1. Functionality-Oriented Taxonomy of Tools in AI Agent Systems

PurposeTypeExamples
Perception: ways a model may perceive the environmentSensorsInternal database, monitoring, diagnostics, GUI, voice, internet search, physical world
Reasoning: ways that a model may reason beyond inferencePlanningTask-decomposition, path-finding models
AnalysisScratchpads, calculators, simulations
Resource managementMemory, self-management
Action: ways that a model may directly affect the environmentAuthenticationLogin, CAPTCHA, wallet
Computer useApplication-specific GUI interaction, website interactions, computer use
Running codeSandboxed code interpreter, IDE, file operations, code execution
Software extensionsCalendar, social media API
Physical extensionsRobotic arm, laboratory tools in factory setting, robot in an open environment
Human interactionPhone calls
Agent interactionMulti-agent simulation, sub-agents that can interact with outside world, third-party agent interactions

The second taxonomy addresses the constraints that may limit the actions possible with tool use. It considers constraints as a function of tool permissions and the action environment. Some tools may enable read-only actions, while others enable (“write”) actions that impact state. In practice, many agent implementations may limit write access by using tools with restricted interactions or constraining otherwise plausibly unlimited tools like code execution. Some agent implementations may access untrusted resources like the open internet, whereas others are designed for deployment in sanitized settings. This taxonomy reflects access patterns raised in the workshop. Stakeholders may deepen these categories with additional gradations of “write” permissions or trust levels. Such efforts, together with specific knowledge of deployments, may be useful to develop a specific agent risk taxonomy or to perform a risk assessment, complementing additional resources like NIST AI 600-1.

Figure 2. Taxonomy of Constrained Tool Access Patterns with Example AI Agent Systems

Tool Permissions/ EnvironmentRead Only(Constrained) WriteWrite
Trusted EnvironmentsRAGApplication-specific GUI or API useCoding agent in a trusted repository
Untrusted EnvironmentsDeep researchBrowser useComputer use

The Consortium contributed valuable expertise to shape these taxonomies and identify additional approaches that can be expanded upon in the future. We invite your feedback and encourage you to adapt and expand upon these findings to serve your needs. Tool taxonomies are one method to improve transparency on capabilities and deployments along the AI agent value chain. We welcome your engagement as we evaluate how best to support stakeholders in AI agent development and deployment. You can share comments via email to CAISI-agents [at] nist.gov.

Released August 5, 2025, Updated August 7, 2025
Was this page helpful?