Conversation
|
@dmytrostruk please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement ( “Agreement” ) is agreed to by the party signing below ( “You” ), 1. Definitions. “Code” means the computer software code, whether in human-readable or machine-executable form, “Project” means any of the projects owned or managed by .NET Foundation and offered under a license “Submit” is the act of uploading, submitting, transmitting, or distributing code or other content to any “Submission” means the Code and any other copyrightable material Submitted by You, including any 2. Your Submission. You must agree to the terms of this Agreement before making a Submission to any 3. Originality of Work. You represent that each of Your Submissions is entirely Your 4. Your Employer. References to “employer” in this Agreement include Your employer or anyone else 5. Licenses. a. Copyright License. You grant .NET Foundation, and those who receive the Submission directly b. Patent License. You grant .NET Foundation, and those who receive the Submission directly or c. Other Rights Reserved. Each party reserves all rights not expressly granted in this Agreement. 6. Representations and Warranties. You represent that You are legally entitled to grant the above 7. Notice to .NET Foundation. You agree to notify .NET Foundation in writing of any facts or 8. Information about Submissions. You agree that contributions to Projects and information about 9. Governing Law/Jurisdiction. This Agreement is governed by the laws of the State of Washington, and 10. Entire Agreement/Assignment. This Agreement is the entire agreement between the parties, and .NET Foundation dedicates this Contribution License Agreement to the public domain according to the Creative Commons CC0 1. |
1 similar comment
|
@dmytrostruk please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement ( “Agreement” ) is agreed to by the party signing below ( “You” ), 1. Definitions. “Code” means the computer software code, whether in human-readable or machine-executable form, “Project” means any of the projects owned or managed by .NET Foundation and offered under a license “Submit” is the act of uploading, submitting, transmitting, or distributing code or other content to any “Submission” means the Code and any other copyrightable material Submitted by You, including any 2. Your Submission. You must agree to the terms of this Agreement before making a Submission to any 3. Originality of Work. You represent that each of Your Submissions is entirely Your 4. Your Employer. References to “employer” in this Agreement include Your employer or anyone else 5. Licenses. a. Copyright License. You grant .NET Foundation, and those who receive the Submission directly b. Patent License. You grant .NET Foundation, and those who receive the Submission directly or c. Other Rights Reserved. Each party reserves all rights not expressly granted in this Agreement. 6. Representations and Warranties. You represent that You are legally entitled to grant the above 7. Notice to .NET Foundation. You agree to notify .NET Foundation in writing of any facts or 8. Information about Submissions. You agree that contributions to Projects and information about 9. Governing Law/Jurisdiction. This Agreement is governed by the laws of the State of Washington, and 10. Entire Agreement/Assignment. This Agreement is the entire agreement between the parties, and .NET Foundation dedicates this Contribution License Agreement to the public domain according to the Creative Commons CC0 1. |
There was a problem hiding this comment.
Pull request overview
This PR adds shell tool abstractions to Microsoft.Extensions.AI.Abstractions, enabling AI services to interact with shell commands through both hosted (service-executed) and local (client-executed) patterns. The implementation follows established patterns from similar features like image generation and code interpreter tools.
Changes:
- Added five new experimental types for shell tool interactions: HostedShellTool (marker for service-side execution), ShellTool (abstract base for client-side execution), ShellCallContent (represents shell call requests), ShellResultContent (represents execution results), and ShellCommandOutput (structured output data)
- Registered ShellCallContent and ShellResultContent for polymorphic JSON serialization in AIJsonUtilities
- Added AIShell experimental diagnostic ID to enable [Experimental] marking of all shell-related types
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/Shared/DiagnosticIds/DiagnosticIds.cs | Added AIShell experimental diagnostic ID |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Utilities/AIJsonUtilities.Defaults.cs | Registered ShellCallContent and ShellResultContent for polymorphic serialization with type discriminators "shellCall" and "shellResult" |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Tools/ShellTool.cs | Abstract base class for client-side shell execution, extends AIFunction with fixed name "local_shell" |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Tools/HostedShellTool.cs | Marker tool for service-side shell execution with name "shell", extends AITool |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Contents/ShellResultContent.cs | Represents shell execution results, extends FunctionResultContent with Output and MaxOutputLength properties |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Contents/ShellCommandOutput.cs | Data class for single command output (stdout, stderr, exit code, timeout flag) |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Contents/ShellCallContent.cs | Represents shell call requests, extends FunctionCallContent with Commands, TimeoutMs, MaxOutputLength, and Status properties |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Contents/AIContent.cs | Added commented JsonDerivedType entries for ShellCallContent and ShellResultContent (to be uncommented when no longer experimental) |
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Tools/ShellToolTests.cs | Tests for ShellTool including constructor, properties, and InvokeAsync with ShellResultContent return |
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Tools/HostedShellToolTests.cs | Tests for HostedShellTool constructor and AdditionalProperties handling |
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Contents/ShellResultContentTests.cs | Comprehensive tests including properties, multiple outputs, and polymorphic serialization |
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Contents/ShellCommandOutputTests.cs | Tests for ShellCommandOutput properties and serialization including timeout scenarios |
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Contents/ShellCallContentTests.cs | Tests for ShellCallContent including constructor, properties, and polymorphic serialization |
|
Thanks, @dmytrostruk. We'll want to hold on this until OpenAI 2.9.0 is out and we can better validate the design with the shell tool support it exposes, implementing support for these types in Microsoft.Extensions.AI.OpenAI. |
| /// </para> | ||
| /// </remarks> | ||
| [Experimental(DiagnosticIds.Experiments.AIShell, UrlFormat = DiagnosticIds.UrlFormat)] | ||
| public abstract class ShellTool : AIFunction |
There was a problem hiding this comment.
What is the use of this if there are no concrete implementations? Who is expected to provide derived implementations?
There was a problem hiding this comment.
I think you could make it non-abstract and override InvokeCoreAsync to pick between cmd or sh based on the OS. If someone wanted to use a different shell, they could also override the method.
There was a problem hiding this comment.
@jozkee I'm not sure if we will provide default local implementation on our side, since there are a lot of security related concerns. The expectation is that user will provide the implementation based on their shell and OS preferences, in similar way how they provide the implementation for AIFunction.
There was a problem hiding this comment.
Not providing a default (leaving it abstract) is a pit of failure, doesn't reduce risk and will make that every user reimplements it with potentially less safeguards.
| /// <summary> | ||
| /// Gets or sets the timeout in milliseconds for the shell command execution. | ||
| /// </summary> | ||
| public int? TimeoutMs { get; set; } |
There was a problem hiding this comment.
Assuming we want this property, it should be a TimeSpan named Timeout
| /// Represents the output of a single shell command execution. | ||
| /// </summary> | ||
| [Experimental(DiagnosticIds.Experiments.AIShell, UrlFormat = DiagnosticIds.UrlFormat)] | ||
| public class ShellCommandOutput |
| /// <summary> | ||
| /// Gets or sets the output of the shell command execution. | ||
| /// </summary> | ||
| public IList<ShellCommandOutput>? Output { get; set; } |
There was a problem hiding this comment.
I have an open proposal to have a common type for *CallContents, we are debating whether we should have Outputs in the base ToolResultContent and reconcile with FunctionResultContent.Result, and this seems to support that direction: #7299 (comment).
There was a problem hiding this comment.
However, I'm curious, why did you add this instead of reusing Result?
| /// that the service is allowed to execute shell commands if the service is capable of doing so. | ||
| /// </remarks> | ||
| [Experimental(DiagnosticIds.Experiments.AIShell, UrlFormat = DiagnosticIds.UrlFormat)] | ||
| public class HostedShellTool : AITool |
There was a problem hiding this comment.
Which providers run shell commands "on your behalf" i.e. you don't need to run them locally and send the result?
There was a problem hiding this comment.
There was a problem hiding this comment.
I'm not sure how we would do it but there's also the option to reference an existing container https://developers.openai.com/api/docs/guides/tools-shell/#2-reference-the-container-in-responses.
Additionally, there's multiple knobs available in OpenAI, which I'm unsure if we should consider for the abstraction.
| /// </para> | ||
| /// </remarks> | ||
| [Experimental(DiagnosticIds.Experiments.AIShell, UrlFormat = DiagnosticIds.UrlFormat)] | ||
| public abstract class ShellTool : AIFunction |
There was a problem hiding this comment.
I think you could make it non-abstract and override InvokeCoreAsync to pick between cmd or sh based on the OS. If someone wanted to use a different shell, they could also override the method.
| /// <summary> | ||
| /// Gets or sets the status of the shell call. | ||
| /// </summary> | ||
| public string? Status { get; set; } |
There was a problem hiding this comment.
This seems meaningful only for remote shell execution, is that correct? Also, this seems to indicate that you could receive an in-progress ShellCallContent for a remote shell, without its corresponding ShellResultContent, in that case, I think the leaf IChatClient should set InformationalOnly so a FunctionInvokingChatClient doesn't have to handle it.
Another option could be to add another type HostedShellCallContent : ShellCallContent and move Status there, but that may be overkill.
Added shell tool abstractions to
Microsoft.Extensions.AI.Abstractionsfor representing shell tool interactions with AI services.New types:
Registered ShellCallContent and ShellResultContent for polymorphic JSON serialization. All types are marked [Experimental].
Microsoft Reviewers: Open in CodeFlow