行动执行服务器

架构概述

Action Execution Server 设计为在沙箱环境（通常是 Docker 容器）内运行，并通过 HTTP 与 OpenHands 后端通信。该服务器公开处理操作执行请求并返回观察结果的端点。

来源： openhands/runtime/action_execution_server.py1-150 openhands/runtime/impl/action_execution/action_execution_client.py58-96

关键组件

ActionExecutor

ActionExecutor 类是负责在沙箱内处理操作执行的核心组件。它维护执行环境状态并提供执行不同类型操作的方法。

来源： openhands/runtime/action_execution_server.py166-618

ActionExecutionClient

ActionExecutionClient 类是用于与 Action Execution Server 通信的运行时实现的基类。它负责发送 HTTP 请求、序列化操作和反序列化观察结果。

来源： openhands/runtime/impl/action_execution/action_execution_client.py59-488

服务器设置和初始化

Action Execution Server 实现为 FastAPI 应用程序，在启动生命周期中初始化一个 ActionExecutor 实例。服务器在指定的端口上运行，并公开用于操作执行和文件操作的端点。

服务器启动时

它解析命令行参数（端口、工作目录、插件等）
在 FastAPI 生命周期上下文中创建 ActionExecutor 实例
使用插件、bash 会话、浏览器环境等初始化执行器
开始侦听 HTTP 请求

来源： openhands/runtime/action_execution_server.py533-581 openhands/runtime/action_execution_server.py568-580

支持的操作类型

Action Execution Server 支持多种操作类型

操作类型	描述	实现方法
`run`	执行 shell 命令	`run(action: CmdRunAction)`
`run_ipython`	在 IPython 中执行 Python 代码	`run_ipython(action: IPythonRunCellAction)`
`read`	读取文件内容	`read(action: FileReadAction)`
`write`	写入文件	`write(action: FileWriteAction)`
`edit`	通过更高级的操作编辑文件	`edit(action: FileEditAction)`
`browse`	与网页交互	`browse(action: BrowseURLAction)`
`browse_interactive`	交互式网页浏览	`browse_interactive(action: BrowseInteractiveAction)`
`call_tool_mcp`	执行 MCP（模型控制协议）工具调用	`call_tool_mcp(action: MCPAction)`

每种操作类型在 ActionExecutor 类中都有相应的实现方法，该方法处理特定操作并返回适当的观察结果。

平台注意事项：由于系统限制，浏览器功能和 MCP 功能在 Windows 平台上被禁用。

来源： openhands/runtime/action_execution_server.py384-611 openhands/runtime/impl/action_execution/action_execution_client.py333-352

HTTP API 端点

Action Execution Server 公开多个 HTTP API 端点

来源： openhands/runtime/action_execution_server.py790-1039

核心端点：`/execute_action`

主要的 API 端点是 /execute_action，它接受操作请求、执行操作并返回结果观察。

POST /execute_action
{
    "action": {
        "type": "run",
        "command": "ls -la",
        "timeout": 30
    }
}

服务器反序列化操作，通过 ActionExecutor 执行操作，并返回序列化的观察结果。

来源： openhands/runtime/action_execution_server.py639-655

文件系统端点

多个端点促进文件操作

/list_files - 列出指定目录中的文件
/upload_file - 将文件上传到沙箱
/download_files - 从沙箱下载文件
/view - 生成用于查看文件内容的 HTML

这些端点允许 OpenHands 前端在沙箱中显示和交互文件。

来源： openhands/runtime/action_execution_server.py657-888 server/routes/files.py1-284

插件系统

Action Execution Server 支持插件以扩展其功能。插件在服务器启动时初始化，并可以提供额外的执行环境，例如 Jupyter notebook 或 VS Code 集成。

插件在服务器初始化期间指定，并在 ActionExecutor 的 ainit() 方法中加载。

来源： openhands/runtime/action_execution_server.py261-299

前端集成

OpenHands 前端通过多种机制与 Action Execution Server 交互

文件系统集成

前端通过 Action Execution Server 的文件端点访问文件系统操作

Git 集成

像 FileDiffViewer 这样的前端组件利用 Action Execution Server 的命令执行能力来执行与 git 相关的操作，以显示文件更改和 diff。

来源： frontend/src/components/features/diff-viewer/file-diff-viewer.tsx62-71 openhands/runtime/action_execution_server.py790-1039

安全考量

Action Execution Server 实施了多项安全措施

API 密钥认证：服务器可以配置会话 API 密钥，客户端必须在 X-Session-API-Key 标头中提供。
沙箱隔离：服务器在沙箱环境（通常是 Docker 容器）内运行，限制了代码执行的影响。
文件访问控制：文件操作是相对于工作目录执行的，以防止访问主机上的任意文件。
内存监控：服务器可以监控内存使用情况并强制执行限制，以防止资源耗尽。
超时执行：操作具有可配置的超时，以防止挂起或资源密集型操作。

来源： openhands/runtime/action_execution_server.py77-84 openhands/runtime/action_execution_server.py172-186

与运行时系统的集成

Action Execution Server 由各种运行时实现实例化。这些运行时使用 ActionExecutionClient 基类与服务器通信。

运行时实现职责

每个运行时实现负责

服务器生命周期管理：在适当的环境中启动 Action Execution Server
通信设置：提供服务器的 URL 和身份验证
环境配置：设置包含所需插件和配置的沙箱环境
MCP 配置：管理用于工具集成的 MCP 服务器配置
资源管理：处理清理和资源处置

特殊情况

LocalRuntime：使用 Poetry 将服务器作为子进程在本地机器上启动
CLIRuntime：直接实现操作，不使用 Action Execution Server
Server-based Runtimes：在容器或远程环境中启动服务器

来源：openhands/runtime/impl/action_execution/action_execution_client.py59-488 openhands/runtime/impl/local/local_runtime.py125-559 openhands/runtime/impl/cli/cli_runtime.py54-776

结论

Action Execution Server 是 OpenHands 运行时系统的关键组件，它支持在隔离环境中安全地执行代理操作。它提供了一个标准化的 HTTP API 用于操作执行和文件操作，支持插件以扩展功能，并包含安全措施以防止未经授权的访问和资源滥用。

该服务器的模块化设计使其能够被不同的运行时实现所使用，成为在各种环境中执行代理操作的灵活而强大的工具。