Llama 3.3

目的与范围

本文档提供了Llama 3.3的技术文档，这是一种多语言大型语言模型，具有改进的多模态功能、高级文本生成能力和增强的工具调用功能。Llama 3.3 提供 70B 参数尺寸，并支持 128K token 的上下文长度。本页面重点介绍模型的架构、提示格式、工具调用能力和使用模式。有关其他 Llama 模型的信息，请参阅Llama 模型版本，有关多模态功能的具体细节，请参阅多模态功能。

模型概述

Llama 3.3 是 Meta 的多语言文本生成模型，它在 Llama 3.2 的基础上构建，同时提供了改进的多模态支持。它专为跨多种语言的商业和研究用途而设计，并支持工具调用和代码解释等高级功能。

关键规格

参数	值
参数大小	70B
上下文长度	128K token
训练数据	15+ 万亿 token
知识截止日期	2023年12月
支持的语言	英语、德语、法语、意大利语、葡萄牙语、印地语、西班牙语、泰语
模型架构	带有分组查询注意力 (GQA) 的 Transformer
输入模态	多语言文本
输出模态	多语言文本和代码

来源：models/llama3_3/MODEL_CARD.md8-11

模型架构和功能

Llama 3.3 采用优化的 Transformer 架构和分组查询注意力（Grouped-Query Attention），以实现高效推理。该模型通过预训练和指令微调（SFT 和 RLHF）相结合的方式进行训练，以使其与人类在有用性和安全性方面的偏好保持一致。

来源：models/llama3_3/MODEL_CARD.md7-11

特殊标记和角色

Llama 3.3 使用一组特殊标记来构建提示并控制生成行为。理解这些标记对于有效使用模型至关重要。

特殊标记

标记	描述
`<\|begin_of_text\|>`	指定提示的开始
`<\|end_of_text\|>`	模型将停止生成更多 token（仅限基础模型）
`<\|finetune_right_pad_id\|>`	用于填充文本序列
`<\|start_header_id\|>` 和 `<\|end_header_id\|>`	包含特定消息的角色
`<\|eom_id\|>`	消息结束（用于工具交互）
`<\|eot_id\|>`	回合结束（表示响应完成）
`<\|python_tag\|>`	用于工具调用的特殊标签

支持的角色

Llama 3.3 支持四种不同的对话角色

system: 为模型设置上下文和指导原则
user: 表示人类对模型的输入
tool: 包含工具调用的输出
assistant: 表示模型的响应

来源：models/llama3_3/prompt_format.md5-23

提示格式

Llama 3.3 支持根据不同用例使用不同的提示格式。以下是主要格式

基础模型文本补全

用于基础模型的简单文本补全

<|begin_of_text|>Color of sky is blue but sometimes can also be

模型将根据提示继续生成文本。

指令模型对话

用于用户和助手之间的结构化对话

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>

Answer who are you in the form of jeopardy?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

模型会返回相应内容，后跟 <|eot_id|>。

来源：models/llama3_3/prompt_format.md25-66

工具调用能力

Llama 3.3 提供高级工具调用功能，允许与外部函数和服务集成。工具调用有多种方法

零样本函数调用

对于无需预定义工具的灵活函数调用，可以在系统或用户消息中定义函数

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are an expert in composing functions. You are given a question and a set of possible functions.
Based on the question, you will need to make one or more function/tool calls to achieve the purpose.
...
[
    {
        "name": "get_weather",
        "description": "Get weather info for places",
        "parameters": {
            "type": "dict",
            "required": [
                "city"
            ],
            "properties": {
                "city": {
                    "type": "string",
                    "description": "The name of the city to get the weather for"
                },
                "metric": {
                    "type": "string",
                    "description": "The metric for weather. Options are: celsius, fahrenheit",
                    "default": "celsius"
                }
            }
        }
    }
]<|eot_id|><|start_header_id|>user<|end_header_id|>

What is the weather in SF and Seattle?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

模型以以下格式响应函数调用

[get_weather(city='San Francisco', metric='celsius'), get_weather(city='Seattle', metric='celsius')]<|eot_id|>

内置工具调用

Llama 3.3 支持三种内置工具

Brave Search：用于网页搜索
Wolfram Alpha：用于复杂数学计算
代码解释器：用于执行 Python 代码

这些可以通过系统提示启用

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython
Tools: brave_search, wolfram_alpha
...
<|eot_id|><|start_header_id|>user<|end_header_id|>

Search the web for the latest price of 1oz gold?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

模型以工具调用格式响应

<|python_tag|>brave_search.call(query="latest price of 1oz gold")<|eom_id|>

来源：models/llama3_3/prompt_format.md73-272

基于 JSON 的工具调用

Llama 3.3 也可以输出 JSON 格式的工具调用

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython

Cutting Knowledge Date: December 2023
Today Date: 21 September 2024

You are a helpful assistant.
<|eot_id|><|start_header_id|>user<|end_header_id|>

Answer the user's question by making use of the following functions if needed.
If none of the function can be used, please say so.
Here is a list of functions in JSON format:
{
    "type": "function",
    "function": {
        "name": "trending_songs",
        "description": "Returns the trending songs on a Music site",
        "parameters": {
            "type": "object",
            "properties": [
                {
                    "n": {
                        "type": "object",
                        "description": "The number of songs to return"
                    }
                },
                {
                    "genre": {
                        "type": "object",
                        "description": "The genre of the songs to return"
                    }
                }
            ],
            "required": ["n"]
        }
    }
}

Return function calls in JSON format.<|eot_id|><|start_header_id|>user<|end_header_id|>

Use tools to get latest trending songs<|eot_id|><|start_header_id|>assistant<|end_header_id|>

模型响应为

<|python_tag|>{
    "type": "function",
    "name": "trending_songs",
    "parameters": {
        "n": "10",
        "genre": "all"
    }
}<|eom_id|>

来源：models/llama3_3/prompt_format.md346-422

基于自定义函数标签的工具调用

用于自定义工具调用格式

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython

Cutting Knowledge Date: December 2023
Today Date: 21 September 2024

You are a helpful assistant.
<|eot_id|><|start_header_id|>user<|end_header_id|>

You have access to the following functions:

Use the function 'trending_songs' to 'Returns the trending songs on a Music site':
{"name": "trending_songs", "description": "Returns the trending songs on a Music site", "parameters": {"genre": {"description": "The genre of the songs to return", "param_type": "str", "required": false}, "n": {"description": "The number of songs to return", "param_type": "int", "required": true}}}

Think very carefully before calling functions.
If you choose to call a function ONLY reply in the following format with no prefix or suffix:

<function=example_function_name>{"example_name": "example_value"}</function>
...

模型响应为

<function=trending_songs>{"n": 10}</function><|eot_id|>

来源：models/llama3_3/prompt_format.md424-477

完整的工具交互周期

与工具的完整交互周期涉及用户、助手和工具之间的多条消息

该周期允许模型在生成最终响应之前使用工具回答问题、执行计算或检索信息。

来源：models/llama3_3/prompt_format.md274-343

代码解释

Llama 3.3 包含一个内置的代码解释器功能，可以通过系统提示 Environment: ipython 激活。这使得模型能够生成和执行 Python 代码

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython<|eot_id|><|start_header_id|>user<|end_header_id|>

Write code to check if number is prime, use that to see if the number 7 is prime<|eot_id|><|start_header_id|>assistant<|end_header_id|>

模型响应为可执行的 Python 代码

<|python_tag|>def is_prime(n):
    if n <= 1
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

print(is_prime(7))  # Output: True<|eom_id|>

来源：models/llama3_3/prompt_format.md239-272

性能与基准测试

Llama 3.3 在各种基准测试中表现出强大的性能，尤其是在多语言能力、代码生成和数学推理方面。

基准比较

类别	基准测试	指标	Llama 3.1 70B Instruct	Llama 3.3 70B Instruct	Llama 3.1 405B Instruct
通用	MMLU (CoT)	macro_avg/acc	86.0	86.0	88.6
可控性	IFEval	-	87.5	92.1	88.6
推理	GPQA Diamond (CoT)	acc	48.0	50.5	49.0
代码	HumanEval	pass@1	80.5	88.4	89.0
数学	MATH (CoT)	sympy_intersection_score	68.0	77.0	73.8
多语言	MGSM	em	86.9	91.1	91.6

Llama 3.3 70B 在可控性 (IFEval)、代码生成 (HumanEval)、数学推理 (MATH) 和多语言能力 (MGSM) 方面比 Llama 3.1 70B 有显著改进。

来源：models/llama3_3/MODEL_CARD.md55-74

系统集成架构

下图展示了 Llama 3.3 如何与应用程序中的各种组件集成

该架构展示了从用户输入到模型，再到直接文本响应或可能需要额外处理才能生成最终输出的工具调用的流程。

来源：models/llama3_3/prompt_format.md

限制和注意事项

尽管 Llama 3.3 提供了高级功能，但在使用该模型时仍有几个重要注意事项

知识截止日期：模型的知识仅限于 2023 年 12 月之前的信息。
语言支持：主要支持 8 种语言（英语、德语、法语、意大利语、葡萄牙语、印地语、西班牙语和泰语）。
安全措施：模型包含安全微调，但部署时应配备额外的系统级安全防护。
工具使用责任：将模型与工具集成的开发人员应实施适当的安全措施并评估第三方服务。
许可限制：使用受 Llama 3.3 社区许可协议和可接受使用政策的约束。

来源：models/llama3_3/MODEL_CARD.md94-109 models/llama3_3/USE_POLICY.md1-72

结论

Llama 3.3 代表了 Meta 在大型语言模型能力方面的持续进步，在多语言支持、工具调用以及编码和数学推理等专业任务方面提供了改进。其灵活的架构和全面的提示格式能够支持广泛的应用，同时在基准测试任务中保持强大的性能。