工具调用(又称函数调用)
了解如何使用 Firebase AI Logic SDK 实现工具调用、管理智能体循环,并融入人机回圈交互。
诚然,LLM 的训练数据本质上覆盖整个互联网,但它们并非无所不知。
它们知道训练当日公开互联网上的内容,但不知道更晚的信息;不知道你或组织的私有信息;即使已知内容也易与其他知识纠缠。
在这些场景及许多其他情况下,我们常向 LLM 提供一个或多个工具。
工具的定义
#
工具由名称、描述以及 LLM「调用」工具时输入数据格式的 JSON schema 组成。例如,若我们提示 LLM「减少 Grandma's All America Breakfast 食谱中的碳水」,它不知道奶奶的菜谱是什么,除非我们提供接受查询字符串的 lookupRecipe 工具来查找食谱。
概念上,工具是我们在 LLM 需要该数据或服务时供其调用的东西。 LLM 调用工具的方式是:以表示「工具调用」的特殊格式消息响应应用请求;工具调用消息包含工具名称与 JSON 参数。应用处理工具调用,将结果打包进另一条 LLM 请求,LLM 再据此响应。
这可能持续多轮。
应用可为模型实例配置任意数量的工具(尽管 LLM 在较小、目标明确且功能不重叠的工具集上表现更好)。 LLM 可在响应中打包任意数量的工具调用,也可在请求中接收任意数量的工具结果。 LLM 通过构成请求/响应对历史的消息栈,整合提示词与工具调用结果的多轮往返。
工具调用结束后,LLM 返回最终响应,例如「这是 Grandma's All American Breakfast 食谱的高蛋白低碳水版本……」。
Gemini 函数
#在 Firebase AI Logic SDK 中,工具称为「function(函数)」,但含义相同。在示例中,线索求解模型配置了查找单词详情的函数。若 LLM 需要单词详情辅助求解,调用该函数可从 Free Dictionary API 获取数据:
[
{
"word": "tool",
"phonetic": "/tuːl/",
"phonetics": [
{
"text": "/tuːl/",
"audio": "https://api.dictionaryapi.dev/media/pronunciations/en/tool-uk.mp3",
"sourceUrl": "https://commons.wikimedia.org/w/index.php?curid=94709459",
"license": {
"name": "BY-SA 4.0",
"url": "https://creativecommons.org/licenses/by-sa/4.0"
}
}
],
"meanings": [
{
"partOfSpeech": "noun",
"definitions": [
{
"definition": "A mechanical device intended to make a task easier.",
"synonyms": [],
"antonyms": [],
"example": "Hand me that tool, would you? I don't have the right tools to start fiddling around with the engine."
},
...
应用中有执行查找的 Dart 函数:
// Look up the metadata for a word in the dictionary API.
Future<Map<String, dynamic>> _getWordMetadataFromApi(String word) async {
final url = Uri.parse(
'https://api.dictionaryapi.dev/api/v2/entries/en/${Uri.encodeComponent(word)}',
);
final response = await http.get(url);
return response.statusCode == 200
? {'result': jsonDecode(response.body)}
: {'error': 'Could not find a definition for "$word".'};
}
模型在初始化时配置了该查找函数:
// The model for solving clues.
_clueSolverModel = FirebaseAI.googleAI().generativeModel(
model: 'gemini-2.5-flash',
systemInstruction: Content.text(clueSolverSystemInstruction),
tools: [
Tool.functionDeclarations([
FunctionDeclaration(
'getWordMetadata',
'Gets grammatical metadata for a word, like its part of speech. '
'Best used to verify a candidate answer against a clue that implies a '
'grammatical constraint.',
parameters: {
'word': Schema(SchemaType.string, description: 'The word to look up.'),
},
),
]),
],
);
为可靠起见,在系统指令中列出工具也是好主意:
static String get clueSolverSystemInstruction =>
'''
You are an expert crossword puzzle solver.
...
### Tool: `getWordMetadata`
You have a tool to get grammatical information about a word.
**When to use:**
- This tool is most helpful as a verification step after you have a likely answer.
- Consider using this tool when a clue contains a grammatical hint that could be ambiguous.
- **Good candidates for verification:**
- Clues that seem to be verbs (e.g., "To run," "Waving").
- Clues that are adverbs (e.g., "Happily," "Quickly").
- Clues that specify a plural form.
- **Try to avoid using the tool for:**
- Simple definitions (e.g., "A small dog").
- Fill-in-the-blank clues (e.g., "___ and flow").
- Proper nouns (e.g., "Capital of France").
**Function signature:**
```json
${jsonEncode(_getWordMetadataFunction.toJson())}
```
''';
应用发起请求时,模型在认为有帮助时可使用该工具。要支持工具调用,需实现智能体循环。
智能体循环
#
LLM 在功能上是无状态的,意味着每次请求都必须提供其所需的全部数据。若请求仅含提示词及附带文件,Firebase AI Logic SDK 在模型实例上提供 generateContent 方法。
然而,工具调用需要由初始提示词以及构成工具调用与工具结果的响应/请求对组成的消息历史。为此 Firebase AI Logic 提供用于收集历史的「chat」对象。我们用它构建智能体循环:
-
启动 chat,在多个请求/响应对之间保存消息历史
收集其提供的任意工具调用的工具结果
将工具结果打包进新请求
循环直到模型返回不含工具调用的响应
返回所有响应累积的文本
以下算法以 GenerativeModel 类的扩展方法表达,可像调用 generateContent 一样调用:
extension on GenerativeModel {
Future<String> generateContentWithFunctions({
required String prompt,
required Future<Map<String, dynamic>> Function(FunctionCall) onFunctionCall,
}) async {
// Use a chat session to support multiple request/response pairs, which is
// needed to support function calls.
final chat = startChat();
final buffer = StringBuffer();
var response = await chat.sendMessage(Content.text(prompt));
while (true) {
// Append the response text to the buffer.
buffer.write(response.text ?? '');
// If no function calls were collected, we're done
if (response.functionCalls.isEmpty) break;
// Append a newline to separate responses.
buffer.write('\n');
// Execute all function calls
final functionResponses = <FunctionResponse>[];
for (final functionCall in response.functionCalls) {
try {
functionResponses.add(
FunctionResponse(
functionCall.name,
await onFunctionCall(functionCall),
),
);
} catch (ex) {
functionResponses.add(
FunctionResponse(functionCall.name, {'error': ex.toString()}),
);
}
}
// Get the next response stream with function results
response = await chat.sendMessage(
Content.functionResponses(functionResponses),
);
}
return buffer.toString();
}
}
该方法接收提示词与处理具体工具调用的回调;示例用它处理单词查找函数:
await _clueSolverModel.generateContentWithFunctions(
prompt: getSolverPrompt(clue, length, pattern),
onFunctionCall: (functionCall) async => switch (functionCall.name) {
'getWordMetadata' => await _getWordMetadataFromApi(
functionCall.args['word'] as String,
),
_ => throw Exception('Unknown function call: ${functionCall.name}'),
},
);
结构化输出使 LLM 便于编程对接,而工具将 LLM 变为「智能体 (agent)」(详见交互模式一节)。
结构化输出与工具调用
#结合结构化输出与工具调用是强大组合。在示例中,线索求解器有查找单词详情的工具,还被要求返回捆绑解法与置信度的 JSON,二者显示在应用任务列表中:
遗憾的是,截至本文撰写时,在 Firebase AI Logic SDK 中同时启用结构化输出与函数会抛出异常:
Function calling with a response mime type: 'application/json' is unsupported
作为(希望临时的)变通,示例移除结构化输出配置,改用名为 returnResult 的工具模拟结构化输出:
// The model for solving clues.
_clueSolverModel = FirebaseAI.googleAI().generativeModel(
model: 'gemini-2.5-flash',
systemInstruction: Content.text(clueSolverSystemInstruction),
tools: [
Tool.functionDeclarations([
...,
FunctionDeclaration(
'returnResult',
'Returns the final result of the clue solving process.',
parameters: {
'answer': Schema(
SchemaType.string,
description: 'The answer to the clue.',
),
'confidence': Schema(
SchemaType.number,
description: 'The confidence score in the answer from 0.0 to 1.0.',
),
},
),
]),
],
);
returnResult 也在系统指令中说明:
static String get clueSolverSystemInstruction =>
'''
You are an expert crossword puzzle solver.
...
### Tool: `returnResult`
You have a tool to return the final result of the clue solving process.
**When to use:**
- Use this tool when you have a final answer and confidence score to return. You
must use this tool exactly once, and only once, to return the final result.
**Function signature:**
```json
${jsonEncode(_returnResultFunction.toJson())}
```
''';
模型调用 returnResult 时,示例缓存结果,
solveClue 在调用 generateContentWithFunctions 后读取:
// Buffer for the result of the clue solving process.
final _returnResult = <String, dynamic>{};
// Cache the return result of the clue solving process via a function call.
// This is how we get JSON responses from the model with functions, since the
// model cannot return JSON directly when tools are used.
Map<String, dynamic> _cacheReturnResult(Map<String, dynamic> returnResult) {
assert(_returnResult.isEmpty);
_returnResult.addAll(returnResult);
return {'status': 'success'};
}
Future<ClueAnswer?> solveClue(Clue clue, int length, String pattern) async {
// Clear the return result cache; this is where the result will be stored.
_returnResult.clear();
// Generate JSON response with functions and schema.
await _clueSolverModel.generateContentWithFunctions(
prompt: getSolverPrompt(clue, length, pattern),
onFunctionCall: (functionCall) async => switch (functionCall.name) {
'getWordMetadata' => ...,
'returnResult' => _cacheReturnResult(functionCall.args),
_ => throw Exception('Unknown function call: ${functionCall.name}'),
},
);
// Use the structured output that the LLM has called function with
assert(_returnResult.isNotEmpty);
return ClueAnswer(
answer: _returnResult['answer'] as String,
confidence: (_returnResult['confidence'] as num).toDouble(),
);
}
用 Firebase AI Logic 同时获得结构化输出与工具调用需要多费些功夫,但结果值得。
人机回圈 (Human in the loop)
#到目前为止,工具用于收集数据与格式化输出。也可用它让人类参与。
例如,示例有时传入解法应匹配的模式(如 "_R_Y"),模型却想建议不匹配的模式(如 "RENT")。此类冲突适合向用户求助:
这称为「人机回圈(human in the loop)」,是人类与 LLM 协作的又一方式。 Flutter 与 Firebase AI Logic SDK 使其实现简单。首先,示例定义函数并配置模型:
// The new function to let the LLM resolve solution conflicts
static final _resolveConflictFunction = FunctionDeclaration(
'resolveConflict',
'Asks the user to resolve a conflict between the letter pattern and the '
'proposed answer. Use this BEFORE calling returnResult if the answer you '
'want to propose does not match the letter pattern.',
parameters: {
'proposedAnswer': Schema(
SchemaType.string,
description: 'The answer the LLM wants to suggest.',
),
'pattern': Schema(
SchemaType.string,
description: 'The current letter pattern from the grid.',
),
'clue': Schema(SchemaType.string, description: 'The clue text.'),
},
);
// Pass the new tool to the model for solving clues.
final _clueSolverModel = FirebaseAI.googleAI().generativeModel(
model: 'gemini-2.5-flash',
systemInstruction: Content.text(clueSolverSystemInstruction),
tools: [
Tool.functionDeclarations([
...
_resolveConflictFunction,
]),
],
);
// Let the LLM know that it has a new tool.
static String get clueSolverSystemInstruction =>
'''
You are an expert crossword puzzle solver.
...
### Tool: `resolveConflict`
You have a tool to ask the user to resolve a conflict.
**When to use:**
- Use this tool **BEFORE** `returnResult` if your proposed answer conflicts with the provided letter pattern.
- For example, if the pattern is `_ R _ Y` and you want to suggest `RENT` (which fits the clue), there is a conflict at the second letter (`R` vs `E`). You should call `resolveConflict(proposedAnswer: "RENT", pattern: "_ R _ Y", clue: "...")`.
- The tool will return the user's decision (either your proposed answer or a new one). You should then use that result to call `returnResult`.
**Function signature:**
```json
${jsonEncode(_resolveConflictFunction.toJson())}
```
''';
模型发现冲突时会调用该工具:
// handle the LLM's request to resolve the conflict
await _clueSolverModel.generateContentWithFunctions(
prompt: getSolverPrompt(clue, length, pattern),
onFunctionCall: (functionCall) async => switch (functionCall.name) {
...
'resolveConflict' => await _handleResolveConflict(
functionCall.args,
onConflict,
),
},
);
// Show the dialog to gather the user's input
Future<Map<String, dynamic>> _handleResolveConflict(
Map<String, dynamic> args,
Future<String> Function(String clue, String proposedAnswer, String pattern)?
onConflict,
) async {
final proposedAnswer = args['proposedAnswer'] as String;
final pattern = args['pattern'] as String;
final clue = args['clue'] as String;
if (onConflict != null) {
final result = await onConflict(clue, proposedAnswer, pattern);
return {'result': result};
}
return {'result': proposedAnswer};
}
示例通过 onConflict 实现处理该工具,调用 showDialog 收集用户数据。这一切发生在智能体循环中间,但没问题——模型并未等待;它已向应用的初始请求返回响应。用户可慢慢操作 UI,示例等待 showDialog 返回的 Future。完成后,模型借助消息历史与最近请求(此处为与用户交互收集的数据)从断点继续。
模态对话框是人机回圈 (human in the loop) 的简单方式,但不是 Flutter 的唯一方式。若你愿意,Completer
实例可让应用进入「向用户收集数据」模式;有数据后对 Completer 调用 complete 并恢复智能体循环。
或者,既然你拥有智能体循环,可检查对「特殊」函数的调用——表示需向用户收集数据。此类特殊函数有时称为「interrupt(中断)」,获得用户数据后你「resume(恢复)」与模型的对话。
请记住 LLM 是无状态的。它不会等你,因此可按最适合应用的方式处理智能体循环。你可随时带着更新的消息历史与新提示词回到 LLM,无论间隔一分钟还是一个月。
除非另有说明,本文档之所提及适用于 Flutter 3.44.0 版本。本页面最后更新时间:2026-06-12。查看文档源码 或者 为本页面内容提出建议。