结果格式化

介绍

Whisper 中的结果格式化子系统负责将结构化的转录输出转换为适合不同用例的各种文件格式。此组件是转录管道的最后阶段，它接收带有时间戳的处理过的片段，并将它们转换为标准化格式，以便其他工具可以轻松使用或显示给用户。

有关整体转录过程的信息，请参阅转录管道。

来源： whisper/utils.py85-105

架构概述

结果格式化系统围绕一个类层次结构构建，该结构具有处理不同输出格式的通用接口。

来源： whisper/utils.py85-293

Writer 选择流程

来源： whisper/utils.py296-318

Base ResultWriter 类

The ResultWriter class serves as the foundation for all format-specific writers. It defines a common interface for writing transcription results to files.

主要功能

Takes an output directory in the constructor
Provides a callable interface via __call__() that
- Determines the output filename based on the input audio file
- Opens the output file with the appropriate extension
- Calls the format-specific write_result() method
Defines an abstract write_result() method to be implemented by subclasses

来源： whisper/utils.py85-106

可用的输出格式

Whisper 支持以下输出格式

格式	类	扩展	描述	用例
文本	`WriteTXT`	`.txt`	纯文本转录	简单的可读性
WebVTT	`WriteVTT`	`.vtt`	Web 视频文本轨道	Web 视频字幕
SRT	`WriteSRT`	`.srt`	SubRip 文本	视频播放器字幕
TSV	`WriteTSV`	`.tsv`	制表符分隔值	数据分析，电子表格
JSON	`WriteJSON`	`.json`	JavaScript 对象表示法	程序化访问，API 响应

WriteTXT

最简单的格式，仅输出片段的文本内容，每段占一行。

来源： whisper/utils.py109-117

字幕格式（VTT 和 SRT）

两种字幕格式均源自 SubtitlesWriter 抽象类，该类提供了通用功能

具有可配置选项的时间戳格式化
单词级别高亮显示支持
具有可配置参数的多行处理
- 最大行宽
- 最大行数
- 每行单词数

VTT 和 SRT 之间的主要区别

VTT:
- 时间戳中使用小数点（例如，00:01:23.456）
- 小时部分在为零时是可选的
- 包括 WEBVTT 标题
SRT:
- 使用逗号作为小数点（例如，00:01:23,456）
- 始终包含小时部分
- 包含每个字幕的序号

来源： whisper/utils.py119-262

WriteTSV

输出开始时间、结束时间和文本的结构化格式，以制表符分隔的列

开始和结束时间以毫秒为单位
包括标题行
文本中的制表符替换为空格

此格式对于在其他应用程序中解析或导入电子表格特别有用。

来源： whisper/utils.py265-284

WriteJSON

以 JSON 格式输出完整的转录结果，保留原始转录结果中的所有信息。

来源： whisper/utils.py287-293

实现细节

时间戳格式化

The system converts floating-point seconds into formatted timestamps through the format_timestamp() function

来源： whisper/utils.py50-68

字幕生成

The iterate_result() method in SubtitlesWriter handles the complex logic of transforming segments into properly formatted subtitles

Processes segments with word-level timestamps (if available)
Formats each segment or word according to the output format's requirements
Applies constraints for line width and word count
Creates line and subtitle breaks based on timing and space constraints
Optionally highlights individual words in their respective timestamps

来源： whisper/utils.py119-228

Writer 选择

The get_writer() function creates the appropriate writer instance based on the requested output format

Takes output_format and output_dir parameters
Returns a callable writer function
Supports a special "all" format that writes to all available formats

来源： whisper/utils.py296-318

使用示例

Here's how the result formatting system is typically used within the transcription pipeline

Common options for subtitle writers include

max_line_width: Maximum characters per line
max_line_count: Maximum lines per subtitle
highlight_words: Whether to highlight individual words in timestamps

来源： whisper/utils.py296-318

特殊处理

单词级时间戳

When word-level timestamps are available, subtitle writers can provide more granular timing information

Words can be grouped based on timing patterns
Long pauses can trigger new subtitles
Individual words can be highlighted as they appear in subtitle formats that support it (e.g., WebVTT)

来源： whisper/utils.py119-228

字符编码安全

The system includes safeguards for character encoding issues

Checks system default encoding
Uses fallback mechanism for non-UTF-8 systems
Replaces problematic characters to avoid encoding errors

来源： whisper/utils.py8-21

结果格式化

介绍

架构概述

Writer 选择流程

Base ResultWriter 类

可用的输出格式

WriteTXT

字幕格式（VTT 和 SRT）

WriteTSV

WriteJSON

实现细节

时间戳格式化

字幕生成

Writer 选择

使用示例

特殊处理

单词级时间戳

字符编码安全

本页内容