配置参考

配置文件概述

Stable Diffusion 使用 YAML 配置文件来定义模型架构、训练参数和数据处理管道。这些文件提供了一种声明式的方式来指定整个实验设置，而无需修改代码。

来源：configs/latent-diffusion/ffhq-ldm-vq-4.yaml1-85 configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml1-91

模型配置结构

The model section is the most important part of the configuration file. It defines the core architecture and hyperparameters of the Stable Diffusion model.

基本模型参数

关键参数包括：

base_learning_rate: Base learning rate for optimization
target: The Python class path to the model implementation
params: Model-specific parameters
- linear_start and linear_end: Define the noise schedule
- timesteps: Number of diffusion steps
- image_size: Size of the latent representation
- channels: Number of channels in the latent space

来源：configs/latent-diffusion/celebahq-ldm-vq-4.yaml1-14 configs/latent-diffusion/ffhq-ldm-vq-4.yaml1-14

UNet 配置

The UNet is the backbone of the diffusion model, responsible for predicting noise during the denoising process.

关键 UNet 参数

image_size: Spatial dimensions of the input
in_channels and out_channels: Input and output channel dimensions
model_channels: Base channel count for the model
attention_resolutions: At which resolution levels to apply attention
num_res_blocks: Number of residual blocks per resolution level
channel_mult: Channel multipliers for each resolution level

来源：configs/latent-diffusion/ffhq-ldm-vq-4.yaml14-35 configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml29-41

第一阶段模型配置

The first stage model encodes images into the latent space and decodes latents back to images. Stable Diffusion supports two types of first stage models

矢量量化 (VQ) 模型
KL 正则化自编码器

示例 VQ 模型配置

示例 KL 模型配置

来源：configs/latent-diffusion/celebahq-ldm-vq-4.yaml36-58 configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml43-62

条件阶段配置

The conditioning stage defines how conditioning information (like text prompts or class labels) is processed and incorporated into the diffusion model.

条件类型

无条件生成
类别条件
空间条件（例如，用于语义合成）

条件的关键参数

cond_stage_key: Specifies the key for conditioning (e.g., "class_label", "segmentation")
cond_stage_trainable: Whether the conditioning stage is trainable
conditioning_key: How conditioning is incorporated (e.g., "crossattn" for cross-attention)

来源：configs/latent-diffusion/celebahq-ldm-vq-4.yaml59 configs/latent-diffusion/cin-ldm-vq-f8.yaml65-69 models/ldm/semantic_synthesis256/config.yaml54-59

数据配置

The data section defines dataset loading, preprocessing, and batching parameters.

主要参数

batch_size: Number of samples per batch
num_workers: Number of worker processes for data loading
wrap: Whether to wrap the dataset
train and validation: Dataset configurations for training and validation sets
- target: The Python class path to the dataset implementation
- params: Dataset-specific parameters (typically includes size for image resolution)

可用的数据集实现

taming.data.faceshq.CelebAHQTrain / CelebAHQValidation
taming.data.faceshq.FFHQTrain / FFHQValidation
ldm.data.lsun.LSUNBedroomsTrain / LSUNBedroomsValidation
ldm.data.lsun.LSUNChurchesTrain / LSUNChurchesValidation
ldm.data.imagenet.ImageNetTrain / ImageNetValidation

来源：configs/latent-diffusion/celebahq-ldm-vq-4.yaml60-73 configs/latent-diffusion/ffhq-ldm-vq-4.yaml59-72 configs/latent-diffusion/cin-ldm-vq-f8.yaml70-85

Lightning 配置

The lightning section configures PyTorch Lightning's trainer and callbacks, which handle the training loop and logging.

关键组件

callbacks: Definitions of Lightning callbacks
- image_logger: Logs generated images during training
  - batch_frequency: How often to log images
  - max_images: Maximum number of images to log
trainer: Configuration for PyTorch Lightning's Trainer
- benchmark: Whether to use cudnn benchmark for improved speed

来源：configs/latent-diffusion/celebahq-ldm-vq-4.yaml76-85 configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml80-91

学习率调度器

Some configurations include a scheduler for adjusting the learning rate during training

主要参数

warm_up_steps: Number of steps for the warm-up phase
cycle_lengths: Length of scheduler cycles
f_start, f_max, f_min: 启动、最大和最小缩放因子

来源： configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml20-27

特定用例的配置

该存储库包含针对不同用例的几个预定义配置。它们的区别如下：

无条件生成

用于在没有条件的情况下生成图像

使用带有 cond_stage_config: __is_unconditional__ 的配置
示例： celebahq-ldm-vq-4.yaml、ffhq-ldm-vq-4.yaml 等。

类别条件生成

用于根据类别标签生成图像

使用 ClassEmbedder 作为条件阶段的配置
设置 cond_stage_key: "class_label" 和 conditioning_key: "crossattn"
示例： cin-ldm-vq-f8.yaml

语义图像合成

用于基于语义分割图生成图像

使用 SpatialRescaler 作为条件阶段的配置
设置 cond_stage_key: "segmentation"，通常设置 concat_mode: true
示例： models/ldm/semantic_synthesis256/config.yaml

来源： configs/latent-diffusion/celebahq-ldm-vq-4.yaml configs/latent-diffusion/cin-ldm-vq-f8.yaml models/ldm/semantic_synthesis256/config.yaml

配置文件的代码关系

配置文件直接映射到代码库中的 Python 类。target 字段指定类路径，而 params 字典将作为关键字参数传递以初始化这些类。

示例映射

target: ldm.models.diffusion.ddpm.LatentDiffusion → from ldm.models.diffusion.ddpm import LatentDiffusion
target: ldm.modules.diffusionmodules.openaimodel.UNetModel → from ldm.modules.diffusionmodules.openaimodel import UNetModel
target: ldm.models.autoencoder.VQModelInterface → from ldm.models.autoencoder import VQModelInterface

主入口点（main.py）将加载这些配置，并根据指定的 target 和参数实例化对象。

来源： configs/latent-diffusion/ffhq-ldm-vq-4.yaml3 configs/latent-diffusion/ffhq-ldm-vq-4.yaml15 configs/latent-diffusion/ffhq-ldm-vq-4.yaml37

配置最佳实践

在使用 Stable Diffusion 配置时

从现有的配置开始，选择最接近你用例的配置
一次修改一个部分，并理解其影响
检查相关部分之间的参数兼容性（例如，确保 UNet 和第一阶段模型之间的图像尺寸匹配）
跟踪与特定配置对应的模型检查点
调整模型大小参数时考虑计算需求（例如，model_channels、attention_resolutions）

有关使用自定义配置训练模型的更多信息，请参阅训练流程。