本文档提供了 Stable Diffusion 仓库中使用的配置系统的全面指南。它解释了控制模型架构、训练参数和推理设置的 YAML 配置文件中的结构和可用选项。有关可以覆盖这些配置的命令行参数的信息,请参阅命令行参数和选项。
Stable Diffusion 使用 YAML 配置文件来定义模型架构、训练参数和数据处理管道。这些文件提供了一种声明式的方式来指定整个实验设置,而无需修改代码。
来源:configs/latent-diffusion/ffhq-ldm-vq-4.yaml1-85 configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml1-91
The model section is the most important part of the configuration file. It defines the core architecture and hyperparameters of the Stable Diffusion model.
关键参数包括:
base_learning_rate: Base learning rate for optimizationtarget: The Python class path to the model implementationparams: Model-specific parameterslinear_start and linear_end: Define the noise scheduletimesteps: Number of diffusion stepsimage_size: Size of the latent representationchannels: Number of channels in the latent space来源:configs/latent-diffusion/celebahq-ldm-vq-4.yaml1-14 configs/latent-diffusion/ffhq-ldm-vq-4.yaml1-14
The UNet is the backbone of the diffusion model, responsible for predicting noise during the denoising process.
关键 UNet 参数
image_size: Spatial dimensions of the inputin_channels and out_channels: Input and output channel dimensionsmodel_channels: Base channel count for the modelattention_resolutions: At which resolution levels to apply attentionnum_res_blocks: Number of residual blocks per resolution levelchannel_mult: Channel multipliers for each resolution level来源:configs/latent-diffusion/ffhq-ldm-vq-4.yaml14-35 configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml29-41
The first stage model encodes images into the latent space and decodes latents back to images. Stable Diffusion supports two types of first stage models
示例 VQ 模型配置
示例 KL 模型配置
来源:configs/latent-diffusion/celebahq-ldm-vq-4.yaml36-58 configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml43-62
The conditioning stage defines how conditioning information (like text prompts or class labels) is processed and incorporated into the diffusion model.
条件类型
无条件生成
类别条件
空间条件(例如,用于语义合成)
条件的关键参数
cond_stage_key: Specifies the key for conditioning (e.g., "class_label", "segmentation")cond_stage_trainable: Whether the conditioning stage is trainableconditioning_key: How conditioning is incorporated (e.g., "crossattn" for cross-attention)来源:configs/latent-diffusion/celebahq-ldm-vq-4.yaml59 configs/latent-diffusion/cin-ldm-vq-f8.yaml65-69 models/ldm/semantic_synthesis256/config.yaml54-59
The data section defines dataset loading, preprocessing, and batching parameters.
主要参数
batch_size: Number of samples per batchnum_workers: Number of worker processes for data loadingwrap: Whether to wrap the datasettrain and validation: Dataset configurations for training and validation setstarget: The Python class path to the dataset implementationparams: Dataset-specific parameters (typically includes size for image resolution)可用的数据集实现
taming.data.faceshq.CelebAHQTrain / CelebAHQValidationtaming.data.faceshq.FFHQTrain / FFHQValidationldm.data.lsun.LSUNBedroomsTrain / LSUNBedroomsValidationldm.data.lsun.LSUNChurchesTrain / LSUNChurchesValidationldm.data.imagenet.ImageNetTrain / ImageNetValidation来源:configs/latent-diffusion/celebahq-ldm-vq-4.yaml60-73 configs/latent-diffusion/ffhq-ldm-vq-4.yaml59-72 configs/latent-diffusion/cin-ldm-vq-f8.yaml70-85
The lightning section configures PyTorch Lightning's trainer and callbacks, which handle the training loop and logging.
关键组件
callbacks: Definitions of Lightning callbacksimage_logger: Logs generated images during trainingbatch_frequency: How often to log imagesmax_images: Maximum number of images to logtrainer: Configuration for PyTorch Lightning's Trainerbenchmark: Whether to use cudnn benchmark for improved speed来源:configs/latent-diffusion/celebahq-ldm-vq-4.yaml76-85 configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml80-91
Some configurations include a scheduler for adjusting the learning rate during training
主要参数
warm_up_steps: Number of steps for the warm-up phasecycle_lengths: Length of scheduler cyclesf_start, f_max, f_min: 启动、最大和最小缩放因子来源: configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml20-27
该存储库包含针对不同用例的几个预定义配置。它们的区别如下:
用于在没有条件的情况下生成图像
cond_stage_config: __is_unconditional__ 的配置celebahq-ldm-vq-4.yaml、ffhq-ldm-vq-4.yaml 等。用于根据类别标签生成图像
ClassEmbedder 作为条件阶段的配置cond_stage_key: "class_label" 和 conditioning_key: "crossattn"cin-ldm-vq-f8.yaml用于基于语义分割图生成图像
SpatialRescaler 作为条件阶段的配置cond_stage_key: "segmentation",通常设置 concat_mode: truemodels/ldm/semantic_synthesis256/config.yaml来源: configs/latent-diffusion/celebahq-ldm-vq-4.yaml configs/latent-diffusion/cin-ldm-vq-f8.yaml models/ldm/semantic_synthesis256/config.yaml
配置文件直接映射到代码库中的 Python 类。target 字段指定类路径,而 params 字典将作为关键字参数传递以初始化这些类。
示例映射
target: ldm.models.diffusion.ddpm.LatentDiffusion → from ldm.models.diffusion.ddpm import LatentDiffusiontarget: ldm.modules.diffusionmodules.openaimodel.UNetModel → from ldm.modules.diffusionmodules.openaimodel import UNetModeltarget: ldm.models.autoencoder.VQModelInterface → from ldm.models.autoencoder import VQModelInterface主入口点(main.py)将加载这些配置,并根据指定的 target 和参数实例化对象。
来源: configs/latent-diffusion/ffhq-ldm-vq-4.yaml3 configs/latent-diffusion/ffhq-ldm-vq-4.yaml15 configs/latent-diffusion/ffhq-ldm-vq-4.yaml37
在使用 Stable Diffusion 配置时
model_channels、attention_resolutions)有关使用自定义配置训练模型的更多信息,请参阅 训练流程。