菜单

配置参考

相关源文件

本文档提供了 Stable Diffusion 仓库中使用的配置系统的全面指南。它解释了控制模型架构、训练参数和推理设置的 YAML 配置文件中的结构和可用选项。有关可以覆盖这些配置的命令行参数的信息,请参阅命令行参数和选项

配置文件概述

Stable Diffusion 使用 YAML 配置文件来定义模型架构、训练参数和数据处理管道。这些文件提供了一种声明式的方式来指定整个实验设置,而无需修改代码。

来源:configs/latent-diffusion/ffhq-ldm-vq-4.yaml1-85 configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml1-91

模型配置结构

The model section is the most important part of the configuration file. It defines the core architecture and hyperparameters of the Stable Diffusion model.

基本模型参数

关键参数包括:

  • base_learning_rate: Base learning rate for optimization
  • target: The Python class path to the model implementation
  • params: Model-specific parameters
    • linear_start and linear_end: Define the noise schedule
    • timesteps: Number of diffusion steps
    • image_size: Size of the latent representation
    • channels: Number of channels in the latent space

来源:configs/latent-diffusion/celebahq-ldm-vq-4.yaml1-14 configs/latent-diffusion/ffhq-ldm-vq-4.yaml1-14

UNet 配置

The UNet is the backbone of the diffusion model, responsible for predicting noise during the denoising process.

关键 UNet 参数

  • image_size: Spatial dimensions of the input
  • in_channels and out_channels: Input and output channel dimensions
  • model_channels: Base channel count for the model
  • attention_resolutions: At which resolution levels to apply attention
  • num_res_blocks: Number of residual blocks per resolution level
  • channel_mult: Channel multipliers for each resolution level

来源:configs/latent-diffusion/ffhq-ldm-vq-4.yaml14-35 configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml29-41

第一阶段模型配置

The first stage model encodes images into the latent space and decodes latents back to images. Stable Diffusion supports two types of first stage models

  1. 矢量量化 (VQ) 模型
  2. KL 正则化自编码器

示例 VQ 模型配置

示例 KL 模型配置

来源:configs/latent-diffusion/celebahq-ldm-vq-4.yaml36-58 configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml43-62

条件阶段配置

The conditioning stage defines how conditioning information (like text prompts or class labels) is processed and incorporated into the diffusion model.

条件类型

  1. 无条件生成

  2. 类别条件

  3. 空间条件(例如,用于语义合成)

条件的关键参数

  • cond_stage_key: Specifies the key for conditioning (e.g., "class_label", "segmentation")
  • cond_stage_trainable: Whether the conditioning stage is trainable
  • conditioning_key: How conditioning is incorporated (e.g., "crossattn" for cross-attention)

来源:configs/latent-diffusion/celebahq-ldm-vq-4.yaml59 configs/latent-diffusion/cin-ldm-vq-f8.yaml65-69 models/ldm/semantic_synthesis256/config.yaml54-59

数据配置

The data section defines dataset loading, preprocessing, and batching parameters.

主要参数

  • batch_size: Number of samples per batch
  • num_workers: Number of worker processes for data loading
  • wrap: Whether to wrap the dataset
  • train and validation: Dataset configurations for training and validation sets
    • target: The Python class path to the dataset implementation
    • params: Dataset-specific parameters (typically includes size for image resolution)

可用的数据集实现

  • taming.data.faceshq.CelebAHQTrain / CelebAHQValidation
  • taming.data.faceshq.FFHQTrain / FFHQValidation
  • ldm.data.lsun.LSUNBedroomsTrain / LSUNBedroomsValidation
  • ldm.data.lsun.LSUNChurchesTrain / LSUNChurchesValidation
  • ldm.data.imagenet.ImageNetTrain / ImageNetValidation

来源:configs/latent-diffusion/celebahq-ldm-vq-4.yaml60-73 configs/latent-diffusion/ffhq-ldm-vq-4.yaml59-72 configs/latent-diffusion/cin-ldm-vq-f8.yaml70-85

Lightning 配置

The lightning section configures PyTorch Lightning's trainer and callbacks, which handle the training loop and logging.

关键组件

  • callbacks: Definitions of Lightning callbacks
    • image_logger: Logs generated images during training
      • batch_frequency: How often to log images
      • max_images: Maximum number of images to log
  • trainer: Configuration for PyTorch Lightning's Trainer
    • benchmark: Whether to use cudnn benchmark for improved speed

来源:configs/latent-diffusion/celebahq-ldm-vq-4.yaml76-85 configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml80-91

学习率调度器

Some configurations include a scheduler for adjusting the learning rate during training

主要参数

  • warm_up_steps: Number of steps for the warm-up phase
  • cycle_lengths: Length of scheduler cycles
  • f_start, f_max, f_min: 启动、最大和最小缩放因子

来源: configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml20-27

特定用例的配置

该存储库包含针对不同用例的几个预定义配置。它们的区别如下:

无条件生成

用于在没有条件的情况下生成图像

  • 使用带有 cond_stage_config: __is_unconditional__ 的配置
  • 示例: celebahq-ldm-vq-4.yamlffhq-ldm-vq-4.yaml 等。

类别条件生成

用于根据类别标签生成图像

  • 使用 ClassEmbedder 作为条件阶段的配置
  • 设置 cond_stage_key: "class_label"conditioning_key: "crossattn"
  • 示例: cin-ldm-vq-f8.yaml

语义图像合成

用于基于语义分割图生成图像

  • 使用 SpatialRescaler 作为条件阶段的配置
  • 设置 cond_stage_key: "segmentation",通常设置 concat_mode: true
  • 示例: models/ldm/semantic_synthesis256/config.yaml

来源: configs/latent-diffusion/celebahq-ldm-vq-4.yaml configs/latent-diffusion/cin-ldm-vq-f8.yaml models/ldm/semantic_synthesis256/config.yaml

配置文件的代码关系

配置文件直接映射到代码库中的 Python 类。target 字段指定类路径,而 params 字典将作为关键字参数传递以初始化这些类。

示例映射

  • target: ldm.models.diffusion.ddpm.LatentDiffusionfrom ldm.models.diffusion.ddpm import LatentDiffusion
  • target: ldm.modules.diffusionmodules.openaimodel.UNetModelfrom ldm.modules.diffusionmodules.openaimodel import UNetModel
  • target: ldm.models.autoencoder.VQModelInterfacefrom ldm.models.autoencoder import VQModelInterface

主入口点(main.py)将加载这些配置,并根据指定的 target 和参数实例化对象。

来源: configs/latent-diffusion/ffhq-ldm-vq-4.yaml3 configs/latent-diffusion/ffhq-ldm-vq-4.yaml15 configs/latent-diffusion/ffhq-ldm-vq-4.yaml37

配置最佳实践

在使用 Stable Diffusion 配置时

  1. 从现有的配置开始,选择最接近你用例的配置
  2. 一次修改一个部分,并理解其影响
  3. 检查相关部分之间的参数兼容性(例如,确保 UNet 和第一阶段模型之间的图像尺寸匹配)
  4. 跟踪与特定配置对应的模型检查点
  5. 调整模型大小参数时考虑计算需求(例如,model_channelsattention_resolutions

有关使用自定义配置训练模型的更多信息,请参阅 训练流程