潜在一致性模型

[[open-in-colab]]

潜在一致性模型 (LCMs) 通过直接在潜在空间而不是像素空间中预测反向扩散过程，实现快速高质量的图像生成。换句话说，LCMs 试图从噪声图像中预测出无噪声图像，而典型的扩散模型则是逐步从噪声图像中去除噪声。通过避免迭代采样过程，LCMs 能够在 2-4 步内生成高质量图像，而不是 20-30 步。

LCMs 是从预训练模型中蒸馏出来的，这需要大约 32 小时的 A100 计算时间。为了加快这一过程，LCM-LoRAs 训练了一个 LoRA 适配器，其参数量远少于完整模型。一旦训练完成，LCM-LoRA 可以插入到扩散模型中。

本指南将向你展示如何使用 LCMs 和 LCM-LoRAs 进行快速推理任务，以及如何将它们与其他适配器（如 ControlNet 或 T2I-Adapter）一起使用。

TIP

LCMs 和 LCM-LoRAs 可用于 Stable Diffusion v1.5、Stable Diffusion XL 和 SSD-1B 模型。你可以在潜在一致性集合中找到它们的检查点。

文本到图像

图像到图像

图像修复

要使用 LCM-LoRAs 进行图像修复，你需要将调度器替换为 [LCMScheduler]，并使用 [~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights] 方法加载 LCM-LoRA 权重。然后你可以像往常一样使用管道，并传递文本提示、初始图像和掩码图像，只需 4 步即可生成图像。

import torch
from diffusers import AutoPipelineForInpainting, LCMScheduler
from diffusers.utils import load_image, make_image_grid

pipe = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5")

init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
generator = torch.manual_seed(0)
image = pipe(
    prompt=prompt,
    image=init_image,
    mask_image=mask_image,
    generator=generator,
    num_inference_steps=4,
    guidance_scale=4,
).images[0]
image

initial image

generated image

适配器

LCMs 兼容 LoRA、ControlNet、T2I-Adapter 和 AnimateDiff 等适配器。你可以将 LCMs 的速度应用到这些适配器上，以生成特定风格的图像或根据其他输入（如 Canny 图像）对模型进行条件化。

LoRA

LoRA 适配器可以快速微调，从少数几张图像中学习新的风格，并插入预训练模型中以生成该风格的图像。

ControlNet

ControlNet 是适配器，可以针对各种输入（如 Canny 边缘、姿态估计或深度）进行训练。ControlNet 可以插入到管道中，为模型提供额外的条件和控制，从而实现更准确的生成。

你可以在 lllyasviel 的仓库中找到针对其他输入训练的更多 ControlNet 模型。

T2I-Adapter

T2I-Adapter 是一个比 ControlNet 更轻量的适配器，它为预训练模型提供了一个额外的输入条件。它的速度比 ControlNet 更快，但结果可能会稍差一些。

你可以在 TencentArc 的仓库中找到其他输入训练的 T2I-Adapter 检查点。

AnimateDiff

AnimateDiff 是一个适配器，可以为图像添加运动效果。它可以与大多数 Stable Diffusion 模型一起使用，有效地将这些模型转变为“视频生成”模型。使用视频模型生成良好的结果通常需要生成多个帧（16-24 帧），这在使用常规的 Stable Diffusion 模型时可能会非常慢。LCM-LoRA 可以通过每帧仅需 4-8 步来加速这一过程。

加载一个 [AnimateDiffPipeline] 并传递一个 [MotionAdapter]。然后将调度器替换为 [LCMScheduler]，并使用 [~loaders.UNet2DConditionLoadersMixin.set_adapters] 方法将两个 LoRA 适配器结合在一起。现在你可以将一个提示传递给管道并生成一个动画图像。

import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler, LCMScheduler
from diffusers.utils import export_to_gif

adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5")
pipe = AnimateDiffPipeline.from_pretrained(
    "frankjoshua/toonyou_beta6",
    motion_adapter=adapter,
).to("cuda")

# set scheduler
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

# load LCM-LoRA
pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5", adapter_name="lcm")
pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-in", weight_name="diffusion_pytorch_model.safetensors", adapter_name="motion-lora")

pipe.set_adapters(["lcm", "motion-lora"], adapter_weights=[0.55, 1.2])

prompt = "best quality, masterpiece, 1girl, looking at viewer, blurry background, upper body, contemporary, dress"
generator = torch.manual_seed(0)
frames = pipe(
    prompt=prompt,
    num_inference_steps=5,
    guidance_scale=1.25,
    cross_attention_kwargs={"scale": 1},
    num_frames=24,
    generator=generator
).frames[0]
export_to_gif(frames, "animation.gif")

潜在一致性模型 ​

文本到图像 ​

图像到图像 ​

图像修复 ​

适配器 ​

LoRA ​

ControlNet ​

T2I-Adapter ​

AnimateDiff ​