稳定级联

该模型基于 Würstchen 架构，其主要区别于其他模型如 Stable Diffusion 的地方在于它工作在一个更小的潜在空间中。为什么这一点很重要？潜在空间越小，推理速度越快，训练成本也越低。潜在空间有多小？Stable Diffusion 使用 8 倍的压缩因子，将 1024x1024 的图像编码为 128x128。Stable Cascade 实现了 42 倍的压缩因子，意味着可以将 1024x1024 的图像编码为 24x24，同时保持清晰的图像重建。文本条件模型随后在高度压缩的潜在空间中进行训练。该架构的先前版本，相较于 Stable Diffusion 1.5 实现了 16 倍的成本降低。

因此，这种模型非常适合在效率至关重要的场景中使用。此外，所有已知的扩展如微调、LoRA、ControlNet、IP-Adapter、LCM 等，都可以通过这种方法实现。

原始代码库可以在 Stability-AI/StableCascade 找到。

模型概述

Stable Cascade 由三个模型组成：Stage A、Stage B 和 Stage C，代表了一个生成图像的级联过程，因此得名“Stable Cascade”。

Stage A 和 Stage B 用于压缩图像，类似于 Stable Diffusion 中 VAE 的工作。然而，通过这种设置，可以实现更高的图像压缩。虽然 Stable Diffusion 模型使用 8 倍的空间压缩因子，将分辨率为 1024 x 1024 的图像编码为 128 x 128，但 Stable Cascade 实现了 42 倍的压缩因子。这可以将 1024 x 1024 的图像编码为 24 x 24，同时能够准确解码图像。这带来了训练和推理成本大幅降低的巨大好处。此外，Stage C 负责根据文本提示生成 24 x 24 的小潜在空间。

Stage C 模型在 24 x 24 的小潜在空间上运行，并根据文本提示去噪潜在空间。该模型也是 Cascade 管道中最大的组件，旨在与 StableCascadePriorPipeline 一起使用。

Stage B 和 Stage A 模型与 StableCascadeDecoderPipeline 一起使用，负责根据 24 x 24 的小潜在空间生成最终图像。

使用示例

python

import torch
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline

prompt = "an image of a shiba inu, donning a spacesuit and helmet"
negative_prompt = ""

prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", variant="bf16", torch_dtype=torch.bfloat16)
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", variant="bf16", torch_dtype=torch.float16)

prior.enable_model_cpu_offload()
prior_output = prior(
    prompt=prompt,
    height=1024,
    width=1024,
    negative_prompt=negative_prompt,
    guidance_scale=4.0,
    num_images_per_prompt=1,
    num_inference_steps=20
)

decoder.enable_model_cpu_offload()
decoder_output = decoder(
    image_embeddings=prior_output.image_embeddings.to(torch.float16),
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=0.0,
    output_type="pil",
    num_inference_steps=10
).images[0]
decoder_output.save("cascade.png")

使用Stage B和Stage C模型的Lite版本

python

import torch
from diffusers import (
    StableCascadeDecoderPipeline,
    StableCascadePriorPipeline,
    StableCascadeUNet,
)

prompt = "an image of a shiba inu, donning a spacesuit and helmet"
negative_prompt = ""

prior_unet = StableCascadeUNet.from_pretrained("stabilityai/stable-cascade-prior", subfolder="prior_lite")
decoder_unet = StableCascadeUNet.from_pretrained("stabilityai/stable-cascade", subfolder="decoder_lite")

prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", prior=prior_unet)
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", decoder=decoder_unet)

prior.enable_model_cpu_offload()
prior_output = prior(
    prompt=prompt,
    height=1024,
    width=1024,
    negative_prompt=negative_prompt,
    guidance_scale=4.0,
    num_images_per_prompt=1,
    num_inference_steps=20
)

decoder.enable_model_cpu_offload()
decoder_output = decoder(
    image_embeddings=prior_output.image_embeddings,
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=0.0,
    output_type="pil",
    num_inference_steps=10
).images[0]
decoder_output.save("cascade.png")

使用`from_single_file`加载原始检查点

通过StableCascadeUNet中的from_single_file方法支持加载原始格式的检查点。

python

import torch
from diffusers import (
    StableCascadeDecoderPipeline,
    StableCascadePriorPipeline,
    StableCascadeUNet,
)

prompt = "an image of a shiba inu, donning a spacesuit and helmet"
negative_prompt = ""

prior_unet = StableCascadeUNet.from_single_file(
    "https://huggingface.co/stabilityai/stable-cascade/resolve/main/stage_c_bf16.safetensors",
    torch_dtype=torch.bfloat16
)
decoder_unet = StableCascadeUNet.from_single_file(
    "https://huggingface.co/stabilityai/stable-cascade/blob/main/stage_b_bf16.safetensors",
    torch_dtype=torch.bfloat16
)

prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", prior=prior_unet, torch_dtype=torch.bfloat16)
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", decoder=decoder_unet, torch_dtype=torch.bfloat16)

prior.enable_model_cpu_offload()
prior_output = prior(
    prompt=prompt,
    height=1024,
    width=1024,
    negative_prompt=negative_prompt,
    guidance_scale=4.0,
    num_images_per_prompt=1,
    num_inference_steps=20
)

decoder.enable_model_cpu_offload()
decoder_output = decoder(
    image_embeddings=prior_output.image_embeddings,
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=0.0,
    output_type="pil",
    num_inference_steps=10
).images[0]
decoder_output.save("cascade-single-file.png")

用途

直接使用

该模型目前旨在用于研究目的。可能的研究领域和任务包括：

生成模型的研究。
安全部署可能生成有害内容的模型。
探究和理解生成模型的局限性和偏见。
生成艺术品并在设计和艺术创作过程中使用。
在教育或创意工具中的应用。

以下描述了排除的使用场景。

超出范围的使用

该模型未经过训练以真实或准确地代表人物或事件，因此使用该模型生成此类内容超出了该模型的能力范围。该模型不应以任何方式违反Stability AI的可接受使用政策。

局限性和偏见

局限性

面部和人物可能无法正确生成。
模型的自动编码部分是有损的。

StableCascadeCombinedPipeline

[[autodoc]] StableCascadeCombinedPipeline - all - call

StableCascadePriorPipeline

[[autodoc]] StableCascadePriorPipeline - all - call

StableCascadePriorPipelineOutput

[[autodoc]] pipelines.stable_cascade.pipeline_stable_cascade_prior.StableCascadePriorPipelineOutput

StableCascadeDecoderPipeline

[[autodoc]] StableCascadeDecoderPipeline - all - call

稳定级联 ​

模型概述 ​

使用示例 ​

使用Stage B和Stage C模型的Lite版本 ​

使用from_single_file加载原始检查点 ​

用途 ​

直接使用 ​

超出范围的使用 ​

局限性和偏见 ​

局限性 ​

StableCascadeCombinedPipeline ​

StableCascadePriorPipeline ​

StableCascadePriorPipelineOutput ​

StableCascadeDecoderPipeline ​

稳定级联

模型概述

使用示例

使用Stage B和Stage C模型的Lite版本

使用`from_single_file`加载原始检查点

用途

直接使用

超出范围的使用

局限性和偏见

局限性

StableCascadeCombinedPipeline

StableCascadePriorPipeline

StableCascadePriorPipelineOutput

StableCascadeDecoderPipeline