Skip to content

Stable unCLIP

Stable unCLIP 检查点是从 Stable Diffusion 2.1 检查点微调而来,以 CLIP 图像嵌入为条件。Stable unCLIP 仍然以文本嵌入为条件。鉴于这两种独立的条件,Stable unCLIP 可用于文本引导的图像变化。当与 unCLIP 先验模型结合时,它还可以用于完整的文本到图像生成。

论文的摘要如下:

像 CLIP 这样的对比模型已被证明能够学习到捕捉语义和风格的鲁棒图像表示。为了利用这些表示进行图像生成,我们提出了一种两阶段模型:一个先验模型,根据文本标题生成 CLIP 图像嵌入,以及一个解码器,根据图像嵌入生成图像。我们表明,显式生成图像表示可以提高图像多样性,同时最小化在照片真实感和标题相似性方面的损失。我们基于图像表示的解码器还可以生成图像的变体,这些变体在保留图像的语义和风格的同时,变化了图像表示中不存在的非本质细节。此外,CLIP 的联合嵌入空间使得语言引导的图像操作能够在零样本的方式下进行。我们使用扩散模型作为解码器,并对先验模型进行了自回归和扩散模型的实验,发现后者在计算上更高效,并生成更高质量的样本。

提示

Stable unCLIP 在推理过程中将 noise_level 作为输入,该参数决定了向图像嵌入添加多少噪声。较高的 noise_level 会增加最终去噪图像的变化。默认情况下,我们不会向图像嵌入添加任何额外噪声(noise_level = 0)。

文本到图像生成

通过与 KakaoBrain 的开源 DALL-E 2 复现 Karlo 的先验模型流水线结合,Stable unCLIP 可以用于文本到图像生成。

python
import torch
from diffusers import UnCLIPScheduler, DDPMScheduler, StableUnCLIPPipeline
from diffusers.models import PriorTransformer
from transformers import CLIPTokenizer, CLIPTextModelWithProjection

prior_model_id = "kakaobrain/karlo-v1-alpha"
data_type = torch.float16
prior = PriorTransformer.from_pretrained(prior_model_id, subfolder="prior", torch_dtype=data_type)

prior_text_model_id = "openai/clip-vit-large-patch14"
prior_tokenizer = CLIPTokenizer.from_pretrained(prior_text_model_id)
prior_text_model = CLIPTextModelWithProjection.from_pretrained(prior_text_model_id, torch_dtype=data_type)
prior_scheduler = UnCLIPScheduler.from_pretrained(prior_model_id, subfolder="prior_scheduler")
prior_scheduler = DDPMScheduler.from_config(prior_scheduler.config)

stable_unclip_model_id = "stabilityai/stable-diffusion-2-1-unclip-small"

pipe = StableUnCLIPPipeline.from_pretrained(
    stable_unclip_model_id,
    torch_dtype=data_type,
    variant="fp16",
    prior_tokenizer=prior_tokenizer,
    prior_text_encoder=prior_text_model,
    prior=prior,
    prior_scheduler=prior_scheduler,
)

pipe = pipe.to("cuda")
wave_prompt = "dramatic wave, the Oceans roar, Strong wave spiral across the oceans as the waves unfurl into roaring crests; perfect wave form; perfect wave shape; dramatic wave shape; wave shape unbelievable; wave; wave shape spectacular"

image = pipe(prompt=wave_prompt).images[0]
image

文本引导的图像到图像变体

python
from diffusers import StableUnCLIPImg2ImgPipeline
from diffusers.utils import load_image
import torch

pipe = StableUnCLIPImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1-unclip", torch_dtype=torch.float16, variation="fp16"
)
pipe = pipe.to("cuda")

url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_unclip/tarsila_do_amaral.png"
init_image = load_image(url)

images = pipe(init_image).images
images[0].save("variation_image.png")

你还可以选择向 pipe 传递一个提示,例如:

python
prompt = "A fantasy landscape, trending on artstation"

image = pipe(init_image, prompt=prompt).images[0]
image

StableUnCLIPPipeline

[[autodoc]] StableUnCLIPPipeline - all - call - enable_attention_slicing - disable_attention_slicing - enable_vae_slicing - disable_vae_slicing - enable_xformers_memory_efficient_attention - disable_xformers_memory_efficient_attention

StableUnCLIPImg2ImgPipeline

[[autodoc]] StableUnCLIPImg2ImgPipeline - all - call - enable_attention_slicing - disable_attention_slicing - enable_vae_slicing - disable_vae_slicing - enable_xformers_memory_efficient_attention - disable_xformers_memory_efficient_attention

ImagePipelineOutput

[[autodoc]] pipelines.ImagePipelineOutput