图像修复

[[open-in-colab]]

图像修复可以替换或编辑图像中的特定区域。这使其成为图像修复的有用工具，例如去除缺陷和伪影，甚至用全新的内容替换图像区域。图像修复依赖于一个掩码来确定要填充的图像区域；要修复的区域由白色像素表示，要保留的区域由黑色像素表示。白色像素由提示内容填充。

使用 🤗 Diffusers，你可以这样进行图像修复：

使用 [AutoPipelineForInpainting] 类加载图像修复检查点。这将自动检测并加载基于检查点的适当管道类：

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

加载基础图像和蒙版图像：

init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

创建一个提示，用于在图像中进行修复，并将其与基础图像和掩码图像一起传递给管道：

prompt = "a black cat with glowing eyes, cute, adorable, disney, pixar, highly detailed, 8k"
negative_prompt = "bad anatomy, deformed, ugly, disfigured"
image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=init_image, mask_image=mask_image).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

base image

mask image

generated image

创建遮罩图像

在整个指南中，为了方便起见，所有代码示例中都提供了遮罩图像。你可以对自定义图像进行修复，但需要为它创建一个遮罩图像。使用下面的空间轻松创建遮罩图像。

上传一个基础图像进行修复，并使用绘图工具绘制遮罩。完成后，点击运行生成并下载遮罩图像。

遮罩模糊

[~VaeImageProcessor.blur] 方法提供了如何将原始图像和修复区域混合的选项。模糊的程度由 blur_factor 参数决定。增加 blur_factor 会增加应用于遮罩边缘的模糊量，从而软化原始图像和修复区域之间的过渡。较低或为零的 blur_factor 会保留遮罩的锐利边缘。

要使用此功能，可以使用图像处理器创建一个模糊的遮罩。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image
from PIL import Image

pipeline = AutoPipelineForInpainting.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16).to('cuda')

mask = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore_mask.png")
blurred_mask = pipeline.mask_processor.blur(mask, blur_factor=33)
blurred_mask

mask with no blur

mask with blur applied

流行的模型

Stable Diffusion Inpainting、Stable Diffusion XL (SDXL) Inpainting 和 Kandinsky 2.2 Inpainting 是最受欢迎的修复模型。SDXL 通常生成的图像分辨率比 Stable Diffusion v1.5 更高，而 Kandinsky 2.2 也能生成高质量的图像。

Stable Diffusion Inpainting

Stable Diffusion Inpainting 是一个在 512x512 图像上进行修复的潜在扩散模型。它是一个很好的起点，因为它相对较快且生成的图像质量良好。要使用此模型进行修复，你需要向管道传递一个提示、基础图像和掩码图像：

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

Stable Diffusion XL (SDXL) 图像修复

SDXL 是 Stable Diffusion v1.5 的更大、更强大的版本。该模型可以遵循两阶段模型过程（尽管每个模型也可以单独使用）；基础模型生成图像，而精炼模型则在此图像基础上进一步增强其细节和质量。请参阅 SDXL 指南，以获取更详细的使用和配置参数的指南。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "diffusers/stable-diffusion-xl-1.0-inpainting-0.1", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

Kandinsky 2.2 图像修复

Kandinsky 模型家族与 SDXL 类似，因为它也使用了两个模型；图像先验模型生成图像嵌入，而扩散模型则从这些嵌入中生成图像。你可以分别加载图像先验模型和扩散模型，但使用 Kandinsky 2.2 最简单的方法是将其加载到 [AutoPipelineForInpainting] 类中，该类在内部使用了 [KandinskyV22InpaintCombinedPipeline]。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

generator = torch.Generator("cuda").manual_seed(92)
prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, generator=generator).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

base image

Stable Diffusion Inpainting

Stable Diffusion XL Inpainting

Kandinsky 2.2 Inpainting

非特定修复检查点

到目前为止，本指南使用了特定的修复检查点，例如 stable-diffusion-v1-5/stable-diffusion-inpainting。但你也可以使用常规检查点，如 stable-diffusion-v1-5/stable-diffusion-v1-5。让我们比较这两种检查点的结果。

左边的图像是从常规检查点生成的，而右边的图像是从修复检查点生成的。你会立即注意到左边的图像不够干净，仍然可以看到模型需要修复的区域的轮廓。右边的图像则更干净，修复的区域看起来更加自然。

stable-diffusion-v1-5/stable-diffusion-v1-5

runwayml/stable-diffusion-inpainting

然而，对于一些基本任务，比如从图像中删除一个对象（例如，道路上的石头），普通的检查点就能产生相当不错的效果。普通检查点和修复检查点之间的差异并不那么明显。

stable-diffusion-v1-5/stable-diffusion-v1-5

runwayml/stable-diffusion-inpainting

使用非特定于修复的检查点的权衡是整体图像质量可能会较低，但它通常倾向于保留掩码区域（这就是为什么你可以看到掩码轮廓）。特定于修复的检查点是专门训练用于生成更高质量的修复图像，这包括在掩码区域和非掩码区域之间创建更自然的过渡。因此，这些检查点更有可能改变你的非掩码区域。

如果你的任务中保留非掩码区域很重要，可以使用 [VaeImageProcessor.apply_overlay] 方法来强制图像的非掩码区域保持不变，但这可能会导致掩码区域和非掩码区域之间的过渡显得更不自然。

import PIL
import numpy as np
import torch

from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

device = "cuda"
pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    torch_dtype=torch.float16,
)
pipeline = pipeline.to(device)

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = load_image(img_url).resize((512, 512))
mask_image = load_image(mask_url).resize((512, 512))

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
repainted_image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
repainted_image.save("repainted_image.png")

unmasked_unchanged_image = pipeline.image_processor.apply_overlay(mask_image, init_image, repainted_image)
unmasked_unchanged_image.save("force_unmasked_unchanged.png")
make_image_grid([init_image, mask_image, repainted_image, unmasked_unchanged_image], rows=2, cols=2)

配置管道参数

图像特征——如质量与“创意”——取决于管道参数。了解这些参数的作用对于获得你想要的结果至关重要。让我们来看看最重要的参数，并看看改变它们如何影响输出。

强度

strength 是衡量添加到基础图像的噪声量的指标，这会影响输出图像与基础图像的相似程度。

📈 较高的 strength 值意味着添加到图像的噪声更多，去噪过程会更长，但你会得到质量更高的图像，这些图像与基础图像的差异更大
📉 较低的 strength 值意味着添加到图像的噪声更少，去噪过程更快，但图像质量可能不会那么好，生成的图像与基础图像更相似

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.6).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

strength = 0.6

strength = 0.8

strength = 1.0

指导尺度

guidance_scale 影响文本提示和生成图像之间的对齐程度。

📈 较高的 guidance_scale 值意味着提示和生成的图像紧密对齐，因此输出是对提示的更严格解释
📉 较低的 guidance_scale 值意味着提示和生成的图像对齐较松散，因此输出可能与提示有更大的差异

你可以结合使用 strength 和 guidance_scale 来更精细地控制模型的表达能力。例如，较高的 strength 和 guidance_scale 值组合可以给模型最大的创意自由度。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, guidance_scale=2.5).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

guidance_scale = 2.5

guidance_scale = 7.5

guidance_scale = 12.5

负面提示

负面提示的作用与普通提示相反；它引导模型避免在图像中生成某些内容。这有助于快速提高图像质量，并防止模型生成你不希望出现的内容。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
negative_prompt = "bad architecture, unstable, poor details, blurry"
image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=init_image, mask_image=mask_image).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

negative_prompt = "bad architecture, unstable, poor details, blurry"

填充掩码裁剪

提高图像修复质量的一种方法是使用 padding_mask_crop 参数。启用此选项后，它会根据用户指定的填充值裁剪掩码区域，并且也会从原始图像中裁剪相同的区域。图像和掩码会被放大到更高的分辨率进行修复，然后再叠加到原始图像上。这是一种快速且简单的方法，可以在不使用单独的管道（如 [StableDiffusionUpscalePipeline]）的情况下提高图像质量。

在管道调用中添加 padding_mask_crop 参数，并将其设置为你希望的填充值。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image
from PIL import Image

generator = torch.Generator(device='cuda').manual_seed(0)
pipeline = AutoPipelineForInpainting.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16).to('cuda')

base = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore.png")
mask = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore_mask.png")

image = pipeline("boat", image=base, mask_image=mask, strength=0.75, generator=generator, padding_mask_crop=32).images[0]
image

default inpaint image

inpaint image with `padding_mask_crop` enabled

连接的图像修复管道

[AutoPipelineForInpainting] 可以与其他 🤗 Diffusers 管道连接，以编辑它们的输出。这通常有助于提高其他扩散管道的输出质量，如果你使用多个管道，将它们连接起来可以更高效地利用内存，因为可以在潜在空间中保持输出并重用相同的管道组件。

文本到图像到修复

将文本到图像和修复管道连接起来，可以让你修复生成的图像，而无需提供基础图像。这使得编辑你最喜欢的文本到图像输出变得非常方便，而无需生成全新的图像。

首先使用文本到图像管道创建一座城堡：

import torch
from diffusers import AutoPipelineForText2Image, AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForText2Image.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

text2image = pipeline("concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k").images[0]

加载上面输出的掩码图像：

mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_text-chain-mask.png")

让我们用瀑布来填充遮罩区域：

pipeline = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

prompt = "digital painting of a fantasy waterfall, cloudy"
image = pipeline(prompt=prompt, image=text2image, mask_image=mask_image).images[0]
make_image_grid([text2image, mask_image, image], rows=1, cols=3)

text-to-image

inpaint

从修复到图像到图像

你也可以在另一个管道（如图像到图像或放大器）之前连接一个修复管道，以提高质量。

首先对图像进行修复：

import torch
from diffusers import AutoPipelineForInpainting, AutoPipelineForImage2Image
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image_inpainting = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]

# resize image to 1024x1024 for SDXL
image_inpainting = image_inpainting.resize((1024, 1024))

现在让我们将图像传递给另一个使用 SDXL 的 refiner 模型的修复管道，以增强图像的细节和质量：

pipeline = AutoPipelineForInpainting.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

image = pipeline(prompt=prompt, image=image_inpainting, mask_image=mask_image, output_type="latent").images[0]

最后，你可以将这张图片传递给图像到图像的管道，以完成最后的润色。使用 [~AutoPipelineForImage2Image.from_pipe] 方法重用现有的管道组件会更高效，从而避免不必要的将所有管道组件再次加载到内存中。

pipeline = AutoPipelineForImage2Image.from_pipe(pipeline)
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

image = pipeline(prompt=prompt, image=image).images[0]
make_image_grid([init_image, mask_image, image_inpainting, image], rows=2, cols=2)

initial image

inpaint

image-to-image

图像到图像和修复实际上是相当相似的任务。图像到图像生成一张与提供的现有图像相似的新图像。修复也做同样的事情，但它仅转换由掩码定义的图像区域，而图像的其余部分保持不变。你可以将修复视为一个更精确的工具，用于进行特定的更改，而图像到图像则具有更广泛的范围，可以进行更广泛的更改。

控制图像生成

让图像完全按照你想要的样子生成是具有挑战性的，因为去噪过程是随机的。虽然你可以通过配置 negative_prompt 等参数来控制生成的某些方面，但还有更好、更高效的方法来控制图像生成。

提示权重

提示权重提供了一种可量化的手段来调整提示中概念的表示。你可以使用它来增加或减少提示中每个概念的文本嵌入向量的幅度，从而决定生成的每个概念的多少。Compel 库提供了一种直观的语法来调整提示权重并生成嵌入。请参阅提示权重指南，了解如何创建嵌入。

生成嵌入后，将它们传递给 [AutoPipelineForInpainting] 中的 prompt_embeds（如果你使用了负提示，则传递 negative_prompt_embeds）参数。嵌入将替换 prompt 参数：

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16,
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

image = pipeline(prompt_embeds=prompt_embeds, # generated from Compel
    negative_prompt_embeds=negative_prompt_embeds, # generated from Compel
    image=init_image,
    mask_image=mask_image
).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

ControlNet

ControlNet 模型与其他扩散模型（如 Stable Diffusion）结合使用，提供了一种更加灵活和准确的图像生成控制方式。ControlNet 接受一个额外的条件图像输入，该输入指导扩散模型保留其中的特征。

例如，让我们使用在修复图像上预训练的 ControlNet 来条件化一个图像：

import torch
import numpy as np
from diffusers import ControlNetModel, StableDiffusionControlNetInpaintPipeline
from diffusers.utils import load_image, make_image_grid

# load ControlNet
controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16, variant="fp16")

# pass ControlNet to the pipeline
pipeline = StableDiffusionControlNetInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting", controlnet=controlnet, torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

# prepare control image
def make_inpaint_condition(init_image, mask_image):
    init_image = np.array(init_image.convert("RGB")).astype(np.float32) / 255.0
    mask_image = np.array(mask_image.convert("L")).astype(np.float32) / 255.0

    assert init_image.shape[0:1] == mask_image.shape[0:1], "image and image_mask must have the same image size"
    init_image[mask_image > 0.5] = -1.0  # set as masked pixel
    init_image = np.expand_dims(init_image, 0).transpose(0, 3, 1, 2)
    init_image = torch.from_numpy(init_image)
    return init_image

control_image = make_inpaint_condition(init_image, mask_image)

现在从基础图像、掩码图像和控制图像生成一张图像。你会注意到，生成的图像中强烈保留了基础图像的特征。

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, control_image=control_image).images[0]
make_image_grid([init_image, mask_image, PIL.Image.fromarray(np.uint8(control_image[0][0])).convert('RGB'), image], rows=2, cols=2)

你可以更进一步，将它与图像到图像的管道链起来，以应用新的风格：

from diffusers import AutoPipelineForImage2Image

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "nitrosocke/elden-ring-diffusion", torch_dtype=torch.float16,
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

prompt = "elden ring style castle" # include the token "elden ring style" in the prompt
negative_prompt = "bad architecture, deformed, disfigured, poor details"

image_elden_ring = pipeline(prompt, negative_prompt=negative_prompt, image=image).images[0]
make_image_grid([init_image, mask_image, image, image_elden_ring], rows=2, cols=2)

initial image

ControlNet inpaint

image-to-image

优化

如果你的资源有限，运行扩散模型可能会很困难且缓慢，但通过一些优化技巧，这并不一定如此。你可以启用的最大（也是最简单的）优化之一是切换到内存高效的注意力机制。如果你使用的是 PyTorch 2.0，缩放点积注意力会自动启用，你无需进行其他操作。对于非 PyTorch 2.0 用户，你可以安装并使用 xFormers 的内存高效注意力机制实现。这两种选项都能减少内存使用并加速推理。

你还可以将模型卸载到 CPU 以节省更多内存：

diff

+ pipeline.enable_xformers_memory_efficient_attention()
+ pipeline.enable_model_cpu_offload()

要进一步加速你的推理代码，可以使用 torch_compile。你应该将 torch.compile 包裹在管道中最耗时的组件上，通常是 UNet：

pipeline.unet = torch.compile(pipeline.unet, mode="reduce-overhead", fullgraph=True)

了解更多内容，请参阅减少内存使用和 Torch 2.0 指南。

图像修复 ​

创建遮罩图像 ​

遮罩模糊 ​

流行的模型 ​

Stable Diffusion Inpainting ​

Stable Diffusion XL (SDXL) 图像修复 ​

Kandinsky 2.2 图像修复 ​

非特定修复检查点 ​

配置管道参数 ​

强度 ​

指导尺度 ​

负面提示 ​

填充掩码裁剪 ​

连接的图像修复管道 ​

文本到图像到修复 ​

从修复到图像到图像 ​

控制图像生成 ​

提示权重 ​

ControlNet ​

优化 ​

图像修复

创建遮罩图像

遮罩模糊

流行的模型

Stable Diffusion Inpainting

Stable Diffusion XL (SDXL) 图像修复

Kandinsky 2.2 图像修复

非特定修复检查点

配置管道参数

强度

指导尺度

负面提示

填充掩码裁剪

连接的图像修复管道

文本到图像到修复

从修复到图像到图像

控制图像生成

提示权重

ControlNet

优化