Giving Stable Diffusion Some Depth

Giving Stable Diffusion Some Depth

You’ve likely heard quite a bit of buzz over the last few months about Stable Diffusion. The new version (v2) has come out, and in addition to the standard image-to-image and text-to-image modes, it also has a depth-image-to-image that can be incredibly useful. [Andrew] has a write-up that guides you on using this mode.

The basic idea is that you can take both an image and depth into the model, which allows you to control what gets put where. Stable Diffusion is a bit confusing, but we already have some great resources to wrap your head around it. In terms of input, you can use a depth map from a camera with lidar (many recent phones include this) or have another model (like MiDaS) estimate it from a 2D picture. This becomes powerful when you can preserve a specific composition, such as an iconic scene from a well-known movie. You can keep the characters’ poses on the screen but transform the style of the scene into whatever you wish (as seen above).

We have already covered a technique to generate textures right in blender, but this new depth information has already been implemented to provide better accuracy of the textures.

[Justin Alvey] used it to create architectural photos from dollhouse furniture. Using the MiDaS model, he estimated the depth and threw away the RGB aspects by setting the denoising strength to maximum. The simplified dollhouse furniture was easily recognizable to the model, which helped produce great results.

However, the only downside is that the perspective produces a rather dollhouse feel. Changing the focal length and moving farther away helps. Overall, it’s a clever use of what the new AI model can do. It’s a fast-moving space, so this will likely be out of date in a few months.