The Polymath Professional Report

Share this post

AI Art Work - DALL·E

zachfrey.substack.com

AI Art Work - DALL·E

Experimenting with generative AI

Zach Frey
Oct 21, 2022
2
Share

Taking a quick detour this week from the lithium supply chain and blockchain use cases to generative AI.

I was asked about how to use DALL·E for transforming a sketch into a detailed graphic based on a specific style.

Thanks for reading The Polymath Professional Report! Subscribe for free to receive new posts and support my work.

I didn’t know how to do this, but why let that slow me down?

First Try: Generating Variations of an Image

To test out DALL·E’s features, I started with this sketch from my friend Hilary:

1

I uploaded this image to DALL·E and generated variations:

Artist’s notebook sketch of a pear

Here are the images produced:

This is good - all of the generated images do look like the original, without being copies.

However, all of the images are in the same style as the original. What if we want something different?

Second Try: Text Prompt Based on Image

My goal wasn’t to generate a series of images that are stylisticly similar to the orignal. My goal is to take an image of a sketch and use generative AI to transform it into a 3D graphic.

I looked for a way to generate an image that is a combination of seed image plus text description in DALL·E, but could not find any.

So, I had a idea. What if I crafted a text prompt that describes my sketch?

artists sketch of a pear

Not bad! The third image is close to the original sketch. Note that I am not using the sketch as input, just as inspiration to craft a text prompt.

Now, I add +”, 3d render” to my original:

artists sketch of a pear, 3d render

Interesting! I like these images. They are all wildly different from each other. My options would be to pick one and generate more variations of it, or craft a more specific addition to my original prompt.

While I like the images produced, I’m still not happy. This doesn’t feel like a very controlled or deterministic process. I want something better.

Third Try: Image to Text?

My next brainstorm to think about how the diffusion algorithm works.

The generation process works to create an image that “fits” the text prompt that is given. This is the normal use case for DALL·E.

But what about the cases of image variation, or scene extension?

In these cases, it seems as if the algorithm must be analyzing the image to create an internal representation, then generating to match.

Great! All we need is to tap into that feature of the algorithm.

Problem: There’s nothing in the DALL·E user interface that provides image-to-text analysis.

What Do You Mean, There’s No API?

Potential solution: maybe DALL·E’s API exposes that functionality?

Nope. OpenAI hasn’t released an API for DALL·E yet.

Checkmate, human.

Looking For Other Options

DALL·E isn’t the only image generation AI out there. Next time, I’ll work through the same problem using Stable Diffusion.

1

Go to Hilary White Sacred Art to support her work.

2
Share
Previous
Next
2 Comments
Hilary White
Writes World of Hilarity
Oct 21, 2022

Why yes, Zach, I'm happy to give you permission to use my work in a blog post...

<...Ahem...>

Expand full comment
Reply
1 reply by Zach Frey
1 more comment…
Top
New
Community

No posts

Ready for more?

© 2023 Zach Frey
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing