Wednesday, January 17, 2024

Experimenting with AI

Over the course of my career in technology, I gained an appreciation of the use of 'accelerators' — utilities, tools, and techniques that help get a task done more quickly than doing the same task by hand. Although I am retired now, I have various personal projects that involve some form of coding so accelerators are still useful. A new option that has really been a game-changer is ChatGPT. By describing the logic I need, it can sometimes provide complete, working code. At worst, it offers a foundation I can adapt, significantly cutting down the time it takes to create a new script or automation, especially in languages that I didn't use professionally like Python.

With the integration Microsoft has provided between ChatGPT 4 and DALL-E 3 under the covers of Copilot (previously Bing Chat), I decided to try my hand using it as an accelerator for creating images. For a few years now, I've had the idea of having a blog title image that highlighted the various interests I write about on this blog, primarily astronomy, photography and sailing: a nighttime seascape with a sailboat and the Milky Way arching overhead. As I have never received a response to my post seeking help from an artist, I decided to experiment with Copilot to create the image myself.

One thing I learned along the way is that creating with ChatGPT and DALL-E isn't like painting with a tool like Photoshop... there are limits to the degree of control you have. My original intent was to have the horizon centered in the image with a pretty plain foreground and limited detail at the top so that I could crop the standard DALL-E square image down to a landscape orientation. The initial images often placed the horizon lower or higher than specified, and revising the prompt didn't always affect its position in the final image. Similarly, I originally had specified having a single-masted sailboat yet every image generated had a two or three-masted ship. Anyway, it appears you have flexible about the details of the generated image.

After numerous passes, I started from scratch with the following prompt which didn't include a reference to a sailboat.

Create a photo-realistic image of a night-time  seascape with a calm sea with a large foreground of water and no land visible, with the stars across the sky an do the Milky Way rising high the the sky on the right side of the image, a low line of clouds along the horizon with a few small lightning bolts in the far distance and a full moon just above the horizon on the far left side of the image.

Of the 4 images generated, I chose the one that was closest to what I had in mind and provided a second prompt to add a small sailboat off in the distance, along the horizon. Here is the result.

This image came close to matching my original vision. The unexpected rocks under the water in the foreground were easily cropped out, although I left a few visible in the final version. Similarly, the Moon was up high instead of on the horizon but, again, not something I minded cropping out. Although the sailboat was a bit smaller than I'd like, I think it works well.

The one issue with this version is the reflection on the water in the middle. It's obviously not the Moon as that's at top right but it's not clear what it is a reflection of. The brightness on the horizon looks a bit like a sunset so perhaps it is supposed to be the Sun behind the clouds. I decided that I'm fine with the sky brightness but the reflection had to go.

Picking the image in Copilot opened it in Image Creator where I could pick Customize which opened the image in Microsoft Designer, an AI-powered design app. One of its features is 'generative erase'. Basically, all I had to do was select the reflection on the water with a brush and Designer replaced it by filling in that space with something that looked like everything around it. This is similar to the 'generative fill' feature in Photoshop. As you can see, it worked pretty well.

If you read the post linked above, you'll see that my ultimate goal was to have an image like this animated as a GIF. My research suggests that using existing AI tools for this is feasible, but it's not as straightforward as just giving ChatGPT a prompt. It involves creating and compiling a series of images, then employing tools like Stable Diffusion to convert them into a video, which can subsequently be saved as a GIF. It sounds like that could even involve writing code. Eventually, I may give that a try but, for now, I'm satisfied with just cropping this image down and using it as the blog title image for a while.

Your thoughts? Whether you love AI, hate it or are ambivalent, I'd like to hear about it in the comments.

As always, just click on an image to see it full screen!