Many of us have, by now, seen a new yield of images online that squint not-quite-right, or not quite plausible (in the sense of slightly wrong images of famous people doing strange things); and many of us know, or have heard about, the explosion in AI imaging through programs like DALL-E, Midjourney, and an ever-increasing number of others. Some of us have friends or online friends who are producing images that have us intrigued.
I have such a friend in Jonathan Hoefler, and as a discussion of the ethics/dangers of AI ensued on one of his Facebook posts (and for the purposes of this article, unless noted otherwise, when I refer to “AI” I am referring specifically to the image-generating form of AI, not the text-generating or any other kind or use), I decided I’d largest trammels it out for myself surpassing arguing either for or against.
I was a bit wrung of getting into it considering I was worried it might “imagine” largest than I do, leaving me finger useless as an artist. I’d moreover heard it’s addictive, and I was worried well-nigh that too. Most of the online snooping I’d encountered was centered on copyright, so I wanted to experiment and see how easy it might be to rip off flipside versifier or photographer (which I will do in the 2nd post of this series). I moreover had some ideas of my own that I wondered if it could “help” me with. And finally, I do love the really fucked-up images I’ve seen, and I wanted to make some surreal, fucked-up images too.
AI is not stealing your images
I want to explain a bit well-nigh how these programs (or whatever they are) work. Their source material is billions (trillions?) of images on the internet. Initially it relies on tagging, otherwise it has no idea what the witnesses of pixels is supposed to represent. So let’s say it assembles a few hundred thousand images tagged #horse. These are photos and illustrations and paintings and sculptures from all variegated angles and sizes. From this it gets a unstipulated idea of horseness, which is variegated from the unstipulated idea of dogness or humanness or carrotness. It then uses that information to start collecting untagged images that it now identifies as #horse. If you’ve overly used the squatter recognition in Adobe Lightroom or any other image sorting software, you understand how at first you have to tag #Janet several times surpassing it starts finding #Janet (and not-Janet!) for you in other photos.
BUT, undisciplined to many people’s belief, when you type “horse” into one of the AI programs it does not pull up one of its millions of photos of horses and serve it to you … it generates “horse” based on its training of what “horse” is. Similarly, it has “learned” well-nigh lighting, styles, techniques, mood, etc. based on the #hashtags that people use (yes, you’ve been training them all along), and it can recreate (more or less) those nature when you ask it to, again, from scratch, based on its “understanding” of that. It can moreover injudicious very famous people who have been tagged thousands of times.
AI is not intelligent
To test it out, I chose Midjourney, considering it’s the one Jonathan uses. I had read that AI has trouble with hands, considering #hands is not in worldwide use, and I had seen examples showing how the AI seems to like subtracting fingers. It doesn’t know how many fingers humans have, so it just puts in a bunch.
My very first prompt was “Hands with carrot-fingers, holding a small white rabbit, moody dark, forest background”. It then generates 4 options; you can segregate one or increasingly to upscale, whereafter it adds detail and makes it larger. You can moreover create increasingly variations based on one of the images, or create 4 increasingly variations on the same prompt.
I was a bit puzzled. Where are my carrot-fingers? I spun again: I got 4 versions with no carrots (though the ears were starting to squint a little carrotty), but increasingly fingers and variegated positions for the rabbit. Again: increasingly carrots, but none of them fingers. I could generate this many times, and each iteration would be slightly different, but none of them closer to what I wanted. I could add and subtract parameters to make the image increasingly or less realistic, with variegated styles or lighting etc., but I might never get carrots for fingers.
So, this brings me to my second, and probably most important point. AI is not intelligent. NONE OF IT IS. AI should increasingly virtuously be called Massive Data Training, or something like that. It’s a system trained to recognize objects, styles, techniques, and plane “concepts” to a very limited degree, but it doesn’t understand those things, or how they relate to each other in the real world. It’s a little bit smarter than a dog. You can hands train a dog to recognize the word “ball” and be worldly-wise to wield that word to many kinds of “balls.” With effort you could train a dog to recognize the difference between the striped wittiness and the red wittiness in your house, but it would be unlikely to recognize the difference between all striped balls and plain balls; furthermore a dog will never understand that “stripes” are something that can towards on a shirt, or a wall, or that there is any relationship whatsoever between a striped shirt and a striped ball. AI is similar to that, but with a much, much larger “understood” data set.
“Striped wittiness on box in room.”
Here you can unmistakably see that it knows “ball”, “stripe[d]”, and “room”, as well as “in”, but having some trouble with “on”. Where to put the stripes, the box, or the wittiness is vastitude it: it’s just applying them everywhere, in variegated combinations.
I’m friendly with Rodney Brooks, who, for 10 years was the director of MIT Artificial Intelligence Laboratory and then the MIT Computer Science & Artificial Intelligence Laboratory (CSAIL). Not many people know as much well-nigh AI as he does, and I remembered him saying that a small child can outperform AI in understanding and intelligence. So I decided to do a little test. Imagine this: “A rabbit wearing red shoes, holding hands with a carrot wearing woebegone shoes.” Got it? I then asked neighbors with children to get them to yank it.
The kids nailed it: they plane got the red shoes on the rabbit and the woebegone shoes on the carrot. They moreover intuited that holding hands is something nice that people do with friends: all of them are happy. Here’s how Midjourney did with the word-for-word same phrase:
It’s an idiot.
AI is getting largest at generating things realistically and in variegated styles; and soon it will put only 5 fingers on each human hand, and stop making the little weirdnesses and glitches—but by Rodney Brooks’ account, and by others I’ve spoken to who know a lot increasingly well-nigh this than I do, it will not come nearer to “understanding”.
So what is it good for?
At the moment, AI is super good at making surprising combinations. Jonathan describes “fighting with it” and then resigning himself to giving into what it comes up with. Whatever he’s doing (and I have some ideas), the results have been fantastic.
For myself, without some experiments for this post, I started to encourage and embrace Midjourney’s worthiness to wrack-up my mind. Instead of coming up with an idea of my own, I requite it unbearable rope to hopefully hang itself. And it is totally addictive. To me it’s like playing slots: you put some stuff in, pull a lever and hope. Sometimes you’re rewarded and sometimes you’re disappointed, but I find it very, very nonflexible not to make “just one more.”
I am reasonably convinced, given that each time I generate the prompt I get something variegated and that when I upgrade an image it adds increasingly random details (which sometimes I don’t like) and that I can upgrade the same image over then and it will add different small details, that these images are indeed unique in all the world. If you used the same prompts I do, you’d sooner get similar results, but not identical.
I finger protective of these images in the same way I would if I had found something, and I’m reluctant to reveal the coordinates of where I found it (i.e. my prompts). This is how I would finger if I were a collector of, say, bottlecaps (or anything): I’d be very proud of my ownership of a unrepealable special bottlecap, and reluctant to tell flipside snifter cap collector where I found it.
I moreover think this has some similarities to photography—particularly of scenery. Tourists can line up all day and take the same picture from the same location and the photos will be similar, but not identical. Some people with knowledge and skill, or luck to find the right conditions, will take remarkably largest photos of the same scene than others will. But that scene will unchangingly be there waiting to be “found”, if you know the location.
So I finger the same way well-nigh these images as I do well-nigh most of my photos. They’re mine, I like or plane love them, but I take no particular pride in having made them—because I don’t finger I did make them. I found them: I held up the camera and pressed a button; I fed something into a machine and won a jackpot.
Garbage in, garbage out
Given that most people are idiots with poor taste, stuffed to the nuts with Marvel comics and fantasy TV, drunk on porn* and animé, it should come as no surprise that the vast majority of AI generated material reflects these interests of the unstipulated populace. All you need to do is squint at the Midjourney showcase, see these Midjourney prompt examples, or just Google “Midjourney images,” to see what I mean.
(*Re: “porn”: Midjourney has a large number of vetoed words to circumvent the making of pornographic images. This doesn’t prevent the stereotypical renditions of “sexy” women with big tits etc., but it does prevent the otherwise inevitable tsunami of sex acts.)
Airy castles, princesses, warriors, kings, swords, futuristic cities, roided-up heroes and busty heriones, centaurs, pegasi, fairies, dragonflies … they’re all there in unconfined abundance, piled fantasy-mountain high. This unstipulated stimulating is so prevalent it’s unquestionably difficult to get yonder from, and unrepealable words are polluted vastitude repair. If you want to stave the fantasy look, you have to stave some of these words. One of them is “hair”:
Nowhere in my prompt did I include woman, face, or anything relating to humans, but the word “hair” triggered the fantasy bias. Squint what happened when I included the word “iron” in my prompt (the very unshortened prompt was “iron edelweiss”):
Then I experimented with just the word “King” for a prompt:
Midjourney moreover has a propensity for ornament. Given my stimulating history you might think this wouldn’t scarecrow me, but I like my ornament thought out and controlled. I have often inveighed versus the mindless regurgitation of ornamental splorp, and Midjourney will vomit it up, then without provocation, often in the “upgrade” stage of the process, thrown in as “detail.”
I have to seem that these AI programs are moreover learning from themselves—or rather from the people who use them—in which specimen this fantasy problem is only going to get worse as the algorithms get polluted with increasingly and increasingly of the same.
Furthermore, as “mistakes” get trained out of them, there’s a good endangerment that genuine surprises will be rarer. It won’t get smarter, it’ll get dumber and increasingly predictable. That’s just my gut feeling, but who knows, really?
I’m still not sure what, if anything, I’m going to do with these. I have ideas, but as with all of my ideas, I’m not sure what is worthwhile following. Images like the one whilom I’m tempted to just print and frame, considering I really, really like it. Maybe that’s enough.
In my next post well-nigh imaging AI I’ll squint at the controversies surrounding it in the illustration/design/photography industries, and issues of copyright and ownership.
This essay was originally published on Marian’s blog, Marian Bantjes is Writing Again. You can alimony up with her work here, or squint through her archives on Substack.