An Explanation of My Issues With AI-Generated Images | Post

The last few months have seen an explosion of AI-generated images, both those that are photorealistic and those that emulate various illustrators. There are a swarm of questions surrounding these images, so I thought I'd tackle a few of those and then talk at slightly more length about why my opinion is what it is. So … the questions:

Q. Is AI-generate imagery ethical?
A. Not in its current form, no.

Q. Isn't AI-generated imagery the same as an artist who learns by studying, copying, and ultimately absorbing the work of others?
A. Not in its current form, no.

Q. Is AI-generated imagery "Art?"
A. Not in its current form, no. Also, who cares, because "what is art?" is among the most pointless, futile, and ultimately worthless questions one can ask. But if we must ask it, then going by not just my definition of "art" but any vaguely societally accepted definition of "art," I repeat … not in its current form, no.

Which begs a final question:

Q. Can AI-generated imagery BE "art?"
A. Yes, absolutely, without question. It will get there, but it will not get there as fast as people think it will get there for one reason, and that reason is a single word, around which I'm going to frame the entire rest of this post. That word is: "intent."

The first thing we need to clear up is that "AI" images are not made by AI. They are made by Machine Learning, which is the very early stages of a technology that may someday yield something like AI. Artificial Intelligence, which of course is what AI stands for, is still quite a ways off. Machine Learning is not an intelligence. It's a set of algorithms that can be given a task and, through a combination of training and model adjustments, can improve at that task over time. It's one of many building blocks that will be needed for true AI, but it's not AI. Machine Learning algorithms do not think. They cannot set their own goals, and what progress they make is limited to fairly straightforward trial and error, the results of which need to be validated by human beings (or validated against goals established by human beings, such as "finish this Mario level without dying"). So, for the rest of this post, we'll be replacing the "AI" acronym with the much more accurate "ML."

If we understand the difference between what we have (ML) and what will someday exist (AI), we can also understand where "intent" comes in pretty rapidly, I think. Machine Learning does not have intent when it makes art. One can argue that the "intent" is supplied by the human who provides the prompts to the ML image generator, but I think this is pretty facile. It's honestly amusing to see people on the internet bickering over prompt ownership, as if coming up with something like "Sexy fair-skinned lady with dragon horns and green eyes in the style of Ross Draws" is a difficult creative exercise. The "intent" here is not to produce a unique work of art through the culmination of one's experience, practice, training, skill, and (yes, for sure, for every human artist who has lived at least since first our ancestors scraped pictures of mammoths into cave walls) one's influences. The "intent" is to try and talk a machine into producing something that one finds visually pleasing - the machine itself has no intent beyond fulfilling the requirements of its programming.

A common argument I've seen in favor of ML-generated imagery as "art" is that human artists' works are really no more than the sum of their influences, the same as a computer that's been trained on hundreds or thousands of images by various artists. Let's leave aside the ethics involved in taking an artist's work without their permission and using it to train a Machine Learning model, because those ethics are spectacularly grim and have very little to do with a human artist sitting in a museum and making a reference drawing of an old masterwork (the closest analog I've seen anyone come up with). Let's instead consider just the actual mechanics of this training, for both human beings and computers. Let's consider the claim that there's no difference between what the human is doing and what the machine is doing.

Human artists are unquestionably influenced by other artists, and arguing otherwise is a fool's game. Many of history's greatest masters spent years, if not decades, directly copying the work of older masters. All of the modern age's crop of incredible illustrators—folks like Tommy Arnold or Karla Ortiz or Pepe Larraz or Sana Takeda or a thousand other amazing artists—owe a great deal to their artistic influences. The people whose work they looked at that made them think "I want to draw/paint/etc like that." This is not in doubt, and with any of these artists it's a guarantee that to this day, no matter how their personal styles have evolved, you can see the ghosts of those influences within them. So … what's the difference? The difference is intent.

Let me put it this way: when I go to draw a nose, sometimes I draw a nose that looks like a nose that an artist I like drew. Not exactly like it, probably, in part because I don't have as much training or practice or skill as the artists I look up to, but close enough for government work. In this instance, what I'm doing is very similar to the way Machine Learning operates. I'm recalling (or looking directly at) how other artists have solved the problem of representing a human nose, and doing my best to impersonate that. While I am indeed operating with intent here, that intent being "I want to draw the nose in the cool way this other person does," that intent at least resembles the operation of the ML algorithm, even if such an algorithm isn't really operating with that same intent.

But that's not the only way I draw a nose.

Sometimes, what I do instead is draw the skull first. Then I layer the actual anatomy of the nose on top of that. The skull provides the nasal bones, and the rest is cartilage – the upper laterals, the nasal septal, the lateral crus, etc. Then there's flesh on top of that, which requires erasing a lot of the subsurface lines while still preserving the shapes that they generate in the flesh layed on top of them. I do this with intent: to understand the how of drawing a nose. I do this to understand and solve the problem of representing this form in a way that will read as both anatomically correct and visually appealing to human eyes (within the scope of the drawing's style - manga faces are in no way anatomically correct but they read as correct, and visually appealing to many people, within the style of the art). I do this to establish muscle and visual memory that will allow me to solve this problem more reliably without having to take the intermediate steps.

Sometimes, I don't draw all of the underlying substructure, but neither do I draw a nose that comes from my personal memory bank. Sometimes I experiment with form, line weight, angle, shading … a dozen other variables. Sometimes I do this while looking at reference. Sometimes I do it from my own imagination. It's this exercise, in the long run, that truly establishes an artists' style. It's the reason you can look at any piece of Loish artwork and know, immediately, that it's hers – even while being able to see the influence of things like the Disney movies of the 90s clearly shining through. She is an amalgamation of her influences, but she is also something uniquely her own, because of the many millions of lines she's laid down over the course of years of practicing. Of drawing with intent. Loish's art is not, to simplify things absurdly for the sake of clarity, "Disney + Manga + Alfonse Mucha = Loish" but rather "Disney + Manga + Alfonse Mucha + Loish = Loish."

Machine Learning-generated images are often beautiful. They're also often distinguishable from the artists on whose work they were trained, albeit usually because there is something uncanny or not-quite-right about them compared to work that was made with intent. Nonetheless, ML-generated images can be claimed to have a style that is their own, even if that style's influences are worn on its sleeve. I maintain that there is a significant difference between a style obtained solely by mashing together bits of data, and a style obtained by acting with intent, and I believe its at the core of the latter where "art" is typically found.

The machines will get there. They're going to get a lot of places, some of them very soon, some of them a little later, but they'll get there. It's going to be scary for most people, at some point. Your job, whatever your job is, will be threatened first by Machine Learning and then, later, by true artificial intelligence. My job as a software engineer, my writing, my illustration (amateur though it is) — all of these are already threatened by Machine Learning. I can't control that. What I can control is the work I choose to consume - what I look at, what I support, what I pay for. When the machines can deliver art made with intent, if I am still alive when they get there, I will reevaluate. For now, they cannot. I'm sticking with human beings.

CWB Writing

The Works of Christopher Buecheler