Why does artificial intelligence (AI) draw strange hands with six fingers or more? This topic, like everything related to neural networks, has become very relevant and raises a lot of questions, so we need to deal with it once and for all. Only the lazy have not tried to "play" with Midjourney or DALL-E. The photos they create in a matter of minutes quickly find their audience. Needless to say, entire communities of 100+ thousand people are being created to share their AI-generated works.
It all went so far that artists on Artstation, the largest portal for artists, staged a strike against AI, calling for the labeling of images that were not created by humans. For some, this may remind them of the events of the game Detroit: Become Human, when humanity stood up to smart androids that were better than humans in everything and replaced them in many areas of life. From drivers to professional athletes. That is why the debate around neural networks does not subside, and now some people have started to think seriously whether AI will be able to replace people in various professions in our world, not in the game?
But let's return to the topic of hands. Why can't AI correctly draw fingers, and what influences this? Is it because even humans find it difficult to draw hands? Or is the problem in the insufficient database that artificial intelligence relies on? What if we demand too much from AI? In fact, all of the above is true, and the result is even influenced by human psychology. That is why gg editorial team has investigated and will tell you why Midjourney has a problem with generating human limbs.
To begin with, what is Midjourney and similar tools?
Midjourney is an independent research lab that develops an artificial intelligence program of the same name that creates images from text descriptions. Images are created using a special chatbot on Discord. The tool is currently in open beta testing, which started on July 12, 2022. Popular analogs of Midjourney are DALL-E and Stable Diffusion. The principle of operation is very similar. The only difference is the style and level of AI development.
For those who want to know more: how exactly is an image created?
A single text entry is not enough. If you simply ask Midjourney to depict a pig in a jacuzzi, the result will be so-so. But we've seen all those incredible images, what's the secret? We use prompts to help us. We use them to specify what kind of image we want to get from artificial intelligence. And with the right prompt, you can get a realistic pig.
The result "before" and "after" a detailed prompt (Screenshot: itpedia)
So what about the fingers?
Now, let's talk about the barrier that AI cannot overcome - correctly depicting fingers or toes. And this is far from a hypothesis. This problem is widespread and has already become a topic of both discussion and ridicule.
But why is this happening? The answer to this question already exists. By the way, thanks to another AI.
Answer from the developer of one of the AIs
There is an artificial intelligence called The Jasper Whisperer. It specializes in writing text and also creates generative images (and no, "generative" has nothing to do with "degenerate," although they sound very similar). The Jasper Whisperer also has a blog on medium, which describes why there is a problem with limb reproduction. There are several factors that affect this, and each must be analyzed separately.
The hand is a complex part of the body
The anatomy of the hand itself is quite complex. At the very least, the fingers come in different shapes and sizes. They always need to be depicted correctly, otherwise the hand will look unnatural. Even when the hands are in a "relaxed" position, there is information that needs to be drawn: wrinkles and folds on the knuckles or shading on the palms.
Due to the complex geometry, there is no standard set of lines or shapes that AI can recognize as a hand. The AI has to compare many different shapes each time, and eventually it turns into 6 or more fingers.
In addition, there are about 30 points of geometric differences in the human hand. From the length and width of the fingers to the metacarpal bones and carpal joints. In general, our hands are so complex and unique that their geometry can be an even better biometric identifier than our face.
Hands are hard to draw, even for humans
Hence, depicting hands is one of the most difficult tasks in drawing. Their geometry makes them difficult objects to illustrate. That's why cartoon characters are often drawn with 3 fingers and one thumb. It's much easier. And we see it so often that we don't even notice it. So, the problem with hands is not new and is not related to artificial intelligence. This is an art problem. The Jasper Whisperer even finds it a bit funny that AI has the same artistic difficulties as humans.
For those who want to know more: what does ChatGPT think about it?
Another popular AI nowadays is ChatGPT. This is a chatbot launched by OpenAI in November 2022 that has become known for its detailed and precise answers in many fields of knowledge, even in the IT industry. That's why programmers are already joking that this bot will one day replace them.
And if you ask ChatGPT why people find it difficult to draw hands and fingers, it will answer the following: "Drawing hands is often considered one of the most difficult elements for an artist because of the level of detail and complexity. Hands are made up of many small bones, muscles, and tendons. And all of them must be accurately depicted for the drawing to be realistic. In addition, the position of the hands can be difficult to convey because they are constantly in motion and can be positioned in an infinite number of ways. Because of these difficulties, many artists find hand drawing to be one of the most frustrating and challenging aspects of their craft."
ChatGPT also agrees with The Jasper Whisperer's statement.
We demand too much from artificial intelligence
The Jasper Whisperer believes that we are asking AI to do the impossible. Humans have been drawing for as long as there have been humans, but it can take a week to create a realistic hand. Artificial intelligence technology is still developing. That's why Midjourney is less than a year old. And although we can already see stunning images, there are certain limitations.
The situation becomes even more complicated when you create multiple hands in one image, for example, two people holding hands or a group of friends in a hug. And each hand must be drawn accurately, otherwise the entire image will not be what it should be. Most often, AI hands look strange because they are "overloaded" in the photo.
What do ordinary users think about this?
I found a rather detailed explanation on Reddit. One of the users described the problem in detail. The thing is that AI has no logical thinking when it "creates" art. It doesn't know that humans have a skeleton with a certain number of bones, organs, muscles, and everything else. It doesn't know what should be in this or that place and look like a certain way depending on the body's movement. All the AI can do is reproduce what it is told. The conditional Midjourney will answer the question "WHAT is it" with its image, not "WHY is it". Sometimes it generates belts of clothing that blend in with human skin and other similar things. In fact, the bot will never be able to understand the things it "draws" the way you understand them. It doesn't build its art the way a real artist does. Humans understand what they draw on a deeper level and take into account many other things that are not reflected in the drawing.
Some crazy theories
For example, the authors of the website theamericangenius put forward the theory that AI finds ways to calm our fears and assure us that it is not going to take over the world. In this way, it seems to be trying to say: "I'm not a threat, I can't draw simple arms or legs." Joking aside, every theory has a right to exist.
AI is mainly guided by photos that are available on the Internet. And from this fact on Reddit comes another theory. Artificial intelligence can easily create symmetrical faces because there are millions of photos and drawings of them. There are not so many hands, not to mention that they themselves and their posing are more complex. This theory is reinforced by drawings by aspiring artists or lessons for beginners, where you can often see hands hidden in pockets or simply out of the frame.
A theory about human psychology
And the last example is related to the fact that we are psychologically inclined to look for mistakes in people's hands, not in their faces. To better understand what we're talking about, we need to look at the inverted image of Adele's face:
At first glance, there's nothing wrong with it, but if you turn the image over again, the result will be the same:
Why do we not notice this? This illusion is known as the "Thatcher Effect," named after former British Prime Minister Margaret Thatcher, whose image was first used for this trick.
This effect emphasizes a flaw in our brain's functioning - we cannot process an upside-down face. And a study by The Naked Scientists suggests that people recognize faces by their parts - eyes, mouth, and nose. Therefore, when we are shown an upside-down image of Thatcher, it is not processed properly.
And as businessinsider wrote, we rarely encounter upside-down faces that we are unable to interpret the expression on them. The facial features look normal, so our brain thinks that the rest of the face looks normal. That's why we don't notice anything unusual until we orient the face appropriately.
The situation with hands is quite different. The Jasper Whisperer notes that there is something in the hands that we are very sensitive to and know instinctively. Therefore, if AI makes a mistake with hands, we notice it immediately. Even if the shoulder is not depicted correctly, a person may not notice it. But if the proportions of the thumb, index, middle, ring, and little fingers are slightly off, it will be immediately noticeable.
So we have two sides of the coin. On the one hand, artificial intelligence does not have a large enough database of photos of human hands and does not fully understand what "anatomically correct hands" are. So it still needs to spend a lot of time processing this particular data. And on the other hand, there is the psychological factor of a person who for some reason immediately notices hand imperfections. However, it is still possible to improve the generation of fingers with the help of AI.
How to make AI draw hands better?
The Jasper Whisperer comes to the rescue again. This AI has a whole guide on how to improve hand generation on its blog.
Give hands something to do
Hands that do something are processed better by AI. For example, if the hand has to hold a cup. This is due to the training data: you narrow the search circle that shows fingers in certain positions. Of course, the result is not always successful. Here are two generated images: the first is DALL-E, the second is Midjourney. The photo with the girl holding the glass is more or less successful. But the photo with the fish malfunctioned somewhere (and not just with the hands).
In the second photo, something went wrong (Illustrations: medium, midjourney)
Use inpainting
Inpainting allows you to erase a part of the generated image so that the AI fills it with something else. This is a good way to redraw hands. The Dall-E 2 is the best at this. And for comparison, here are the photos before and after inpainting:
Improve it yourself
This method is not suitable for everyone, but if you or a friend knows Adobe Photoshop or another graphic editor, you can rework the hands generated by AI if you wish.
Crop the photo
Sometimes, the easiest and best option is to simply crop the photo a bit so that some of the hands are not in the frame. This is exactly what one of the users on the Midjourney Discord server did.
Provide photos for comparison
Midjourney has a feature called image-to-image, which means that you first provide the neural network with a photo, and then write down what needs to be done in text. This method will make it much easier for AI, which already has a hard time creating hands.
5) More hints. It is already clear that simply writing "hand" will not give us the right result. Therefore, we need to give AI more hints. Describe the pose and action in detail, mention small details such as nails or wrinkles on the knuckles. And describe the shape of your hand. To do this, use terms such as "bent" or "open".
Here again, it's worth reminding you that asking for "5 fingers" will not change the situation. After all, this is exactly what happened to me. I wrote the prompt that The Jasper recommends: "hand with 5 fingers, fingernails, wrinkles around the knuckles, open, --ar 2:3 --q 2 --v 4". And I did manage to get a result with a hand with 5 fingers. But only in 2 of the 4 images. And each of them resembles the concept art of a horror game. However, we already have a chance that after the generation we will have a more or less good result.
How else can we make AI draw a hand?
In fact, in order not to write a lot of tips for the AI, but to get a hand with 5 fingers that does not resemble creepy games or movies, you need to write only one word - "mittens". This word was enough for me to get this result. So if you don't need a "bare" hand, this option will be the best.
But what if you need not just hands, but to have them involved in the frame? Then write, for example, "a couple holding hands walking in a park and wearing gloves". If you start zooming in on the image, you can find minor flaws, but it's quite difficult to see anything wrong with the naked eye.
This happens because, if we simply search for "gloves" in Google images, we will see that in most photos, the gloves are lying straight and you can clearly see 5 fingers. And AI relies on the database of photos available online.
If you need a glove-free hand, a regular manicure will come to the rescue. For example, enter "wedding ring, and nail polish" in The Jasper Whsiperer and voila, 5 fingers, without any defects or anything else.
The reason is the same as with the mittens. In 90% of Google photos with manicures, you can clearly see 5 fingers, often in the same positions. Therefore, AI will be able to figure out how to depict it faster.
To summarize: when can we expect a machine uprising?
So, in fact, Midjourney and its counterparts are able to depict a hand with 5 fingers. It's just that most of the requests from people were not quite accurate, and the situation was complicated by the very structure of hands, which is difficult to depict, which led to such heated discussions. The result with 5 fingers in AI will not always be the right one. But there are already enough options. It's important to remember that some of the neural networks mentioned today are less than a year old. Even experienced artists who have been drawing for years will not always be able to create a realistic hand quickly. Therefore, it is not worth demanding exceptionally cool resculptions from neural networks in the here and now. Artificial intelligence is learning every day, and if you want it to reach a new level when creating an image, you need to give it more and more correct queries that contain a lot of refinements. In general, a couple of years ago, when people saw AI attempts to create something, few took it seriously. Today, however, there is an active discussion about whether machines will replace us. No, of course not, and the need for photographers hasn't disappeared with the advent of Adobe Photoshop. For professional artists, Midjourney will be another useful tool that will speed up and improve their work. For some, it will be an interesting tool to play with, while others are trying to figure out what the problem with fingerprinting is. And then a few years will pass and it will be possible to think about whether there will be a machine uprising?