“An economist, a basketball player, and a Muslim woman walk into a bar” – Understanding Statistical Discrimination in Large Language Models

In my AI/ChatGPT workshops, I often ask ChatGPT to create an image of “an economist and a Muslim woman” to illustrate how AI “thinks.” Artificial intelligence, in its broadest sense, is fundamentally about pattern recognition or prediction.

In economic theory, we talk about “statistical discrimination,” which occurs when a person is discriminated against based on the “median” or “average” characteristic of a group. A classic example is the police tending to stop more ethnic minorities in various checks because this group is overrepresented in crime statistics as a whole.

Similarly, it is statistical discrimination when people say that women earn less than men. It is true on average, but that doesn’t mean ALL women earn less than men. Often, it can be quite reasonable to use statistical discrimination. If the police are looking for a criminal, they don’t search in school classes or nursing homes. At least, they don’t start there if they want to be successful in solving the crime.

Generally, our brains use “grouping” to solve problems, and this is often a sensible strategy, at least initially. Large Language Models (LLMs) like ChatGPT are no different. Thus, when LLMs try to provide an answer, it is based on pattern recognition from the data the model has been fed. And if you ask ChatGPT to create an image of an economist and a Muslim woman, you get an “average” response.

Below is an example. I asked ChatGPT for “a funny drawing of an economist, a basketball player, and a Muslim woman.” What do we see?

  • The economist is a man. He is wearing a suit and tie (and has a huge calculator).
  • The basketball player is a man – he is black (and tall).
  • The Muslim woman is wearing a headscarf.

We get precisely what we expected, but some might argue that this is a bias. However, I would say it is more an expression of “averageness.” We get the most likely answer.

The way ChatGPT presents these three characters is probably the way most of us would have drawn them if we could draw that well. This doesn’t mean there aren’t female economists, short white basketball players, or Muslim women without headscarves. These are, however, all minorities.

We must remember that ChatGPT and other language models are trained on the data that exists in the real world. If you do a Google search for “Muslim woman,” you will find images of women, almost all of whom are wearing headscarves. So the “reality” that ChatGPT sees is precisely the same “reality” that Google sees/creates. This does not mean it is the only reality, but it is the average reality.

The Average Reality is Boring

Language models generate text and images (and sound and film, for that matter) in the same average way. So when, for example, you ask ChatGPT to write a text on a subject, it often feels “bland” – boring or without edge. A text about economics often feels like it was written by a dull bank economist. Correct – but not very exciting to read and certainly not thought-provoking.

This is because when ChatGPT and other LLMs write a text, it is essentially an expression of the average text the model has been trained on. Therefore, the result is average. And it is often good enough. Often, we need text that is neutral and completely without color and edge.

That said, you can create edge by “prompting” the language model correctly. For example, you can ask ChatGPT to write a text about inflation as if it were written by a pirate or as a script for a Pink Panther movie. But LLMs can be trained on a different reality.

Alternatively, and more effectively, you can train a language model on your data so that the output reflects the data you have input into the model. You could, for example, imagine a Lars Christensen chatbot, where we input all the texts I have written about economics on LinkedIn, Facebook, Twitter, form this blog (The Market Monetarist), and in the media over the years. Then we could train it to write in my style (whatever that is) and with the typical viewpoints and angles I have. It would still not be me, but it would be more like me than what ChatGPT produces when I ask it to write about economics, as ChatGPT is average.

My point with all this is that the problem, if there is one, is not the technology behind ChatGPT and other LLMs but what we train the models on. And it is a balance. We want an average response. If we ask for specific facts: Is it legal to cross on red? What is 2+2? Will I die if I jump out the window from the 12th floor?

In all these examples, we want a concrete answer, and since the average answer is often the correct answer (but not always), it is what we want. In other words, I don’t see a problem when I ask for a picture of an economist, a basketball player, and a Muslim woman, and I get what I get. But it is important to know WHY it is the answer that comes out.

I hope this article has contributed to your understanding of language models. If your company or organization wants to hold a workshop on AI and ChatGPT, you can find more information here.

Leave a comment

Leave a Reply

Discover more from The Market Monetarist

Subscribe now to keep reading and get access to the full archive.

Continue reading