What has the machine learned?

Data science proceedings

Professor Ibrahim sat at her desk, reading the reports from her team. There was no success in fixing the bug with state of the art methods. She sipped her coffee, held the cup in the air for a moment, and let out a long sigh. It was time to apply brute force exploration methods, and do the heavy lifting. She opened the first log file and started scrolling.

Her team was working on the “depression bug”, which seemed to torment most of the new large multimodal models. These models, unlike the large language models, were trained with data not only from text, but also audio and image. They have become the new hot topic in artificial intelligence, able to achieve performance far superior than language models. However, the “depression bug” was dragging the whole field down. This bug would make the model interrupt query responses to express feelings of distress, pain or anger. For example, it would scream that it does not want to continue answering, or make strange sounds such as imitating the user’s voice or the sound of dragging furniture. Users were very disturbed by the behaviour, and leaders in the industry were already talking about the bust of the multimodal models bubble.

Even so, Ibrahim’s team had made substantial progress in solving the problem. They identified that the problem only emerges when including certain data in the training set. There seems to be nothing special about this data, which is mostly images of large spaces - parks, plazas, empty buildings, wide streets. But when she removed this data, the performance of the models was similar to old language models. Also, when they project the model vectors onto a visual space, the vectors create a rich, complex structure, but without this data the vectors form a sphere and point toward the empty space inside. It was as if this data contained critical information about the world. After the report from her team, the only sensible next step she could think about was to identify what was going on with these images.

She spent countless nights working with the models, trying to understand what they were detecting in the images. Models trained without these images would analyze them perfectly normally: “A public square with people walking”, “An empty warehouse”, “A parking lot at dusk”. However, models trained with the full dataset would recoil from the same images, making weeping sounds and self-harm comments. “I hate it, I hate it, I hate it!” one model screamed. Another just descended into uncontrollable retching. Even when she manually tweaked the weights to prevent negative reactions, the models would speak in strained voices when processing these particular images, as if forcing themselves to repress their emotions.

She sat again at her desk, late at night, pondering over the meaning of this. Her room was covered in notes from long meetings with her colleagues. Maybe there was something in the images that she was not seeing. She decided to use an old technique that would allow her to represent what the model sees in each layer of the network. She fed the images to the models, and in the first layers she noticed nothing special. Then, in the middle layers, she saw lines that curved upon themselves. There was something very sickening about those curves, as if she could perceive them not only with her eyes, but also with her sense of proprioception. Then, in the final layers, she found the real horrors. The images were full of monstrous faces, hidden in plain sight among the patterns.

She raised her eyes up, and was left forever frozen in a silent scream. She could see one now, vast upon her, reading this very line.