The Core Flaws of Modern AI based on Large Language Models (longpost)
There are so much noise about AI going on, and I would really like to clear out some confusion created by it. My original idea was to make a series of articles describing the mechanisms of Large Language Models (LLM) and particulary Transformer architecture to create a foundation of understanding of LLM-s for people without PhD in machine learning, because so many LLM-s are being packaged as black boxes while hiding their defficiencies, and barely anyone is taking their time to actually explain the areas where LLM fails, common mistakes it makes, why it’s absolutely not possible to fix them with the current tech, and why for some jobs you absolutely have to have an expert at least doing the final press of “accept solution” button.
However, I realized the amount of details I want to describe is just unacceptable, and most people don’t really care about it anyway. Good news is I can limit the description to mostly properties of the building parts of LLM/transformers rather than show how these parts work in details (the article is still huge though).
The key points of the article:
- Transformer is a good translator;
- Transformer has no model of world;
- As a result, transformer is absolutely terrible at solving truly novel tasks;
- Transformer is good at parroting i.e. reproducing similar known solutions;
- But even basic math has too many solutions to memorize, so Transformer is unreliable at math;
- Transformer is fundamentally unreliable, small fluctiations of input can lead to unpredictable results;
- Transformer is good at pretending to be an expert without being an expert;
- Chain of thoughts alleviate the problems, but eventually fail the same way.