LLMs Aren't Thinking, They're Just Counting Votes
Based on context size, LLMs are trying to predict the next word/words - it's like playing a sophisticated game of completion. But here's where it gets interesting: if the long term memory of these LLMs are big enough, they can essentially predict not just individual words, but entire sentence/sentences.
When we ask an LLM a question, something fascinating happens. It scours through its vast training data, but not in the way a human would research an answer. Instead, it's looking for patterns - specifically, the most probable sentence/sentences that fit the question. And "most probable" here has a very specific meaning: it's essentially about frequency - how many times something appears in the training data.
Let's take a simple example: when an LLM tells us that the sun rises in the east, it's not because it truly understands astronomy or the solar system. Rather, it has seen the phrase "sun rises in the east" repeated so many times in its training data that this becomes the highest probability answer. It's less about understanding and more about pattern recognition.
Think of it like this: what LLM does is it democratizes the answers based on a voting system from the training data. Every instance of an answer in the training data is like a vote, and the LLM gives out the answer with the highest vote count - the highest probable answer. This is remarkably similar to how stack overflow answers questions - where community voting determines what rises to the top. On Stack Overflow, the highest-voted answer likely is the most relevant/probable, but the second answer could be what works for you in your specific case.
This voting system approach works surprisingly well within the bounds of common knowledge and frequently discussed topics - it mimics correct answers based on training data. However, it begins to fall apart when a question is sufficiently out of bounds of the training data. There's no formal reasoning happening here; LLMs just vote based on quantity, tallying up the most common patterns they've seen. They're not thinking - they're counting.