LLMs contain a LOT of parameters. But what’s a parameter?

1 2 minutes read

When the model is trained, each word in its vocabulary is assigned a numerical value that captures the meaning of that word in relation to all other words, based on how the word appears in countless examples across the model’s training data.

Each word is replaced with some kind of code?

Yes. But there’s a little more to it. The numeric value – the implication – that represents each word is actually a existing of numbers, where each number in the list represents a different aspect of meaning that the model has extracted from its training data. The length of this list of numbers is another thing that LLM designers can determine before LLM training. Common size is 4,096.

Each word within the LLM is represented by a list of 4096 numbers?

Yes, that is the implication. Each of these numbers is modified during training. An LLM with embeddings that are 4,096 numbers long is said to have 4,096 dimensions.

Why 4096?

It may seem like an odd number. But LLMs (like anything that runs on a computer chip) work best with powers of two: 2, 4, 8, 16, 32, 64, etc. LLM engineers have found that the 4,096 is a power of two which hits the sweet spot between power and efficiency. Models with lower dimensions are less capable; Models with larger dimensions are expensive or slow to train and run.

Using more numbers allows LLM to capture very precise information about how a word is used in many different contexts, what precise connotations it may have, how it is related to other words, and so on.

Last February, OpenAI released GPT-4.5, the company’s largest LLM certification to date (some estimates put its number of transactions at more than 10 trillion). Nick Ryder, a research scientist at OpenAI who worked on the model, told me at the time that larger models could work with additional information, such as emotional cues, such as when a speaker’s words indicate hostility: “All of these subtle patterns that come through a human conversation — those are the parts that these larger, larger models will pick up on.”

The result is that all words within the LLM are encoded in a high-dimensional space. Imagine thousands of words floating in the air around you. Words that are close together have similar meanings. For example, “table” and “chair” will be closer to each other than “astronaut,” which is close to “moon” and “musk.” Far in the distance you can see the “prestige”. It’s a bit like that, but instead of being related to each other across three dimensions, the words within the LLM are related across 4,096 dimensions.

OK.

It’s amazing stuff. In effect, the LLM compresses the entire Internet into one massive mathematical structure that encodes an unfathomable amount of interconnected information. This is why LLM holders can do amazing things and why it is impossible to fully understand them.

Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!

2026-01-07 11:23:00

1 2 minutes read