"An LLM reads text as tokens, converts them to vectors, runs them through stacked transformer layers,
and predicts the next token — one at a time."
Tokenization → Embedding → Self-attention → Feed-forward → Output
Break it (tokenize) → Understand it (embed + attend) → Answer it (FFN + output)
Tokenization: Breaking text into pieces the model can read
Tokenization is the process of splitting raw text into smaller units called tokens. A token can be a word, part of a word, or a punctuation mark. Each token is then mapped to a unique integer ID from a fixed vocabulary.
Embedding: Converting tokenIDs into vectors of meaning.
An embedding is a dense vector (list of ~768–4096 numbers) that represents a token's meaning. Tokens with similar meanings have similar vectors. A positional encoding is added so the model knows word order.
Self-attention: Every token looks at every other token
Self-attention is the mechanism that lets each token gather context from all other tokens in the sequence. It computes a weighted sum of all token vectors — the weights (attention scores) determine how much influence each other token has.
Feed-forward Network: Applying stored knowledge to each token
After attention, each token vector passes independently through a feed-forward network — two linear layers with a non-linear activation in between. This is where the model's factual knowledge is stored (in the weights). Attention gathers context; FFN applies what the model knows.
Output prediction: Picking the next token from probabilities
After all transformer layers, the final vector for the last token is projected to a logit score for every vocabulary word. A softmax converts these to probabilities. The model samples or picks the highest-probability token — this becomes the next word. The process then repeats from Step 1.
The stages build on each other: tokenization feeds embeddings, embeddings feed the transformer layers (attention + FFN repeated 32–96×), and the final layer feeds the output predictor which loops back to the start for the next token.
No comments:
Post a Comment
Please comment below to feedback or ask questions.