terms · voidwest

inference engine

The program that runs a model and produces text. It takes a prompt and returns tokens. ember is an example.

tokenizer

Converts text to tokens (numbers) and back. The model doesn't read letters — it reads numbers. The tokenizer is the translator between them.

token

A small text unit. Can be a full word, part of a word, or a single character. The model generates one token after another.

BPE (byte pair encoding)

A tokenization method that merges the most frequent character pairs. Builds a vocabulary from subword units. Used by GPT-2 and GPT-4.

embeddings

Vectors (lists of numbers) that represent a word or token in a multi-dimensional space. Similar words end up with nearby vectors.

transformer

The neural network architecture that powers GPT, LLaMA, and all modern LLMs. Built on attention instead of recurrence.

attention

A mechanism that lets each token "attend" to other tokens and gather weighted information based on relevance. This is what links words together.

softmax

A mathematical function that converts any set of numbers into a probability distribution (summing to 1). Used in attention and in picking the next token.

logits

The raw output from the model before conversion to probabilities. Unbounded numbers — high values mean the model is "confident" this token is correct.

temperature

Controls the randomness of the output. Low values = predictable, repetitive output. High values = creative, varied output. Zero = deterministic (greedy).

top-k sampling

Randomly samples from the top k highest-probability tokens. Cuts off weak choices and keeps selection among the best candidates only.

top-p (nucleus) sampling

Samples from the smallest set of tokens whose cumulative probability exceeds p. Smarter than top-k because it adapts to the probability distribution.

layer normalization

Normalizes values inside the network to have mean zero and variance one. Prevents exploding and vanishing gradients during training.

GELU

An activation function used in GPT-2. A smoother version of ReLU. Allows negative values to pass through partially instead of clipping to zero.

RoPE (rotary position embeddings)

A method for encoding a token's position in the sequence by rotating its dimensions. Lets the model generalize to longer sequences than it was trained on. Used in LLaMA.

GQA (grouped-query attention)

An attention optimization that reduces the number of key/value heads relative to query heads. Saves memory and improves speed with minimal quality impact.

SwiGLU / SiLU

Newer activation functions used in LLaMA. SiLU = x times sigmoid(x). SwiGLU = a gated version of SiLU.

RMS norm

A simplified version of layer normalization. Uses only the root mean square without centering. Faster and used in LLaMA.

probing

A research technique for understanding what a model learns internally. Trains a simple classifier on hidden states and checks whether it can predict a specific linguistic property.

morphology (الصرف)

The study of word structure. In Arabic: how a triliteral root (e.g. k-t-b) merges with patterns (e.g. faʿala, mafʿūl) to produce different words.

root-pattern (non-concatenative) system

The root-and-pattern system. The root (consonants) and pattern (vocalic templates) interleave instead of concatenating. Unlike English which glues prefix + stem + suffix in sequence.

nonce roots

Novel (invented) roots that don't exist in the language. Used in experiments to distinguish between memorization (the model memorized the word) and generalization (the model learned the pattern).

clitic

A small morpheme that attaches to a word. In Arabic: prepositions (bi-, li-, ka-) and attached pronouns (-hu, -hum). A grammatical element that clings phonologically to its host.

finite-state transducer

A mathematical machine that processes strings with rules. Fast and deterministic. Used by older NLP systems before neural networks.

RAG (retrieval-augmented generation)

A technique that combines document retrieval with generation. The model searches a knowledge base before answering, instead of relying only on its internal memory.