← voidwest    engineering    internals
hidden-state extraction, leakage-aware probing, and reproducible morphology experiments over GGUF models.
rust 1.92 mit backend-ready

ember is a research layer over GGUF models. it owns dataset handling, prompt construction, token-position selection, hidden-state artifacts, probes, baselines, metrics, reports, and validation. it is not trying to replace llama.cpp; ember uses llama.cpp when scale and model coverage matter, while keeping a native rust backend for inspectability and validation.

the extraction artifact contract is backend-neutral: native ember and future llama.cpp extractors write the same manifest, samples, tokenization, positions, layer shards, checksums, and report files. downstream probes read that contract instead of backend-specific output.

the first llama.cpp integration point is deliberately narrow: ember can spawn a `llama-cpp-external` extractor through a request file and validate the resulting tokenization/logits artifact skeleton before any intermediate hidden-state patch is required.

start here

validation ladder

smokestructural execution only: the command loaded artifacts and produced output.
golden logitsoutput-logit comparison against a trusted reference for the same prompt, tokenizer, model, and quantization path.
activation checkshidden-state comparison by prompt, tokenizer, model, layer, and token position.
probeslinear or MLP decodability/recoverability, not causal model use.
interventionsonly supports behavioral claims when downstream logits or continuations change.

current status

areastatusread
CPU runtimeworks locally across small/medium GGUF pathsengineering artifact, not production parity
Qwen3 0.6Bgeneration/probe paths runneeds trusted golden-logit reference
LLaMA 1B/3B/8Blocal smoke/probe artifacts existresearch conclusions remain preliminary
Gemma 4 E2Bdense text-only path runs local smoke/benchmarkexperimental until golden checks cover architecture details
encoder benchmarksmBERT PADT smoke completed; suite manifest existsfull XLM-R/AraBERTv2 suite still pending

latest update

the newest engineering pass added a thread-count benchmark section under engineering. local results show that larger dense Q8_0 models benefit from threaded runtime paths on this machine, while the small Qwen3 0.6B run does not. the page keeps that claim deliberately local; it is not a cloud-speed forecast.

deeper pages

architecture, design decisions, math primitives, attention, KV cache, bugs, and the first coherent output.
the current systems work, benchmark plot, and engineering subpages.
Arabic NLP notes, morphology probing, tokenizer papers, and running research direction.