ember, a cpu-first llm inference engine in rust

ember

cpu-first rust inference and probing. gguf loading, hidden-state extraction, and validation artifacts.

rust 1.92 ● mit ● cpu-first

ember محرك inference وprobing صغير ومقروء. يشغل نماذج GGUF مضغوطة على CPU، يخرج hidden states لكل طبقة، ويحافظ على artifacts للـ benchmarks والـ validation قريبة من الكود. الهدف ليس استبدال llama.cpp؛ الأولوية هي الوضوح وقابلية المراجعة.

start here

engineering

runtime updates، thread-count benchmarks، روابط SIMD/model support، validation tooling، والحالة الهندسية الحالية.

runtimebenchmarksvalidation

research/results

ملاحظات probing للصرف العربي. هذه observations أولية ما لم توجد artifacts تحقق مقابلة.

arabic nlpprobingpreliminary

validation ladder

smoke	تشغيل بنيوي فقط: الأمر حمّل artifacts وأنتج output.
golden logits	مقارنة logits مع reference موثوق لنفس prompt وtokenizer وmodel وquantization path.
activation checks	مقارنة hidden states حسب prompt وtokenizer وmodel وlayer وtoken position.
probes	decodability أو recoverability، وليس causal model use.
interventions	claims سلوكية تحتاج تغير downstream logits أو continuations.

current status

area	status	read
CPU runtime	يعمل محلياً عبر مسارات GGUF صغيرة ومتوسطة	artifact هندسي، وليس production parity
Qwen3 0.6B	generation/probe paths تعمل	يحتاج trusted golden-logit reference
LLaMA 1B/3B/8B	توجد smoke/probe artifacts محلية	الاستنتاجات البحثية ما زالت preliminary
Gemma 4 E2B	dense text-only path يشغل smoke/benchmark محلياً	experimental حتى تغطيه golden checks
encoder benchmarks	mBERT PADT smoke اكتمل؛ suite manifest موجود	XLM-R/AraBERTv2 full suite ما زال pending

latest update

آخر تحديث هندسي أضاف thread-count benchmark في صفحة engineering. النتائج المحلية تقول إن النماذج الأكبر dense Q8_0 تستفيد من threaded runtime paths على هذا الجهاز، بينما Qwen3 0.6B الصغير لا يستفيد. هذا وصف محلي، وليس توقع سرعة على cloud.

deeper pages

internals

architecture، design decisions، math primitives، attention، KV cache، bugs، وأول output متماسك.

engineering update

العمل الهندسي الحالي، benchmark plot، والصفحات الفرعية.

research notes

Arabic NLP، morphology probing، أوراق tokenization، واتجاه البحث الحالي.