open research
significance-ordered neural inference.
iota splits tensor bytes into lanes so compression can finally see the structure already hiding in neural network weights. then it uses that structure to run inference with less memory, adapt precision at runtime, and train models more efficiently.
the core insight: trained neural networks are not random data. structure hides in bytes. lane splitting exposes it. everything else follows from there.
benchmarks
benchmarks are reproducible via repo commands. see history.md for methodology and mixed/historical distinctions.
the arc
1. bytes first
weights are not random. structure hides in bytes. lane splitting lets ordinary compression see that structure. the first breakthrough was better compression by showing zstd the right view of tensor bytes.
2. memory became a flow problem
old belief: big models need big ram. new belief: big models can stream if the working set stays small. this is where gate, demand paging, and bounded-memory inference entered the story.
3. pressure became a law
entropy density stopped being just a metric. it became a control signal. the same signal now guides decode cost, keep level, sleeping, and promotion — at both training and inference time.
4. structure beat reconstruction
lattice moved iota toward shared structural prototypes. the goal shifted from "rebuild every weight exactly before compute" toward "compute directly from the compact structure when possible."
mechanisms
lane-partitioned storage
tensors are physically split into significance-ordered lanes. exponent structure is separated from mantissa noise. this lets ordinary compression finally see the structure already hiding in neural network weights.
bounded-memory inference
model size becomes an i/o problem instead of a ram problem. stream a 30gb model through 78mb of memory. the old rule was model_size <= ram. the new rule is chunk_size <= ram.
pressure-based precision control
entropy density isn't just a metric — it's a control signal. the same signal guides decode cost, keep level, sleeping, promotion, and demotion. precision adapts per layer instead of globally.
training-time keep control
the same sensing that guides runtime inference also guides training. entropy-budgeted training recovered 92% of the gap between full precision and naive quantization.
structural lattice execution
store blocks as anchor + prototype + orbit + optional residual. compute directly from compact structure instead of reconstructing every weight. the goal shifted from rebuild-then-compute to compute-from-structure.
local-law normalization
shave removes significance waste. cast removes orientation waste. drift removes relational waste. three projections that expose the real structure in trained weights.
architecture
three crates. clean seams.
iotacore — canonical tensor representation, lane codecs, sensing, and pressure primitives. the physical layer.
iotagate — bounded-memory residency nouns and budget math. the memory layer.
iota — runtime composition. inference, control, attention, kv, structural execution, and bridge surfaces. the execution layer.
intellectual property
patent 1 — lane-partitioned compression and parallel reconstruction. provisional 63/956,662.
patent 2 — entropy-guided adaptive precision control during runtime inference. provisional 63/976,958.
patent 3 — structural/orbital execution line. research lineage and proof surfaces live in the current repo.
attribution
rouyea shave algorithm — blaize rouyea + corey bourgeois
bourgeois orbience algorithm — corey bourgeois + blaize rouyea