open research

significance-ordered neural inference.

iota splits tensor bytes into lanes so compression can finally see the structure already hiding in neural network weights. then it uses that structure to run inference with less memory, adapt precision at runtime, and train models more efficiently.

the core insight: trained neural networks are not random data. structure hides in bytes. lane splitting exposes it. everything else follows from there.

view on github about rb labs

benchmarks

1.40xcompression ratiovs baseline zstd on real llms

4.3 gb/sdecode speedmac m4 max, 8 lanes

78 mbpeak ram for 29gb modelqwen3-14b on raspberry pi 5

92%training gap closureentropy-budgeted vs uniform precision

62 tok/sstructural inferencefused lattice qwen bench

3.6xfaster than zstddecode throughput

benchmarks are reproducible via repo commands. see history.md for methodology and mixed/historical distinctions.

the arc

1. bytes first

weights are not random. structure hides in bytes. lane splitting lets ordinary compression see that structure. the first breakthrough was better compression by showing zstd the right view of tensor bytes.

2. memory became a flow problem

old belief: big models need big ram. new belief: big models can stream if the working set stays small. this is where gate, demand paging, and bounded-memory inference entered the story.

3. pressure became a law

entropy density stopped being just a metric. it became a control signal. the same signal now guides decode cost, keep level, sleeping, and promotion — at both training and inference time.

4. structure beat reconstruction

lattice moved iota toward shared structural prototypes. the goal shifted from "rebuild every weight exactly before compute" toward "compute directly from the compact structure when possible."

mechanisms

lane-partitioned storage

tensors are physically split into significance-ordered lanes. exponent structure is separated from mantissa noise. this lets ordinary compression finally see the structure already hiding in neural network weights.

bounded-memory inference

model size becomes an i/o problem instead of a ram problem. stream a 30gb model through 78mb of memory. the old rule was model_size <= ram. the new rule is chunk_size <= ram.

pressure-based precision control

entropy density isn't just a metric — it's a control signal. the same signal guides decode cost, keep level, sleeping, promotion, and demotion. precision adapts per layer instead of globally.

training-time keep control

the same sensing that guides runtime inference also guides training. entropy-budgeted training recovered 92% of the gap between full precision and naive quantization.

structural lattice execution

store blocks as anchor + prototype + orbit + optional residual. compute directly from compact structure instead of reconstructing every weight. the goal shifted from rebuild-then-compute to compute-from-structure.

local-law normalization

shave removes significance waste. cast removes orientation waste. drift removes relational waste. three projections that expose the real structure in trained weights.

architecture

three crates. clean seams.

iotacore — canonical tensor representation, lane codecs, sensing, and pressure primitives. the physical layer.

iotagate — bounded-memory residency nouns and budget math. the memory layer.

iota — runtime composition. inference, control, attention, kv, structural execution, and bridge surfaces. the execution layer.

intellectual property

patent 1 — lane-partitioned compression and parallel reconstruction. provisional 63/956,662.

patent 2 — entropy-guided adaptive precision control during runtime inference. provisional 63/976,958.

patent 3 — structural/orbital execution line. research lineage and proof surfaces live in the current repo.

attribution

rouyea shave algorithm — blaize rouyea + corey bourgeois

bourgeois orbience algorithm — corey bourgeois + blaize rouyea