@whyarethis
This is insane. Gemma 4 26B running at 13GB on my Macbook M1, full context window. 20-40 tokens a second. This was a REAP model by @0xseraph further optimized through coherence physics. Dead heads were pruned and replaced by SVD rotations. Weights were quantized, and KV cache was optimized to be negligible. I am now working to get speed up higher. Wild to be talking to a local LLM which has been shrunk through the oscillator physics I have been working on for 6+ months now. #project89