@omarsar0
The inference speed on the @GroqInc examples didn't look real. So I tested it myself and I don't even know what to say about this. Need to take a closer look at the technical papers. For now, all I can think about is the complex use cases this, and the support of millions of tokens context length, can enable. With breakthroughs in inference and long context understanding, we are officially entering a new era in LLMs. I am not surprised that we now have a dedicated inference engine for language processing. From the groq FAQ: "An LPU has greater compute capacity than a GPU and CPU in regards to LLMs. This reduces the amount of time per word calculated, allowing sequences of text to be generated much faster. Additionally, eliminating external memory bottlenecks enables the LPU Inference Engine to deliver orders of magnitude better performance on LLMs compared to GPUs." Try it yourself. What you see in the clip is playing at its original speed. This also made me realize how slow I type. 😅