Menu Close

Cerebras Unveils New AI Inference Technology

August 30, 2024

Cerebras Unicorn News - August 30, 2024

Cerebras Systems introduces a revolutionary AI inference approach using its wafer-scale engine technology, integrating Meta's LLaMA 3.1 AI model directly onto the chip. This innovation promises significant cost and power savings while boosting processing speeds, challenging Nvidia's dominance in the market.

( 00:00:00 ) Introduction

( 00:00:33 ) Cerebras Challenges Nvidia with New AI Inference Approach and Sleeker Designs

play
0:00 - 0:00

Cerebras Systems has introduced a groundbreaking AI inference approach utilizing its wafer-scale engine (WSE) technology, which integrates Meta's open-source LLaMA 3.1 AI model directly onto the chip. This innovation significantly reduces costs and power consumption while dramatically increasing processing speeds, claiming to be 10 times faster than current market solutions. The chip processes 1,800 tokens per second for the 8-billion parameter version of LLaMA 3.1, compared to 260 tokens per second on state-of-the-art GPUs. Cerebras is offering this service through an API to its cloud, making it accessible without requiring infrastructure changes, and plans to expand its offerings with larger models soon.

Today's Unicorn News