3 Minutes
Tiiny AI is betting that the next leap in AI hardware won’t live in a datacenter rack — it’ll fit in your hand. The startup has unveiled the Pocket Lab, a palm-sized “supercomputer” designed to run 120-billion-parameter large language models (LLMs) entirely offline.
Small device, big claims
Don’t let the dimensions trick you. At roughly 14.2 × 8 × 2.53 cm and about 300 grams, the Pocket Lab is built to be genuinely portable. Yet Tiiny AI says the unit can host heavyweight open models that typically require expensive GPU clusters, promising PhD-level reasoning, complex multi-step analysis, and deep contextual understanding without the cloud.
Specs that explain the hype
On paper the Pocket Lab reads like a condensed server. Key highlights include:
- ARMv9.2 12-core CPU for general compute tasks
- A custom heterogeneous compute module (SoC + discrete NPU) delivering around 190 TOPS
- 80 GB LPDDR5X memory and a 1 TB SSD for large-model residency and fast I/O
- Ability to run up to 120B-parameter LLMs fully on-device using aggressive quantization
- Power profile targeted at ~30W TDP and ~65W typical system power — far lower than comparable server setups
- Offline-first operation with one-click deployment for many open-source LLMs and agent frameworks

How does it pull off 120B models in your pocket?
The secret is a mix of hardware density and software smarts. The Pocket Lab packs a discrete NPU capable of high TOPS, but Tiiny AI also relies on two flagship techniques to keep large models practical on limited silicon:
- TurboSparse — a neuron-level sparse activation approach that squeezes inference efficiency without degrading model reasoning. The result: fewer computations, similar intelligence.
- PowerInfer — an open-source heterogeneous inference engine (popular on GitHub) that dynamically splits workloads between CPU and NPU. It orchestrates computation to mimic server-class throughput at a fraction of the usual power draw.
Combined with 80 GB of LPDDR5X to enable aggressive quantization and memory-efficient execution, these techniques make running 120B models locally plausible rather than theoretical.
Models, privacy, and real-world uses
The Pocket Lab supports a broad catalog of open models — from GPT-OSS and Llama to Qwen, Mistral, and Phi — letting developers pick the architecture that fits their needs. Because the device operates fully offline, it’s attractive for privacy-focused deployments, field research, and developers who want fast iteration without cloud latency or recurring costs.
Imagine testing a new agent workflow on your desk, or running sophisticated NLP tasks in offline environments like remote labs or secure facilities. That’s the kind of use case Tiiny AI is targeting.

What’s next: CES and questions to answer
Tiiny AI plans to showcase the Pocket Lab at CES 2026. The company hasn’t announced pricing or a shipping date yet, and real-world benchmarks will be the key test: can a pocket-sized machine consistently match server-grade workloads in diverse scenarios?
Even so, the Pocket Lab signals an exciting shift. Edge AI is moving beyond tiny sensors and into genuinely powerful, private compute platforms — and that could change how developers, researchers, and privacy-conscious users interact with LLMs.
Source: wccftech
Leave a Comment