3 Minutes
Xiaomi has unveiled MiMo-V2-Flash, its most advanced open-source language model yet — a speed-focused, cost-efficient contender aimed directly at models like DeepSeek and Claude. Designed for agent workflows and multi-step interactions, MiMo-V2-Flash mixes high reasoning and code-generation skills with a production-ready emphasis on inference speed and lower operational cost.
What makes MiMo-V2-Flash stand out?
At the core of MiMo-V2-Flash is a Mixture-of-Experts (MoE) architecture with 309 billion total parameters and about 15 billion active parameters during inference. That combination lets Xiaomi boost throughput while keeping compute usage — and billing — down. Imagine getting the reasoning and coding chops of larger models but with much lighter infrastructure demands.
Benchmarks and real-world performance
Xiaomi says benchmark results place MiMo-V2-Flash among the top open models. It ranked in the top two open-source models on reasoning tests like AIME 2025 and GPQA-Diamond, and outperformed peers on software-engineering suites such as SWE-Bench Verified and SWE-Bench Multilingual. In some engineering tasks it approaches the level of proprietary models like GPT-5 and Claude 4.5 Sonnet.

Speed and cost: the practical edge
- Latency: Xiaomi reports response generation at up to 150 tokens per second.
- Pricing: API access is priced at $0.10 per 1M input tokens and $0.30 per 1M output tokens, with limited-time free access initially available.
- Efficiency claim: Xiaomi says MiMo-V2-Flash’s inference cost is about 2.5% of Claude’s, making it notably cheaper to run at scale.
Technical innovations that power the model
Two of Xiaomi’s innovations are particularly notable. Multi-Token Prediction (MTP) enables the model to generate multiple tokens at once and validate them before finalizing the output — a tactic that accelerates throughput without sacrificing quality. Meanwhile, Multi-Teacher Online Policy Distillation (MOPD) uses several assistant models and token-level reward signals to distill capabilities more efficiently, cutting heavy training resource needs.
Developer tools and ecosystem
To make the model usable beyond benchmarks, Xiaomi launched MiMo Studio — a platform for conversational access, web search integration, running agent workflows, and code generation. MiMo-V2-Flash can produce functional HTML pages and is compatible with tooling like Claude Code and Cursor, which should ease adoption among devs and product teams.
Whether you’re building assistants, coding agents, or fast inference services, MiMo-V2-Flash signals Xiaomi’s growing bet on open, high-performance AI that’s built for real-world throughput and lower running costs. The result? A compelling alternative for teams seeking speed and affordability without abandoning advanced reasoning and code-generation capabilities.
Leave a Comment