What It Is
A 1.7B parameter language model compressed to 290MB using 1-bit quantization, running entirely in-browser via WebGPU. No server. No API fees. No data leaving your device.
Developed by Prism ML, based on Qwen3-1.7B architecture, with WebGPU implementation hosted by the WebML Community on HuggingFace.
Technical Specifications
| Specification | Value |
|---|---|
| Parameters | 1.7B (1.4B non-embedding) |
| Quantization | Q1_0 g128 (1.125 bits/weight) |
| Context Length | 32,768 tokens |
| Vocabulary | 151,936 |
| Layers | 28 Transformer blocks |
| Attention | GQA (16 query / 8 KV heads) |
Memory Compression
| Format | Size | Compression |
|---|---|---|
| FP16 | 3.44 GB | Baseline |
| Browser (1-bit) | 290 MB | 14.2x |
| GGUF Q1_0 | 0.24 GB | 14.2x |
The quantization method: 1 sign bit + 1 FP16 scale per 128 weights. Weights map to 0 (negative scale) or 1 (positive scale). All layers quantized—embeddings, attention, MLP, LM head.
Performance Benchmarks
| Platform | Throughput | vs FP16 |
|---|---|---|
| RTX 4090 (CUDA) | 674 tok/s | 3.0x faster |
| M4 Pro 48GB (Metal) | 250 tok/s | 3.8x faster |
| iPhone (MLX Swift) | 130 tok/s | — |
The 1-bit kernels are faster than FP16 because memory bandwidth dominates inference latency. Less data to fetch = faster decoding.
Competitor Analysis
| Solution | Size | Key Feature |
|---|---|---|
| Bonsai 1.7B | 290 MB | 1-bit quantization |
| WebLLM | 2-4 GB | OpenAI API compatible |
| Transformers.js | Variable | No GPU required |
| Secret Llama | 2-4 GB | Privacy-focused UI |
Bonsai is 7-14x smaller than typical Q4 browser models. The tradeoff: quality vs size. For prototyping, quick completions, or constrained devices, 290MB is compelling.
What This Means
Privacy-first AI with zero infrastructure cost.
- Wearables & IoT: 290MB fits on constrained devices
- Offline capability: Works after initial download
- Cross-platform: Same model runs on CUDA, Metal, WebGPU, iOS
Resources
- WebGPU Demo: https://huggingface.co/spaces/webml-community/bonsai-webgpu
- GGUF Model: huggingface.co/prism-ml/Bonsai-1.7B-gguf
- GitHub: github.com/PrismML-Eng/Bonsai-demo
- Website: https://prismml.com