
Converting a 2D image into a usable 3D model has always meant compromising on geometry. Thin wires, open surfaces, hollow interiors—these things wreck traditional reconstruction methods. TRELLIS.2, released by Microsoft Research under MIT license, handles them all natively with a 4B-parameter model that outputs production-ready PBR materials.
Architecture
TRELLIS.2 introduces O-Voxel (Omni-Voxel), a "field-free" sparse voxel structure that jointly encodes geometry and appearance. Traditional SDF-based methods struggle with open surfaces and non-manifold geometry because they rely on continuous field functions. O-Voxel bypasses this entirely.
| Component | Function |
|---|---|
| O-Voxel | Sparse voxel grid encoding both geometry (f^geo) and appearance (f^mat) simultaneously |
| SC-VAE | Sparse Compression VAE with 16× spatial compression—1024³ voxels compressed to ~9.6K latent tokens |
| Flow Transformer | 4B-parameter sparse flow transformer trained with rectified flow matching |
Flexible Dual Grids handle arbitrary topologies: open surfaces (leaves, cloth), non-manifold geometries (self-intersections), and enclosed hollow interiors (car interiors, cages). This is where SDF methods fail catastrophically.
Full PBR materials output directly to GLB format:
- Base Color (Albedo)
- Metallic
- Roughness
- Opacity (Alpha)—critical for glass, liquids, atmospheric effects
Specs & Performance
| Resolution | H100 Inference Time | VRAM Requirement |
|---|---|---|
| 512³ | ~3 seconds | 24GB+ |
| 1024³ | ~15 seconds | 32GB+ |
| 1536³ | ~60 seconds | 48GB+ |
The 4B model runs locally but demands serious hardware. Community quantization efforts are targeting 8GB-16GB via SLAT stage optimization.
Benchmarks & Comparisons
vs. TripoSR: TripoSR wins on speed (under 1 second for previews) but forces everything into watertight manifolds. TRELLIS.2 preserves thin structures like wires and fences that TripoSR destroys.
vs. Meshy-6: Meshy produces cleaner manifold meshes for 3D printing, but TRELLIS.2 outputs native PBR materials. Meshy requires secondary AI retexturing passes; TRELLIS.2 generates metallic/roughness/opacity in one shot.
vs. CRM (2024): CRM, a 2024 leader, is structurally outmatched. TRELLIS.2 shows superior intent alignment—matching generated models accurately to complex input prompts—and preserves high-frequency details CRM smooths over.
Community Reaction
Hacker News users call TRELLIS/TRELLIS.2 the "first true open-source foundation model for 3D." Microsoft released weights, training code, and a 500K curated dataset under MIT license.
The raw mesh topology and UV mapping aren't drag-and-drop ready for AAA engines. Assets still need manual retopology for rigging and animation.
Reddit users highlight the VRAM barrier:
24GB+ for high-res inference. The community is scrambling to quantize the SLAT stage down to 8GB-16GB.
The alpha handling gets consistent praise:
Finally, a model that outputs transparent objects like fishbowls with water natively. No post-processing hacks.
Sources
https://github.com/microsoft/TRELLIS https://github.com/microsoft/TRELLIS.2 https://microsoft.github.io/TRELLIS.2/ https://arxiv.org/abs/2512.14692 https://arxiv.org/abs/2412.01975 https://huggingface.co/spaces/microsoft/TRELLIS.2 https://trellis3d.github.io/ https://news.ycombinator.com/item?id=42514968 https://news.ycombinator.com/item?id=47929302 https://huggingface.co/microsoft/TRELLIS.2-4B
So What
The O-Voxel architecture is what actually matters here. SDF-based methods have dominated 3D reconstruction for years, and they've all hit the same wall: open surfaces and thin geometry break them. TRELLIS.2 doesn't patch around this limitation—it sidesteps the entire field-function paradigm.
The VRAM requirements are a genuine friction point. 24GB+ isn't consumer hardware. But the MIT license plus 500K curated dataset means the community will inevitably optimize this down. That's the pattern we've seen with every major open-source release.
What surprised me: the native alpha/opacity handling. Most image-to-3D models fake transparency or require post-processing. TRELLIS.2 outputs glass and liquids as actual transparent materials. That's the difference between a demo and something usable in production.