trellis-2_01

Converting a 2D image into a usable 3D model has always meant compromising on geometry. Thin wires, open surfaces, hollow interiors—these things wreck traditional reconstruction methods. TRELLIS.2, released by Microsoft Research under MIT license, handles them all natively with a 4B-parameter model that outputs production-ready PBR materials.


Architecture

TRELLIS.2 introduces O-Voxel (Omni-Voxel), a "field-free" sparse voxel structure that jointly encodes geometry and appearance. Traditional SDF-based methods struggle with open surfaces and non-manifold geometry because they rely on continuous field functions. O-Voxel bypasses this entirely.

Component Function
O-Voxel Sparse voxel grid encoding both geometry (f^geo) and appearance (f^mat) simultaneously
SC-VAE Sparse Compression VAE with 16× spatial compression—1024³ voxels compressed to ~9.6K latent tokens
Flow Transformer 4B-parameter sparse flow transformer trained with rectified flow matching

Flexible Dual Grids handle arbitrary topologies: open surfaces (leaves, cloth), non-manifold geometries (self-intersections), and enclosed hollow interiors (car interiors, cages). This is where SDF methods fail catastrophically.

Full PBR materials output directly to GLB format:

  • Base Color (Albedo)
  • Metallic
  • Roughness
  • Opacity (Alpha)—critical for glass, liquids, atmospheric effects

Specs & Performance

Resolution H100 Inference Time VRAM Requirement
512³ ~3 seconds 24GB+
1024³ ~15 seconds 32GB+
1536³ ~60 seconds 48GB+

The 4B model runs locally but demands serious hardware. Community quantization efforts are targeting 8GB-16GB via SLAT stage optimization.


Benchmarks & Comparisons

vs. TripoSR: TripoSR wins on speed (under 1 second for previews) but forces everything into watertight manifolds. TRELLIS.2 preserves thin structures like wires and fences that TripoSR destroys.

vs. Meshy-6: Meshy produces cleaner manifold meshes for 3D printing, but TRELLIS.2 outputs native PBR materials. Meshy requires secondary AI retexturing passes; TRELLIS.2 generates metallic/roughness/opacity in one shot.

vs. CRM (2024): CRM, a 2024 leader, is structurally outmatched. TRELLIS.2 shows superior intent alignment—matching generated models accurately to complex input prompts—and preserves high-frequency details CRM smooths over.


Community Reaction

Hacker News users call TRELLIS/TRELLIS.2 the "first true open-source foundation model for 3D." Microsoft released weights, training code, and a 500K curated dataset under MIT license.

The raw mesh topology and UV mapping aren't drag-and-drop ready for AAA engines. Assets still need manual retopology for rigging and animation.

Reddit users highlight the VRAM barrier:

24GB+ for high-res inference. The community is scrambling to quantize the SLAT stage down to 8GB-16GB.

The alpha handling gets consistent praise:

Finally, a model that outputs transparent objects like fishbowls with water natively. No post-processing hacks.


Sources

https://github.com/microsoft/TRELLIS https://github.com/microsoft/TRELLIS.2 https://microsoft.github.io/TRELLIS.2/ https://arxiv.org/abs/2512.14692 https://arxiv.org/abs/2412.01975 https://huggingface.co/spaces/microsoft/TRELLIS.2 https://trellis3d.github.io/ https://news.ycombinator.com/item?id=42514968 https://news.ycombinator.com/item?id=47929302 https://huggingface.co/microsoft/TRELLIS.2-4B


So What

The O-Voxel architecture is what actually matters here. SDF-based methods have dominated 3D reconstruction for years, and they've all hit the same wall: open surfaces and thin geometry break them. TRELLIS.2 doesn't patch around this limitation—it sidesteps the entire field-function paradigm.

The VRAM requirements are a genuine friction point. 24GB+ isn't consumer hardware. But the MIT license plus 500K curated dataset means the community will inevitably optimize this down. That's the pattern we've seen with every major open-source release.

What surprised me: the native alpha/opacity handling. Most image-to-3D models fake transparency or require post-processing. TRELLIS.2 outputs glass and liquids as actual transparent materials. That's the difference between a demo and something usable in production.