hy-world-2_01

What It Is

HY-World 2.0 is Tencent's multi-modal world model that generates actual 3D geometry—not video—from text, images, or video input. It outputs 3D Gaussian Splats, meshes, and point clouds that can be imported directly into Unity, Unreal Engine, Blender, and NVIDIA Isaac Sim.

This is the first open-source 3D world model that competes with closed-source alternatives like World Labs' Marble on Stanford's WorldScore benchmark.


Architecture & Specs

Component Parameters Purpose
WorldMirror 2.0 ~1.2B Feed-forward reconstruction model
HY-Pano 2.0 TBD Panorama generation (text/image to 360°)
WorldNav TBD Trajectory planning and navigation
WorldStereo 2.0 TBD View generation with memory

WorldMirror 2.0 Architecture: Unified Transformer backbone with DPT decoder heads, simultaneously predicting depth, normals, camera parameters, and 3DGS attributes in a single forward pass.

Improvement v1.0 v2.0
Position Encoding Absolute RoPE Normalized RoPE
Depth Supervision GT depth only GT depth + normals
Resolution Range 100K-250K 50K-500K pixels
Curriculum 2 stages 3 stages

Benchmarks

WorldStereo 2.0 Camera Control

Method RotErr TransErr CLIP-I
SEVA 1.690 1.578 77.16
Gen3C 0.944 1.580 82.33
WorldStereo 2.0 0.492 0.968 89.43

Reconstruction Quality (Tanks-and-Temples / MipNeRF360)

Method F1 Score
SEVA 36.73 / 28.75
Lyra 32.54 / 36.05
WorldStereo 2.0 41.43 / 51.27

Capabilities

  • Real 3D Assets: Generates actual geometry—not pixel videos
  • Persistent Worlds: Build once, keep forever; unlimited duration
  • Native 3D Consistency: No flickering, inherent spatial coherence
  • Engine Import: Direct to Unity, Unreal, Blender, Isaac Sim
  • Physics Support: Collision detection, real-time rendering

Competitor Comparison

Aspect HY-World 2.0 Marble Genie 3
Access Open source Commercial ($) Google AI Ultra
Output Real 3D 3DGS Pixel video
Duration Unlimited Downloadable ~1 min
Editability Fully editable Partial Non-editable
Self-host Yes No No

Community Reality Check

Reddit's r/LocalLLaMA discussion (51 upvotes, 19 comments):

"Some BIG asterisks here. The code available is for making Gaussian splats from images and videos. Many of the more interesting features and models are not available yet."

What's actually released: WorldMirror 2.0 only. HY-Pano, WorldNav, WorldStereo 2.0 coming soon.

License note: Open source but NOT FOSS—commercial restrictions apply.

Quality concern: "If you look at the video full screen, both texture and mesh resolution are very low." Generates entire scenes, not individual editable objects.


Key Takeaway

HY-World 2.0 is a paradigm shift from video world models (ephemeral playback) to persistent, navigable 3D environments. The partial release and license restrictions are real limitations, but this is the first genuinely competitive open-source option for 3D world generation—and it outputs geometry you can actually use in production pipelines.