One2Scene: Geometric Consistent Explorable 3D Scene Generation from a Single Image

The Hong Kong Polytechnic University

pengfei.wang@connect.polyu.hk

ICLR 2026

^*Equal contribution. ^†Corresponding author.

Abstract

Generating explorable 3D scenes from a single image is a highly challenging problem in 3D vision. Existing methods struggle to support free exploration, often producing severe geometric distortions and noisy artifacts when the viewpoint moves far from the original perspective. We introduce One2Scene, an effective framework that decomposes this ill-posed problem into three tractable subtasks to enable immersive explorable scene generation. We first use a panorama generator to produce anchor views from a single input image as initialization. Then, we lift these 2D anchors into an explicit 3D geometric scaffold via a generalizable, feed-forward Gaussian Splatting network. Rather than directly reconstructing from the panorama, we reformulate the task as multi-view stereo matching across sparse anchors, which allows us to leverage robust geometric priors learned from large-scale multi-view data. A bidirectional feature fusion module is used to enforce cross-view consistency, yielding an efficient and geometrically reliable scaffold. Finally, the scaffold serves as a strong prior for a novel view generator that can produce photorealistic and geometrically accurate views at arbitrary cameras. By explicitly constructing and conditioning on a 3D-consistent scaffold, One2Scene works stably under large camera motions, facilitating immersive scene exploration. Extensive experiments show that One2Scene substantially outperforms state-of-the-art methods in panorama depth estimation, feed-forward 360° reconstruction, and explorable 3D scene generation.

One2Scene Overview

Our method consists of three stages: (a) an anchor view generation stage to establish an initial 360-degree representation, (b) a feed-forward 3D Gaussian Splatting stage to construct an explicit 3D geometric scaffold, and (c) a synthesis stage that leverages the scaffold information to produce high-quality novel views. The pipeline enables geometrically consistent and photorealistic novel view synthesis from a single input image.

Result Gallery

Single image to explorable 3D scene generation results across different environments and styles.

Public Space

Photorealistic Style

Creative Workspace

Photorealistic Style

Modern Living Space

Photorealistic Style

Modern Workspace

Photorealistic Style

Indoor Dining Room

Photorealistic Style

Office Environment

Photorealistic Style

Suburban Outdoor

Photorealistic Style

Indoor Passageways

Photorealistic Style

Dining Space

Minecraft Style

Living Room

Anime Style

Cozy Living Space

Anime Style

Study Room

Anime Style

Outdoor Scene

Minecraft Style

Aquatic Landscape

Photorealistic Style

Suburban Street

Photorealistic Style

Aquatic Landscape

Anime Style

Comparison to other methods

Compare the generated videos of our method One2Scene with other baseline methods across different scenarios. Our method demonstrates superior geometric consistency and visual quality in explorable 3D scene generation. Try selecting different scenes to see the comparison!

VMem

VMem+

VMem+Anchor views

SEVA

SEVA+

SEVA+Anchor views

One2Scene

One2Scene (Ours)

Select Scene

Point Cloud Reconstruction Comparison

Compare the 3D point cloud reconstructions of our method One2Scene with other baseline methods. Our method produces more geometrically accurate and complete point clouds with better color fidelity. Rotate and zoom the 3D models to explore the reconstruction quality from different angles!

Point cloud files are large, please wait patiently for loading.

VMem

VMem+

VMem+Anchor views

SEVA

SEVA+

SEVA+Anchor views

One2Scene

One2Scene (Ours)

Select Scene

Click and drag to rotate • Scroll to zoom • Double-click to reset view

Additional Outdoor Generation Results

More examples of explorable 3D scene generation for outdoor environments. Our method demonstrates consistent geometric accuracy and visual quality for different type of outdoor scenes. Each video showcases smooth camera navigation through the reconstructed 3D environment.