Dual-Camera Smooth Zoom on Mobile Phones

Abstract

When zooming between dual cameras on a mobile, noticeable jumps in geometric content and image color occur in the preview, inevitably affecting the user's zoom experience. In this work, we introduce a new task, \ie, dual-camera smooth zoom (DCSZ) to achieve a smooth zoom preview. The frame interpolation (FI) technique is a potential solution but struggles with ground-truth collection. To address the issue, we suggest a data factory solution where continuous virtual cameras are assembled to generate DCSZ data by rendering reconstructed 3D models of the scene. In particular, we propose a novel dual-camera smooth zoom Gaussian Splatting (ZoomGS), where a camera-specific encoding is introduced to construct a specific 3D model for each virtual camera. With the proposed data factory, we construct a synthetic dataset for DCSZ, and we utilize it to fine-tune FI models. In addition, we collect real-world dual-zoom images without ground-truth for evaluation. Extensive experiments are conducted with multiple FI methods. The results show that the fine-tuned FI models achieve a significant performance improvement over the original ones on DCSZ task.

DCSZ Overview

(a) Existing dual-camera zoom. For zooming between UW and W cameras (i.e., from x0.6 to x1.0), smartphones (e.g., Xiaomi, OPPO, and vivo) generally crop out the specific area from the UW image, and scale the image up to the dimensions of the original. When the zoom factor changes from 0.9 to 1.0, the lens has to switch from UW to W, where notable geometric content and image color jump happen in the preview. (b) Our proposed dual-camera smooth zoom (DCSZ). (c) The geometric content and image color jump in (a) existing dual-camera zoom. Some examples are shown below.

UW Image (x0.6)	W Image (x1.0)	Camera Swiching (x0.6->x1.0)	Camera Swiching (x0.9->x1.0)

Method Overview

Overview of the proposed method. (a) Data preparation for data factory. We collect multi-view dual-camera images and calibrate their camera extrinsic and intrinsic parameters. (b) Construction of ZoomGS in data factory. ZoomGS employs a camera transition (CamTrans) module to transform the base (i.e., UW camera) Gaussians to the specific camera Gaussians according to the camera encoding. (c) Data generation from data factory. The virtual (V) camera parameters are constructed by interpolating the dual-camera ones, and are then input into ZoomGS to generate zoom sequences. (d) Fine-tuning a frame interpolation (FI) model with the constructed zoom sequences.

ZoomGS Results

UW Image (x0.6)	W Image (x1.0)	ZoomGS Results (x0.6->x1.0)

FI Results

Quantitative comparisons of FI models on synthetic and real-world dataset. The results of our fine-tuned models are marked bold. The results show that the fine-tuned FI models achieve a significant performance improvement over the original ones on DCSZ task.

Visual comparisons on the synthetic dataset. The FI models synthesize the intermediate geometry content between dual cameras, as indicated with yellow arrows. The fine-tuned FI models generate more photo-realistic details.

Visual comparisons on the real-world dataset. The FI models also synthesize the intermediate geometry content in the real world, as indicated with yellow arrows. Besides, the fine-tuned FI models generate fewer visual artifacts.

BibTeX

@article{DCSZ,
  title={Dual-Camera Smooth Zoom on Mobile Phones},
  author={Renlong Wu and Zhilu Zhang and Yu Yang and Wangmeng Zuo},
  journal={arXiv preprint arXiv:2404.04908},
  year={2024}
}