vggt-omega

Project Url: facebookresearch/vggt-omega

Introduction: [CVPR 2026 Oral] VGGT Omega

Tags:

VGGT-Ω

Jianyuan Wang^1,2 Minghao Chen¹ Shangzhan Zhang¹ Nikita Karaev¹
Johannes Schönberger² Patrick Labatut² Piotr Bojanowski² David Novotny
Andrea Vedaldi^1,2 Christian Rupprecht¹

¹Visual Geometry Group, University of Oxford; ²Meta AI

Before using the models, please request access to the checkpoints here. Once your request is approved, you can download the checkpoints. Please note that access requests are reviewed by an automated process based on the information provided in the request.

Model	Resolution	Text alignment	Download
`VGGT-Omega-1B-512`	512	No	Link
`VGGT-Omega-1B-256-Text-Alignment`	256	Yes	Link

The authors are not involved in the review process and cannot approve or reject individual applications. However, the 🤗 Hugging Face demo is available to everyone.

Quick Start

First, clone this repository and install the dependencies:

git clone git@github.com:facebookresearch/vggt-omega.git
cd vggt-omega
pip install -r requirements.txt
pip install -e .

Now, try the model with a few lines of code:

import torch

from vggt_omega.models import VGGTOmega
from vggt_omega.utils.load_fn import load_and_preprocess_images
from vggt_omega.utils.pose_enc import encoding_to_camera

checkpoint_path = "path/to/vggt_omega_1b_512.pt"
image_names = ["path/to/imageA.png", "path/to/imageB.png", "path/to/imageC.png"]

model = VGGTOmega().to("cuda").eval()
model.load_state_dict(torch.load(checkpoint_path, map_location="cpu"))

images = load_and_preprocess_images(image_names, image_resolution=512).to("cuda")

with torch.inference_mode():
    predictions = model(images)

extrinsics, intrinsics = encoding_to_camera(
    predictions["pose_enc"],
    predictions["images"].shape[-2:],
)

depth = predictions["depth"]
depth_conf = predictions["depth_conf"]
camera_and_register_tokens = predictions["camera_and_register_tokens"]
camera_tokens = camera_and_register_tokens[:, :, :1]
registers = camera_and_register_tokens[:, :, 1:]

For the text-aligned checkpoint, use VGGTOmega(enable_alignment=True) with image_resolution=256 and read predictions["text_alignment_embedding"].

Interactive Demo

Install the demo dependencies:

pip install -r requirements_demo.txt

Launch the Gradio demo with a local checkpoint path:

python demo_gradio.py \
  --checkpoint checkpoints/VGGT-Omega-1B-512/model.pt \
  --image-resolution 512

The demo accepts uploaded images or a video, runs camera and depth inference, and visualizes the depth-unprojected point cloud and predicted cameras as a GLB scene.

Runtime and GPU Memory

We benchmark the end-to-end peak GPU memory usage of VGGT-Omega-1B-512 on a single NVIDIA A100 GPU with 624x416 input images. The measurement covers the full inference program, from loading the model weights onto the GPU through the forward pass, so it includes both the memory needed to store the model itself and the memory used by inference activations and buffers. In other words, a GPU with at least the listed available memory is able to run the corresponding number of input frames under this setup.

Input Frames	1	10	25	50	100	200	300	400	500
Peak Memory (GB)	6.02	6.67	7.80	9.66	13.37	20.82	28.26	35.71	43.15

The benchmark uses load_and_preprocess_images with the default mode="balanced" and image_resolution=512. For these roughly 3:2 landscape images, this produces 624x416 inputs. You can set mode="max_size" to resize the longest side to 512 instead; for the same aspect ratio, this gives about 512x336 inputs and uses less GPU memory.

License

See the LICENSE file for details about the license under which this code is made available.

[^release]: This Release is intended to support the open source research community.

@misc{wang2026vggtomega,
      title={VGGT-$\Omega$}, 
      author={Jianyuan Wang and Minghao Chen and Shangzhan Zhang and Nikita Karaev and Johannes Schönberger and Patrick Labatut and Piotr Bojanowski and David Novotny and Andrea Vedaldi and Christian Rupprecht},
      year={2026},
      eprint={2605.15195},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.15195}, 
}

Apps

Android Developer Tools

Android Developer Tools Pro

About Me

Tools: TimeShining

GitHub: Trinea

Facebook: Dev Tools

AI Daily Digest

Daily AI News & Insights

JSON Format, Support error correction

MD5/SHA Encode, Support batch

Text Process

CSS Format and Compress