Implement 3DGS | Lee Lab

By ihsumlee , 30 May 2026

content

Ref.:
- Vid2Sim: https://metadriverse.github.io/vid2sim/
Used Packages: ffmpeg, COLMAP, cmake, ninja, Nerfstudio, gsplat
Example of creating a 360-degree outfit showcase using images

Below is a clean setup for your workstation:

Ubuntu 24.04
Dual RTX 5080, 16 GB VRAM each
NVIDIA driver 580.159.03
CUDA / nvcc 12.8
gcc/g++ 13.3

We will use Nerfstudio Splatfacto. Nerfstudio supports end-to-end image/video processing, and its ns-process-datacommand supports images and video inputs; the official documentation says COLMAP and FFmpeg should be installed for this workflow. Splatfacto is Nerfstudio’s 3D Gaussian Splatting method, where 3D Gaussians are projected onto 2D camera views for fast rendering.

3DGS Setup and Test Guide

1. Create the conda environment

conda create -n ns3dgs python=3.10 -y
conda activate ns3dgs

python -m pip install --upgrade pip setuptools wheel

Check:

python --version
which python

Expected:

Python 3.10.x
.../envs/ns3dgs/bin/python

2. Install PyTorch for CUDA 12.8

Because your workstation has CUDA 12.8, use the CUDA 12.8 PyTorch wheel:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

Test GPU detection:

python - &lt;&lt;'PY'
import torch
print("torch version:", torch.__version__)
print("torch cuda:", torch.version.cuda)
print("cuda available:", torch.cuda.is_available())
print("gpu count:", torch.cuda.device_count())
for i in range(torch.cuda.device_count()):
    print(i, torch.cuda.get_device_name(i))
PY

Expected result:

cuda available: True
gpu count: 2
0 NVIDIA GeForce RTX 5080
1 NVIDIA GeForce RTX 5080

Do not continue until this test works.

3. Install Ubuntu dependencies

sudo apt update
sudo apt install -y git cmake ninja-build build-essential ffmpeg colmap

Check:

ffmpeg -version
colmap -h
cmake --version
ninja --version

4. Install Nerfstudio and gsplat

pip install nerfstudio
pip install gsplat

Nerfstudio can be installed by pip install nerfstudio, according to its PyPI page. The gsplat library provides CUDA-accelerated Gaussian rasterization with Python bindings.

Test installation:

ns-train --help
ns-process-data --help
python -c "import gsplat; print('gsplat OK')"

If gsplat fails, try this compiler workaround:

sudo apt install -y gcc-12 g++-12

export CC=/usr/bin/gcc-12
export CXX=/usr/bin/g++-12

pip uninstall -y gsplat
pip install --no-cache-dir gsplat
python -c "import gsplat; print('gsplat OK')"

5. Create a project folder

mkdir -p ~/vln_3dgs_project/raw_data/video_test/video
mkdir -p ~/vln_3dgs_project/raw_data/image_test/images
mkdir -p ~/vln_3dgs_project/processed_data
mkdir -p ~/vln_3dgs_project/outputs

Recommended first scene:

Good:
- lab desk
- robot workspace
- cabinet corner
- bookshelf
- toolbox area

Avoid:
- white corridor
- glass wall
- empty room
- fast camera motion
- many moving people

Part A: Test 3DGS using a video

A1. Prepare the video

Copy your video into the folder:

cp your_video.mp4 ~/vln_3dgs_project/raw_data/video_test/video/

A2. Process the video

Start with 300 frames:

conda activate ns3dgs

ns-process-data video \
  --data ~/vln_3dgs_project/raw_data/video_test/video/your_video_720p.mp4 \
  --output-dir ~/vln_3dgs_project/processed_data/video_test_300 \
  --num-frames-target 300

This step will:

1. extract video frames,
2. run COLMAP,
3. estimate camera poses,
4. create transforms.json,
5. prepare the dataset for Splatfacto.

Check the result:

ls ~/vln_3dgs_project/processed_data/video_test_300

Expected:

images
transforms.json
colmap

If transforms.json does not exist, COLMAP failed.

A3. Train Splatfacto from the video dataset

Use only one RTX 5080 first:

CUDA_VISIBLE_DEVICES=0 ns-train splatfacto \
  --data ~/vln_3dgs_project/processed_data/video_test_300 \
  --output-dir ~/vln_3dgs_project/outputs

During training, you should see a viewer link:

http://localhost:7007

Open it in your workstation browser.

If you are using SSH:

ssh -L 7007:localhost:7007 user@your_server_ip

Then open this on your local computer:

http://localhost:7007

A4. If CUDA memory is not enough

Your RTX 5080 has 16 GB VRAM. If you get out-of-memory, reduce the frame number:

ns-process-data video \
  --data ~/vln_3dgs_project/raw_data/video_test/video/your_video_720p.mp4 \
  --output-dir ~/vln_3dgs_project/processed_data/video_test_150 \
  --num-frames-target 150

Then train:

CUDA_VISIBLE_DEVICES=0 ns-train splatfacto \
  --data ~/vln_3dgs_project/processed_data/video_test_150 \
  --output-dir ~/vln_3dgs_project/outputs

Part B: Test 3DGS using images

B1. Prepare image folder

Put your images here:

~/vln_3dgs_project/raw_data/image_test/images

Example:

ls ~/vln_3dgs_project/raw_data/image_test/images

Expected:

0001.jpg
0002.jpg
0003.jpg
...

Recommended first image dataset:

Number of images: 50–200
Resolution: 720p or 1080p
Scene: small indoor scene
Motion: different viewpoints around the scene
Lighting: stable

If your images are very high resolution, resize them:

mkdir -p ~/vln_3dgs_project/raw_data/image_test/images_720p

for img in ~/vln_3dgs_project/raw_data/image_test/images/*; do
  filename=$(basename "$img")
  ffmpeg -y -i "$img" -vf scale=1280:-1 \
    ~/vln_3dgs_project/raw_data/image_test/images_720p/"$filename"
done

B2. Process the images

ns-process-data images \
  --data ~/vln_3dgs_project/raw_data/image_test/images_720p \
  --output-dir ~/vln_3dgs_project/processed_data/image_test

Check:

ls ~/vln_3dgs_project/processed_data/image_test

Expected:

images
transforms.json
colmap

B3. Train Splatfacto from images

CUDA_VISIBLE_DEVICES=0 ns-train splatfacto \
  --data ~/vln_3dgs_project/processed_data/image_test \
  --output-dir ~/vln_3dgs_project/outputs

Open the viewer:

http://localhost:7007

6. How to use both GPUs

For one first training run, use only one GPU.

Your best use of dual RTX 5080 is parallel experiments:

Terminal 1:

CUDA_VISIBLE_DEVICES=0 ns-train splatfacto \
  --data ~/vln_3dgs_project/processed_data/video_test_300 \
  --output-dir ~/vln_3dgs_project/outputs

Terminal 2:

CUDA_VISIBLE_DEVICES=1 ns-train splatfacto \
  --data ~/vln_3dgs_project/processed_data/image_test \
  --output-dir ~/vln_3dgs_project/outputs

This is easier than trying to use two GPUs for one 3DGS scene.

7. How to judge whether the test succeeded

In the viewer, check:

| Item | Good result | | ----------------- | ------------------------------ | | camera trajectory | smooth, not scattered randomly | | object appearance | recognizable | | table/wall/floor | stable, not twisted | | novel view | not too blurry | | ghost artifacts | few | | overall scene | consistent 3D structure |

If the result is broken, the most common reason is bad camera pose estimation, not the 3DGS model.

8. Common problems and fixes

Problem 1: COLMAP fails

Symptoms:

No transforms.json
Very few registered images
Broken camera path

Fix:

Record slower.
Use more visual texture.
Avoid white walls and glass.
Avoid only rotating the camera.
Move with translation.
Avoid moving people.
Use 150–300 good frames.

Problem 2: CUDA out of memory

Fix:

Use 150 frames instead of 300.
Resize to 720p.
Use a smaller scene.
Use one GPU only.
Close other GPU programs.

Command:

nvidia-smi

Problem 3: gsplat compile/import error

Try:

sudo apt install -y gcc-12 g++-12
export CC=/usr/bin/gcc-12
export CXX=/usr/bin/g++-12
pip uninstall -y gsplat
pip install --no-cache-dir gsplat

Problem 4: Viewer cannot open through SSH

Use port forwarding:

ssh -L 7007:localhost:7007 user@your_server_ip

Then open:

http://localhost:7007

9. Recommended first experiment for your workstation

Because your GPUs have 16 GB VRAM, use this setting first:

Input type: video
Scene: lab desk or robot workspace
Video: 30–60 seconds
Resolution: 720p
Frames: 150–300
GPU: CUDA_VISIBLE_DEVICES=0
Method: ns-train splatfacto

After this works, try:

RGB-D / robot-view video
ROS 2 pose alignment
SAM 2 semantic masks
language-guided correction
VLN/VLA training environment

10. Full command summary

# Create environment
conda create -n ns3dgs python=3.10 -y
conda activate ns3dgs
python -m pip install --upgrade pip setuptools wheel

# Install PyTorch CUDA 12.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

# Test PyTorch
python - &lt;&lt;'PY'
import torch
print("torch version:", torch.__version__)
print("torch cuda:", torch.version.cuda)
print("cuda available:", torch.cuda.is_available())
print("gpu count:", torch.cuda.device_count())
for i in range(torch.cuda.device_count()):
    print(i, torch.cuda.get_device_name(i))
PY

# Install system tools
sudo apt update
sudo apt install -y git cmake ninja-build build-essential ffmpeg colmap

# Install Nerfstudio and gsplat
pip install nerfstudio
pip install gsplat

# Test installation
ns-train --help
ns-process-data --help
python -c "import gsplat; print('gsplat OK')"

# Create folders
mkdir -p ~/vln_3dgs_project/raw_data/video_test/video
mkdir -p ~/vln_3dgs_project/raw_data/image_test/images
mkdir -p ~/vln_3dgs_project/processed_data
mkdir -p ~/vln_3dgs_project/outputs

# Process video
ns-process-data video \
  --data ~/vln_3dgs_project/raw_data/video_test/video/your_video_720p.mp4 \
  --output-dir ~/vln_3dgs_project/processed_data/video_test_300 \
  --num-frames-target 300

# Train video test
CUDA_VISIBLE_DEVICES=0 ns-train splatfacto \
  --data ~/vln_3dgs_project/processed_data/video_test_300 \
  --output-dir ~/vln_3dgs_project/outputs

# Process images
ns-process-data images \
  --data ~/vln_3dgs_project/raw_data/image_test/images_720p \
  --output-dir ~/vln_3dgs_project/processed_data/image_test

# Train image test
CUDA_VISIBLE_DEVICES=0 ns-train splatfacto \
  --data ~/vln_3dgs_project/processed_data/image_test \
  --output-dir ~/vln_3dgs_project/outputs

The next thing you should do is run the PyTorch GPU test first. Once both RTX 5080 GPUs are detected, install Nerfstudio and we can test one small video.

Teaching Materials

Xie_Vid2Sim_Realistic_and_Interactive_Simulation_from_Video_for_Urban_Navigation_CVPR_2025_paper.pdf (9.66 MB)

14_EmbodiedSplat_Personalized_.pdf (2.23 MB)