Skip to main content
Point Cloud Data (PCD) Projects are multi-modal projects that involve labeling and reviewing 3D point cloud data. Autonomous driving, robotics, and drone technology are all examples of Projects that PCD Projects were built to support.

Point Cloud Data Supported

Encord supports the following formats for Point Cloud Data:
  • .pcd - Point Cloud Data
  • .ply - Polygon File Format
  • .las - LAS point cloud (up to LAS v1.3)
  • .laz - Compressed LAS (up to LAS v1.3)
  • .mcap - MCAP container
  • .bag - ROS bag files
  • .db3 - ROS2 SQLite bag files
The Encord platform supports many more Point Cloud Data file formats. If you do not see a format you want supported, contact us at support@encord.com.

Data for PCD Projects

PCD Projects use “Scenes”. Scenes are the data units that your Taskers, and possibly your Agents, work with to create and review your labels. A Scene can be one of the following:
  • A PCD file (.pcd, .ply, .las, .laz, and so on)
  • A group of videos and PCD files bound together as a coherent group.
Encord supports the following for multimodal Scenes:
  • Videos up to 1GB in size
  • Up to 9 videos in a Scene
  • Up to 100 frames in each video
  • Up to 10 million cloud data points per frame
The Encord platform supports many more Point Cloud Data file formats. If you do not see a format you want supported, contact us at support@encord.com.
To register/import point cloud data into Encord, the data needs to be mirrored exactly in the cloud and locally. The main.py script to create Scenes in Encord for an autonomous driving project.
main.py
# /// script
# requires-python = ">=3.12"
# dependencies = [
#   "matplotlib>=3.10.3",
#   "np>=1.0.2",
#   "nuscenes-devkit>=1.1.9",
#   "pillow>=11.3.0",
#   "pydantic>=2.11.5",
#   "pypcd4>=1.2.1",
#   "requests>=2.32.3",
#   "scipy>=1.15.3",
#   "tqdm>=4.67.1",
# ]
# ///
from __future__ import annotations

import argparse
import json
import os
import pathlib
import re
import shutil
import tarfile
from dataclasses import dataclass
from enum import StrEnum, auto
from math import floor
from typing import Annotated, Any, Literal

import numpy as np
import pypcd4
import requests
import tqdm
from nuscenes import nuscenes
from pydantic import BaseModel, ConfigDict, Field
from scipy.spatial.transform import Rotation

"""
This script processes scenes from the nuScenes (https://www.nuscenes.org/) dataset and converts them into a
the Encord upload JSON format for visualization and annotation. It can handle
lidar, radar, and camera data, as well as 3D annotations and ego-vehicle poses.
The script downloads the nuScenes minisplit if not found locally, and processes it, including:
- Converting the point cloud data from .bin to .pcd
- Timestamps are normalized to start from 0 at the beginning of the scene
- Converting positions so that the vehicle's starting position is treated as the origin (0, 0, 0)
"""


def snake2camel(snake: str, start_lower: bool = True) -> str:
    """
    Converts a snake_case string to camelCase.

    The `start_lower` argument determines whether the first letter in the generated camelcase should
    be lowercase (if `start_lower` is True), or capitalized (if `start_lower` is False).
    """
    camel = snake.title()
    camel = re.sub("([0-9A-Za-z])_(?=[0-9A-Z])", lambda m: m.group(1), camel)
    if start_lower:
        camel = re.sub("(^_*[A-Z])", lambda m: m.group(1).lower(), camel)
    return camel


class CamelModel(BaseModel):
    model_config = ConfigDict(alias_generator=snake2camel, populate_by_name=True)


@dataclass
class CameraIntrinsics:
    fx: Annotated[float, Field(description="Focal length x")]
    fy: Annotated[float, Field(description="Focal length y")]
    ox: Annotated[float, Field(description="Principal point offset x")]
    oy: Annotated[float, Field(description="Principal point offset y")]
    s: Annotated[float, Field(description="Axis skew")]


@dataclass
class CameraExtrinsics:
    rotation: Annotated[
        tuple[float, float, float, float, float, float, float, float, float],
        Field(description="Rotation matrix R"),
    ]
    position: Annotated[
        tuple[float, float, float], Field(description="Translation vector T")
    ]


@dataclass
class CameraParams:
    width_px: int
    height_px: int
    intrinsics: Annotated[CameraIntrinsics, Field(description="The intrinsic matrix K")]
    extrinsics: Annotated[
        CameraExtrinsics, Field(description="The extrinsic 4x4 matrix R|T")
    ]


@dataclass
class FrameOfReference:
    id: Annotated[str, Field(description="ID of this frame of reference")]
    parent_FOR: Annotated[
        str | None, Field(description="ID of a parent frame of reference")
    ]
    rotation: tuple[float, float, float, float, float, float, float, float, float]
    position: tuple[float, float, float]


Position = tuple[float, float, float]
EulerOrientation = tuple[float, float, float]
Size = tuple[float, float, float]


class Pose(CamelModel):
    position: Position
    orientation: EulerOrientation


class CuboidGeometry(CamelModel):
    type: Literal["cuboid"] = "cuboid"
    pose: Pose
    size: Size


@dataclass
class _FORIdMixin:
    frame_of_reference_id: Annotated[
        str | None, Field(description="ID of the frame of reference the entity is in")
    ] = None


@dataclass
class _URIMixin:
    uri: str


@dataclass
class _EventMixin:
    timestamp: float | None = None


class URIEvent(CamelModel, _EventMixin, _URIMixin):
    pass


class CameraParamsEvent(CamelModel, _EventMixin, CameraParams):
    pass


class FOREvent(CamelModel, _EventMixin, FrameOfReference):
    pass


class ModelEvent(CamelModel, _EventMixin):
    geometries: list[CuboidGeometry]


class CompositeScene(CamelModel):
    type: Literal["composite"] = "composite"
    streams: dict[str, EventStream]


class EntityType(StrEnum):
    POINT_CLOUD = auto()
    FRAME_OF_REFERENCE = auto()
    IMAGE = auto()
    MODEL = auto()
    CAMERA_PARAMETERS = auto()


class PCDStream(CamelModel, _FORIdMixin):
    entity_type: Literal[EntityType.POINT_CLOUD] = EntityType.POINT_CLOUD
    events: Annotated[list[URIEvent], Field(description="List of point cloud events")]


class CameraStream(CamelModel, _FORIdMixin):
    entity_type: Literal[EntityType.CAMERA_PARAMETERS] = EntityType.CAMERA_PARAMETERS
    events: list[CameraParamsEvent]


class ImageStream(CamelModel):
    entity_type: Literal[EntityType.IMAGE] = EntityType.IMAGE
    events: list[URIEvent]
    camera_id: Annotated[
        str | None,
        Field(
            description="ID of the camera associated with the image. Used to position the image in-scene"
        ),
    ]


class ModelStream(CamelModel):
    entity_type: Literal[EntityType.MODEL] = EntityType.MODEL
    events: list[URIEvent | ModelEvent]
    camera_id: str | None


class FORStream(CamelModel):
    entity_type: Literal[EntityType.FRAME_OF_REFERENCE] = EntityType.FRAME_OF_REFERENCE
    events: Annotated[
        list[FOREvent], Field(description="List of frame of reference events")
    ]


class EventStream(CamelModel):
    type: Literal["event"] = "event"
    id: str
    stream: Annotated[
        PCDStream | CameraStream | FORStream | ImageStream | ModelStream,
        Field(discriminator="entity_type"),
    ]


DATASET_DIR = pathlib.Path("./dataset")


class Config:
    env: str
    output_dir: pathlib.Path
    base_url: str

    def __init__(self):
        self.env = "remote"
        self.output_dir = pathlib.Path("./scenes")
        self.base_url = (
            "https://storage.cloud.google.com/my-bucket-name/scenes/nuscenes" # Replace this with the file path in your bucket to the dataset
        )


config = Config()


def ensure_scene_available(
    root_dir: pathlib.Path, dataset_version: str, scene_name: str
) -> None:
    """
    Ensure that the specified scene is available.

    Downloads minisplit into root_dir if scene_name is part of it and root_dir is empty.

    Raises ValueError if scene is not available and cannot be downloaded.
    """
    try:
        nusc = nuscenes.NuScenes(
            version=dataset_version, dataroot=str(root_dir), verbose=False
        )
    except AssertionError:  # dataset initialization failed
        if dataset_version == "v1.0-mini":
            download_minisplit(root_dir)
            nusc = nuscenes.NuScenes(
                version=dataset_version, dataroot=str(root_dir), verbose=False
            )
        else:
            print(
                f"Could not find dataset at {root_dir} and could not automatically download specified scene."
            )
            exit()

    scene_names = [s["name"] for s in nusc.scene]
    if scene_name not in scene_names:
        raise ValueError(f"{scene_name=} not found in dataset")


def nuscene_sensor_names(nusc: nuscenes.NuScenes, scene_name: str) -> list[str]:
    """Return all sensor names in the scene."""

    sensor_names = set()

    scene = next(s for s in nusc.scene if s["name"] == scene_name)
    first_sample = nusc.get("sample", scene["first_sample_token"])
    for sample_data_token in first_sample["data"].values():
        sample_data = nusc.get("sample_data", sample_data_token)
        if sample_data["sensor_modality"] == "camera":
            current_camera_token = sample_data_token
            while current_camera_token != "":
                sample_data = nusc.get("sample_data", current_camera_token)
                sensor_name = sample_data["channel"]
                sensor_names.add(sensor_name)
                current_camera_token = sample_data["next"]

    # For a known set of cameras, order the sensors in a circle.
    ordering = {
        "CAM_FRONT_LEFT": 0,
        "CAM_FRONT": 1,
        "CAM_FRONT_RIGHT": 2,
        "CAM_BACK_RIGHT": 3,
        "CAM_BACK": 4,
        "CAM_BACK_LEFT": 5,
    }
    return sorted(
        sensor_names, key=lambda sensor_name: ordering.get(sensor_name, float("inf"))
    )


# Write all uri assets required for the scene to a separate output directory
def write_asset(path: pathlib.Path):
    shutil.copyfile(path, pathlib.Path("./output") / path.name)


def write_nuscenes_json(scene: CompositeScene, name: str):
    OUTPUT_FILE = config.output_dir / "nuscenes.json"
    with open(OUTPUT_FILE, "w") as f:
        dummy_json = scene.model_dump_json(by_alias=True, indent=2)
        f.write(dummy_json)
        print("Wrote to", OUTPUT_FILE)


def write_upload_json(scenes: list[tuple[CompositeScene, str]]):
    scenes_final = []
    for scene, name in scenes:
        streams = list(scene.model_dump(by_alias=True)["streams"].values())
        scenes_final.append(
            {
                "title": name,
                "streams": streams,
            }
        )

    final = {"scenes": scenes_final}

    OUTPUT_FILE = config.output_dir / "upload.json"
    with open(OUTPUT_FILE, "w") as f:
        json.dump(final, f, indent=2)
        print("Wrote to", OUTPUT_FILE)


first_timestamp = 0
first_position = [0, 0, 0]
hz = 0


def sub(a, b) -> tuple[float, float, float]:
    return [a[i] - b[i] for i in range(len(a))]


def log_nuscenes(
    nusc: nuscenes.NuScenes, scene_name: str, max_time_sec: float, sample_hz: float
) -> CompositeScene:
    """Log nuScenes scene."""
    print(f"Logging scene {scene_name}")

    result = CompositeScene(streams={})

    scene = next(s for s in nusc.scene if s["name"] == scene_name)

    location = nusc.get("log", scene["log_token"])["location"]

    # Get the first sample
    first_sample_token = scene["first_sample_token"]
    first_sample = nusc.get("sample", scene["first_sample_token"])

    # Get the timestamp (in seconds)
    global first_timestamp
    first_timestamp = first_sample["timestamp"] / 1e6
    global first_position
    first_position = (0, 0, 0)
    global hz
    hz = sample_hz

    first_lidar_tokens = []
    first_radar_tokens = []
    first_camera_tokens = []
    for sample_data_token in first_sample["data"].values():
        sample_data = nusc.get("sample_data", sample_data_token)
        log_sensor_calibration(result, sample_data, nusc)

        if sample_data["sensor_modality"] == "lidar":
            first_lidar_tokens.append(sample_data_token)
        elif sample_data["sensor_modality"] == "radar":
            first_radar_tokens.append(sample_data_token)
        elif sample_data["sensor_modality"] == "camera":
            first_camera_tokens.append(sample_data_token)

    first_timestamp_us = nusc.get("sample_data", first_lidar_tokens[0])["timestamp"]
    max_timestamp_us = first_timestamp_us + 1e6 * max_time_sec

    log_lidar_and_ego_pose(result, location, first_lidar_tokens, nusc, max_timestamp_us)
    log_cameras(result, first_camera_tokens, nusc, max_timestamp_us)
    log_radars(result, first_radar_tokens, nusc, max_timestamp_us)
    log_annotations(result, location, first_sample_token, nusc, max_timestamp_us)

    return result


def log_cameras(
    scene: CompositeScene,
    first_camera_tokens: list[str],
    nusc: nuscenes.NuScenes,
    max_timestamp_us: float,
) -> None:
    """Log camera data."""
    for first_camera_token in first_camera_tokens:
        current_camera_token = first_camera_token
        last_logged_timestamp = -10000
        while current_camera_token != "":
            sample_data = nusc.get("sample_data", current_camera_token)
            if max_timestamp_us < sample_data["timestamp"]:
                break
            sensor_name = sample_data["channel"]

            if sensor_name not in scene.streams:
                scene.streams[sensor_name] = EventStream(
                    id=sensor_name,
                    stream=ImageStream(
                        events=[],
                        camera_id=sensor_name + "-camera",
                        frame_of_reference_id=sensor_name + "-calibration",
                    ),
                )

            timestamp = sample_data["timestamp"] * 1e-6 - first_timestamp
            if hz > 0:
                timestamp *= hz
                timestamp = floor(timestamp)
            if hz > 0 and timestamp - last_logged_timestamp < 1.0:
                current_camera_token = sample_data["next"]
                continue
            last_logged_timestamp = timestamp

            data_file_path = nusc.dataroot / sample_data["filename"]

            # write_asset(data_file_path)
            event = URIEvent(
                uri=config.base_url + "/" + str(data_file_path),
                timestamp=timestamp,
            )
            scene.streams[sensor_name].stream.events.append(event)

            current_camera_token = sample_data["next"]


def log_lidar_and_ego_pose(
    scene: CompositeScene,
    location: str,
    first_lidar_token: list[str],
    nusc: nuscenes.NuScenes,
    max_timestamp_us: float,
) -> None:
    """Log lidar data and vehicle pose."""

    scene.streams["ego_vehicle"] = EventStream(
        id="ego_vehicle",
        stream=FORStream(events=[]),
    )

    last_logged_timestamp = -10000

    for current_lidar_token in first_lidar_token:
        while current_lidar_token != "":
            sample_data = nusc.get("sample_data", current_lidar_token)
            sensor_name = sample_data["channel"]

            if max_timestamp_us < sample_data["timestamp"]:
                break

            timestamp = sample_data["timestamp"] * 1e-6 - first_timestamp
            if hz > 0:
                timestamp *= hz
                timestamp = floor(timestamp)
            if hz > 0 and timestamp - last_logged_timestamp < 1.0:
                current_lidar_token = sample_data["next"]
                continue
            last_logged_timestamp = timestamp

            ego_pose = nusc.get("ego_pose", sample_data["ego_pose_token"])
            rotation = (
                Rotation.from_quat(ego_pose["rotation"], scalar_first=True)
                .as_matrix()
                .transpose()
                .flatten()
            )
            position = ego_pose["translation"]
            if timestamp == 0:
                global first_position
                first_position = position

            event = FOREvent(
                id="ego_vehicle",
                parent_FOR="root",
                position=sub(position, first_position),
                rotation=rotation,
                timestamp=timestamp,
            )
            scene.streams["ego_vehicle"].stream.events.append(event)

            current_lidar_token = sample_data["next"]

            data_file_path = nusc.dataroot / sample_data["filename"]

            if sensor_name not in scene.streams:
                scene.streams[sensor_name] = EventStream(
                    id=sensor_name,
                    stream=PCDStream(
                        events=[], frame_of_reference_id=sensor_name + "-calibration"
                    ),
                )

            data_file_path = nusc.dataroot / sample_data["filename"]
            pointcloud = nuscenes.LidarPointCloud.from_file(str(data_file_path))
            points = pointcloud.points[:3].T

            fields = ("x", "y", "z")
            types = (
                np.float32,
                np.float32,
                np.float32,
            )

            pc = pypcd4.PointCloud.from_points(points, fields, types)

            # strip .bin extension from filename
            new_path = str(data_file_path.parent / data_file_path.stem)
            pc.save(new_path)

            event = URIEvent(
                uri=config.base_url + "/" + new_path,
                timestamp=timestamp,
            )
            scene.streams[sensor_name].stream.events.append(event)


def log_radars(
    scene: CompositeScene,
    first_radar_tokens: list[str],
    nusc: nuscenes.NuScenes,
    max_timestamp_us: float,
) -> None:
    """Log radar data to the scene"""
    for first_radar_token in first_radar_tokens:
        current_camera_token = first_radar_token
        last_logged_timestamp = -10000
        while current_camera_token != "":
            sample_data = nusc.get("sample_data", current_camera_token)
            if max_timestamp_us < sample_data["timestamp"]:
                break
            sensor_name = sample_data["channel"]

            if sensor_name not in scene.streams:
                scene.streams[sensor_name] = EventStream(
                    id=sensor_name,
                    stream=PCDStream(
                        events=[], frame_of_reference_id=sensor_name + "-calibration"
                    ),
                )

            timestamp = sample_data["timestamp"] * 1e-6 - first_timestamp
            if hz > 0:
                timestamp *= hz
                timestamp = floor(timestamp)
            if hz > 0 and timestamp - last_logged_timestamp < 1.0:
                current_camera_token = sample_data["next"]
                continue
            last_logged_timestamp = timestamp

            data_file_path = nusc.dataroot / sample_data["filename"]
            current_camera_token = sample_data["next"]
            # write_asset(data_file_path)
            event = URIEvent(
                uri=config.base_url + "/" + str(data_file_path),
                timestamp=timestamp,
            )
            scene.streams[sensor_name].stream.events.append(event)


def log_sensor_calibration(
    scene: CompositeScene, sample_data: dict[str, Any], nusc: nuscenes.NuScenes
) -> None:
    """Log sensor calibration (pinhole camera, sensor poses, etc.) to the scene"""
    sensor_name = sample_data["channel"]
    calibrated_sensor_token = sample_data["calibrated_sensor_token"]
    calibrated_sensor = nusc.get("calibrated_sensor", calibrated_sensor_token)
    rotation = (
        Rotation.from_quat(calibrated_sensor["rotation"], scalar_first=True)
        .as_matrix()
        .transpose()
        .flatten()
        .tolist()
    )

    id = sensor_name + "-calibration"
    scene.streams[id] = EventStream(
        id=id,
        stream=FORStream(events=[]),
    )
    position = sub(calibrated_sensor["translation"], first_position)
    event = FOREvent(
        id=id,
        parent_FOR="ego_vehicle",  # "ego_vehicle",
        position=position,
        rotation=rotation,
    )
    scene.streams[id].stream.events.append(event)

    if len(calibrated_sensor["camera_intrinsic"]) != 0:
        intrinsic = calibrated_sensor["camera_intrinsic"]
        camera_id = sensor_name + "-camera"
        scene.streams[camera_id] = EventStream(
            id=camera_id,
            stream=CameraStream(
                events=[],
                frame_of_reference_id=id,  # might be "root"
            ),
        )

        event = CameraParamsEvent(
            timestamp=0,
            width_px=1600,
            height_px=900,
            intrinsics=CameraIntrinsics(
                fx=intrinsic[0][0],
                fy=intrinsic[1][1],
                ox=intrinsic[0][2],
                oy=intrinsic[1][2],
                s=intrinsic[0][1],
            ),
            extrinsics=CameraExtrinsics(
                position=(0, 0, 0),
                rotation=(0, 0, 1, -1, 0, 0, 0, -1, 0),
            ),
        )
        scene.streams[camera_id].stream.events.append(event)


def log_annotations(
    scene: CompositeScene,
    location: str,
    first_sample_token: str,
    nusc: nuscenes.NuScenes,
    max_timestamp_us: float,
) -> None:
    """Log 3D cuboids to the scene"""

    scene.streams["anns"] = EventStream(
        id="anns",
        stream=ModelStream(events=[], camera_id=None),
    )

    current_sample_token = first_sample_token
    last_logged_timestamp = -10000
    while current_sample_token != "":
        sample_data = nusc.get("sample", current_sample_token)
        if max_timestamp_us < sample_data["timestamp"]:
            break

        timestamp = sample_data["timestamp"] * 1e-6 - first_timestamp
        if hz > 0:
            timestamp *= hz
            timestamp = floor(timestamp)
        if hz > 0 and timestamp - last_logged_timestamp < 1.0:
            current_sample_token = sample_data["next"]
            continue
        last_logged_timestamp = timestamp

        ann_tokens = sample_data["anns"]
        geometries = []
        for ann_token in ann_tokens:
            ann = nusc.get("sample_annotation", ann_token)

            width, length, height = ann["size"]

            # Convert rotation to euler angles
            rotation = Rotation.from_quat(ann["rotation"], scalar_first=True).as_euler(
                "XYZ"
            )

            geometries.append(
                CuboidGeometry(
                    pose=Pose(
                        position=sub(ann["translation"], first_position),
                        orientation=rotation,
                    ),
                    size=(length, width, height),
                )
            )

        event = ModelEvent(
            timestamp=timestamp,
            geometries=geometries,
        )
        scene.streams["anns"].stream.events.append(event)

        current_sample_token = sample_data["next"]


def download_file(url: str, dst_file_path: pathlib.Path) -> None:
    """Download file from url to dst_fpath."""
    dst_file_path.parent.mkdir(parents=True, exist_ok=True)
    print(f"Downloading {url} to {dst_file_path}")
    response = requests.get(url, stream=True)
    with tqdm.tqdm.wrapattr(
        open(dst_file_path, "wb"),
        "write",
        miniters=1,
        total=int(response.headers.get("content-length", 0)),
        desc=f"Downloading {dst_file_path.name}",
    ) as f:
        for chunk in response.iter_content(chunk_size=4096):
            f.write(chunk)


def untar_file(
    tar_file_path: pathlib.Path, dst_path: pathlib.Path, keep_tar: bool = True
) -> bool:
    """Untar tar file at tar_file_path to dst."""
    print(f"Untar file {tar_file_path}")
    try:
        with tarfile.open(tar_file_path, "r") as tf:
            tf.extractall(dst_path)
    except Exception as error:
        print(f"Error unzipping {tar_file_path}, error: {error}")
        return False
    if not keep_tar:
        os.remove(tar_file_path)
    return True


def download_minisplit(root_dir: pathlib.Path) -> None:
    """
    Download nuScenes minisplit.

    Adopted from <https://colab.research.google.com/github/nutonomy/nuscenes-devkit/blob/master/python-sdk/tutorials/nuscenes_tutorial.ipynb>
    """
    MINISPLIT_URL = "https://www.nuscenes.org/data/v1.0-mini.tgz"

    zip_file_path = pathlib.Path("./v1.0-mini.tgz")
    if not zip_file_path.is_file():
        download_file(MINISPLIT_URL, zip_file_path)
    untar_file(zip_file_path, root_dir, keep_tar=True)


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Visualizes the nuScenes dataset ")
    parser.add_argument(
        "--root-dir",
        type=pathlib.Path,
        default=DATASET_DIR,
        help="Root directory of nuScenes dataset",
    )
    parser.add_argument(
        "--scene-name",
        type=str,
        default="scene-0061",
        help="Scene name to visualize (typically of form 'scene-xxxx')",
    )
    parser.add_argument(
        "--dataset-version", type=str, default="v1.0-mini", help="Scene id to visualize"
    )
    parser.add_argument(
        "--seconds",
        type=float,
        default=float("inf"),
        help="If specified, limits the number of seconds logged",
    )
    parser.add_argument(
        "--all",
        "-A",
        action="store_true",
        help="If specified, logs all scenes",
    )
    parser.add_argument(
        "--hz",
        type=float,
        default=0.0,
        help="Limit the sample rate",
    )
    args = parser.parse_args()

    # ensure_scene_available(
    #     root_dir=args.root_dir,
    #     dataset_version=args.dataset_version,
    #     scene_name=args.scene_name,
    # )

    nusc = nuscenes.NuScenes(
        version=args.dataset_version, dataroot=args.root_dir, verbose=False
    )

    scene_names: list[str] = [args.scene_name]

    if args.all:
        scene_names = [s["name"] for s in nusc.scene]

    scenes = [
        (
            log_nuscenes(
                nusc, scene_name, max_time_sec=args.seconds, sample_hz=args.hz
            ),
            scene_name,
        )
        for scene_name in scene_names
    ]
    write_upload_json(scenes)


PCD Ontologies

PCD Projects support the following object label types:
  • Cuboids
  • Segmentation
  • Polylines
  • Keypoints

Project Settings

Configure Label Editor templates to streamline the annotation and review experience for your Taskers.

Settings and Controls

Toolbar

Hot key: Ctrl + ICreates a new issue on the data unit.
Hot key: Shift + CThe POV centers on the object (vehicle, robot, drone) that captured the PCD.
Use this feature together with Center content.
  • ON: All points in the PCD workspace (on all axis) are averaged. Using Center content brings the POV to the center of all axis in the workspace.
  • OFF: Using Center content brings to POV to the object that captured the PCD.
Hot key: Shift + up arrow.Zooms into the workspace.
Hot key: Shift + down arrowZooms out of the workspace.
Specifies the “ground height” in the workspace. You can specify any value between the highest and lowest points, of the PCD, on the X axis.
Hot key: Option + RRadius indicators are useful guides when annotating in 3D space. For example, you might only want to annotate anything that comes within 3 meters of your object (vehicle, robot, drone). You would then use a radius of three and only annotate anything within that radius.
  • ON: Displays one or more radii centered on the object (vehicle, robot, drone) that captured PCD. You can select the color used for all radii.
  • OFF: Hides all the radii.
Hot key: Option + M
  • ON: Displays all PCD from all frames in the workspace at once.
  • OFF: Displays the PCD for that specific frame.
Hot key: Option + Shift + SThe Scene slicer is a planar point cloud data selection guard. Place and rotate a Scene slicer to confine point cloud data selection. Point cloud data on the opposite side cannot be selected for annotation.
Display or hide various elements in the PCD space.
  • Show top view: Displays view from the top in the right hand work area.
  • Show left view: Displays view from the left in the right hand work area.
  • Show right view: Displays view from the right in the right hand work area.
  • Show control hints: Displays workspace navigation hints.
  • Show camera switcher: Displays video views for the Scene.

Editor Settings - Scene

Specify using a mouse or trackpad to rotate or pan in the workspace.
We STRONGLY recommend using a mouse to rotate or pan in the workspace.
Show or hide a grid to aid in annotating PCD.
Show or hide a camera helper lines to aid in annotating PCD.
Specify the size of points in the workspace.
Specify the color of PCD in the workspace based on a number of options.
  • ON: Show all PCD in the workspace.
  • OFF: Hide all PCD in the workspace.
  • ON: Displays background in the workspace.
  • OFF: Hides background in the workspace.
  • ON: Displays all PCD from all frames in the workspace at once.
  • OFF: Displays the PCD for that specific frame.
Specifies the opacity of PCD in 2D views.
The default value is 0. PCD does not display in 2D views with a value of 0.
Specify PCD source to display in the workspace.

Editor

Click into the PCD workspace to navigate inside the workspace. The keyboard hints display when you can navigate the workspace.
Use the WASD keys to move along the X axis in the workspace.
Use the QE keys to move along the Y axis in the workspace.
Hold down the scroll wheel and move your mouse to rotate in the workspace.
Hold down the right mouse button and move your mouse to pan around the workspace.

Label and Review PCD Data

We strongly recommend that Taskers use a mouse when annotating or reviewing Scenes. Using a mouse makes annotating or reviewing Scenes significantly easier.
  1. Click Start task or Initiate to annotate a PCD data unit. The Label Editor opens with a PCD data unit ready for annotation.
  2. Use the Editor controls and Toolbar buttons to navigate the PCD workspace.
  3. Use the General Settings to customize and streamline the PCD workspace.
  4. Select an object label from the left-hand menu and begin annotating the PCD data unit.
  5. Use your input device (mouse or trackpad) to create a label in the PCD workspace.
  6. Select the label and adjust the label from the right-hand space and from the Toolbar.
  • Copy labels from one frame to another using Command + C and Command + V.
  • Copy labels from within the same frame using Command + C and Command + Shift + V.

Scene Formats

Encord supports various ways of importing/registering Scenes. All examples use the InputScene format as the root structure.

URL to PCD File

A Scene consisting of a single PCD file, in cloud storage: https://example.com/left_001.pcd This format automatically validates that the URL has a supported file extension

Stream of PCD Files without Timestamps

This format consists of multiple PCD files organized into a stream structure. For multiple PCD files without timing information, Encord assigns implicit timestamps of 1, 2, 3. At every time point T, Encord displays the last PCD available at or before T.
{
  "left_lidar": {
    "type": "point_cloud",
    "events": [
      {
        "uri": "https://example.com/left_001.pcd"
      },
      {
        "uri": "https://example.com/left_002.pcd"
      }
    ]
  }
}

Stream of PCD Files with Timestamps

The main difference between datasets that have timestamps and that do not is that Encord treats items in “frames” or sees time as continuous. If temporal information is missing from the PCD stream you can add the timestamp:
{
  "left_lidar": {
    "type": "point_cloud",
    "events": [
      {
        "uri": "https://example.com/left_001.pcd",
        "timestamp": 1634567890.123
      },
      {
        "uri": "https://example.com/left_002.pcd",
        "timestamp": 1634567890.223
      }
    ]
  }
}
A number of timestamp formats are supported (see Timestamp Formats).

Multiple PCD Streams (Left and Right Sensors)

Two synchronized PCD streams from different sensors:
{
  "left_lidar": {
    "type": "point_cloud",
    "events": [
      {
        "uri": "https://example.com/left_001.pcd",
        "timestamp": 1634567890.123
      },
      {
        "uri": "https://example.com/left_002.pcd",
        "timestamp": 1634567890.223
      }
    ]
  },
  "right_lidar": {
    "type": "point_cloud",
    "events": [
      {
        "uri": "https://example.com/right_001.pcd",
        "timestamp": 1634567890.125
      },
      {
        "uri": "https://example.com/right_002.pcd",
        "timestamp": 1634567890.225
      }
    ]
  }
}

Add Frame of Reference for Ego Vehicle

Add a frame of reference stream to represent an ego vehicle’s pose over time:
{
  "ego_vehicle": {
    "type": "frame_of_reference",
    "id": "ego_vehicle",
    "events": [
      {
        "timestamp": 1634567890.123,
        "pose": {
          "position": {
            "x": 0.0,
            "y": 0.0,
            "z": 0.0
          },
          "rotation": {
            "x": 0.0,
            "y": 0.0,
            "z": 0.0,
            "w": 1.0
          }
        }
      },
      {
        "timestamp": 1634567890.223,
        "pose": {
          "position": {
            "x": 1.5,
            "y": 0.2,
            "z": 0.0
          },
          "rotation": {
            "x": 0.0,
            "y": 0.0,
            "z": 0.1,
            "w": 0.995
          }
        }
      }
    ]
  },
  "left_lidar": {
    "type": "point_cloud",
    "frameOfReference": "ego_vehicle",
    "events": [
      {
        "uri": "https://example.com/left_001.pcd",
        "timestamp": 1634567890.123
      },
      {
        "uri": "https://example.com/left_002.pcd",
        "timestamp": 1634567890.223
      }
    ]
  },
  "right_lidar": {
    "type": "point_cloud",
    "frameOfReference": "ego_vehicle",
    "events": [
      {
        "uri": "https://example.com/right_001.pcd",
        "timestamp": 1634567890.125
      },
      {
        "uri": "https://example.com/right_002.pcd",
        "timestamp": 1634567890.225
      }
    ]
  }
}

Sensor-Specific Frames of Reference with Calibration

Create individual frames of reference for each sensor, with calibration relative to the ego vehicle:
{
  "ego_vehicle": {
    "type": "frame_of_reference",
    "id": "ego_vehicle",
    "events": [
      {
        "timestamp": 1634567890.123,
        "pose": {
          "position": {
            "x": 0.0,
            "y": 0.0,
            "z": 0.0
          },
          "rotation": {
            "x": 0.0,
            "y": 0.0,
            "z": 0.0,
            "w": 1.0
          }
        }
      },
      {
        "timestamp": 1634567890.223,
        "pose": {
          "position": {
            "x": 1.5,
            "y": 0.2,
            "z": 0.0
          },
          "rotation": {
            "x": 0.0,
            "y": 0.0,
            "z": 0.1,
            "w": 0.995
          }
        }
      }
    ]
  },
  "left_lidar_frame": {
    "type": "frame_of_reference",
    "id": "left_lidar_frame",
    "parentForId": "ego_vehicle",
    "events": [
      {
        "timestamp": 1634567890.123,
        "pose": {
          "position": {
            "x": 0.0,
            "y": 0.5,
            "z": 1.8
          },
          "rotation": {
            "x": 0.0,
            "y": 0.0,
            "z": 0.0,
            "w": 1.0
          }
        }
      }
    ]
  },
  "right_lidar_frame": {
    "type": "frame_of_reference",
    "id": "right_lidar_frame",
    "parentForId": "ego_vehicle",
    "events": [
      {
        "timestamp": 1634567890.123,
        "pose": {
          "position": {
            "x": 0.0,
            "y": -0.5,
            "z": 1.8
          },
          "rotation": {
            "x": 0.0,
            "y": 0.0,
            "z": 0.0,
            "w": 1.0
          }
        }
      }
    ]
  },
  "left_lidar": {
    "type": "point_cloud",
    "frameOfReference": "left_lidar_frame",
    "events": [
      {
        "uri": "https://example.com/left_001.pcd",
        "timestamp": 1634567890.123
      },
      {
        "uri": "https://example.com/left_002.pcd",
        "timestamp": 1634567890.223
      }
    ]
  },
  "right_lidar": {
    "type": "point_cloud",
    "frameOfReference": "right_lidar_frame",
    "events": [
      {
        "uri": "https://example.com/right_001.pcd",
        "timestamp": 1634567890.125
      },
      {
        "uri": "https://example.com/right_002.pcd",
        "timestamp": 1634567890.225
      }
    ]
  }
}

Image Stream without Camera Parameters

Add an image stream using an existing camera reference:
{
  "ego_vehicle": {
    "type": "frame_of_reference",
    "id": "ego_vehicle",
    "events": [
      {
        "timestamp": 1634567890.123,
        "pose": {
          "position": {
            "x": 0.0,
            "y": 0.0,
            "z": 0.0
          },
          "rotation": {
            "x": 0.0,
            "y": 0.0,
            "z": 0.0,
            "w": 1.0
          }
        }
      },
      {
        "timestamp": 1634567890.223,
        "pose": {
          "position": {
            "x": 1.5,
            "y": 0.2,
            "z": 0.0
          },
          "rotation": {
            "x": 0.0,
            "y": 0.0,
            "z": 0.1,
            "w": 0.995
          }
        }
      }
    ]
  },
  "left_lidar_frame": {},
  "right_lidar_frame": {},
  "left_lidar": {},
  "right_lidar": {},
  "front_camera_images": {
    "type": "image",
    "frameOfReference": "ego_vehicle",
    "camera": "front_camera_id",
    "events": [
      {
        "uri": "https://example.com/image_001.jpg",
        "timestamp": 1634567890.123
      },
      {
        "uri": "https://example.com/image_002.jpg",
        "timestamp": 1634567890.223
      }
    ]
  }
}

Image Stream with Camera Calibration

Add full camera calibration parameters with the image stream:
{
  "ego_vehicle": {
    "type": "frame_of_reference",
    "id": "ego_vehicle",
    "events": [
      {
        "timestamp": 1634567890.123,
        "pose": {
          "position": {
            "x": 0.0,
            "y": 0.0,
            "z": 0.0
          },
          "rotation": {
            "x": 0.0,
            "y": 0.0,
            "z": 0.0,
            "w": 1.0
          }
        }
      },
      {
        "timestamp": 1634567890.223,
        "pose": {
          "position": {
            "x": 1.5,
            "y": 0.2,
            "z": 0.0
          },
          "rotation": {
            "x": 0.0,
            "y": 0.0,
            "z": 0.1,
            "w": 0.995
          }
        }
      }
    ]
  },
  "left_lidar_frame": {},
  "right_lidar_frame": {},
  "left_lidar": {},
  "right_lidar": {},
  "front_camera_images": {
    "type": "image",
    "frameOfReference": "ego_vehicle",
    "camera": {
      "widthPx": 1920,
      "heightPx": 1080,
      "intrinsics": {
        "type": "simple",
        "fx": 1000.0,
        "fy": 1000.0,
        "ox": 960.0,
        "oy": 540.0,
        "model": {
          "type": "pinhole"
        }
      },
      "extrinsics": {
        "position": {
          "x": 0.0,
          "y": 0.0,
          "z": 2.0
        },
        "rotation": {
          "x": 0.0,
          "y": 0.0,
          "z": 0.0,
          "w": 1.0
        }
      }
    },
    "events": [
      {
        "uri": "https://example.com/image_001.jpg",
        "timestamp": 1634567890.123
      },
      {
        "uri": "https://example.com/image_002.jpg",
        "timestamp": 1634567890.223
      }
    ]
  }
}

Format Details

Pose Representations

Poses can be specified in multiple formats. Named Position + Quaternion:
{
  "position": {
    "x": 1.0,
    "y": 2.0,
    "z": 3.0
  },
  "rotation": {
    "x": 0.0,
    "y": 0.0,
    "z": 0.0,
    "w": 1.0
  }
}
Named Position + Euler Angles:
{
  "position": {
    "x": 1.0,
    "y": 2.0,
    "z": 3.0
  },
  "rotation": {
    "x": 0.1,
    "y": 0.2,
    "z": 0.3
  }
}
4x4 Affine Transform Matrix (column-major):
//(@formatter:off)
[
  1, 0, 0, 0,
  0, 1, 0, 0,
  0, 0, 1, 0,
  1, 2, 3, 1
]

Timestamp Formats

Encord supports multiple timestamp formats:
  • Unix timestamp (float): 1634567890.123
  • Unix timestamp (int): 1634567890
  • ISO datetime string: “2021-10-18T10:31:30.123Z”
  • Time-only string: “10:31:30.123”

Scene Configuration

You can specify coordinate system conventions:
{
  "worldConvention": {
    "x": "right",
    "y": "forward",
    "z": "up"
  },
  "cameraConvention": {
    "x": "right",
    "y": "down",
    "z": "forward"
  },
  "content": {...} // Your scene content here
}

PCD Concepts

Sensor Data Streams

Streams comprise a sequence of messages coming out of a sensor (LiDAR, camera, accelerometer) at discrete moments in time.

Stream Rendering and Data Access

Encord renders the latest available data per stream, where at any given time point T, the most recent data available at or before T from that stream is displayed. For streams without explicit timestamps, Encord assigns implicit sequential timestamps (1, 2, 3, etc.), allowing for consistent temporal ordering while maintaining flexibility in data ingestion.

Frame of Reference Hierarchies

Hierarchical Coordinate System Organization

Frame of reference hierarchies establish spatial relationships between different coordinate systems in a tree structure, with a root frame at the top. A typical hierarchy includes:
  • World Frame: The global reference frame, often representing a fixed point in the environment
  • Ego Vehicle Frame: The coordinate system of the primary platform (car, robot, etc.)
  • Sensor Frames: Individual coordinate systems for each sensor, positioned relative to the vehicle frame

Static vs Dynamic Transformations

Frame relationships can be either static (fixed relative position/orientation) or dynamic (changing over time). Static transforms are used for rigidly mounted sensors, while dynamic transforms represent moving parts or the motion of the entire system through the world. Dynamic transforms are represented as messages in a stream.

Coordinate System Conventions

Different domains use different coordinate system conventions. The format allows specification of both world and camera coordinate conventions:
  • World Convention: Typically right-handed systems where axes x,y,z represent directions like “right,” “forward,” and “up”
  • Camera Convention: Often follows computer vision conventions where axes might represent “right,” “down,” and “forward”

Camera Calibration and Image Distortion

Intrinsic Camera Parameters

To project 3D world points onto the 2D image plane, we need camera calibration that involves determining both intrinsic and extrinsic parameters. Intrinsic parameters are specific to the camera hardware and include:
  • Focal Length (fx, fy): The distance between the camera lens and the image sensor, typically measured in pixels
  • Principal Point (cx, cy): The coordinates of the image center where the optical axis intersects the image plane
  • Skew Coefficient: Accounts for non-rectangular pixels (rarely used in modern cameras)

Extrinsic Camera Parameters

Extrinsic parameters define the camera’s position and orientation in 3D space relative to the world coordinate system:
  • Rotation Matrix (R): Describes the camera’s orientation using a 3x3 rotation matrix
  • Translation Vector (t): Specifies the camera’s position in world coordinates

Lens Distortion Correction

Camera lenses can introduce distortions that cause straight lines to appear curved in images. The format supports distortion correction through distortion coefficients that model:
  • Radial Distortion: Caused by light rays bending more near the lens edges, creating barrel or pincushion effects
  • Tangential Distortion: Results from lens misalignment with the image sensor

Data Format Architecture

The format supports two primary data organization approaches: Message-Based JSON: Every stream and message within a stream is in a big JSON scene file. The data of point clouds or images themselves are URIs to external files. Container-Based Storage: Single files (MCAP, ROS bag, DB3) contain multiple sensor streams and their messages and include the point cloud and image data
I