THE KITTI VISION BENCHMARK SUITE: RAW DATA RECORDINGS
Karlsruhe Institute of Technology
Toyota Technological Institute at Chicago
www.cvlibs.net

This file gives more information about the KITTI raw data recordings.

General information about streams and timestamps

Each sensor stream is stored in a single folder. The main folder contains meta information and a timestamp file, listing the timestamp of each frame of the sequence to nanosecond precision. Numbers in the data stream correspond to each numbers in each other data stream and to line numbers in the timestamp file (0-based index), as all data has been synchronized. All cameras have been triggered directly by the Velodyne laser scanner, while from the GPS/IMU system (recording at 100 Hz), we have taken the data information closest to the respective reference frame. For all sequences image_00 has been used as the synchronization reference stream.

Rectified color + grayscale stereo sequences

Our vehicle has been equipped with four cameras: 1 color camera stereo pair and 1 grayscale camera stereo pair. The color and grayscale cameras are mounted close to each other (~6 cm), the baseline of both stereo rigs is approximately 54 cm. We have chosen this setup such that for the left and right camera we can provide both color and grayscale information. While the color cameras (obviously) come with color information, the grayscale camera images have higher contrast and a little bit less noise.

All cameras are synchronized at about 10 Hz with respect to the Velodyne laser scanner. The trigger is mounted such that camera images coincide roughly with the Velodyne lasers facing forward (in driving direction).

All camera images are provided as lossless compressed and rectified png sequences. The native image resolution is 1382x512 pixels and a little bit less after rectification, for details see the calibration section below. The opening angle of the cameras (left-right) is approximately 90 degrees.

The camera images are stored in the following directories:

image_00: left rectified grayscale image sequence
image_01: right rectified grayscale image sequence
image_02: left rectified color image sequence
image_03: right rectified color image sequence

Velodyne 3D laser scan data

The velodyne point clouds are stored in the folder velodyne_points. To save space, all scans have been stored as Nx4 float matrix into a binary file using the following code:

  stream = fopen (dst_file.c_str(),"wb");
  fwrite(data,sizeof(float),4*num,stream);
  fclose(stream);

Here, data contains 4*num values, where the first 3 values correspond to x,y and z, and the last value is the reflectance information. All scans are stored row-aligned, meaning that the first 4 values correspond to the first measurement. Since each scan might potentially have a different number of points, this must be determined from the file size when reading the file, where 1e6 is a good enough upper bound on the number of values:

  // allocate 4 MB buffer (only ~130*4*4 KB are needed)
  int32_t num = 1000000;
  float *data = (float*)malloc(num*sizeof(float));

  // pointers
  float *px = data+0;
  float *py = data+1;
  float *pz = data+2;
  float *pr = data+3;

  // load point cloud
  FILE *stream;
  stream = fopen (currFilenameBinary.c_str(),"rb");
  num = fread(data,sizeof(float),num,stream)/4;
  for (int32_t i=0; i<num; i++) {
    point_cloud.points.push_back(tPoint(*px,*py,*pz,*pr));
    px+=4; py+=4; pz+=4; pr+=4;
  }
  fclose(stream);

x,y and z are stored in metric (m) Velodyne coordinates.

IMPORTANT NOTE: Note that the velodyne scanner takes depth measurements continuously while rotating around its vertical axis (in contrast to the cameras, which are triggered at a certain point in time). This means that when computing point clouds you have to untwist the points linearly with respect to the velo- dyne scanner location at the beginning and the end of the 360° sweep. The time- stamps for the beginning and the end of the sweeps can be found in the time- stamps file. The velodyne rotates in counter-clockwise direction.

Of course this untwisting only works for non-dynamic environments.

The relationship between the camera triggers and the velodyne is the following: We trigger the cameras when the velodyne is looking exactly forward (into the direction of the cameras).

GPS/IMU 3D localization unit

The GPS/IMU information is given in a single small text file which is written for each synchronized frame. Each text file contains 30 values which are:

lat: latitude of the oxts-unit (deg)
lon: longitude of the oxts-unit (deg)
alt: altitude of the oxts-unit (m)
roll: roll angle (rad), 0 = level, positive = left side up (-pi..pi)
pitch: pitch angle (rad), 0 = level, positive = front down (-pi/2..pi/2)
yaw: heading (rad), 0 = east, positive = counter clockwise (-pi..pi)
vn: velocity towards north (m/s)
ve: velocity towards east (m/s)
vf: forward velocity, i.e. parallel to earth-surface (m/s)
vl: leftward velocity, i.e. parallel to earth-surface (m/s)
vu: upward velocity, i.e. perpendicular to earth-surface (m/s)
ax: acceleration in x, i.e. in direction of vehicle front (m/s^2)
ay: acceleration in y, i.e. in direction of vehicle left (m/s^2)
az: acceleration in z, i.e. in direction of vehicle top (m/s^2)
af: forward acceleration (m/s^2)
al: leftward acceleration (m/s^2)
au: upward acceleration (m/s^2)
wx: angular rate around x (rad/s)
wy: angular rate around y (rad/s)
wz: angular rate around z (rad/s)
wf: angular rate around forward axis (rad/s)
wl: angular rate around leftward axis (rad/s)
wu: angular rate around upward axis (rad/s)
posacc: velocity accuracy (north/east in m)
velacc: velocity accuracy (north/east in m/s)
navstat: navigation status
numsats: number of satellites tracked by primary GPS receiver
posmode: position mode of primary GPS receiver
velmode: velocity mode of primary GPS receiver
orimode: orientation mode of primary GPS receiver

To read the text file and interpret them properly an example is given in the matlab folder: First, use oxts = loadOxtsliteData('2011_xx_xx_drive_xxxx') to read in the GPS/IMU data. Next, use pose = convertOxtsToPose(oxts) to transform the oxts data into local euclidean poses, specified by 4x4 rigid transformation matrices. For more details see the comments in those files.

Coordinate Systems

The coordinate systems are defined the following way, where directions are informally given from the drivers view, when looking forward onto the road:

Camera: x: right, y: down, z: forward
Velodyne: x: forward, y: left, z: up
GPS/IMU: x: forward, y: left, z: up

All coordinate systems are right-handed.

Sensor Calibration

The sensor calibration zip archive contains files, storing matrices in row-aligned order, meaning that the first values correspond to the first row:

calib_cam_to_cam.txt: Camera-to-camera calibration

S_xx: 1x2 size of image xx before rectification
K_xx: 3x3 calibration matrix of camera xx before rectification
D_xx: 1x5 distortion vector of camera xx before rectification
R_xx: 3x3 rotation matrix of camera xx (extrinsic)
T_xx: 3x1 translation vector of camera xx (extrinsic)
S_rect_xx: 1x2 size of image xx after rectification
R_rect_xx: 3x3 rectifying rotation to make image planes co-planar
P_rect_xx: 3x4 projection matrix after rectification

Note: When using this dataset you will most likely need to access only P_rect_xx, as this matrix is valid for the rectified image sequences.

calib_velo_to_cam.txt: Velodyne-to-camera registration

R: 3x3 rotation matrix
T: 3x1 translation vector
delta_f: deprecated
delta_c: deprecated

R|T takes a point in Velodyne coordinates and transforms it into the coordinate system of the left video camera. Likewise it serves as a representation of the Velodyne coordinate frame in camera coordinates.

calib_imu_to_velo.txt: GPS/IMU-to-Velodyne registration

R: 3x3 rotation matrix
T: 3x1 translation vector

R|T takes a point in GPS/IMU coordinates and transforms it into the coordinate system of the Velodyne scanner. Likewise it serves as a representation of the GPS/IMU coordinate frame in Velodyne coordinates.

example transformations

As the transformations sometimes confuse people, here we give a short example how points in the velodyne coordinate system can be transformed into the camera left coordinate system.

In order to transform a homogeneous point X = [x y z 1]' from the velodyne coordinate system to a homogeneous point Y = [u v 1]' on image plane of camera xx, the following transformation has to be applied:

Y = P_rect_xx * R_rect_00 * (R|T)_velo_to_cam * X

To transform a point X from GPS/IMU coordinates to the image plane:

Y = P_rect_xx * R_rect_00 * (R|T)_velo_to_cam * (R|T)_imu_to_velo * X

The matrices are:

P_rect_xx (3x4): rectfied cam 0 coordinates -> image plane
R_rect_00 (4x4): cam 0 coordinates -> rectified cam 0 coord.
(R|T)_velo_to_cam (4x4): velodyne coordinates -> cam 0 coordinates
(R|T)_imu_to_velo (4x4): imu coordinates -> velodyne coordinates

Note that the (4x4) matrices above are padded with zeros and: R_rect_00(4,4) = (R|T)_velo_to_cam(4,4) = (R|T)_imu_to_velo(4,4) = 1.

Tracklet Labels

Tracklet labels are stored in XML and can be read / written using the C++/MATLAB source code provided with this development kit. For compiling the code you will need to have a recent version of the boost libraries installed.

Each tracklet is stored as a 3D bounding box of given height, width and length, spanning multiple frames. For each frame we have labeled 3D location and rotation in bird's eye view. Additionally, occlusion / truncation information is provided in the form of averaged Mechanical Turk label outputs. All tracklets are represented in Velodyne coordinates.

Object categories are classified as following:

Car
Van
Truck
Pedestrian
Person (sitting)
Cyclist
Tram
Misc

Here, Misc denotes all other categories, e.g., Trailers or Segways.

Reading the Tracklet Label XML Files

This toolkit provides the header cpp/tracklets.h, which can be used to parse a tracklet XML file into the corresponding data structures. Its usage is quite simple, you can directly include the header file into your code as follows:

  #include "tracklets.h"
  Tracklets *tracklets = new Tracklets();
  if (!tracklets->loadFromFile(filename.xml))
    <throw an error>
  <do something with the tracklets>
  delete tracklets;

In order to compile this code you will need to have a recent version of the boost libraries installed and you need to link against libboost_serialization.

matlab/readTrackletsMex.cpp is a MATLAB wrapper for cpp/tracklets.h. It can be build using make.m. Again you need to link against libboost_serialization, which might be problematic on newer MATLAB versions due to MATLAB's internal definitions of libstdc, etc. The latest version which we know of which works on Linux is 2008b. This is because MATLAB has changed its pointer representation.

Of course you can also directly parse the XML file using your preferred XML parser. If you need to create another useful wrapper for the header file (e.g., for Python) we would be more than happy if you could share it with us).

Demo Utility for projecting Tracklets into Images

In matlab/run_demoTracklets.m you find a demonstration script that reads tracklets and projects them as 2D/3D bounding boxes into the images. You will need to compile the MATLAB wrapper above in order to read the tracklets. For further instructions, please have a look at the comments in the respective MATLAB scripts and functions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raw_data_format.md

raw_data_format.md

General information about streams and timestamps

Rectified color + grayscale stereo sequences

Velodyne 3D laser scan data

GPS/IMU 3D localization unit

Coordinate Systems

Sensor Calibration

calib_cam_to_cam.txt: Camera-to-camera calibration

calib_velo_to_cam.txt: Velodyne-to-camera registration

calib_imu_to_velo.txt: GPS/IMU-to-Velodyne registration

example transformations

Tracklet Labels

Reading the Tracklet Label XML Files

Demo Utility for projecting Tracklets into Images

Files

raw_data_format.md

Latest commit

History

raw_data_format.md

File metadata and controls

General information about streams and timestamps

Rectified color + grayscale stereo sequences

Velodyne 3D laser scan data

GPS/IMU 3D localization unit

Coordinate Systems

Sensor Calibration

calib_cam_to_cam.txt: Camera-to-camera calibration

calib_velo_to_cam.txt: Velodyne-to-camera registration

calib_imu_to_velo.txt: GPS/IMU-to-Velodyne registration

example transformations

Tracklet Labels

Reading the Tracklet Label XML Files

Demo Utility for projecting Tracklets into Images