~sotirisp/pdm

The Portable Depth Map image format specification

07e6556 WIP Add Kaitai Struct format specification

~sotirisp pushed to ~sotirisp/pdm git

a month ago

719226c Use the GNU Free Documentation License v1.3

~sotirisp pushed to ~sotirisp/pdm git

a month ago

#The Portable Depth Map image format

The Portable Depth Map (PDM) is a simple image format specifically designed for depth images such as those captured by the Intel RealSense cameras or LIDAR sensors. Its design is inspired by the Netpbm family of image formats.

This is the structure of a PDM image:

PDM32
# Optional comment.
<width> <height>
<width*height 32-bit floats in row-major order containing distances in meters>

Some sample reader and writer implementations are provided here.

#Extended description

A PDM file image consists of a sequence of one or more PDM images. There are no delimiters, data or padding of any kind before, between or after PDM images. PDM files have the .pdm extension.

Each PDM image consists of the following:

  • The PDM32 magic number followed by a newline character (\n / 0x0A / LF).
  • Zero or more comment lines. Comment lines begin with # and extend to the next newline character. They may be ignored by the image reader.
  • The width formatted as ASCII characters in decimal, a space character (0x20), the height formatted as ASCII characters in decimal and a newline character. The width and height of the image must be in the range [0, 2^32-1] inclusive.
  • The image data stored as width * height IEEE 754 single precision floating point numbers in row-major order. Each float must be stored in little-endian byte order.

The floating point values of zero, not-a-number (NaN) and negative infinity must all be considered as invalid or missing data. Positive infinity may be used to indicate a measurement that is too far away. This can be useful in cases where there's no actual measurement but it's known that there are no obstacles along a particular ray, e.g. in synthetic datasets or rays extending towards the sky.

Compression may be optionally provided by some external program such as gzip, bzip2 or xz. The resulting file should have the appropriate extension appended to its name, e.g. foo.pdm would become foo.pdm.gz or foo.pdm.xz for gzip or xz compression respectively.

It is recommended that any other data required to interpret the images, such as camera parameters, be included as human-readable data in the comment section.

#Benchmarks

The file format was benchmarked using the fr1/desk sequence from the TUM RGB-D dataset.

#File size

The following table contains the image size comparison for the first depth image in the dataset (1305031453.374112.png). The lossless PNG optimization was performed using optipng.

PNG Optimized PNG PDM PDM+gzip PDM+bzip2 PDM+LZMA
115 KiB 73 KiB 1.2 MiB 81 KiB 57 KiB 62 KiB

The following table contains the image size comparison for all 595 depth images in the dataset. One important thing to note is that when converted to PDM all depth images are placed in the same file. This means that all 4 PDM versions of the dataset consist of a single file instead of 595 individual image files. The lossless PNG optimization was again performed using optipng.

PNG Optimized PNG PDM PDM+gzip PDM+bzip2 PDM+LZMA
70 MiB 44 MiB 697 MiB 48 MiB 33 MiB 37 MiB

#Design goals

The image format design was based on the following goals:

  • Allow storing a wide range of measurements without a significant loss in precision. Depth measurements may range from a few centimeters (in the case of depth cameras) to a few hundred meters (in the case of LIDARs).
  • The scale of the depth values should be unambiguous. People shouldn't have to look for documentation on whether the data is in meters, millimeters or some other unit.
  • Image readers and writers should be easy to implement. No specialized libraries should be required to use the image format.

#Design decisions

#Why not use an existing image format?

It is common to distribute depth images as 16-bit grayscale PNG images. One downside with this approach is that the scaling factor used isn't contained in the image data. Users of the image have to search the dataset documentation to find the appropriate scaling factor to convert 16-bit unsigned integers into floating point values in meters. The limited range of 16-bit unsigned integers also makes it difficult to encode depth images containing a large range of measurements, as is common in outdoor scenes.

It is possible to save one floating point value per pixel by abusing a PNG RGBA image. However the scaling factor still needs to be obtained separately from the image data and the image looks like something went wrong during encoding.

There are also floating point image (e.g. PFM, TIFF, OpenEXR) formats but they typically assume that values are always in the range [0, 1] inclusive. Due to this some of the libraries used to read or write these kinds of images will clamp data to this range. It is possible to scale the data to fit in the [0, 1] range but then the scaling factor is no longer clear as in the case of 16-bit grayscale images. It is also possible to exponentiate the negated depth value to obtain a value in the interval [0, 1]. Apart from the fact that the value of the exponent is not clear, the exponentiation of larger values common in LIDAR data introduces loss of precision. The C expression

-log((float) exp(-100.123))

simulating storing a depth value using exponentiation and loading it, returns the value 100.143436, an error of 2 centimeters.

#Why not add compression to the image format?

Adding compression would complicate the image format and require the use of a compression/decompression library. There are general purpose compression programs already installed in most systems that can be used for this purpose. Even though their compression ratio can be lower than image-specific compression methods, they are typically good enough as shown in the benchmarks.

#Why not store the camera parameters in a machine-readable format?

Depth images may be produced by sensors with vastly different projection models. Depth cameras typically use a pinhole camera model, LIDARs use a spherical projection model and an orthographic projection might be used for a heightmap. Accounting for all the potential projection models would make the image format more complex.

#Why fix the scale to meters?

The initial design of the PDM format contained a dedicated scale parameter so values other than meters could be stored. This has the added benefit that if depth measurements are known to be within a certain range higher precision can be retained. This wasn't deemed an important enough benefit considering the amount of precision already afforded by single precision floats in meters. Single precision floats have a precision of 6-7 decimal digits, thus even values of a few hundred meters have millimeter precision. This was deemed more than enough for the current sensors and applications. If this precision is deemed too little for certain applications a double precision floating point format can be introduced.

#Why use a hybrid text/binary file format?

The textual header allows easy inspection of a file by humans. The binary row-major data allows direct reading and writing of image data since this is the format at which it's typically stored in memory.

#Why store data in little-endian order?

The majority of systems where this format is expected to be used (x86, Linux on ARM) are little-endian. Requiring the data to be in little-endian order allows simplifying the file format by removing the need for a byte order indicator while still allowing the files to be portable on big-endian systems.

#Why allow multiple images per file?

Having multiple images per file allows storing a dataset in only a few files. For example, if the PBM image format is used for color images, a full dataset could be a PDM file containing all depth images, a PBM file containing all corresponding color images and a text file containing the corresponding poses. Using a single file for all images allows better compression and faster data reading since opening files is a relatively slow operation.

#License

The Portable Depth Map specification is licensed under the GNU Free Documentation License v1.3 or later. This license covers only the text of the specification. You are free to use PDM images and write code that manipulates PDM images under any license.