07e6556 WIP Add Kaitai Struct format specification
~sotirisp pushed to ~sotirisp/pdm git
719226c Use the GNU Free Documentation License v1.3
~sotirisp pushed to ~sotirisp/pdm git
The Portable Depth Map (PDM) is a simple image format specifically designed for depth images such as those captured by the Intel RealSense cameras or LIDAR sensors. Its design is inspired by the Netpbm family of image formats.
This is the structure of a PDM image:
PDM32
# Optional comment.
<width> <height>
<width*height 32-bit floats in row-major order containing distances in meters>
Some sample reader and writer implementations are provided here.
A PDM file image consists of a sequence of one or more PDM images. There are no
delimiters, data or padding of any kind before, between or after PDM images.
PDM files have the .pdm
extension.
Each PDM image consists of the following:
PDM32
magic number followed by a newline character (\n
/ 0x0A
/
LF).#
and extend to the
next newline character. They may be ignored by the image reader.0x20
), the height formatted as ASCII characters in decimal and a newline
character. The width and height of the image must be in the range [0, 2^32-1]
inclusive.width * height
IEEE 754 single precision floating
point numbers in row-major order. Each float must be stored in
little-endian byte order.The floating point values of zero, not-a-number (NaN) and negative infinity must all be considered as invalid or missing data. Positive infinity may be used to indicate a measurement that is too far away. This can be useful in cases where there's no actual measurement but it's known that there are no obstacles along a particular ray, e.g. in synthetic datasets or rays extending towards the sky.
Compression may be optionally provided by some external program such as gzip
,
bzip2
or xz
. The resulting file should have the appropriate extension
appended to its name, e.g. foo.pdm
would become foo.pdm.gz
or foo.pdm.xz
for gzip
or xz
compression respectively.
It is recommended that any other data required to interpret the images, such as camera parameters, be included as human-readable data in the comment section.
The file format was benchmarked using the fr1/desk sequence from the TUM RGB-D dataset.
The following table contains the image size comparison for the first depth
image in the dataset (1305031453.374112.png
). The lossless PNG optimization
was performed using optipng
.
PNG | Optimized PNG | PDM | PDM+gzip | PDM+bzip2 | PDM+LZMA |
---|---|---|---|---|---|
115 KiB | 73 KiB | 1.2 MiB | 81 KiB | 57 KiB | 62 KiB |
The following table contains the image size comparison for all 595 depth images
in the dataset. One important thing to note is that when converted to PDM all
depth images are placed in the same file. This means that all 4 PDM versions of
the dataset consist of a single file instead of 595 individual image files. The
lossless PNG optimization was again performed using
optipng
.
PNG | Optimized PNG | PDM | PDM+gzip | PDM+bzip2 | PDM+LZMA |
---|---|---|---|---|---|
70 MiB | 44 MiB | 697 MiB | 48 MiB | 33 MiB | 37 MiB |
The image format design was based on the following goals:
It is common to distribute depth images as 16-bit grayscale PNG images. One downside with this approach is that the scaling factor used isn't contained in the image data. Users of the image have to search the dataset documentation to find the appropriate scaling factor to convert 16-bit unsigned integers into floating point values in meters. The limited range of 16-bit unsigned integers also makes it difficult to encode depth images containing a large range of measurements, as is common in outdoor scenes.
It is possible to save one floating point value per pixel by abusing a PNG RGBA image. However the scaling factor still needs to be obtained separately from the image data and the image looks like something went wrong during encoding.
There are also floating point image (e.g. PFM, TIFF, OpenEXR) formats but they
typically assume that values are always in the range [0, 1]
inclusive. Due to
this some of the libraries used to read or write these kinds of images will
clamp data to this range. It is possible to scale the data to fit in the [0, 1]
range but then the scaling factor is no longer clear as in the case of
16-bit grayscale images. It is also possible to exponentiate the negated depth
value to obtain a value in the interval [0, 1]
. Apart from the fact that the
value of the exponent is not clear, the exponentiation of larger values common
in LIDAR data introduces loss of precision. The C expression
-log((float) exp(-100.123))
simulating storing a depth value using exponentiation and
loading it, returns the value 100.143436
, an error of 2 centimeters.
Adding compression would complicate the image format and require the use of a compression/decompression library. There are general purpose compression programs already installed in most systems that can be used for this purpose. Even though their compression ratio can be lower than image-specific compression methods, they are typically good enough as shown in the benchmarks.
Depth images may be produced by sensors with vastly different projection models. Depth cameras typically use a pinhole camera model, LIDARs use a spherical projection model and an orthographic projection might be used for a heightmap. Accounting for all the potential projection models would make the image format more complex.
The initial design of the PDM format contained a dedicated scale parameter so values other than meters could be stored. This has the added benefit that if depth measurements are known to be within a certain range higher precision can be retained. This wasn't deemed an important enough benefit considering the amount of precision already afforded by single precision floats in meters. Single precision floats have a precision of 6-7 decimal digits, thus even values of a few hundred meters have millimeter precision. This was deemed more than enough for the current sensors and applications. If this precision is deemed too little for certain applications a double precision floating point format can be introduced.
The textual header allows easy inspection of a file by humans. The binary row-major data allows direct reading and writing of image data since this is the format at which it's typically stored in memory.
The majority of systems where this format is expected to be used (x86, Linux on ARM) are little-endian. Requiring the data to be in little-endian order allows simplifying the file format by removing the need for a byte order indicator while still allowing the files to be portable on big-endian systems.
Having multiple images per file allows storing a dataset in only a few files. For example, if the PBM image format is used for color images, a full dataset could be a PDM file containing all depth images, a PBM file containing all corresponding color images and a text file containing the corresponding poses. Using a single file for all images allows better compression and faster data reading since opening files is a relatively slow operation.
The Portable Depth Map specification is licensed under the GNU Free Documentation License v1.3 or later. This license covers only the text of the specification. You are free to use PDM images and write code that manipulates PDM images under any license.