A Quadtree-Based Lightweight Data Compression Approach to Processing Large-Scale Geospatial Rasters

Summary

Huge amounts of geospatial rasters, such as remotely sensed imagery and environmental modeling output, are being generated with increasingly finer spatial, temporal, spectral and thematic resolutions. Given that CPUs on modern computer systems are three orders of magnitude faster than disk I/O speed and two orders of magnitude faster than network bandwidth, it becomes more and more beneficial to allocate computing power for real time compression and decompression to reduce I/O times in streaming large-scale geospatial rasters among CPU memory, disks and distributed computing nodes and file systems. In this study, we aim at developing a lightweight lossless data compression technique that balances the performance between compression and decompression for large-scale geospatial rasters. Our Bitplane bitmap Quadtree (or BQ-Tree) based technique encodes the bitmaps of raster bitplanes as compact quadtrees which can compress and index rasters simultaneously. The technique is simple by design and lightweight by implementations. Apart from to computing Z-order codes for cache efficiency, only bit level operations are required. Extensive experiments using 36 rasters of the NASA Shuttle Range Topography Mission (SRTM) 30 meter resolution elevation data with 20 billion raster cells have shown that our BQ-Tree technique is more than 4X faster for compression and 36% faster for decompression than zlib using a single CPU ore while achieving very similar compression ratios. Our technique further has achieved 10-13X speedups for compression and 4X speedups for decompression using 16 CPU cores. On our experiment machine equipped with dual Intel Xeon 8-core E5-2650V2 CPUs, our technique is able to compress the SRTM raster dataset in about 35 seconds and decompress back in 29 seconds. The result compares favorably with the best known technique with respect to both compression and decompression throughputs. We have made the source code package and a subset of the SRTM raster publically available to facilitate validation and cross comparisons.

Code, Data and Steps to Repeat Experiments

The following three programs (C++ code) allows interested readers to repeat our experiments using a sample dataset.
1) The code can be downloaded as gzipped tar ball by following this link.
2) The sample raster in BIL format can be downloaded as gzipped tar ball by following this link. (Warning: 672 MB).  The raster tile is the third quadrant of the fourth raster data file and has a dimension of 41400*18000 (~1.5 GB uncompressed) with a geographical coverage of (-111.5,33.0,-99.0, 38). More detailed information of the raster tile can be obtained by using the gdalinfo tool in the GDAL libary. 

To repeat our experiments using the sample dataset:
1) download and unzip the above two tar balls to your directory.
2) compile the four C++ programs by following the example command line for compilation at the beginning of each program. You can easily compile the programs to serial implementations by removing "-fopenmp" flag.
3) run bqt_omp_decompression, bqt_omp_compression, zlib_omp_compression and zlib_omp_decompression by following the example command line for execution at the beginning of each program. You will need to recompile to switch between the serial and multi-core implementations.

Related Publications

Jianting Zhang and Simin You (2014) . A Quadtree-Based Lightweight Data Compression Approach to Processing Large-Scale Geospatial Rasters . Technical Report [PDF].