HDF: Hierarchical Data Format

This is intended as a short introduction to the HDF data format. The complete HDF documentation can be found at http:/www.hdfgroup.org.


Background

HDF (Hierarchical Data Format) is a file format for sharing data in a distributed environment. It is orientated towards scientific data. HDF was developed at the National Center for Supercomputing Applications NCSA at the University of Illinois at Urbana Champaign.

In 2006, the HDF Group (THG) was formed as a not-for-profit corporation whose mission is to sustain the HDF technologies and to support worldwide HDF user communities with production-level software and services. It is a spin-off from the HDF group at NCSA. Their aim is to maintain the HDF libraries and tools. The project is open source, meaning that the libraries and utilities are freely available.

What is HDF?

At its lowest level, HDF is a physical file format for storing scientific data. At its highest level, HDF is a collection of utilities and applications for manipulating, viewing, and analyzing data in HDF files. Between these levels, HDF is a software library that provides high-level APIs and a low-level data interface.

The HDF package provides utilities for a non-sequential access to various records within that data base. HDF files are completely portable across computer architectures and operating systems. Integer and floating point numbers are stored within HDF files as binary streams of data, which shrinks the size of the file very significantly and which speeds reading and writing of such files. Also, the data are stored in an architecture independent manner, while at the same time remaining fully compliant with IEEE specifications, and HDF provides its own facilities for data compression, allowing it to be automatically compressed if required (this compression is also architecture and operating system independent).

The latest version (HDF5) had been developed to address the growing issues associated with storing, accessing and sharing complex and volumous data from a diverse range of backgrounds. In particular, it offers: HDF5 was created to address the data management needs of scientists and engineers working in high performance, data intensive computing environments. As a result, the HDF5 library and format emphasize storage and I/O efficiency. For instance, the HDF5 format can accommodate data in a variety of ways, such as compressed or chunked. HDF5 has a number of advantages over other common data formats, and is widely used.

Why use HDF?

The HDF-EOS project: In 1993 NASA chose HDF to be the standard file format for storing data from the Earth Observing System (EOS), which is a data gathering system of sensors (mainly satellites) supporting the Global Change Research Program. This is still supported and maintained by NCSA.

What's in the HDF file?

There are several types of HDF object; rasters, palettes, tables, and multi-dimensional arrays. The latest version of HDF (HDF5) can store two primary objects; datasets and groups. A dataset is essentially a multi-dimensional array of data elements, and a group is a structure for organising objects in an HDF5 file.

How to read HDF?

Some pointers to HDF tools are available from the HDF Group website at:

http://www.hdfgroup.org/tools/

The NCAR Command Language can read and write HDF4. More details available at:

http://www.ncl.ucar.edu/Document/Manuals/Ref_Manual/NclFormatSupport.shtml#HDF