Format Specification for Data Exchange By Steven E. Gaines R. Stephen Hipskind Version 1.2 12 January 1998 Version 1: 16 May 1990 Version 1.1: 6 February 1992 R.S.H. Voice: 650/604-5076 FAX: 650/604-3625 Internet: hipskind@cloud1.arc.nasa.gov S.E.G. Voice: 650/604-4546 FAX: 650/604-3625 Internet: gaines@cloud1.arc.nasa.gov ii Contents -------- Preface to Version 1.2 ........................................ iv Preface ....................................................... iv 1 Introduction ................................................ 1 2 File Naming Convention ...................................... 2 2.1 Aircraft data files .................................... 3 2.2 Ozonesonde data files .................................. 3 2.3 Radiosonde data files .................................. 4 3 Concepts and Structures ..................................... 4 4 Implementation Considerations ............................... 7 5 Notation .................................................... 8 6 Definitions ................................................ 10 7 ASCII File Format Specifications ........................... 14 7.1 Summary of data record formats ........................ 24 8 Examples ................................................... 25 iii Preface to Version 1.2 ---------------------- The file format standards defined in Version 1.2 are the same as those in the two previous versions. The major changes in this version are to emphasize the requirement that independent variables be monotonic, and to slightly change the notation for several parameters to show their dependence on the unbounded independent variable. In previous versions those dependencies have been implied, and shown by examples, but not clearly stated. Other minor changes have been made throughout the text to help clarify the concepts and requirements of the format standards, and variable definitions in some of the examples have been condensed. We appreciate the feedback we have received from users of the exchange files, and have used it as a guide for clarifying the standards. Preface ------- This document specifies format standards to be used to facilitate data exchange for aircraft missions managed by the Earth Science Division at NASA Ames Research Center. It is intended as a reference document for creating experimental datasets. The standards should be adhered to in all exchange files being contributed to the project archive, including instrumental measurements, theoretical calculations and operational data. It is important that the person responsible for actually generating a given dataset refer to this document when determining the format for that dataset. It is the responsibility of the principal investigator or team leader to make sure that the appropriate people have access to this document and that their data conforms to the format standards. The specifications described in this document grew out of an effort beginning with the 1987 Stratosphere Troposphere Exchange Project (STEP) to put the experimental aircraft data on a medium and in a format that would be accessible to all experiment participants during the field experiment. Flight planning could then take into account the data from a previous flight, increasing the likelihood of meeting the overall goals and objectives of a given campaign. The standards developed for STEP were also used in the 1987 Airborne Antarctic Ozone Experiment (AAOE) and the 1989 Airborne Arctic Stratospheric Expedition (AASE). The basic premise in specifying format standards was to create self-descriptive datasets using a prescribed header structure to contain information about the data in a given file. The STEP experiment used only the ER-2 aircraft and consisted primarily of in situ, time series data. It was this single dimensional data for which the original format specification was written. The use of remote sensing instruments on both the DC-8 and ER-2, and the generation of model output files creates iv multi-dimensional data which do not fit well into that original format. To better account for the variety of data, we have formalized some of the original concepts and have extended the requirements for header information to more adequately characterize the data. The file header now must include explicit specification of data dimensionality. We have also allowed for a more flexible specification of the data structure by the use of a file format index, described in the text. In writing this new format specification, a conscious effort was made to retain much of the progress made in past experiments towards creating an environment of free data exchange. We have tried to build a logical extension to the original concepts rather than making a radical departure from them. Those with experience in the previous experiments (STEP, AAOE and AASE) should recognize that, in many cases, the format for their exchange files will remain much the same under these new specifications with only relatively minor, but important, changes to the header entries. We want to acknowledge the fact that this document is the product of interactions between the authors and the experiment participants; indeed, the idea of data exchange standards was originally driven by a consensus of the participants, not by the "data managers". We appreciate the feedback that we have received, both written and oral, and have incorporated many of the suggestions into the final document. We also want to encourage everyone to feel free to contact us if they have any questions, problems or suggestions. v 1 Introduction --------------- This document describes a conceptual framework for specifying exchange data formats, and then gives a detailed description of the standard formats (although there are very important distinctions between measured quantities and those resulting from mathematical model calculations, for simplicity, the two will often be loosely termed "data"). Those considering writing exchange files must review the format options presented in this document and determine the format most suitable for recording their data. If none are deemed suitable, consult with the project archive manager to define a new format option. New file formats will be circulated to project participants as addenda to this document. The primary goal of instituting format standards for data exchange is to promote accessibility and ease of use of a variety of datasets from different instruments, platforms and numerical models. The specific goals of the proposed system are: * The exchange files must be readable on all computer systems commonly in use. These include PC's (MS DOS), Apple Macs, DEC VMS and Unix systems. * The exchange files must be self describing, such that the information needed to read the data is contained in an order dependent file header, and the minimum information required to analyze the particular dataset is contained within the file. * Maintain as much compatibility as possible with existing formats from previous experiments, while allowing flexibility to handle new datasets and formats. * Minimize the amount of software required to access diverse datasets by categorizing the datasets and allowing a minimal number of data formats. The complexity of any system of standards increases with increasing generality. This standard represents a compromise between simplicity and generality. The generality of the proposed system stems from the incorporation of a file format index which, by referencing pre-defined format options, defines the format of both the file header and the data records. Thus, new file formats can be incorporated at a future time without changing those defined in this document. The complexity of this system increases with the number of file formats, so an attempt has been made to minimize the number of format options while at the same time accommodate the existing standard data formats from previous experiments; the file header formats are, however, different from the older standards. An additional advantage of standardized file formats is that the data files can more easily be checked for format errors. Plans for future field experiments include computer programs to check the format of each data file as part of the procedure for submittal to the data archive. 1 The system described here assumes that all exchange files are in ASCII, because ASCII coded files are the most universally readable across computer systems from different vendors. It is anticipated that the same standards can be extended to include binary files as well. However, before that can be done, a convention for external data representation must be agreed upon due to the differences in internal representation on different machines. A special naming convention for binary files would also have to be adopted. Guidelines for choosing file names are given in Section 2. Section 3 describes the basic structure of the data files, the types of variables, and how they regulate the format specifications. Section 4 describes some precautionary measures to ensure readable files. The array and implied loop notation used to specify the formats are defined in Section 5, and a collection of definitions of the variables and terminology is contained in Section 6. The file format specifications are given in Section 7, with a summary of data record formats at the end of the Section. An example of each standard format is given in Section 8. Since any particular format option can accommodate a variety of types of data, the concepts, terminology, and format specifications are first presented in an abstract manner so not to bias or narrow their definition. The examples in Section 8 are included to provide a tangible link between the abstract definitions and actual exchange data files. It will, therefore, be useful to refer to these examples while (or before) reading the rest of this document. 2 File Naming Convention ------------------------- The main objective of a file naming convention is to convey as much information about the file within the limited number of characters allowed for the name. Because of MS DOS limitations, file names must be limited to a maximum of eight characters, followed by a period, and up to three characters in the file name extension. This is a severe restriction considering the many and varied types of measurements and modeling results that are being exchanged. For this reason, there can be no hard and fast rules for file names, but some guidelines for choosing file names are now offered, and the exceptions will be handled as they arise. In general, the file name consists of a two character prefix which, together with the extension, uniquely identifies the measurement or instrument. The prefix is followed by a six digit number which specifies the UT year, month, and day on which the data within the file begins. The file name extension is separated from the file name by a period, and consists of at most three characters which may identify the measuring platform, a flight number or time, or a volume number to allow for continuation of large files requiring more than one volume of the exchange medium (diskettes). 2 To eliminate conflicting file names, the two character file name prefixes must be decided upon prior to an experiment. Also, the extensions may have to be modified to accommodate new data collection platforms. It is suggested, however, that no matter what the platform, if the data within the file pertains to a particular date or time period then the file name should indicate the UT date on which the data begins. Several suggestions and examples will now be given. 2.1 Aircraft data files ------------------------ Aircraft data files have the standard file name of a two character prefix followed by a six digit number indicating the UT year, month, and day of takeoff. The three character extension starts with either an E (for ER-2) or a D (for DC-8), followed by an A or B to indicate the first or second flight on the indicated day, and ends with a number 1, 2, 3, etc., to indicate the volume number. Some examples are (the prefix MM stands for some arbitrary measurement): MM910116.EA1.....first ER-2 flight on 16 January 1991, volume 1 MM910116.DB1.....second DC-8 flight on 16 January 1991, volume 1 2.2 Ozonesonde data files -------------------------- If the ozonesonde data file contains sounding data for a single flight at a particular launch site then use the two character prefix to denote the launch site, followed by a six digit number indicating the UT date of launch. The first character of the extension might be a B (balloon), followed by two digits indicating the UT hour of launch. For example: LW910116.B17.....17 Z launch at Lerwick on 16 January 1991 Alternatively, an exchange file might include all ozonesonde soundings from a particular launch site for the duration of a particular mission. For this case, the two character prefix could denote the instrument, and the remaining six characters could be used to indicate the UT date on which the observations begin. The extension could denote the platform and launch site. For example: OS890116.BLW.....Lerwick soundings starting 16 January 1989 3 2.3 Radiosonde data files -------------------------- Radiosonde data files present a problem because there are so many launch sites, and possible launch times, that it is difficult to uniquely describe the location and launch time with 11 characters. For this reason, all radiosonde soundings for a particular time period might be lumped together in one file with a name like: RS910116.B12.....radiosonde soundings for 12 Z on 16 January 1991 3 Concepts and Structures -------------------------- The reason for writing an exchange file is to convey some measured, calculated, or otherwise derived quantity, which will be called the PRIMARY variable. There may be more than one PRIMARY variable in a given exchange file. In addition, there may be some ancillary information concerning the measurement, calculation, or interpretation of the PRIMARY variables or the data records containing their values. These secondary quantities will be referred to as AUXILIARY variables. Usually the inclusion of AUXILIARY variables is optional, but there are some format options in which they are required because they provide information about the ensuing data records. Both PRIMARY and AUXILIARY variables are considered as dependent variables, and are always recorded with reference to at least one INDEPENDENT variable. INDEPENDENT variables can be time, spatial coordinates, index values, or any other monotonic quantity that can be used to uniquely identify a particular PRIMARY variable value. Each INDEPENDENT variable represents a dimension on which the PRIMARY variables are dependent. PRIMARY variables are considered as discrete functions of the INDEPENDENT variables, whereas AUXILIARY variables are associated with an explicitly recorded INDEPENDENT variable. The information recorded within exchange files is of two types, either numeric or character string. Character strings may contain any printable ASCII character (ASCII decimal values between 32 and 126 inclusive), whereas numeric values are restricted to characters 0 through 9, the plus sign, the minus sign, the period, and the letter E used in exponential notation. Except for the purpose noted in Section 4, an exchange file must not contain non-printable ASCII characters. The non-printable characters have ASCII decimal values of 0 through 31, and values greater than 126. Each exchange file has a file header which conveys information about the PRIMARY, AUXILIARY, and INDEPENDENT variables, and the order in which they are recorded in the file. Rather than attempt to pre-define a single file header format which accounts for all existing data formats, as well as any future formats, a File Format Index (FFI) is used to uniquely define the exchange file format. By reference to pre-defined format options, the value of the FFI determines the number of 4 INDEPENDENT variables, whether the values of the INDEPENDENT and dependent variables are numeric or character string, the format of the file header, and the format of the data records. Included in the file header are descriptions and/or units of measure for the INDEPENDENT, PRIMARY, and AUXILIARY variables. All variables must be defined in the records in which they are expected to appear, and cannot be omitted or have blank spaces substituted for their values. Associated with each PRIMARY and AUXILIARY variable is a "missing" value to denote missing or erroneous data values. These missing values must be larger than any "good" data values recorded within the file so that a simple test on the magnitude of a data value will determine if it represents missing or usable data. A scale factor is associated with each numeric PRIMARY and AUXILIARY variable. The scale factors are included to encourage recording of the data as scaled whole numbers, without a decimal point or exponential notation, and thus reduce the size of the file. There are no scale factors or missing values for the INDEPENDENT variables. The order in which the PRIMARY and AUXILIARY variables are defined in the file header is the same order in which they are recorded in the data records. The order in which the INDEPENDENT variables are defined in the file header determines the dependence of the PRIMARY variables on the INDEPENDENT variables and, therefore, the manner in which the PRIMARY variables are recorded. The recorded dependence of the PRIMARY variables on the INDEPENDENT variables is such that, from the point of view of writing the data records, the most rapidly varying dimension is listed first in the file header, and the most slowly varying dimension is listed last. If the number of values in the most slowly varying dimension is not pre-determined (as with the time dimension in many cases) then it is termed the unbounded dimension. Of necessity, only one dimension, or INDEPENDENT variable, can be unbounded, while the others, if any, must be bounded. The number of values in the bounded dimensions are defined in either the file header or the data records. Values of the unbounded INDEPENDENT variable are explicitly recorded at pre-determined locations within the data records and are termed INDEPENDENT VARIABLE MARKS. The AUXILIARY variables, if any, are specified immediately after the INDEPENDENT VARIABLE MARKS, either within the same record or the subsequent records. The unbounded INDEPENDENT variable must be a monotonic quantity. The bounded INDEPENDENT variables, for a given INDEPENDENT VARIABLE MARK, must also be monotonic. As an illustration, consider airborne lidar measurements of ozone, recorded as a time sequence of vertical profiles of ozone. For this example, ozone number density is the PRIMARY variable, altitude above Earth's surface is the bounded INDEPENDENT variable, and a monotonic measure of time is the unbounded INDEPENDENT variable. The same records which contain time may also contain AUXILIARY variables. Since ozone values at all recorded altitudes are read for each time mark, the dependence on altitude is considered the more rapidly varying dependence, so altitude is the first INDEPENDENT variable defined in the file 5 header; time is defined second in the file header. An examination of the standard formats in Section 7 reveals that there are several file formats which could accommodate the data in this example, the selection of the most appropriate format depends on the nature of the altitude measurements. If the values of altitude are constant then FFI 2010 could be used, with the monotonic, constant altitudes defined in the file header. In this instance, AUXILIARY variables are optional, but one may wish to include additional information, say, aircraft longitude and latitude, in the AUXILIARY variable list. If the altitude values are variable, but the interval between the altitudes is constant, then FFI 2310 is more appropriate. In this instance the number of altitudes, base altitude value, and altitude increment are supplied in the AUXILIARY variable list. If the altitude values and the intervals between altitudes are variable then FFI 2110 is the most appropriate option, with the number of altitudes given in the AUXILIARY variable list, and the altitude values read from the records containing the ozone values. In these last two instances (FFI 2310, 2110), the indicated AUXILIARY variables are required, in the sense that they provide necessary information for reading subsequent data records, but one still has the option to include additional AUXILIARY variables. Also, the values of the bounded INDEPENDENT variable (altitude) in FFI 2110 and 2310 can be different for each INDEPENDENT VARIABLE MARK (time mark) and, therefore, are dependent on the unbounded INDEPENDENT VARIABLE. For each time mark the altitude values must be monotonic. The file header also contains information on the originators of the exchange file and their affiliations, the source of the PRIMARY variables, the mission which the data supports, and (by popular demand) the number of lines in the file header. The originators will often be the principal investigators for a particular instrument or model simulation, and the instrument and platform, or model, will be the source. At the beginning of each mission, a mission name will be decided upon and used in all exchange files. Also included, are the date for which the data applies, the date the data was reduced or revised (not necessarily the date the file was written, although the two may be the same), the volume number for the exchange file, and the total number of volumes required to record the complete dataset. For large datasets requiring more than one volume of the medium on which they are written (diskette, etc.), the data is continued in a new file, on a new volume, and after a file header with an incremented volume counter (see IVOL, NVOL in Section 6). There are allowances for three types of comments in the file header. Two of these comment types have reserved locations within the file header, and are associated with counters defining the number of lines occupied by each type of comment. The first type is for more complete descriptions of the variables, instrument, or other comments that apply in general to all of a particular kind of dataset; these are called normal comments. The second type, called special comments, are reserved to note special problems or circumstances concerning the data within a specific exchange file. If the exchange file is a revised dataset then it is recommended 6 that the special comments describe how it differs from the previous version of the dataset. The third type of comments are merely annotations which may follow numeric values; these comments must be contained on the same line as, and separated by at least one space from the last numeric value expected in the record. They should not be included in lines containing character values because the annotations can not easily be separated from the character string values. The data records immediately follow the file header records and continue to the end of the file. One or more spaces (ASCII decimal value 32) delimit successive numeric values within a line in both the file header and the data records. 4 Implementation Considerations -------------------------------- Even though an ASCII file is the most universally readable type of file, there are differences in the way different operating systems define the end of a line for ASCII text files. Therefore, some consideration must be given to the way in which files are transferred between machines with different operating systems. MS DOS uses the ASCII characters for carriage return and line feed to terminate each line, Macintosh uses just , Unix uses just , whereas VAX/VMS has control words at the beginning of each line which give the number of characters in the line. It is, therefore, impracticable to write an ASCII file which will appear as a native to every operating system. If inter-system file transfers are performed using some of the "standard" file transfer software (Kermit, FTP, DECnet-DOS, etc.) then the conversion to the appropriate end-of-line designator is done automatically during the file transfer, assuming it is not a bit for bit (binary) transfer. But if the file is written to some storage medium (diskette, tape, compact disc, etc.) under one operating system, and read from the medium by another operating system, then it may be necessary to rewrite or edit the file to a form with the appropriate end-of-line designator. Analogous to the end-of-line designator, the end-of-file designator differs with different operating systems, but is appropriately converted using standard file transfer software. There is currently no convenient solution to the above stated dilemma. It is mentioned mainly to alert originators and users of exchange files to the potential problems with transferring ASCII files between different operating systems. In the past, the MS DOS designators have been the convention for transferring files via diskette and compact disc, but this may change as technology and industry standards change. In any case, prior to each mission, the mode of file transfer, and the acceptable end-of-line and end-of-file designators, will be decided upon and communicated to the project participants. Except for the purpose of preparing a file for use on a different operating system, there must not be any extraneous 7 non-printable ASCII characters within an exchange file. The non-printable characters have ASCII decimal values of 0 through 31, and values greater than 126. For similar reasons, exchange files must not be Fortran output files with Fortran carriage control characters embedded within the file. Programming languages impose limitations on record length, magnitudes of integer and real numbers, and precision of real numbers. To comply with limitations in the most commonly used environments, the maximum record length in exchange files is 32766 characters. It is suggested that all numeric values be limited to seven significant digits within the magnitude range of 1.0E-38 to 1.0E+38. For numeric data, there should be an adequate number of digits to resolve the anticipated precision, but in the interest of minimizing the file size, the number of digits should not be larger than necessary. Also, unnecessary records of missing values should not be used to pad the beginning or end of the data section of an exchange file. If, for example, the data from an airborne instrument begins 10 minutes after takeoff, and terminates 10 minutes before landing, it is unnecessary to include 10 minutes of missing values before the data begins and after it terminates. 5 Notation ----------- The array and implied loop notation, used to generalize the exchange file format definitions given in Section 7, will now be explained. The notation is merely a convenient means of specifying the file formats, and not intended to indicate useful or desirable array structures in computer programs. Quantities enclosed in square brackets [ ] are read with one "read" statement and, therefore, the quantities occupy one record which may exceed one line. One or more quantities appearing in a line, and not enclosed in square brackets, are read as one record and constitute one line in the exchange file. Similar comments apply to writing the records, but the descriptions which follow in this Section are from the perspective of reading the records. The indices act merely as counters to indicate the dependence of some variable, but several indices are consistently used for special purposes. The index m is always used to count independent variable marks, and the implied loop over m is unbounded. The index s is the counter for the independent variables (dimensions), n for the primary variables, and a for the auxiliary variables. The usage of other indices (i,j,k) is less consistent, but usually they are counters for the bounded independent variable values. Consider the array X, which contains values of the independent variables on which the primary variables are dependent. To reference a specific array element we write X(2,1). To reference a general array element we write X(i,s), where i and s can assume any allowable values. To indicate the allowable 8 range of values for i and s we write X(i,s), i=1,NX(s), s=1,NIV; which states that s may take on integer values of 1 to NIV, and i may assume integer values of 1 to NX(s), the value of NX depending on the value of s. NIV is the number of independent variables, and NX(s) is the number of values for the s-th independent variable. Now consider the array V(X,n), which contains values of the primary variables as functions of two independent variables. Since NIV=2, V(X,n) may also be expressed as V(X(i,1),X(m,2),n), or simply as V(i,m,n). To completely specify the contents of V we write V(i,m,n), i=1,NX(1), n=1,NV, where NV is the number of primary variables, and NX(1) is the number of bounded independent variable values. It is implied that m can pertain to any independent variable mark within the file. For reading data records, the implied loop notation has a slightly different meaning, because then it implies that during the read operation the loop index will sequentially take on the values dictated by the loop limits. If the terminal value of a loop is smaller than the initial value, the implication is that the loop is not executed. Let the general expression for the data format be: [ X(m,2) ( A(m,a), a=1,NAUXV ) ] [ V(i,m,n), i=1,NX(1) ] n=1,NV In the above expressions X(m,2) represents the m-th independent variable mark for the unbounded independent variable; A(m,a) is the value of the a-th auxiliary variable at the m-th independent variable mark; and V(i,m,n) is the value of the n-th primary variable at the m-th independent variable mark and i-th bounded independent variable value. NAUXV is the number of auxiliary variables. The square brackets enclosing the first line of the expression indicate that an independent variable mark and NAUXV auxiliary variables are read as one record which may span more than one line. The second line of the expression is to be interpreted to mean that for the m-th independent variable mark there are NV records of primary variables, which will be read with n starting at a value of 1, incrementing by one for each record, and ending with a value of NV for the last record. The notation within the square brackets indicates that, for each record, NX(1) values of the n-th primary variable at the m-th independent variable mark are read. In this case, the constant values of the bounded independent variable (X(i,1), i=1,NX(1)) are read from the file header. 9 To be more specific, assume there are three auxiliary variables (NAUXV=3), two primary variables (NV=2), and four values for the bounded independent variable (NX(1)=4). Given these values for the loop limits, the general expressions for the data format imply the following record structure (the intra-record spacing between values is merely for clarity): [ X(m,2) A(m,1) A(m,2) A(m,3) ] [ V(1,m,1) V(2,m,1) V(3,m,1) V(4,m,1) ] [ V(1,m,2) V(2,m,2) V(3,m,2) V(4,m,2) ] [ X(m+1,2) A(m+1,1) A(m+1,2) A(m+1,3) ] [ V(1,m+1,1) V(2,m+1,1) V(3,m+1,1) V(4,m+1,1) ] [ V(1,m+1,2) V(2,m+1,2) V(3,m+1,2) V(4,m+1,2) ] [ X(m+2,2) A(m+2,1) A(m+2,2) A(m+2,3) ] etc., etc. If, for the sake of illustration, NAUXV=0 then according to the loop limits in the general expressions for the data format given above, the terminal value of the loop would be smaller than the initial value. The implication would then be that no auxiliary variables were present in the file and, therefore, none would be read from the file. The data records would remain the same as those in the above example, with the exception that there would be no auxiliary variables in the records containing the independent variable marks. 6 Definitions -------------- A(m,a): value of the a-th auxiliary variable at the m-th independent variable mark (a=1,NAUXV). If A(m,a) is real, the use of scaled whole numbers is encouraged. AMISS(a): a quantity indicating missing or erroneous data for the a-th auxiliary variable. The value of AMISS(a) must be larger than any "good" value of A(m,a) recorded in the file. The value of AMISS(a) defined in the file header is the same value that appears in the data records for missing/bad values of A(m,a). ANAME(a): a character string specifying the name and/or description of the a-th auxiliary variable, on one line and not exceeding 132 characters. Include units of measure the data will have after multiplying by the a-th scale factor, ASCAL(a). The order in which the auxiliary variable names are listed in the file header is the same order in which the auxiliary variables are read from the data records, and the same order in which the auxiliary variable scale factors and missing values are read from the file header records. ASCAL(a): scale factor (real) by which one multiplies recorded values of the a-th auxiliary variable to convert them to the units specified in ANAME(a). character string: a string of at most 132 printable ASCII characters occupying one line of an exchange file. The printable ASCII characters have ASCII decimal values between 32 and 126 inclusive. 10 DATE: UT date at which the data within the exchange file begins. For aircraft data files DATE is the UT date of takeoff. DATE is in the form YYYY MM DD (year, month, day) with each integer value separated by at least one space. For example: 1989 1 16 or 1989 01 16 for 16 January 1989. DX(s): interval (real) between values of the s-th independent variable, X(i,s), i=1,NX(s); in the same units as specified in XNAME(s). DX(s) is zero for a non-uniform interval. DX(s) is non-zero for a constant interval. If DX(s) is non-zero then it is required that NX(s) = (X(NX(s),s)-X(1,s)) / DX(s) + 1. For some file formats the value of DX also depends on the unbounded independent variable and is expressed as DX(m,s). FFI: file format index (integer). The FFI uniquely defines the file header and data formats. It is the second value recorded on the first line of an exchange file. The first (left-most) digit in the FFI gives the number of independent variables listed in the file header, the second digit gives the number of required (in the sense that they are necessary for reading the subsequent data records) auxiliary variables. The remaining digits are used to loosely associate file formats with similar characteristics. independent variable mark: a value of the unbounded independent variable which is explicitly recorded in the data records. Independent variable marks must be monotonic. integer: a whole number written without a decimal point. Leading zeros are insignificant. IVOL: volume number (integer) of the total number of volumes required to store a complete dataset, assuming only one file per volume. To be used in conjunction with NVOL to allow data exchange of large datasets requiring more than one volume of the exchange medium (diskette, etc.). LENA(a): integer number of characters used to record auxiliary variable A(m,a) when A(m,a) is represented as a character string. The value of LENA(a) must be less than 133. LENX(s): integer number of characters used to record independent variable X(i,s) when X(i,s) is represented as a character string. The value of LENX(s) must be less than 133. line: refers to a string of printable ASCII characters within an exchange file, terminated by the appropriate end-of-line (or new line) designator for the operating system on which the file resides. The maximum number of printable characters per line is 132. MNAME: a character string specifying the mission which the data is supporting, on one line and not exceeding 132 characters. The appropriate value for MNAME will be decided upon prior to the start of the mission. NAUXC: number of auxiliary variables (integer) whose values are recorded as character strings. If NAUXC=0 then no auxiliary variables are recorded as character strings. 11 NAUXV: number of auxiliary variables (integer). If NAUXV=0 then no auxiliary variables are recorded and no missing values, scale factors, or names for the auxiliary variables are present in the file header. NCOM(k): a character string containing the k-th normal comment line (k=1,NNCOML). NIV: number of independent variables (integer) on which the primary variables are dependent. NLHEAD: number of lines (integer) composing the file header. NLHEAD is the first recorded value on the first line of an exchange file. NNCOML: number of normal comment lines (integer) within the file header, including blank lines and data column headers, etc. Normal comments are those which apply to all of a particular kind of dataset, and can be used to more completely describe the contents of the file. If NNCOML=0 then there are no normal comment lines. NSCOML: number of special comment lines (integer) within the file header. Special comments are reserved to note special problems or circumstances concerning the data within a specific exchange file so they may easily be found and flagged by those reading the file. If NSCOML=0 then there are no special comment lines. NV: number of primary variables in the exchange file (integer). NVOL: total number of volumes (integer) required to store the complete dataset, assuming one file per volume. If NVOL>1 then each volume must contain a file header with an incremented value for IVOL, and continue the data records with monotonic independent variable marks. NVPM(s): integer number of independent variable values between independent variable marks, for the s-th independent variable. NVPM(s) = (X(m+1,s)-X(m,s)) / DX(s). NX(s): number of values (integer) for the s-th independent variable. If NX(s) is defined in the file header then it represents the constant number of values for the s-th independent variable. Otherwise, NX=NX(m,s) is defined in the data records and its values can vary with the independent variable marks. In the case of an unbounded independent variable, NX(NIV) is never specified in the file but the values of X(m,NIV) are read from the data records (independent variable marks). NXDEF(s): number of values (integer) of the s-th independent variable which are explicitly defined in the file header. If NXDEF(s)=NX(s) then all values of X(i,s), i=1,NX(s) are recorded in the file header. If NXDEF(s)=1 then only the first value, X(1,s), is recorded in the file header and the remaining values of X(i,s) are calculated as X(i,s) = X(1,s) + (i-1) * DX(s) for i=2,NX(s). 12 ONAME: a character string specifying the name(s) of the originator(s) of the exchange file, last name first. On one line and not exceeding 132 characters. ORG: character string specifying the organization or affiliation of the originator of the exchange file. Can include address, phone number, email address, etc. On one line and not exceeding 132 characters. RDATE: date of data reduction or revision, in the same form as DATE. real: a real valued number that may include a decimal point or be written in exponential notation. It is preferred that the values of real numbers be limited to seven significant digits within the magnitude range of 1.0E-38 to 1.0E+38. record: a logical record to be read by one "read" statement. The maximum record length is 32766 characters with a maximum of 132 characters per line. The first character of a record is also the first character of a line. SCOM(k): a character string containing the k-th special comment line (k=1,NSCOML). SNAME: a character string specifying the source of the measurements or model results which compose the primary variables, on one line and not exceeding 132 characters. Can include instrument name, measurement platform, etc. V(X,n): value of n-th primary variable (n=1,NV) at specified values of independent variables X. If V is real then the use of scaled whole numbers, without decimal points, is encouraged. VMISS(n): a quantity indicating missing or erroneous data values for the n-th primary variable. VMISS(n) must be larger than any "good" data value, of the n-th primary variable, recorded in the file. The value of VMISS(n) defined in the file header is the same value that appears in the data records for missing/bad values of V(X,n). VNAME(n): a character string giving the name and/or description of the n-th primary variable, on one line and not exceeding 132 characters. Include units of measure the data will have after multiplying by the n-th scale factor, VSCAL(n). The order in which the primary variable names are listed in the file header is the same order in which the primary variables are read from the data records, and the same order in which scale factors and missing values for the primary variables are read from the file header records. VSCAL(n): scale factor (real) by which one multiplies recorded values of the n-th primary variable to convert them to the units specified in VNAME(n). 13 X(i,s): i-th value of the s-th independent variable (X(i,s), i=1,NX(s), s=1,NIV). For some file formats the values of a bounded independent variable may also depend on the unbounded independent variable, and in those cases we will denote the bounded independent variable as X(i,m,s), with s 7 KM ABOVE AIRCRAFT (TRANSITION RANGE VARIES WITH SIGNAL STRENGTH) HORIZONTAL AVERAGING INTERVAL: 60 KM 30335 26 12819 75 10389 8 25 35 -13324 -945 1340 1519 1660 1779 1868 1939 1973 1992 1989 1955 1934 1897 1817 1721 1619 1514 1434 1343 1258 1203 1140 1088 1037 956 892 878 30360 22 12819 75 10383 8 26 0 -13322 -993 1351 1523 1658 1774 1860 1930 1962 1974 1966 1932 1909 1877 1803 1706 1600 1493 1407 1310 99999 99999 1094 1045 30384 93 12744 75 10378 8 26 24 -13312 -1031 934 1378 1541 1673 1782 1862 1925 1950 1956 1946 1912 1884 1843 1765 1667 1565 1457 1375 1279 1194 31 23 3010 {NLHEAD FFI} Mertz, Fred Pacific University NOAA/NMC grid point analyses AASE 1 1 {IVOL NVOL} 1989 1 16 1989 1 16 {DATE RDATE} 5.0 2.5 12.0 {DX(1), DX(2), DX(3)} 8 3 {NX(1), NX(2)} 1 1 {NXDEF(1), NXDEF(2)} -25 {X(1,1); X(i,1) = -25 -20 -15 -10 -5 0 5 10} 60.0 {X(1,2); X(j,2) = 60.0 62.5 65.0} East longitude (deg) Latitude (deg) Time (UT hours) from 00 hours on day given by DATE 2 {NV=number of primary variables} 1.0E-08 0.1 {scale factors for primary variables} 99999 9999 {missing values for primary variables} Potential vorticity (K m**2/(kg s)) on 400 K isentropic surface Temperature (K) on 400 K isentropic surface 0 {NAUXV=number of auxiliary variables} 0 {NSCOML} 0 {NNCOML} 0 1604 1597 1589 1578 1570 1578 1584 1589 {PV rec} 1598 1583 1561 1534 1506 1478 1447 1446 {PV rec} 1440 1439 1442 1469 1493 1512 1527 1537 {PV rec} 2234 2251 2259 2250 2247 2200 2194 2187 { T rec} 2194 2151 2159 2150 2147 2166 2175 2165 { T rec} 2121 2136 2140 2140 2138 2127 2111 2104 { T rec} 12 1532 1522 1509 1492 1472 1467 1459 1450 {PV rec} 1419 1433 1448 1465 1483 1503 1525 1567 {PV rec} 1670 1691 1711 1724 1737 1744 1745 1743 {PV rec} 2224 2241 2249 2240 2237 2200 2184 2177 { T rec} 2184 2141 2149 2140 2137 2156 2165 2155 { T rec} 2111 2126 2130 2130 2128 2117 2101 2101 { T rec} 24 1587 1578 1569 1558 1546 1533 1641 1626 32 24 4010 {NLHEAD FFI} Mertz, Fred Pacific University NOAA/NMC grid point analyses AASE 1 1 {IVOL NVOL} 1989 1 16 1989 1 16 {DATE RDATE} 5.0 2.5 40.0 0.0 {DX(1), DX(2), DX(3), DX(4)} 8 3 2 {NX(1), NX(2), NX(3)} 1 1 2 {NXDEF(1), NXDEF(2), NXDEF(3)} -25 {X(1,1); X(i,1)= -25 -20 -15 -10 -5 0 5 10} 60.0 {X(1,2); X(j,2)= 60.0 62.5 65.0} 400 440 {X(k,3)} East longitude (deg) Latitude (deg) Potential temperature (K) Time (UT hours) from 00 hours on day given by DATE 1 {NV=number of primary variables} 1.0E-08 {scale factor for primary variable} 99999 {missing value for primary variable} Potential vorticity (K m**2/(kg s)) 0 {NAUXV=number of auxiliary variables} 0 {NSCOML} 0 {NNCOML} 0 1604 1597 1589 1578 1570 1578 1584 1589 1598 1583 1561 1534 1506 1478 1447 1446 1440 1439 1442 1469 1493 1512 1527 1537 {last 400K rec} 3135 3151 3175 3198 3220 3240 3260 3278 3326 3348 3369 3389 3409 3428 3446 3465 3498 3492 3485 3476 3468 3459 3464 3446 {last 440K rec} 12 1532 1522 1509 1492 1472 1467 1459 1450 1419 1433 1448 1465 1483 1503 1525 1567 1670 1691 1711 1724 1737 1744 1745 1743 {last 400K rec} 3424 3419 3409 3396 3379 3354 3327 3297 3193 3158 3125 3095 3065 3037 3011 2998 2956 2938 2920 2914 2909 2906 2905 2906 {last 440K rec} 36 1587 1578 1569 1558 1546 1533 1641 1626 33