NASA-Ames file ingestion at BADC
The purpose of the ingestion scheme is
to move incoming files to the correct final data directory,
to confirm that these files are fully NASA-Ames compliant and to
extract information from these files in order
to maintain the BADC file catalogue.
These general requirements involve the following specific steps:
- Logging -
A cron task runs every 30 minutes to log incoming files.
Logged details include the submitted file name,
the user ID and the arrival date and time.
After a file has been logged its ownership is changed to
"acsoe" to prevent a subsequent supplier
overwriting it.
- Creating the catalogue ID -
At least once a week
(this frequency may be increased to daily when a tranche of
new files is expected)
the file-arrival log is scanned manually
and the names of any newly submitted files
checked against existing entries in the BADC file catalogue.
If no previous entries for this file name exist
then the file is given a BADC "generation number" of 1
otherwise it is given a generation number of
one greater than the latest entry known to the catalogue.
The newly submitted file is then assigned a unique catalogue file key
and its name, generation number, user id and creation time/date are
recorded in the BADC file catalogue.
- Archiving -
After the new file has been accurately catalogued
the generation number preceded by "-" is appended to the file name
and the file moved to the BADC long-term archive directory
/home/tornado/acsoe/cache.
- Directory identification -
The next step is to work out the correct /badc/acsoe/data directory
using the file name, the file extension, and the ACSOE file organisation rules
provided by the ACSOE data manager, Claire Reeves.
The submitted file is then copied to the correct data directory.
- Mechanical Corrections -
The copy process in step 4 may not produce an exact copy of the original file,
in particular
- all end-of-record markers will be set to <LF>
(this change happens automatically when you transfer files using ASCII-mode
ftp).
- all double quotes (") in the NASA-Ames header will be changed to spaces
- all leading spaces in the header will be removed
- all spaces between the keyword symbol and the keyword value will also be
removed
- Compliance Checking -
After any necessary mechanical corrections have been made
the file is tested for compliance
with the NASA-Ames data format and the ACSOE data submission rules.
At BADC a FORTRAN program NACHECK
is used.
Broadly speaking this program reads and interprets individual records
in the file header and lists the data tables record-by-record.
All lines in the table listing are
numbered and any line found not to contain the correct number of
primary variables is flagged and reported.
Further editing of the file to bring about full NASA-Ames compliance
can be undertaken at this stage.
- Cataloguing -
Next the information needed to update the BADC file catalogue
is extracted from file header.
This information is re-written in the form of a
shell script which is then executed to update the file catalogue.
- Plotting -
The BADC ingestion scheme also plots in gif format
the first meaningful primary variable in the file.
These gifs, identified by file name, are held in the directory
/home/tornado/acsoe/nachecks.
- Change recording -
The final step is to record for future reference
any differences existing between the final version of
the file stored in the data directory and the submitted version
held in /home/tornado/acsoe/cache.
These differences can be identified by file name and are stored
under /home/tornado/acsoe/filediff.
Go to:
Index page
Next doc. (FORTRAN program)