Elaboration of outgoing programmer(s) EOX activities: discussion of EOX processes and variations: background notes, explanations, considerations, and troubleshooting for end-of-expedition data processing.
The actual scripts to run are here.
Negotiate a time window.
Establish when the final database snapshot will be taken.
The end-of-expedition process may be run when data collection is complete.
...
Agree on a time, then stick to it: any new data applied after this point will not return to shore this cycle. To capture it would require an additional end-of-expedition data-processing cycle.
Deliver the EOX tech report
See the Confluence Lab Notebook space for prior examples of the development technical report. Keeping a daily activity log throughout the expedition reduces the effort to prepare the final report.
The report can be the basis of cross-over discussions. It is also a records of expedition activities and concerns that can provide context to future users of IODP data.
Spot check data copy efforts
Programmers are tasked with the database management effort. All IODP staff have some level of data management effort. Be aware of these. Learn about what the individuals do. On occasion an expedition is just crazy enough to require each other's support.
- Programmers wear the DBA hat on ship, so are tasked with being aware of data loaded there. And are responsible to ensure snapshots get home (via tape).
- Laboratory Technical Staff ensure lab log sheets, content at instrument hosts (IODP and 3rd party) are copied to DATA1 and sometimes subsets to USERVOL.
- The Ops Manager or Ops Engineer manage and monitor content from rig instrumentation. Data to return to shore is staged on DATA1.
- ALOs take care of datasets not assigned to other Technical Staff--sometime examples: bathymetry, navigation, site-fix, et al.
- LOs and ALOs take care of shipping/logistics concerns in co-ordination with others. Examples: Curator sample shipping concerns. Operations shipping of drilling supplies and components.
- The Publications Specialist has a complete snapshot of the Expedition Proceedings. They stage it where the MCS will take it to tape; and they hand-carry media back to HQ.
- The MCS take content staged to USERVOL, DATA1 and several other locations to ensure all expedition effort is returned to shore via tape.
Provide courier services
Good practice for backups. Bring home a copy. Leave a copy for reference.
...
If routine processes have gone smoothly, the redundant files may be expunged by the next expedition, or the next time you come out.
Honor moratorium
Once content is uploaded to LIMS, and raw files have gone to data1, and routine backup cycles have occurred: expedition-specific data may (in-general) be cleaned off of instrument hosts and workstations.
...
If end-of-expedition activities are harried and crunched there is precedence to omit this step at EOX. Leave data on instrument hosts and workstations for (fresh) oncoming developers and technical staff to accomplish.
Accounts
Multiple accounts are used to execute the end-of-expedition data processing.
- The Oracle operating system account conducts database exports at the primary shipboard Oracle Data Appliance (ODA).
- Your personal DBA account is used to execute, monitor, and test the export process.
Executing database export (and import)
This process requires the expdp
and impdp
commands. These command-line data management tools are installed at the ODAs with the RDBMS software. They are not SQLPLUS commands. These extract|transfer|load (ETL) tools should be run from the Oracle OS account at the primary ODA via a terminal session. The utilities are database version specific. There is a parameter to tell the tools to behave compatibly with a specific RDBMS release (VERSION).
...
Windows SSH within a Windows Terminal is also very effective. Operating it under Command (cmd.exe) is also effective. The blue powershell terminal is functional until you resize the window or change font sizes--then the output sequencing get confused.
Transfer schema cleanout
This script expunges TRANSFER schema table content using a fast process known as TRUNCATE. The cleanout procedure is useful for TRANSFER schema data management. The script is no longer required to conduct the shipboard EOX process.
Executing a stored procedure
Programmers prefer to run Oracle stored procedures from SQL Developer. They can also be executed via SQLPLUS, or command-line SQL--however, these command-line environments are only recommended under conditions where the minimum overhead of that tooling helps get the job done.
...
- They have the same columns as their production namesakes.
- The tables are intended to be empty at the beginning of EOX.
- These tables do not have any indexes, constraints, or foreign key dependencies.
Why multiple exports?
- Redundancy.
- Ensures that content gets home that is not part of routine EOX/BOX handling, including:
- Location-specific configuration [LIMS analysis, components, constants, instrument, and user definitions.]
- DESCINFO raw workspaces that were precursors to DescLogik uploads into the LIMS schema.
- Log content (PITALOG).
- Useful to provide real content for development, demo, test, and training databases.
Troubleshooting Notes
Archive logging--This is typically not an issue for EOX, but is still a very real issue for BOX.
Oracle treats the changes resulting from end-of-expedition processing just like any other transactions. Due to the size of these transactions it is possible (though unlikely) that EOX activities will exceed 20 GiB allocated for archive logging. When this occurs, the database blocks all activity until archive log space is freed. All attempts at new transactions will receive the message ORA-00257: archiver stuck
.
...
Monitor the generation of archive logs using the Enterprise Manager Console or the RMAN tool.
Export fails with permission errors
The export [in step (5) of the short form] requires that the system and transfer schemas have permission from Oracle and the ODA operating system to write files to the directory /u02/exports
.
This SQL verifies the existence of the directory named dmpdir. Inspection verifies whether it points to /u02/exports
.
...
These activities are driven by the MCS and DBA. Coordinate with both.
This script isn't working, why?
The underlying SQL for each procedure is viewable (and modifiable) under SQLDeveloper:
...
Most likely source of error: database model changes during the expedition. Model changes must be replicated to the various transfer tables and the EOX scripts.
Something went wrong, I want to start over
Not a problem. The function of these scripts is to make a copy of data for backup. There is no irreversible damage being done by any of these activities. It is sufficient to begin again at step (2) of the short form.
The source data schemas are not modified in any way by this process. The various stored procedures only copy from a source (LIMS, DESCINFO2, OPS, etc.) to the target TRANSFER schema.
Prior to 324, the cleanout method relied on Oracle delete. Oracle's guarantee of atomic, consistent, isolated, durable transactions translates to considerable overhead when millions of records are involved. The truncate
command bypasses this overhead, but does not guarantee atomicity, consistency, isolation, or durability.
Data cleanout
CLEANOUT_NEW_TABLES.
Conducts a fast and non-recoverable truncation of table content. Only applied on tables in the transfer
schema. DO NOT APPLY THE SAME TECHNIQUE on production tables as this command is not intended to process indexes, constraints, table statistics, and other table-related data objects that are affected by data removal.
The truncation command has been applied since expedition 324. Though overflow of the available archive logger space is still possible, use of the truncation command alleviates that issue for EOX processing.
Data copy architecture
LIMS_EXPEDITION_TO_NEW.
Copies data by expedition [not by project!] for sample, test, result and several other tables. The style is a series of insert into ... select * from ... statements.
- The script does not check that the tables are empty.
- The script may be run multiple times.
- Records that match will only be copied once (changed as of expedition 324).
- When new records are identified, they are accumulated in the transfer tables.
Useful Utilities
How do I recreate a missing TRANSFER table?
Before we give you the commands, some background information:
...
The source table for any given entry may be found by dropping the prefix (NEW_, LEGACY_, CONT_). Source tables are in the schemas LIMS, OPS, DESCINFO2.
*nix to Windows Copy examples
Various methods for transferring files if you aren't initially comfortable on a *nix box.Via PuTTY. From the database host to the local Windows box:
...
Tools such as IPSWITCH_FTP, FileZilla and WinSCP provide GUIs to perform the same task for both secure copy (scp) and secure file transfer (sftp) protocols. On the Unix/Linux side, scp
is the equivalent tool. Type man scp
for command-line help.
Are the copies you made any good? Are you sure?
Checksum tools read every byte of a file to generate a signature for the file. If the signatures differ between copies of a file, then the copies are not identical. The signatures should match, regardless of the platform used. These tools are particularly useful to verify transfers conducted over our (slow, less-reliable-than-i'd-like) satellite connection.
Code Block |
---|
Redhat$ md5sum *.dmpdp.bz2 Solaris> digest -a md5 -v transfer_741.dmpdp md5 (transfer_323.dmpdp) = 9a043b6b899b6eddebc70f30e7df450c DOS> md5sum transfer_323.dmpdp 9a043b6b899b6eddebc70f30e7df450c *transfer_741.dmpdp MacOSX$ md5 transfer_741.dmpdp MD5 (transfer_741.dmpdp) = 9a043b6b899b6eddebc70f30e7df450c |
TECHDOC Cumulus Content
TECHDOC is not brought home. No copy is needed. If it is required, the MCS will manage it.
Labstation / Instrument Host Cleanup
Work with the techs and research folks to get any data needed off of the labstations and cleanup any files on those that need doing. Procedures on this will evolve as we gain more experience.
...