Elaboration and of outgoing programmer(s) EOX activities: discussion of EOX processes and variations: background notes, explanations, considerations, and troubleshooting for end-of-expedition data processing.
The actual scripts to run are here.
Negotiate a time window.
Establish when the final database snapshot will be taken.
The end-of-expedition process may be run at when data collection is complete.
The time window required is small. Current hardware You will need at least 2 hours to run the scripts, 4 for more productive expeditionsODA hardware has been observed to export 60 GiB of data in 20 min--while coreflow and full laboratory database uploads are in progress.
Speak with the MCS to gauge their timing for backups and end-of-site or end-of-expedition staging of take-home copies.
Speak with the Staff ScientistExpedition Project Manager.
Speak with the Technical Staff for each laboratory.
Speak with the technical staffLOs and ALOs.
When will you be done adding new data to the database?
Pick Agree on a time, then stick to it: any new data applied after this point will not return to shore this cycle. To capture it would require an additional end-of-expedition data-processing cycle.
Bring home a copy
The MCS place database snapshots and files on tape.
You may be requested to carry a tape home.
If you are carrying a travel drive, please bring a copy of both database snapshots home. The additional redundancy has value. Do not attempt to bring back the ASMAN file directory.
Procedural notes
Accounts. Three accounts are used to execute the end-of-expedition data processing. The oracle
user (via the ODA operating system) runs the database exports. The Oracle database accounts your-surname_dba
and transfer
are applied within the export utility. Credentials for oracle
and transfer
are stored in \\novarupta\vol1\tas\AD\pwsafe\ship.psafe3.
Executing database export (and import). This process requires the expdp
and impdp
commands. These are command-line data management tools not SQLPLUS commands. They should be run from the account oracle@k1.ship.iodp.tamu.edu
--i.e. the database host console. The utilities are database version specific.
Database cleanout. Step (2) is fast (since expedition 324). It guarantees that all the TRANSFER schema tables are empty for subsequent steps.
...
Deliver the EOX tech report
See the Confluence Lab Notebook space for prior examples of the development technical report. Keeping a daily activity log throughout the expedition reduces the effort to prepare the final report.
The report can be the basis of cross-over discussions. It is also a records of expedition activities and concerns that can provide context to future users of IODP data.
Spot check data copy efforts
Programmers are tasked with the database management effort. All IODP staff have some level of data management effort. Be aware of these. Learn about what the individuals do. On occasion an expedition is just crazy enough to require each other's support.
- Programmers wear the DBA hat on ship, so are tasked with being aware of data loaded there. And are responsible to ensure snapshots get home (via tape).
- Laboratory Technical Staff ensure lab log sheets, content at instrument hosts (IODP and 3rd party) are copied to DATA1 and sometimes subsets to USERVOL.
- The Ops Manager or Ops Engineer manage and monitor content from rig instrumentation. Data to return to shore is staged on DATA1.
- ALOs take care of datasets not assigned to other Technical Staff--sometime examples: bathymetry, navigation, site-fix, et al.
- LOs and ALOs take care of shipping/logistics concerns in co-ordination with others. Examples: Curator sample shipping concerns. Operations shipping of drilling supplies and components.
- The Publications Specialist has a complete snapshot of the Expedition Proceedings. They stage it where the MCS will take it to tape; and they hand-carry media back to HQ.
- The MCS take content staged to USERVOL, DATA1 and several other locations to ensure all expedition effort is returned to shore via tape.
Provide courier services
Good practice for backups. Bring home a copy. Leave a copy for reference.
The MCS place database snapshots and logs on tape. You may be requested to carry a tape home if your itinerary supports quick arrival of the data at HQ.
Developers have plenty of storage. Keep a copy of the database export into next expedition. If we need to reference it, it's there. If it is useful to have test data from that expedition, it's there.
If you have space on your development system, take a copy of the snapshot. The redundancy has value--especially when unusual circumstances arise. Do not attempt to bring back the ASMAN file directory.
If routine processes have gone smoothly, the redundant files may be expunged by the next expedition, or the next time you come out.
Honor moratorium
Once content is uploaded to LIMS, and raw files have gone to data1, and routine backup cycles have occurred: expedition-specific data may (in-general) be cleaned off of instrument hosts and workstations.
As laboratory technical staff manage this, communicate if you are able and desire to assist with this.
It is good practice to assist technical staff in this activity: both to raise awareness of its complexity, and to consider opportunities for automation and improvement. At busy times it can help reduce individual workload. Well managed labs will have staging areas where expedition-specific data is placed so it can be readily managed in this context.
It is good practice to confer with seasoned technical staff that manage this for their labs. There is variance between crews as to how and when these procedures are carried out. It is possible to do these activities on a site-by-site basis rather than just EOX.
This task occurs in both EOX and BOX processes so we have more sets of eyes and hands ensuring moratorium compliance.
If end-of-expedition activities are harried and crunched there is precedence to omit this step at EOX. Leave data on instrument hosts and workstations for (fresh) oncoming developers and technical staff to accomplish.
Accounts
Multiple accounts are used to execute the end-of-expedition data processing.
- The Oracle operating system account conducts database exports at the primary shipboard Oracle Data Appliance (ODA).
- Your personal DBA account is used to execute, monitor, and test the export process.
Executing database export (and import)
This process requires the expdp
and impdp
commands. These command-line data management tools are installed at the ODAs with the RDBMS software. They are not SQLPLUS commands. These extract|transfer|load (ETL) tools should be run from the Oracle OS account at the primary ODA via a terminal session. The utilities are database version specific. There is a parameter to tell the tools to behave compatibly with a specific RDBMS release (VERSION).
The tools are quite powerful and straightforward to use. An Overview of Oracle Data Pump.
PuTTY is the tool most often used for a command-line session.
Windows SSH within a Windows Terminal is also very effective. Operating it under Command (cmd.exe) is also effective. The blue powershell terminal is functional until you resize the window or change font sizes--then the output sequencing get confused.
Transfer schema cleanout
This script expunges TRANSFER schema table content using a fast process known as TRUNCATE. The cleanout procedure is useful for TRANSFER schema data management. The script is no longer required to conduct the shipboard EOX process.
Executing a stored procedure
Programmers prefer to run Oracle stored procedures from SQL Developer. They can also be executed via SQLPLUS, or command-line SQL--however, these command-line environments are only recommended under conditions where the minimum overhead of that tooling helps get the job done.
- The order of invocation is not critical. Each call to a stored procedure manages an atomic, consistent, isolated, durable subset of the data content.
- Repetition of a procedure is Ok. The scripts have been revised to prevent generation of duplicate copies of the data.
- Repetition of a procedure is required to capture multiple subsets of the data.
- Each procedure should return a message to the effect Script completed successfully. The
set serveroutput on
command enables this feedback. - For typical expedition data collection no script is expected to run more than 20 minutes. The procedures lims_expedition_to_new(), descinfo_to_new(), and ops_expedition_to_new() are the slowest as these move, respectively, the greatest number of bytes of data, and the greatest number of rows.
Monitoring execution progress. SQLDeveloper provides a "Monitor session..." facility that lets you know the process is still working. A positive feedback check is to use SQL to count the rows in the TRANSFER tables being updated. If the task was started within SQLDeveloper, menu item View > Task Progress offers some visual feedback on progress.
...