BOX - beginning of expedition
EOX - end of expedition
The DBA is responsible for shore EOX processing. In a pinch programmers and systems personnel can fill in (they just don't know it yet).
This expedition is ended. Within two weeks, science participants expect to have moratorium access to LIMS content. DBA, Systems/MCS, and sometimes Programmers need to coordinate to help this happen. The Database Librarian needs to know when these changes occur too.
Make sure we are getting as complete a copy of the data to be warehoused on shore as we can.
0. Notify that the data load is in progress
Lot's of folks want to know: Systems, Data Librarian, Programmers, EPMs, Managers, selected TAS staff.
It is helpful to chat with Curation. If there is a sampling party or post-cruise meeting in progress, it is preferrable to post-pone the Copy the content step to a less busy time. Otherwise activities up to that point can be completed without any production impact.
1. Retrieve data content from tape
Subsets of the EOX tapes are distributed to Publications, Data Librarian, DBA, Operations, and public access storage locations. - Systems
A copy of the shipboard database export is stage for DBA use at OEM:/backup/eox/
- The database dump is backed up to tape via the host
OEMJR:/backup/eox/
. TBD--confirm location and match in DEV and MCS SOPs. - When restored, the file is copied to
OEM:/backup/eox/
.
There should be one data dump file and an accompanying log of the export activity.
- When the file contains a full shipboard database export the naming pattern is like
limsjr_full-398.dmpdp
- When the file contains selected schema exports it is named like
limsjr_schemas-398.dmpdp
For the shore EOX work, the filename is typically shortened to the expedition designation, e.g. 398.dmpdp
.
2. Restore and review
Restore routinely selected database content from the full backup to the shore production transfer schema. - DBA
Once staged extract, transfer, and load scripts may be run.
- DBA scripting for the EOX (and other) processes are hosted at
OEM:/backup/scripts/rci_work/exp/
- This scripting is designed to be run from an ODA command-prompt.
- Privileges to sudo to the ODA oracle user are required.
Other pre-requisites.
- What database instance are you connecting to? You must be at a command-line for that ODA instance.
As the OS oracle user specify which database instance you are using with this syntax
. oraenv
import_transfer.ksh
When executed, this script
- Prompts for the local name (and extension) of the database export file. Upon <Enter>
- Shipboard content is imported into the TRANSFER schema. Indexes are created for the largest rowcount tables: new_sample, new_test, new_result.
- The script now prompts for the expedition of interest, e.g. 398, 397T, etc. Upon <Enter>
- The script imports reference copies of GEODCAT (renamed to include the expedition) and GEODP### schemas.
On completion--of the schema imports above--a variety of SQL statements are run to highlight what data was brought back and how many rows.
- This script imported 398 content over multiple tests between 30 and 40 minutes.
Additional data review scripts take another 10-15 minutes to run. - The script requires no TRANSFER schema preparation--all the tables required are present, the import methodology truncates the tables before importing new content.
- The content is imported directly from SCHEMAS exported on ship (LIMS, DESCINFO2, OPS, GEODCAT, GEODP###, etc.).
There is no duplication of keys incurred by the ETL cycle now in use.
3. Establish moratorium credentials and controls
For the new expedition data. - DBA
Pick a name and credential. The account is shared by a science party for the moratorium period. The role created with the same name ensures that science party is granted the exception to see its moratorium data.
eox_gen_in_moratorium_script.ksh [moratorium-group]
This script accepts a moratorium name, e.g. SHACKLETON, SANTORINI, etc.
It generates another script (below) that will be run to modify NEW_TEST, NEW_RESULT, and NEW_GEOD_DATA tables for moratorium concerns.
- If a moratorium group is not provided, the script will prompt.
- The script does not check these requirement
- The GROUP_NAME field that this string populates is varchar2(10)--10 characters or less. It is conventional to use a memorable uppercase name. Often the EPMs will provide both a name and a credential.
- The string must also be a valid name for a ROLE. Example: should not start with a number; must be valid for naming a Unix file.
in_moratorium_[moratorium-group].ksh
Running this script
- Ensures the current expedition content is tagged for moratorium control.
- Do review the script. Sometimes the EOX expedition number is cleared before running
eox_gen_in_moratorium_script.ksh
.
Upon running the script--wait.
- The NEW_RESULT table typically contains 20-30 million rows. This update takes the longest.
- The NEW_TEST and NEW_SAMPLE table are much faster due to being 3 orders of magnitude smaller.
- Timing is turned on--the scripts will show how long they took to run.
The result table took 2-5 minutes to complete (with indexes in place) for each of ORTLIMS, LIMS19, RTLIMS, and LIMSHQ.
crt_user_exp.sql
Review the script.
- Make a copy of it specifically for the expedition you are processing--e.g.
crt_user_398.sql
- Modify the user credential to reflect COMPLEX_PASSWORD requirements. If the EPM has provided a credential set, use that.
Run the script via SQL.
Provide an explicit path to the script to run it.
Pass the expedition as a parameter to the script. The pattern to follow based on expedition 398 activity
sql / as sysdba SQL>@/backup/scripts/rci_work/exp/crt_user_398.sql 398
Note that this script may be run at any time. It is free of any dependencies on other scripts here.
This credential needs to be distributed to the following email distributions for filing: EPM for the expedition in processing, it_ops@iodp.tamu.edu, programmers@iodp.tamu.edu, database@iodp.tamu.edu (Data Librarian), webmaster@iodp.tamu.edu
4. Copy the content
Into the publicly accessible LIMS. - DBA
chk_tablespace_free.ksh
Review the space available in LABWARE, LABWAREIDX.
grp_cnt_transfer.ksh, grp_cnt_lims.ksh
Review. Run both of these scripts to gather baseline row counts by moratorium group, expedition, and record status. Preparatory step before data is merged.
data_pre_03_dis_trig.ksh
Run this script to generate a pair of scripts that respectively enable and disable triggers.
Future: Remove this script by applying the use of "execute immediate" support for disabling respective triggers automatically in the TRANSFER.EOX_SHORE_MERGE_TRANSFER_NEW_TO_LIMS() script--around LIMS.SAMPLE, LIMS.TEST, LIMS.RESULT, LIMS.GEOD_DATA, LIMS.GEOD_METADATA.
dis_trig_all_<yyyymmdd_hhmmss>.sql
Run this script via SQL to disable all schema triggers.
- Ensures the data goes in faster. And
- Doesn't get modified (again) on the way by.
data_pre_05_without_DI_v6.ksh
Script deprecated--not modified for current use. Superseded for now by the script below.
call transfer.eox_shore_merge_transfer_new_to_lims()
This procedure merges TRANSFER schema NEW_* table content into LIMS. Including
- scales, depth, samples, tests, results; and
- two (recently added) GEODESC tables.
Please review the procedure. To run it
sql / as sysdba SQL> call transfer.eox_shore_merge_transfer_new_to_lims()
Noted
- Without a unique result_number index, the bulk merging of the NEW_RESULT content took upwards of 6 hours.
- With a unique index on NEW_RESULT.RESULT_NUMBER the bulk merge was observed to complete (once) 30 minutes, but more typically in 4 hours.
- Although the data is available, the script does not provide for the merge of DESCINFO2 data--system is no longer used, so omitting now.
Future: The scripts are constructed around MERGE. Test in LIMS19 or RTLIMS to see if this technique is more performant for the bulk data moves. Since RESULT is the largest table with the most indexes we can gain the most improvement in execution times by focussing on this table.
execute immediate "alter table lims.result nologging;" insert /*+append*/ lims.result select * from transfer.new_result where result_number in (select result_number from transfer.new_result minus select result_number from lims.result ); execute immediate "alter table lims.result logging;" commit;
grp_cnt_lims.ksh
Re-run this script for review purposes. Data formerly only in the TRANSFER schema should now be present in the same numbers in the LIMS schema.
enb_trig_all_<yyyymmdd_hhmmss>.sql
If the row counts match, reenable operational triggers.
Notify interested parties that new moratorium data is available in LIMS.
Spot-check and test its availability and completeness.
5. Update the Database Overview
Configuration updates provide summary of the expedition data for participants and future users. - DBA
dbo_compute_v4.sql <exp>
Run this script to compute new expedition summaries for display in the database OVERVIEW application: https://web.iodp.tamu.edu/OVERVIEW/
Original process notes below.
Re-worked below into process above--polishing and efficiency improvements welcome.
The scripts below are applicable to steps 2. thru 5.
EOX load and merge - rci_work/exp
The collection of scripts in this location is used to automate the shore components of EOX processing: retrieve last expedition content from the database dump, create and apply moratorium supports, merge the data into production; provide statistics which give quality assurance/quality control on the process.
These are the scripts (in-order) that are typically used
|