Perth, Australia
+ (61) 417713124
prosolutions@gavinsoorma.com

GoldenGate Bounded Recovery

  • Posted by Gavin Soorma
  • On June 24, 2014
  • 2 Comments
  • Bounded Recovery, BR, BR interval, checkpoint, GoldenGate

The Oracle online redo log files contain both committed as well as uncommitted transactions, but Oracle GoldenGate only writes committed transactions to the trail files. So the question which can be asked is what happens to the transactions which are not committed or especially what happens to those uncommitted long running transactions.

Sometimes long running transactions in batch jobs can take several hours to complete. So until the long running transaction is not completed or committed how will GoldenGate handle the situation where an extract is reading from a particular online redo log file when the transaction starts and then with the passage of time other DML activity in the database causes that particular online redo log file to be archived – and then maybe that archive log file is not available on disk because the nightly RMAN backup job has deleted the archive log files from disk after the backup completes.

So GoldenGate has two kinds of recovery – Normal Recovery where the extract process needs all the archive log files starting from the current recovery read checkpoint of the extract and Bounded Recovery which is what we will discuss here with an example.

In very simple terms there is a Bounded Recovery (BR) Interval for an extract which by default is 4 hours and every 4 hours the extract process will makes a Bounded Recovery checkpoint. At every BR interval GoldenGate will check for any long running transactions which are older than the BR interval (which defaults to 4 hours) and writes information about the current state as well as data of the extract to disk – which again by default is the BR sub-directory in the GoldenGate software home location. This will continue at every BR interval until the long running transaction is committed or a rollback is performed.

In our extract parameter file we use the BR BRINTERVAL command :

BR BRINTERVAL 20M

 

This is what the official documentation states:

The use of disk persistence to store and then recover long-running transactions enables Extract to manage a situation that rarely arises but would otherwise significantly (adversely) affect performance if it occurred. The beginning of a long-running transaction is often very far back in time from the place in the log where Extract was processing when it stopped. A long-running transaction can span numerous old logs, some of which might no longer reside on accessible storage or might even have been deleted. Not only would it take an unacceptable amount of time to read the logs again from the start of a long-running transaction but, since long-running transactions are rare, most of that work would be the redundant capture of other transactions that were already written to the trail or discarded. Being able to restore the state and data of persisted long-running transactions eliminates that work.

 

In this example we will see how BR works by setting the BR Interval of the extract to a low value of 20 minutes and perform a INSERT statement which we do not commit. When we first issued the INSERT statement the extract process is reading from a particular online redo log (sequence #14878).

We switch some redo log files to simulate activity in the database and then backup and delete archivelog sequence 14878. We can see at every 20 minute interval Bounded Recovery is being performed and information that the extract needs about the long running transaction is being written to the BR directory on disk. Even though the archive log file is not present on disk, the extract process does not need that archive log file and uses the Bounded Recovery data which is present in the BR directory to then write data to the trail files when the long running transaction is finally committed.

We issue this INSERT statement and we will not commit the transaction – this is our test long-running transaction.

 

SQL> insert into myobjects
select object_id,object_name,object_type from dba_objects;

75372 rows created.

 

Check the online redo log sequence the extract is currently reading from – in this case it is 14878

 

GGSCI (kens-orasql-001-dev.corporateict.domain) 2> info ext1

EXTRACT EXT1 Last Started 2014-06-21 18:07 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:08 ago)
Process ID 15190
Log Read Checkpoint Oracle Redo Logs
2014-06-21 18:10:21 Seqno 14878, RBA 5936128
SCN 0.9137531 (9137531)

 

Using the SEND EXTRACT SHOWTRANS command, we can identify any transactions in progress or open

 

GGSCI (kens-orasql-001-dev.corporateict.domain) 4> send ext1 showtrans

Sending SHOWTRANS request to EXTRACT EXT1 …

Oldest redo log file necessary to restart Extract is:

Redo Log Sequence Number 14878, RBA 116752

————————————————————
XID: 10.16.1533
Items: 75372
Extract: EXT1
Redo Thread: 1
Start Time: 2014-06-21:18:10:14
SCN: 0.9137521 (9137521)
Redo Seq: 14878
Redo RBA: 116752
Status: Running

 

The INFO EXTRACT SHOWCH command gives us more information about the extract checkpoint information. Basically the position in the source (redo log/transaction logs) where it is reading from and the position in the target (trail file) where it is currently writing to.

It shows us the redo log file (or archive log file) which the extract first read when it started up (Startup Checkpoint) which is 14861.

It shows us the position of the oldest unprocessed transaction in the online redol log /archive redo logs files (Recovery Checkpoint) which is 14878 SCN and SCN 9137521.

Finally it shows us the current position in the online redo log file in terms of SCN where the extract last read a record (Current Checkpoint) which is sequence 14878 but the SCN had advanced to 9137612 because of some other activity in the database.

 

GGSCI (kens-orasql-001-dev.corporateict.domain) 5> info ext1 showch

EXTRACT EXT1 Last Started 2014-06-21 18:07 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:06 ago)
Process ID 15190
Log Read Checkpoint Oracle Redo Logs
2014-06-21 18:11:41 Seqno 14878, RBA 5977088
SCN 0.9137612 (9137612)

Current Checkpoint Detail:

Read Checkpoint #1

Oracle Redo Log

Startup Checkpoint (starting position in the data source):
Thread #: 1
Sequence #: 14861
RBA: 5918224
Timestamp: 2014-06-21 16:49:33.000000
SCN: 0.9129707 (9129707)
Redo File: /u01/app/oracle/fast_recovery_area/GGATE1/archivelog/2014_06_21/o1_mf_1_14861_9tbo7pys_.arc

Recovery Checkpoint (position of oldest unprocessed transaction in the data source):
Thread #: 1
Sequence #: 14878
RBA: 116752
Timestamp: 2014-06-21 18:10:14.000000
SCN: 0.9137521 (9137521)
Redo File: /u01/app/oracle/oradata/ggate1/redo03.log

Current Checkpoint (position of last record read in the data source):
Thread #: 1
Sequence #: 14878
RBA: 5977088
Timestamp: 2014-06-21 18:11:41.000000
SCN: 0.9137612 (9137612)
Redo File: /u01/app/oracle/oradata/ggate1/redo03.log

Write Checkpoint #1

GGS Log Trail

Current Checkpoint (current write position):
Sequence #: 3
RBA: 8130790
Timestamp: 2014-06-21 18:11:44.414364
Extract Trail: ./dirdat/zz
Trail Type: RMTTRAIL

 

After some time (more than 20 minutes) we issue the same SHOWCH command and let us look at the differences we see in the output of the command as compared to the previous SHOWCH.

We can see that because of database activity the extract is now reading from the online redo log sequence 14884 (earlier it was 14878).

But what has remained unchanged is the Recovery Checkpoint which is the oldest redo log sequence that the extract needs to access when the long running transaction which is currently in progress is finally committed.

 

GGSCI (kens-orasql-001-dev.corporateict.domain) 2> info ext1 showch

EXTRACT EXT1 Last Started 2014-06-21 18:07 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:04 ago)
Process ID 15190
Log Read Checkpoint Oracle Redo Logs
2014-06-21 18:40:34 Seqno 14884, RBA 72704
SCN 0.9139491 (9139491)

Current Checkpoint Detail:

Read Checkpoint #1

Oracle Redo Log

Startup Checkpoint (starting position in the data source):
Thread #: 1
Sequence #: 14861
RBA: 5918224
Timestamp: 2014-06-21 16:49:33.000000
SCN: 0.9129707 (9129707)
Redo File: /u01/app/oracle/fast_recovery_area/GGATE1/archivelog/2014_06_21/o1_mf_1_14861_9tbo7pys_.arc

Recovery Checkpoint (position of oldest unprocessed transaction in the data source):
Thread #: 1
Sequence #: 14878
RBA: 116752
Timestamp: 2014-06-21 18:10:14.000000
SCN: 0.9137521 (9137521)
Redo File: /u01/app/oracle/oradata/ggate1/redo03.log

Current Checkpoint (position of last record read in the data source):
Thread #: 1
Sequence #: 14884
RBA: 72704
Timestamp: 2014-06-21 18:40:34.000000
SCN: 0.9139491 (9139491)
Redo File: /u01/app/oracle/oradata/ggate1/redo03.log

 

We also see important information related to the Bounded Recovery (BR) Checkpoint via the INFO EXTRACT SHOWCH command.

As earlier mentioned we had changed the BR interval for this example to 20 minutes from the default value of 4 hours, so every 20 minutes at the BR interval (in this case will 18:07, 18:27. 18:47 and so on ), information about the current state and data of the extract wil be written to disk in the BR sub-directory.

So we see that at 18:27 BR interval, the BR checkpoint had persisted information from the redo log sequence 14881 to disk. So if there was a failure or if the extract was restarted, it will not need any redo log files or archive log files prior to sequence 14881

 

BR Previous Recovery Checkpoint:
Thread #: 0
Sequence #: 0
RBA: 0
Timestamp: 2014-06-21 18:07:35.982719
SCN: Not available
Redo File:

BR Begin Recovery Checkpoint:
Thread #: 0
Sequence #: 14878
RBA: 116752
Timestamp: 2014-06-21 18:10:14.000000
SCN: 0.9137521 (9137521)
Redo File:

BR End Recovery Checkpoint:
Thread #: 1
Sequence #: 14881
RBA: 139776
Timestamp: 2014-06-21 18:27:38.000000
SCN: 0.9138688 (9138688)
Redo File:

 

We can see that some files have been created in the BR directory for extract EXT1

 

GGSCI (kens-orasql-001-dev.corporateict.domain) 4> info ext1

EXTRACT EXT1 Last Started 2014-06-21 18:07 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:06 ago)
Process ID 15190
Log Read Checkpoint Oracle Redo Logs
2014-06-21 18:41:35 Seqno 14884, RBA 131072
SCN 0.9139583 (9139583)

GGSCI (kens-orasql-001-dev.corporateict.domain)

GGSCI (kens-orasql-001-dev.corporateict.domain) 3> shell ls -l ./BR/EXT1

total 20
-rw-r—– 1 oracle oinstall 65536 Jun 21 18:27 CP.EXT1.000000015
drwxr-x— 2 oracle oinstall 4096 Jun 19 17:07 stale

 

So what happens if we delete the old archive log sequence 14878 from disk. Since the BR checkpoint has already persisted information about the long running transaction which was contained in 14878 sequence to disk, it should not need to access this older archive log file.

To test this we take a backup of archive log sequence 14878 and then delete it. Remember this was the redo log sequence the extract was writing to when the long-running transaction first started.

 

RMAN> backup archivelog sequence 14878 delete input;

Starting backup at 21-JUN-14
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=24 device type=DISK
channel ORA_DISK_1: starting archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=14878 RECID=30497 STAMP=850846396
channel ORA_DISK_1: starting piece 1 at 21-JUN-14
channel ORA_DISK_1: finished piece 1 at 21-JUN-14
piece handle=/u01/app/oracle/fast_recovery_area/GGATE1/backupset/2014_06_21/o1_mf_annnn_TAG20140621T234659_9tcb7msp_.bkp tag=TAG20140621T234659 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
channel ORA_DISK_1: deleting archived log(s)
archived log file name=/u01/app/oracle/fast_recovery_area/GGATE1/archivelog/2014_06_21/o1_mf_1_14878_9tbpowlm_.arc RECID=30497 STAMP=850846396
Finished backup at 21-JUN-14

 

Let us now finally commit the long-running transaction.

 

SQL> insert into myobjects
2 select object_id,object_name,object_type from dba_objects;

75372 rows created.

SQL> commit;

Commit complete.

 

In the Extract EXT1 report, we can see information about the long-running transaction as well as the Bounded Recovery Checkpoint and we can see that every 20 minutes the redo log sequence for which the Bounded Recovery Checkpoint is happening is getting incremented.

 

2014-06-21 18:17:42 WARNING OGG-01027 Long Running Transaction: XID 10.16.1533, Items 75372, Extract EXT1, Redo Thread 1, SCN 0.9137521 (9137521), Redo Seq #14878, R
edo RBA 116752.

2014-06-21 18:27:41 INFO OGG-01971 The previous message, ‘WARNING OGG-01027’, repeated 1 times.

2014-06-21 18:27:41 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p23540_extr: start=SeqNo: 14878, RBA: 116752, SCN: 0.9137521 (9137521), Timest
amp: 2014-06-21 18:10:14.000000, end=SeqNo: 14881, RBA: 139776, SCN: 0.9138688 (9138688), Timestamp: 2014-06-21 18:27:38.000000, Thread: 1.

2014-06-21 18:47:50 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p23540_extr: start=SeqNo: 14885, RBA: 144912, SCN: 0.9139983 (9139983), Timest
amp: 2014-06-21 18:47:47.000000, Thread: 1, end=SeqNo: 14885, RBA: 145408, SCN: 0.9139983 (9139983), Timestamp: 2014-06-21 18:47:47.000000, Thread: 1.

2014-06-21 19:07:59 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p23540_extr: start=SeqNo: 14889, RBA: 176144, SCN: 0.9141399 (9141399), Timest
amp: 2014-06-21 19:07:56.000000, Thread: 1, end=SeqNo: 14889, RBA: 176640, SCN: 0.9141399 (9141399), Timestamp: 2014-06-21 19:07:56.000000, Thread: 1.

 

So a point to keep in mind:

If we are using the Bounded Recovery interval with the default value of 4 hours, then ensure that we keep on disk at least at a minimum archive log files for the past 8 hours to cater for anylong running transactions.

 1

2 Comments

satwinder
  • Jan 7 2017
Wonderful explanation of bound recovery. Now, we would be a bit more comfortable in reading the output of transaction or checkpoint related commands as well as report files. U are doing a great job in spreading GoldenGate knowhow to the world.
Tapas
  • Jan 23 2017
Great job ..its very informative .. BR concept precisely defined..

Leave Reply

Your email address will not be published. Required fields are marked *