That’s Me

Gavin Soorma
  • Oracle Certified Professional
  • 7.3, 8i, 9i,10g, 11g
  • 11i Apps DBA OCP
  • 10g RAC OCE
  • Certified GoldenGate Implementation Specialist
  • 10g OCM
  • 11g OCM

  • Feedback

    2,409,840 hits

    Thanks A MILLION for your support!

    Please send me your valuable feedback and suggestions






    Oracle Goldengate Tutorial 8 – Filtering and Mapping data

    Oracle GoldenGate not only provides us a replication solution that is Oracle version independent as well as platform independent, but we can also use it to do data transformation and data manipulation between the source and the target.

    So we can use GoldenGate when the source and database database differ in table structure as well as an ETL tool in a Datawarehouse type environment.

    We will discuss below two examples to demonstrate this feature – column mapping and filtering of data.

    In example 1, we will filter the records that are extracted on the source and applied on the target – only rows where the JOB column value equals ‘MANAGER” in the MYEMP table will be considered for extraction.

    In example 2, we will deal with a case where the table structure is different between the source database and the target database and see how column mapping is performed in such cases.

    Example 1

    Initial load of all rows which match the filter from source to target. The target database MYEMP table will only be populated with rows from the EMP table where filter criteria of JOB=’MANAGER’ is met.

    On Source

    GGSCI (redhat346.localdomain) 4> add extract myload1, sourceistable
    EXTRACT added.

    GGSCI (redhat346.localdomain) 5> edit params myload1

    EXTRACT myload1
    USERID ggs_owner, PASSWORD ggs_owner
    RMTHOST devu007, MGRPORT 7809
    RMTTASK replicat, GROUP myload1
    TABLE scott.myemp, FILTER (@STRFIND (job, “MANAGER”) > 0);

    On Target

    GGSCI (devu007) 2> add replicat myload1, specialrun
    REPLICAT added.

    GGSCI (devu007) 3> edit params myload1

    “/u01/oracle/software/goldengate/dirprm/myload1.prm” [New file]
    REPLICAT myload1
    USERID ggs_owner, PASSWORD ggs_owner
    ASSUMETARGETDEFS
    MAP scott.myemp, TARGET sh.myemp;

    On Source – start the initial load extract

    GGSCI (redhat346.localdomain) 6> start extract myload1

    Sending START request to MANAGER …
    EXTRACT MYLOAD1 starting

    On SOURCE

    SQL> select count(*) from myemp;

    COUNT(*)
    ———-
    14

    SQL> select count(*) from myemp where job=’MANAGER’;

    COUNT(*)
    ———-
    9

    On TARGET

    SQL> select count(*) from myemp where job=’MANAGER’;

    COUNT(*)
    ———-
    9

    Create an online change extract and replicat group using a Filter

    GGSCI (redhat346.localdomain) 10> add extract myload2, tranlog, begin now
    EXTRACT added.

    GGSCI (redhat346.localdomain) 11> add rmttrail /u01/oracle/software/goldengate/dirdat/bb, extract myload2
    RMTTRAIL added.

    GGSCI (redhat346.localdomain) 11> edit params myload2

    EXTRACT myload2
    USERID ggs_owner, PASSWORD ggs_owner
    RMTHOST 10.53.200.225, MGRPORT 7809
    RMTTRAIL /u01/oracle/software/goldengate/dirdat/bb
    TABLE scott.myemp, FILTER (@STRFIND (job, “MANAGER”) > 0);

    On Target

    GGSCI (devu007) 2> add replicat myload2, exttrail /u01/oracle/software/goldengate/dirdat/bb
    REPLICAT added.

    GGSCI (devu007) 3> edit params myload2

    “/u01/oracle/software/goldengate/dirprm/myload2.prm” [New file]
    REPLICAT myload2
    ASSUMETARGETDEFS
    USERID ggs_owner, PASSWORD ggs_owner
    MAP scott.myemp, TARGET sh.myemp;

    On Source – start the online extract group

    GGSCI (redhat346.localdomain) 13> start extract myload2

    Sending START request to MANAGER …
    EXTRACT MYLOAD2 starting

    GGSCI (redhat346.localdomain) 14> info extract myload2

    EXTRACT MYLOAD2 Last Started 2010-02-23 11:04 Status RUNNING
    Checkpoint Lag 00:27:39 (updated 00:00:08 ago)
    Log Read Checkpoint Oracle Redo Logs
    2010-02-23 10:36:51 Seqno 214, RBA 103988

    On Target

    GGSCI (devu007) 4> start replicat myload2

    Sending START request to MANAGER …
    REPLICAT MYLOAD2 starting

    GGSCI (devu007) 5> info replicat myload2

    REPLICAT MYLOAD2 Last Started 2010-02-23 11:05 Status RUNNING
    Checkpoint Lag 00:00:00 (updated 00:00:08 ago)
    Log Read Checkpoint File /u01/oracle/software/goldengate/dirdat/bb000000
    First Record RBA 989

    On Source we now insert two rows into the MYEMP table – one which has the JOB value of ‘MANAGER’ and the other row which has the job value of ‘SALESMAN’


    On SOURCE

    SQL> INSERT INTO MYEMP
    2 (empno,ename,job,sal)
    3 VALUES
    4 (1234,’GAVIN’,’MANAGER‘,10000);

    1 row created.

    SQL> commit;

    Commit complete.

    SQL> INSERT INTO MYEMP
    2 (empno,ename,job,sal)
    3 VALUES
    4 (1235,’BOB’,’SALESMAN‘,1000);

    1 row created.

    SQL> commit;

    Commit complete.

    SQL> select count(*) from myemp;
    COUNT(*)
    ———-
    16

    SQL> select count(*) from myemp where job=’MANAGER’;

    COUNT(*)
    ———-
    10

    On Target, we will see that even though two rows have been inserted into the source MYEMP table, on the target MYEMP table only one row is inserted because the filter has been applied which only includes the rows where the JOB value equals ‘MANAGER’.

    SQL> select count(*) from myemp;

    COUNT(*)
    ———-
    10

    Example 2 – source and target table differ in column structure

    In the source MYEMP table we have a column named SAL whereas on the target, the same MYEMP table has the column defined as SALARY.

    Create a definitions file on the source using DEFGEN utility and then copy that definitions file to the target system

    GGSCI (redhat346.localdomain) > EDIT PARAMS defgen

    DEFSFILE /u01/oracle/ggs/dirsql/myemp.sql
    USERID ggs_owner, PASSWORD ggs_owner
    TABLE scott.myemp;

    [oracle@redhat346 ggs]$ ./defgen paramfile /u01/oracle/ggs/dirprm/defgen.prm

    ***********************************************************************
    Oracle GoldenGate Table Definition Generator for Oracle
    Version 10.4.0.19 Build 002
    Linux, x64, 64bit (optimized), Oracle 11 on Sep 18 2009 00:09:13

    Copyright (C) 1995, 2009, Oracle and/or its affiliates. All rights reserved.

    Starting at 2010-02-23 11:22:17
    ***********************************************************************

    Operating System Version:
    Linux
    Version #1 SMP Wed Dec 17 11:41:38 EST 2008, Release 2.6.18-128.el5
    Node: redhat346.localdomain
    Machine: x86_64
    soft limit hard limit
    Address Space Size : unlimited unlimited
    Heap Size : unlimited unlimited
    File Size : unlimited unlimited
    CPU Time : unlimited unlimited

    Process id: 14175

    ***********************************************************************
    ** Running with the following parameters **
    ***********************************************************************
    DEFSFILE /u01/oracle/ggs/dirsql/myemp.sql
    USERID ggs_owner, PASSWORD *********
    TABLE scott.myemp;
    Retrieving definition for SCOTT.MYEMP

    Definitions generated for 1 tables in /u01/oracle/ggs/dirsql/myemp.sql

    If we were to try and run the replicat process on the target without copying the definitions file, we will see an error as shown below which pertains to the fact that the columns in the source and target database are different and GoldenGate is not able to resolve that.

    2010-02-23 11:31:07 GGS WARNING 218 Aborted grouped transaction on ‘SH.MYEMP’, Database error 904 (ORA-00904: “SAL”: invalid identifier).

    2010-02-23 11:31:07 GGS WARNING 218 SQL error 904 mapping SCOTT.MYEMP to SH.MYEMP OCI Error ORA-00904: “SAL”: invalid identifier (status = 904), SQL .

    We then ftp the definitions file from the source to the target system – in this case to the dirsql directory located in the top level GoldenGate installed software directory

    We now go and make a change to the original replicat parameter file and change the parameter ASSUMEDEFS to SOURCEDEFS which provides GoldenGate with the location of the definitions file.

    The other parameter which is included is the COLMAP parameter which tells us how the column mapping has been performed. The ‘USEDEFAULTS’ keyword denotes that all the other columns in both tables are identical except for the columns SAL and SALARY which differ in both tables and now we are mapping the SAL columsn in source to the SALARY column on the target.

    REPLICAT myload2
    SOURCEDEFS /u01/oracle/software/goldengate/dirsql/myemp.sql
    USERID ggs_owner, PASSWORD ggs_owner
    MAP scott.myemp, TARGET sh.myemp,
    COLMAP (usedefaults,
    salary = sal);

    We now go and start the originall replicat process myload2 which had abended because of the column mismatch (which has now been corrected via the parameter change) and we see that the process now is running without any error.

    now go and start the process which had failed after table modification

    GGSCI (devu007) 2> info replicat myload2

    REPLICAT MYLOAD2 Last Started 2010-02-23 11:05 Status ABENDED
    Checkpoint Lag 00:00:03 (updated 00:11:44 ago)
    Log Read Checkpoint File /u01/oracle/software/goldengate/dirdat/bb000000
    2010-02-23 11:31:03.999504 RBA 1225

    GGSCI (devu007) 3> start replicat myload2

    Sending START request to MANAGER …
    REPLICAT MYLOAD2 starting

    GGSCI (devu007) 4> info replicat myload2

    REPLICAT MYLOAD2 Last Started 2010-02-23 11:43 Status RUNNING
    Checkpoint Lag 00:00:00 (updated 00:00:03 ago)
    Log Read Checkpoint File /u01/oracle/software/goldengate/dirdat/bb000000
    2010-02-23 11:31:03.999504 RBA 1461

    Coming Next! – Monitoring the GoldenGate environment …..

    7 comments to Oracle Goldengate Tutorial 8 – Filtering and Mapping data

    • shubha

      Hi

      Ur documents are very good for basic configuration.

      We have one query – can we replicate data if table having differnt number of columns on source & target database. suppose source database have 1 table with 10 columns,target database have 5 columns on that table.

      Waiting for your reply

      Thanks in advance

    • Hi Shubha – yes it is certainly possible – we can change table structure on the target – column names are different – data types are different – columns are fewer etc etc – read page 247 of the GoldenGate Administration Guide – it has reference to COLS and COLSEXCEPT keywords which basically enables us to specify the columns to include or exclude from the replication process – try it out and let me know how it goes as I have not tried that feature – thanks

    • [...] GoldenGate – using FILTER, COMPUTE and SQLEXEC commands Some time back I had posted a note on column mapping and data transformation using GoldenGate. [...]

    • Vinay Kumar

      Hi Gavin,

      Is it possible to map tables with different structure.
      Like in Source table we have 3 columns but in target table we want only 2 columns.
      How do we achieve that?
      Thanks in advance.

      Regards,
      Vinay Kumar G

    • Hi there is the COLMAP clause – have a look at the documentation Windows and Unix Administration Guide for more information

    • victor

      Hi Gavin,

      we are building a real-time ODS sourcing several Oracle OLTP systems, and want to run PL/SQL package at target to do an ETL-like transform at the end of GG replication, all synchronous, using GG as the invoker of the PL/SQL package. could you be kind enough to give an example of how one would source let’s say INVOICE headers, lines, and CUSTOMERS using GG, and then transform related records, namely invoice header, associated lines, and associated customer with a PL/SQL call, combining new CDC data for the sub-set that came from GG with already resident data on the target.. basic ETL pattern using both source and target data, all in GG.. Is this doable?

      thanks so much in advance!

    • Pradeep Gupta

      Good work Gavin, I am having 3 years of good exp in GoldenGate. I went thru some of your documents and found very clear and nicely written.

      Many thanks for all your blogs, I will surely recommend your site for beginners.

      Cheers

    Leave a Reply

      

      

      

    You can use these HTML tags

    <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>