This can be easily achieved in two ways:
a) Creating a procedure to delete those records.
b) Creating a custom knowledge module with a step which will delete these records.
Today I will be covering part b).
To do this, we need a logic to delete those records. In my case I have been using this Oracle SQL query:
delete from TABLE1where ROWID not in (select MAX(ROWID) from TABLE1GROUP BY column or columns...)This query uses CUSTOMER_ID to delete duplicates on CUSTOMER_ID.
But imagine that this scope can be much bigger, you can use whatever query needed for your own way of deleting duplicate records and then add that query as a step in your custom knowledge module.
So let's do some quick API calls and write the same query but in a knowledge module manner:
delete from <%=snpRef.getTable("L","TARG_NAME","A")%>So what this script does is it will obtain the Target's table name where the knowledge module is being used and it will obtain the columns marked as UK in ODI's quick edit tab.
where ROWID not in (
select MAX(ROWID)
from <%=snpRef.getTable("L","TARG_NAME","A")%>
GROUP BY <%=odiRef.getColList("","[COL_NAME]","\n\t,\t","","UK")%>)
The important part here is marking the columns with UK so the IKM will know which columns can be added in the GROUP BY statement in this case.
NOTE: Each column that is marked as UK, will be fetched in the GROUP BY statement.
Now let's prepare the interface and a dummy file.
This is the interface and the file that I will use with dummy data, assume that I want to delete one of the duplicates in CUSTOMER_ID column and I want to retain only one of them.
Navigate to Quick-Edit tab, and mark the columns that you want to GROUP BY as UK, in our case CUSTOMER_ID.
-----
Note that you can mark up to 10 columns as User Defined (UD1, UD2, UD3 and so on) in quick-edit, imagine you have a big query and you need to dynamically call via API those columns in the query, then it should look something like this:
<%=odiRef.getColList("","[COL_NAME]","\n\t,\t","","UD1")%> respectively UD2 and so on...
-----
Now we need to set up the Integration Knowledge Itself, in my case I need the same functionality as the IKM SQL Control Append but after the insert new rows statement, I need to delete duplicate records.
So first duplicate the original IKM SQL Control Append IKM so that we don't touch the original one.
Then we have to open it and navigate to Details tab, over there the tasks are listed in top-down order, so add a new step with the + sign on top right corner and rename it as Duplicate Rows Step, order it so it will be executed AFTER insert new rows statement (you can rearrange the order using up and down arrows next to + sign).
So this step is reference by Order 110, make sure that Commit done, and leave other parameters as shown in the screen shot.
Open the command and paste your code over there:
Make sure that the command is written in Command On Target tab.
Now we have created the custom knowledge module, we can also create an option for it if we want to disable this functionality of the KM for a particular run.
So now let's add an option to this IKM. Right click on the knowledge module created and select New Option
Rename that option and make sure you link the options Position with the Order number in Details step of the IKM. In our case Delete Duplicate Rows is in order 110:
I will set the default value to True, it can be optional however you may need it if you have many new custom steps if needed.
Now that we have completely set up this IKM, go to the interface you need it working, and choose this IKM in the flow tab of the interface:
Let's run the interface and see the steps:
| Code Executed by this step |
And this is what is left in the table: