Tuesday, October 29, 2019

Creating a custom Integration Knowledge Module to delete duplicates and retain one record based on a column

Scenario: Consider having records which are duplicate in some kind of column, assume you want to delete all those records which are duplicate for Customer_Id and leave only one record based on a particular query (logic).

This can be easily achieved in two ways:

a) Creating a procedure to delete those records.

b) Creating a custom knowledge module with a step which will delete these records.

Today I will be covering part b).


To do this, we need a logic to delete those records. In my case I have been using this Oracle SQL query:

delete from    TABLE1where ROWID not in (select MAX(ROWID) from    TABLE1GROUP BY column or columns...) 
This query uses CUSTOMER_ID to delete duplicates on CUSTOMER_ID.

But imagine that this scope can be much bigger, you can use whatever query needed for your own way of deleting duplicate records and then add that query as a step in your custom knowledge module.

So let's do some quick API calls and write the same query but in a knowledge module manner:

delete from    <%=snpRef.getTable("L","TARG_NAME","A")%>
where ROWID not in (
select MAX(ROWID)
from     <%=snpRef.getTable("L","TARG_NAME","A")%>
GROUP BY <%=odiRef.getColList("","[COL_NAME]","\n\t,\t","","UK")%>) 
So what this script does is it will obtain the Target's table name where the knowledge module is being used and it will obtain the columns marked as UK in ODI's quick edit tab.

The important part here is marking the columns with UK so the IKM will know which columns can be added in the GROUP BY statement in this case.

NOTE: Each column that is marked as UK, will be fetched in the GROUP BY statement.


Now let's prepare the interface and a dummy file.





This is the interface and the file that I will use with dummy data, assume that I want to delete one of the duplicates in CUSTOMER_ID column and I want to retain only one of them.


Navigate to Quick-Edit tab, and mark the columns that you want to GROUP BY as UK, in our case CUSTOMER_ID.

-----
Note that you can mark up to 10 columns as User Defined (UD1, UD2, UD3 and so on) in quick-edit, imagine you have a big query and you need to dynamically call via API those columns in the query, then it should look something like this:

<%=odiRef.getColList("","[COL_NAME]","\n\t,\t","","UD1")%> respectively UD2 and so on...
-----




Now we need to set up the Integration Knowledge Itself, in my case I need the same functionality as the IKM SQL Control Append but after the insert new rows statement, I need to delete duplicate records.

So first duplicate the original IKM SQL Control Append IKM so that we don't touch the original one.



Then we have to open it and navigate to Details tab, over there the tasks are listed in top-down order, so add a new step with the + sign on top right corner and rename it as Duplicate Rows Step, order it so it will be executed AFTER insert new rows statement (you can rearrange the order using up and down arrows next to + sign).


So this step is reference by Order 110, make sure that Commit done, and leave other parameters as shown in the screen shot.

Open the command and paste your code over there:



Make sure that the command is written in Command On Target tab.

Now we have created the custom knowledge module, we can also create an option for it if we want to disable this functionality of the KM for a particular run.

So now let's add an option to this IKM. Right click on the knowledge module created and select New Option



Rename that option and make sure you link the options Position with the Order number in Details step of the IKM. In our case Delete Duplicate Rows is in order 110:



I will set the default value to True, it can be optional however you may need it if you have many new custom steps if needed.


Now that we have completely set up this IKM, go to the interface you need it working, and choose this IKM in the flow tab of the interface:


Let's run the interface and see the steps:




Code Executed by this step

And this is what is left in the table:




Monday, October 28, 2019

Using .csv or .xlsx to populate an Oracle Table and Generating .xlsx or .csv from a table ODI11G

There are many business requirements in which you may need to populate oracle db tables by using excel elements as a source, or by using a db table to generate excel elements as target.

This post will demonstrate how to achieve that accordingly!

First things first, make sure that Topology is established accordingly, model folders, output file path is configured and source file path is configured.

I won't be covering that topic since it is a separate topic within ODI.


  • First part of this post is Populating an Oracle DB table via .csv source file.
First put the file in the source file folder so that we can read the correct .csv model using "Reverse Engineer feature". The idea is to give the parameters (delimiter, header etc) so that ODI can read the file appropriately.

For the sake of this example I will be creating a simple .csv document in excel.

Keep in mind that if your csv does not have a clean format (specified delimiters, text separators or correct headers) then your source file has to be cleaned before reading from ODI. Secondly keep in mind that naming header columns with spaces such as "First Name" is not preferred since ODI will rename it automatically First_Name or _First_Name which might not be the result desired. So clean your source file accordingly!
.csv Source File

Now lets create a data store in the model folder to read this specific file, I created this file manually in excel so it will be comma delimited, but there are cases in which the file might be tab delimited so be careful on that.
Data Store creation
Use the search button to find the file in the source file folder. the Alias is used to reference the columns in the target data store mapping expressions so make sure that you use a clear one.

After finding the file needed, navigate to"Files" tab so you can specify the parameters that ODI will be using to "Reverse Engineer" the file. In our case the file is comma delimited, it has a MS-DOS Record Separator and the first row is column names. To check the correct Record Separator I use Ultra Edit text editor.



Data Store model parameters
Now navigate to Columns tab, and select Reverse Engineer, after doing so, it should read the file appropriately, if it doesn't, then you have to check once again the parameters given in Files tab.


For HEIGHT and WEIGHT columns, ODI detected them as numeric, but I changed it to String for the sake of this example.

After saving this data store, create a new Interface in the project desired. For this example I will use the interface named CSV2TABLE.




Since I want to populate a table which will reside on the schema SOURCE2PRESTAGE_FILES_DB, make sure you select that as a staging area highlighted in the picture above, I hid Context due to privacy issues but you should select the context which maps the DB schema configured in Topology navigator (as mentioned I won't be covering that topic in this post).

Now navigate to mapping window and add the data store model created for the source file (.csv).

Now it's time to select the table which will be populated. We have two options here, you can either type manually within DBMS the create statement for the table needed as target, and then reverse engineering it and dragging it to target Data store. Or the option covered in this example where there is no table existing already so ODI will create the target table automatically using the same columns as in Source Data store.

This is achieved using the "Add To Target" option:

Add To Target

Doing so, make sure that the target table is named, otherwise it will throw an error.

After this step we need to navigate to Flow tab to specify the steps which ODI will cover when executing this interface.

For the Source set, select LKM File To SQL, this Load Knowledge Module is a built in Module which can correctly read Excel files. For Integration choose IKM SQL Control Append, but make sure that CREATE_TARG_TABLE is set to true because we don't have the target table existing!

Note: After the first time created, you can set the step of this IKM for creating the target table to false, or you can leave it to true but it will generate a new table each time it is executed.

Run the Interface and expect the table to be created.

  • Now lets try to generate a .csv file from an Oracle table
I will use the same table which I created from the file, but now I want to generate a .csv named: TOFILE.csv.

First I will have to create the data store at the Model which is used for output files (Configured in Topology), but we have to make sure that we write Resource Name .csv and since the file has to be created we won't be looking in the search icon for it, we have to write the name directly and specify the output parameters at the Files tab
Since the file has to generated and we don't have it existing, we have to manually create the columns in the Columns tab using the + sign on top right corner, after specifying Datatype and length make sure to save it before creating the interface which generates the file.


Now that we have the Data Store of Source Table and Target File created, we have to create an Interface and add the table to source tab, file to Target tab.

AFTER adding the source and target, make sure to check "Staging Area Different From Target" this tells ODI that the transformation has to occur in source (table level) because the target is a file. Since ODI relies on DB power to transform data, this step is a must.
Staging area has to be the schema in which the source table resides.


At the Flow tab we have to select only IKM (SQL TO FILE) since there won't be any LKM to generate the file


Run the the interface, and the output file will be generated.



NOTE:

There are some cases when there is a specific date format in the source table, and after generating it to file, excel automatically formats it.

To preserve the original format of the date make sure you use this mapping expression in the column

trim('=' || '"'   || to_char(COLUMN ,'yyyy-mm-dd') || '"')

This will preserve the yyyy-mm-dd date format into target file and excel won't format it.

Sunday, October 27, 2019

Intro

Recently I have decided to keep a record of my ODI 11g use cases, I will be posting regularly with interesting use cases which I have developed during my time as a ODI 11g developer. This is not a blog how to use ODI 11g in depth, but rather a blog that documents unique use cases which every ETL dev might face using ODI 11g.

I have been working for a year and a half with ODI 11g, and I find ODI very fun using. This blog will be used as a personal portfolio and a reference for every 11g developer out there.

I hope this blog will help someone in the future.