Saturday, November 9, 2019

Automating email reports using ODI Packages - Accessing Data Sources with changing name mask using ODI Variables

A very important feature which ODI offers is ODI Packages. ODI offers many elements which can be used to automate different processes.

Today I will try to demonstrate the usage of ODI Packages, scheduling runs for those packages and using variables to dynamically access different data sources every day.

In the end, different statistics and automatic emails can be generated to achieve basically everything which can be achieved through an SQL Query in a database :).

Scenario:

- Read a .TXT file which name changes everyday with current date. (test20190811.txt)
- Populate an oracle table everyday except for Saturday and Sunday
- Generate a .csv file from that table with name and date mask (test20190811.csv)
- Attach that file to an email and send that email everyday to a recipient.
- Show row number of the populated table in email.



Firstly we need to create a variable which obtains the value of current date.
That can be done through a simple query in Oracle:

SELECT to_char(sysdate,'YYYYMMDD') from dual;

The variable will be named VAR1, and the Schema provided can be any schema since I am not selecting a particular table but I am selecting from dual which is a built in oracle table. (Otherwise if you need to select from a specific table, the schema where that table is created has to be provided).




Make sure to test the value of the variable using the refresh logo on top right corner, after clicking it, navigate to history tab on the left top corner and check the value if it corresponds to today's date and desired format (in my case YYYYMMDD).


Now that we have created the variable which will do the work, we have to create a new interface which will read the file and populate an non-existing oracle table which will be created during the first run of the interface.

As mentioned, I will be reading a source file .txt named test20190811 (date is the file mask which will change everyday according to current date). Create a new data store which will be used as a source in the interface, make sure that at Resource Name, give the static part of the name, and then for file mask address the variable created, variables in ODI are addressed using "#", in this case #VAR1.

Make sure to write this part manually and don't use the search icon on the right:



After that, define file format in Files tab and make sure to add Columns manually using + sign in Columns tab. (It is not possible to reverse engineer the file because RKM will not obtain the variable value as of 11g.)

After defining manually all of the columns, create a new interface and add the created data store as source, right click on it and Add to target. All of existing columns from source will be mapped into target. 



(Note that the table can be created in database manually and reverse engineered as a new data store and dragged into target tab).

In my case LKM will be assigned LKM File to SQL, IKM SQL Control Append.

Run the interface and check the operator:


Now that we have the interface which loads (File to table) we have to generate a new .csv file containing data from this table and send it as an email attachment and provide statistics within email for every load.

In one of my previous posts, I have shown how to generate a file from an Oracle table, I will generate a .xlsx using ODI package elements this time which is quite interesting.

Let's assemble our elements in a package.

Create a new package, I will name it EMAIL_AUTOMATION.

The first element that I will drag is VAR1 variable, we have to refresh that variable everyday (rememberthat variables have to be refreshed, otherwise they will keep the old value), due to this fact we need to load a new file which updates is name mask everyday so a refreshing is a must:

Drag VAR1 into the diagram, it will be set as a Refresh Variable, which is exactly what we need.



After having refreshed the variable, interface has to be run, we assume that the file will be generated in the path where Interface reads the source file (If it doesn't, we can use ODI File Copy from Toolbox provided and specify the copy path before running the interface but I won't be covering that topic). Drag the interface into the diagram. Connect both points using the Ok arrow which indicates that if one step executes correctly continue with the other step:



After this step file generation is required, I will do that not using an interface but using ODI SQL Unload Element from toolbox,

-Target File (full path of the file which will be generated (I want to name it file#VAR1.csv))
-JDBC Driver: oracle.jdbc.OracleDriver
-JDBC URL: jdbc:oracle:thin@<>:<>;<> (provide database info)
-User: Provide User
-Password: Provide Password
-File Format: Delimited
-SQL QUERY: (select * from <MYSCHEMA>.TARGETTEST), this is the query which "decides" what to output in your file.

Right click on ODI Sql Unload step and select execute step to test it.



Go to the path specified to generate the file and open it:



Having done this correctly you can see the data from the table.


Let's create a new variable which counts the rows in the table, using this query:

SELECT COUNT(*) FROM <MYSCHEMA>.TARGETTEST

Drag this variable into package and make sure to refresh it before the interface, otherwise the row count will not be updated:



As you can see another variable refresh has been added and also an ODISendMail step from Internet folder inside the toolbox on the left, now we need to fill the ODISendMail parameters as below:



- Mail Server (ip address of the mail server)
- From: Any valid email
- To: Any valid email
- Attachment: Full path of the file generated + variable which obtains the dynamically changing mask
- Message Body which consists the date of load, table row count as below:





Save the package and run it (check operator with all of the steps finished):



Wait for the email:




Now lets add a schedule to this package, to do so we must generate a package scenario and also have an existing ODI Agent running.

Generate a scenario under package (Scenario is a frozen code)



Expand the new scenario generated, right click on Scheduling and select new scheduling:





Many constraints can be added, I want it to run every day of the week 7:28 AM.




Friday, November 1, 2019

Enabling Data Quality checks in ODI11g through Flow Control / Isolating errors automatically in an error table

ODI as a tool offers data quality assurance through various scopes.

Today I will demonstrate how to add ODI level constraints and isolate errors which violate a constraint, in error tables that ODI creates automatically (known as E$ tables).

In principle, ODI has 2 kind of data controls known as:

  • Flow Control - which validates data before inserting into a target table (validation is done within I$ tables). What happens is that it leaves only the records which do not violate a constraint into the target table.
  • Static Control - Validation occurs after inserted in target table, though insertion is done in E$ table for validating records but they won't be deleted from target table.
Enabling these controls in ODI, one must use the IKM's Flow Control/Static Control options of an IKM in the Flow tab of an interface.

But first we need to define what constraints do columns have in order to apply Flow Control.

Todays scenario: 

  • Populate a table in which the primary key of the table cannot be null (ID).
  • Column height cannot be more than 200
  • Column date has to be in format YYYYMMDD (i.e 20191101)
Let's dig into the problem.

Create an interface which populates a table consisting in my case columns mentioned above.

I will be reading a .txt source file which is tab delimited. Data populated in the target table looks like this:



Area hand circled with red, marks the violations which will be declared in the datastore of the target table now. We can observe that the ID is null, Height is bigger that 200 and DATE is not in format of YYYYMMDD.

First let us write the SQL statements which validate the conditions:

As per ID: WHERE ID IS NOT NULL will do the work
As per Height: WHERE to_number(HEIGHT) > 200
As per Date: REGEXP_LIKE(DATE, '^\d{4}(0[1-9]|(1[0-2]))(0[1-9]|[1-2][0-9]|3[0-1])$')


Navigate to Quick-Edit tab and select CN and UK (Check Not Null) for ID:
Note: UK has to be selected for the primary key because flow control requires one.


Now we need to create constraints in target table's datastore. Navigate to: Designer - Models - Model where your datastore resides - Right Click Constraints - New Condition:

In my case, I created 2 Conditions

1. CHECK_HEIGHT
2. CHECK_DATE

Given the names, I also choose Type: Oracle Data Integrator Condition

In the Where area: paste the SQL expressions.

Message: the error message which you want to prompt


Note: In Control tab on top left corner Flow and Static Control can be allowed for these constraints. I have checked both, but in my case I will be using only Flow Control.

Save the constraints and now open the interface, go to Flow tab, select IKM_SQL_Control_Append and mark the option FLOW_CONTROL to True.


Under Control Tab next to Flow you can select the CKM which does the controlling.
Besides selecting the CKM it allows you to disable or enable specific constraints (True or False).




Also the maximum number of errors allowed can be specified, meaning that if that number is exceeded, interface will not throw warnings instead it will throw an error!

Run the interface and check the Operator:





In DBMS you can see that a new table has been created in the schema, the E$ table consists of the constraint violating records:




Only one record did not violate the constraint and that record is inserted into the target table:




Tuesday, October 29, 2019

Creating a custom Integration Knowledge Module to delete duplicates and retain one record based on a column

Scenario: Consider having records which are duplicate in some kind of column, assume you want to delete all those records which are duplicate for Customer_Id and leave only one record based on a particular query (logic).

This can be easily achieved in two ways:

a) Creating a procedure to delete those records.

b) Creating a custom knowledge module with a step which will delete these records.

Today I will be covering part b).


To do this, we need a logic to delete those records. In my case I have been using this Oracle SQL query:

delete from    TABLE1where ROWID not in (select MAX(ROWID) from    TABLE1GROUP BY column or columns...) 
This query uses CUSTOMER_ID to delete duplicates on CUSTOMER_ID.

But imagine that this scope can be much bigger, you can use whatever query needed for your own way of deleting duplicate records and then add that query as a step in your custom knowledge module.

So let's do some quick API calls and write the same query but in a knowledge module manner:

delete from    <%=snpRef.getTable("L","TARG_NAME","A")%>
where ROWID not in (
select MAX(ROWID)
from     <%=snpRef.getTable("L","TARG_NAME","A")%>
GROUP BY <%=odiRef.getColList("","[COL_NAME]","\n\t,\t","","UK")%>) 
So what this script does is it will obtain the Target's table name where the knowledge module is being used and it will obtain the columns marked as UK in ODI's quick edit tab.

The important part here is marking the columns with UK so the IKM will know which columns can be added in the GROUP BY statement in this case.

NOTE: Each column that is marked as UK, will be fetched in the GROUP BY statement.


Now let's prepare the interface and a dummy file.





This is the interface and the file that I will use with dummy data, assume that I want to delete one of the duplicates in CUSTOMER_ID column and I want to retain only one of them.


Navigate to Quick-Edit tab, and mark the columns that you want to GROUP BY as UK, in our case CUSTOMER_ID.

-----
Note that you can mark up to 10 columns as User Defined (UD1, UD2, UD3 and so on) in quick-edit, imagine you have a big query and you need to dynamically call via API those columns in the query, then it should look something like this:

<%=odiRef.getColList("","[COL_NAME]","\n\t,\t","","UD1")%> respectively UD2 and so on...
-----




Now we need to set up the Integration Knowledge Itself, in my case I need the same functionality as the IKM SQL Control Append but after the insert new rows statement, I need to delete duplicate records.

So first duplicate the original IKM SQL Control Append IKM so that we don't touch the original one.



Then we have to open it and navigate to Details tab, over there the tasks are listed in top-down order, so add a new step with the + sign on top right corner and rename it as Duplicate Rows Step, order it so it will be executed AFTER insert new rows statement (you can rearrange the order using up and down arrows next to + sign).


So this step is reference by Order 110, make sure that Commit done, and leave other parameters as shown in the screen shot.

Open the command and paste your code over there:



Make sure that the command is written in Command On Target tab.

Now we have created the custom knowledge module, we can also create an option for it if we want to disable this functionality of the KM for a particular run.

So now let's add an option to this IKM. Right click on the knowledge module created and select New Option



Rename that option and make sure you link the options Position with the Order number in Details step of the IKM. In our case Delete Duplicate Rows is in order 110:



I will set the default value to True, it can be optional however you may need it if you have many new custom steps if needed.


Now that we have completely set up this IKM, go to the interface you need it working, and choose this IKM in the flow tab of the interface:


Let's run the interface and see the steps:




Code Executed by this step

And this is what is left in the table:




Monday, October 28, 2019

Using .csv or .xlsx to populate an Oracle Table and Generating .xlsx or .csv from a table ODI11G

There are many business requirements in which you may need to populate oracle db tables by using excel elements as a source, or by using a db table to generate excel elements as target.

This post will demonstrate how to achieve that accordingly!

First things first, make sure that Topology is established accordingly, model folders, output file path is configured and source file path is configured.

I won't be covering that topic since it is a separate topic within ODI.


  • First part of this post is Populating an Oracle DB table via .csv source file.
First put the file in the source file folder so that we can read the correct .csv model using "Reverse Engineer feature". The idea is to give the parameters (delimiter, header etc) so that ODI can read the file appropriately.

For the sake of this example I will be creating a simple .csv document in excel.

Keep in mind that if your csv does not have a clean format (specified delimiters, text separators or correct headers) then your source file has to be cleaned before reading from ODI. Secondly keep in mind that naming header columns with spaces such as "First Name" is not preferred since ODI will rename it automatically First_Name or _First_Name which might not be the result desired. So clean your source file accordingly!
.csv Source File

Now lets create a data store in the model folder to read this specific file, I created this file manually in excel so it will be comma delimited, but there are cases in which the file might be tab delimited so be careful on that.
Data Store creation
Use the search button to find the file in the source file folder. the Alias is used to reference the columns in the target data store mapping expressions so make sure that you use a clear one.

After finding the file needed, navigate to"Files" tab so you can specify the parameters that ODI will be using to "Reverse Engineer" the file. In our case the file is comma delimited, it has a MS-DOS Record Separator and the first row is column names. To check the correct Record Separator I use Ultra Edit text editor.



Data Store model parameters
Now navigate to Columns tab, and select Reverse Engineer, after doing so, it should read the file appropriately, if it doesn't, then you have to check once again the parameters given in Files tab.


For HEIGHT and WEIGHT columns, ODI detected them as numeric, but I changed it to String for the sake of this example.

After saving this data store, create a new Interface in the project desired. For this example I will use the interface named CSV2TABLE.




Since I want to populate a table which will reside on the schema SOURCE2PRESTAGE_FILES_DB, make sure you select that as a staging area highlighted in the picture above, I hid Context due to privacy issues but you should select the context which maps the DB schema configured in Topology navigator (as mentioned I won't be covering that topic in this post).

Now navigate to mapping window and add the data store model created for the source file (.csv).

Now it's time to select the table which will be populated. We have two options here, you can either type manually within DBMS the create statement for the table needed as target, and then reverse engineering it and dragging it to target Data store. Or the option covered in this example where there is no table existing already so ODI will create the target table automatically using the same columns as in Source Data store.

This is achieved using the "Add To Target" option:

Add To Target

Doing so, make sure that the target table is named, otherwise it will throw an error.

After this step we need to navigate to Flow tab to specify the steps which ODI will cover when executing this interface.

For the Source set, select LKM File To SQL, this Load Knowledge Module is a built in Module which can correctly read Excel files. For Integration choose IKM SQL Control Append, but make sure that CREATE_TARG_TABLE is set to true because we don't have the target table existing!

Note: After the first time created, you can set the step of this IKM for creating the target table to false, or you can leave it to true but it will generate a new table each time it is executed.

Run the Interface and expect the table to be created.

  • Now lets try to generate a .csv file from an Oracle table
I will use the same table which I created from the file, but now I want to generate a .csv named: TOFILE.csv.

First I will have to create the data store at the Model which is used for output files (Configured in Topology), but we have to make sure that we write Resource Name .csv and since the file has to be created we won't be looking in the search icon for it, we have to write the name directly and specify the output parameters at the Files tab
Since the file has to generated and we don't have it existing, we have to manually create the columns in the Columns tab using the + sign on top right corner, after specifying Datatype and length make sure to save it before creating the interface which generates the file.


Now that we have the Data Store of Source Table and Target File created, we have to create an Interface and add the table to source tab, file to Target tab.

AFTER adding the source and target, make sure to check "Staging Area Different From Target" this tells ODI that the transformation has to occur in source (table level) because the target is a file. Since ODI relies on DB power to transform data, this step is a must.
Staging area has to be the schema in which the source table resides.


At the Flow tab we have to select only IKM (SQL TO FILE) since there won't be any LKM to generate the file


Run the the interface, and the output file will be generated.



NOTE:

There are some cases when there is a specific date format in the source table, and after generating it to file, excel automatically formats it.

To preserve the original format of the date make sure you use this mapping expression in the column

trim('=' || '"'   || to_char(COLUMN ,'yyyy-mm-dd') || '"')

This will preserve the yyyy-mm-dd date format into target file and excel won't format it.

Sunday, October 27, 2019

Intro

Recently I have decided to keep a record of my ODI 11g use cases, I will be posting regularly with interesting use cases which I have developed during my time as a ODI 11g developer. This is not a blog how to use ODI 11g in depth, but rather a blog that documents unique use cases which every ETL dev might face using ODI 11g.

I have been working for a year and a half with ODI 11g, and I find ODI very fun using. This blog will be used as a personal portfolio and a reference for every 11g developer out there.

I hope this blog will help someone in the future.