Introduction to Informatica

As the Datawarehouses are increasing in the current market it becomes difficult to keep a track of all the data which comes from the sources to the target. So with Informatica Power Center which is a new ETL (Extraction,Transformation and Loading) tool for building a Enterprised wide datawarehouse which is used in extracting data from the source, transforming  the data  as per business requirements and loading the data to a target data warehouse.


Most of the companies nowadays get the data from different sources which consists of huge set of data.

we need to find out how the data is available i.e in which format the data is available, in which structure the file is available and in which databases.

The data must be collected,combined,compared together to look as one single block good to be with similar databases.




If it is a similar database the work will be faster.
If it is different database, the interface has to be upgraded.

In order to overcome the problem of different database, interface a feature called as Data integration comes into picture which allows the data to be communicated with different databases.



when we do the data integration process business problems may come.To solve this Informatica makes use of ETL Process.

ETL (Extract Transform Load).
Functions of ETL system.
1. Extract the data from the source system.
2. Transforming the data and cleansing the data according to the business requirements.
3. Summarizing the data from the source systems.
4. Loading the data to the target database i.e the data-warehouse.
5. Refreshing the data-warehouse with the updated data.

ETL process used in Informatica.
Informatica Power center reads the data from the source systems which may be either a database or file.
It transforms the data meaning that the contents of the data should be understood by the target system.
After Transformation Informatica loads the data into the target database or to a file like how it derived the data from the source system.


Informatica is used in the Real time Applications like:
1. Data Migration.
2. Application Integration.
3. Datawarehousing.
4. Middleware.

Informatica Architecture.


The Informatica Architecture consists of 2 components.
-Client Component.
-Server Component.

Client component:
It consists of RDWM.
1. Repository Manager: It is used to add, edit and delete the users and groups.
2.Designer:  It is used to move the data between various sources and targets. It includes transformation process moving from source to target.
3. Workflow Manager: It consists of ordered set of one or more sessions and other tasks.
4. Workflow Monitor: It is used to monitor the execution of workflows and tasks.

Informatica Administrator console is used to administer the Informatica domain and Informatica security.

Server Components:
Repository service: it is used to manage the repository. It retrieves, inserts, and updates metadata into the repository database tables.

Domain: It is a combination set of nodes and services.
It consists of Integration,Repository Service, Repository Service Process.

Transformation Objects.
Transformation objects are used to read,modify or pass the data derived from the source table to the Target table like tables/files based on some conditions.
For a complete transformation to be considered Mapping is necessary.

Mapping
A mapping is a set of Source tables and target tables along with the transformations.
It is also used for the execution of the Workflows.
Informatica Transformations can be divided into 2 Parts.
1. Based on Connectivity
2. Based on the changes in the number of rows between the source and target tables.

1. Based on Connectivity.
a. Connected Transformations.
b. Unconnected Transformations.

a. Connected Transformations.
are the transformations in which for every input row it is connected to the output row through the transformation.
eg: Aggregator, Router, Joiner, etc

2. Based on the number of rows
a. Active Transformations
b. Passive Transformations

a. Active Transformations.
An Active transformation is one which will satisfy all these conditions:
1. It can add/remove the rows that passes through the transformation.
eg: Filter transformation is active because it removes rows that do not meet the filter condition.
2. Change the transaction boundary: It encloses all the transactions before a commit is called or between two commit calls.
3. Change the rowtype attribute: It can add the Rowtype attribute.
eg : for Update Strategy transformation 0 is used for inserting values, 1 for update, 2 for delete or 3 for reject.
eg: Aggregator, Filter, Joiner, Normalizer, etc. are a few examples of Active transformation.

b. Passive Transformation.
A passive transformation is one which will satisfy all these conditions:
1. It can not add/remove the rows that passes through the transformation unlike active Transformations.
2. Maintains the transaction boundary .
3. Maintains the rowtype attribute.
Expression, ExternalProcedure, HTTP, etc. are a few examples of Passive transformation.

Informatica Concepts and Overview
Source Analyzer
A Source is an ETL entity where the records are fetched and stored in temporary tables or Informatica transformation caches.
After fetching the data based on the mapping design document/business requirement we can transform the data based on thed business requirement
For sourced and target tables we make use of source analyzer and target designer.



The Structure which consists of the Source table are as follows
Source (Table) name
Database location
Name of Columns
Columns Data types
Constraints

From the Source analyser we can import the following files
Relation tables or the database tables, views and synonyms
Flat files
Cobol files
XML files

Target Designer / DatawareHouse Designer.
After transforming the data we load the data into another table structure, called target table.

Reusable Transformations
Reusable transformation is used to avoid duplication of data, reduces the development time and provides centralized change control. Once an object is defined as reusable, it can be called in any mapping or mapplet.
eg: lookups, aggregators, expressions, routers, and sequences can be defined as reusable objects. 

Working with relational sources in informatica.
Importing relational sources
We can add relational source definitions either by importing source definitions / updating source definitions either manually or by reimporting the definition.
To import a source definition, we must connect to the source using ODBC data source or gateway.

Updating a Relational Source Definition.
We can update a source definition with column names, datatypes, or other changes by the following ways:
Editing the definition / Reimport the definition.

Working with Flat files in informatica.
When the source is a flat file.
We select the Source Analyser to import Flat file.
select Flat File Type as Delimited.
Enter file name of the source definition.
Select start import at row 1, since we dont have any header record.
Next select the delimiter as Comma.
Next modify the field names and corresponding datatypes accordingly.
Next in the Properties tab check the option Add Currently Processed Flat File Name Port. When this option is selected, a port specifying currently processed file name gets added to the source definition.


Target Table in informatica
Click on the Target option in Designer and select the Create option, a new window called Create Target table will be displayed.
Enter a name for the table.
Select a database type from the list of available databases.
click on the Create button to create new empty table in our PowerCenter Target Designer.

Working with relational targets in informatica.
when we work with relational targets we must configure the following properties.
Reject truncated and overflow data.
Table name prefix.
Pre-and post-session SQL. 
Target table name 

Working with flat files targets in informatica.
In the session level create a dummy file name for target.
Create a target with file name column property use ("Add File Name column to target instance" in target designer)
In the expression transformation pass the source flat file name to this file name column. It will generate the target file with source file name and your source file is no longer in the repository.

Transformations in Informatica
Transformations in Informatica are divided as Active and Passive Transformations.

Source Qualifier
It  is an active & connected transformation used to connect relational or flat file definition in a mapping.The task in the Source Qualifier are Joins, Filter rows,Sorting input, Distinct rows, Custom SQL Query.


Expression Transformation.
It is passive and unconnected transformation  used to perform row-wise manipulation on an individual record.
The Ports in an expression transformation are Input port (I), Output Port (O), Variable Port (V).

Filter Transformation.
It is an active & connected transformation which is used to filter out rows in a mapping by 
giving the condition in the properties tab.

Joiner transformation.
It is an active & connected transformation which is used to join two or more than two 
heterogeneous sources.
The types of joins are:
a. Normal Join
b. Master Outer Join
c. Details Outer Join
d. Full Outer Join

Joiner Transformations consists of 2 parts Master Join and Detail Join.
The Master join ends at the Transformation level and the Detail join continuous till the 
target.


Sorter transformation 
It is an active & connected transformation which is used to sort the data in an ascending 
or descending order based on single or multiple keys.


Rank Transformation 
It is an active and connected transformation which is used to select the top or bottom rank 
of data. The rank transformation is used to select the smallest or largest numeric/string values.


Router Transformation
It is an active and connected transformation.
It is similar to the filter transformation used to test a condition and filter the data. The unmatched records will be routed to the default group.


Aggregator Transformation.
It is an active and connected transformation used to perform calculations such as sums, 
averages, counts on groups of data. 


Normalizer Transformation.
It is an active and connected transformation which is used to convert a single row into 
multiple rows and vice versa.It is used normally with cobol files.

Lookup Transformation
Lookup transformation is used to look up a source, source qualifier, or target to get the relevant data.It can be connected and unconnected.

a. Connected Lookup.
In Connected lookup we connect input and output ports to other transformation in the mapping.

b. UnConnected Lookups.
It is neither connected to the source nor connected to the target.
An unconnected look up transformation can receive the multi input ports but returns single output port.

Update strategy transformation
It is an active and connected transformation.It is used to insert, update, and delete records in the target table.


Union Transformation
It is used to combine data from multiple sources (excel files, flat file etc) or multiple SQL tables and produce one output to store in the target table. This transformation is an active transformation.It is similar to Union All in SQL.

Slowly Changing Dimensions (SCD)
Slowly changing dimensions are the dimensions in which the data changes slowly, rather than changing regularly on a time basis.

SCD are of 3 types.
SCD Type 1
It is used when there is no need to store historical data in the dimension table.
SCD Type 2
It is used to store the current data as well as historical data in the dimension table.
SCD Type 3
It is used to store the current data and one last previous data in the dimension table.

Workflow Manger.
Workflows, Worklets & Sessions
A workflow is a set of instruction sthat tell the Informatica server how to execute the tasks. 
A worklet is an object that represents a set of tasks. 
A session is a set of instructions to move data from sources to targets.

Types of Tasks.
1. Assignment =It is used to assign a value to a workflow variable.
2. Command =It is used to run a shell command during the workflow.
3. Control =It is used to stop or abort the workflow.
4. Decision = It tells a condition to evaluate.
5. Email = It is used to send email during the workflow.
6. Event-Raise = It notifies the Event-Wait task that an event has occurred.
7. Event-Wait = It waits for the event to completed in order to start the next task.
8. Timer = It waits for a already timed event to start.

Comments

Popular posts from this blog