1. Brief Introduction About INFORMATICA?
Answer: Informatica Corporation is a software company founded in 1993 by Gaurav Dhillon and Diaz Nesamoney. Informatica Corporation is well known for its data integration (ETL) product PowerCenter. A lot of times when people say Informatica they actually mean Informatica PowerCenter!
Over the course of last 10 years Informatica has introduced a series of products – all related to Data Integration & Warehousing. A quick list of Informatica products is right here:
•PowerCenter
•B2B Data Exchange
•MDM – Master Data Management
•TDM/ILM – Test Data Management/ Information Lifecycle Management/ Data Archive
•Informatica Data Masking/ Dynamic Data Masking
•iPaaS – Integration Platform as a Service
•Informatica Data Virtualization
Now that you have an idea about the company and products lets talk about PowerCenter. Informatica PowerCenter is a tool, supporting all the steps of Extraction, Transformation and Load process. A whole lot of product offerings are orchestrated around PowerCenter’s ability to connect to different technologies ranging from mainframe to CRM to Big Data.
Informatica PowerCenter is an easy to use tool. It has got a simple visual interface like forms in visual basic. You just need to drag and drop different objects (known as transformations) and design process flow for data extraction transformation and load. These process flow diagrams are known as mappings. Once a mapping is developed, it can be scheduled to run as and when required. In the background Informatica server takes care of fetching data from source, transforming it, & loading it to the target systems/databases.
PowerCenter can communicate with all major data sources (mainframe, Big Data, RDBMS, Flat Files, XML, SAP, Salesforce & the list goes on), can move/transform data between them. It can move huge volumes of data in a very effective way, many a times better than even bespoke programs written for specific data movement. It can throttle the transactions (do big updates in small chunks to avoid long locking and filling the transactional log). It can effectively join data from two distinct data sources (even a xml file can be joined with a relational table). In all, Informatica has got the ability to effectively integrate heterogeneous data sources & converting raw data into useful information.
Some facts and figures about Informatica Corporation:
•Founded in 1993, based in Redwood City, California
•5500 + Employees; 5000 + Customers
•NASDAQ Stock Symbol: INFA; Stock Price: $48.73 (11/06/2015)
•Revenues in fiscal year 2014: $1.05 Billion
2. Types of partitioning in Informatica?
Answer: Partition 5 types
1. Simple pass through
2. Key range
3. Hash
4. Round robin
5. Database
3. Which transformation uses cache?
Answer:
1. Lookup transformation
2. Aggregator transformation
3. Rank transformation
4. Sorter transformation
5. Joiner transformation
4. Explain about union transformation?
Answer :A union transformation is a multiple input group transformation, which is used to merge the data from multiple sources similar to UNION All SQL statements to combine the results from 2 or more sql statements.
Similar to UNION All statement, the union transformation doesn’t remove duplicate rows. It is an active transformation.
5. Explain about Joiner transformation?
Answer :Joiner transformation is used to join source data from two related heterogeneous sources. However this can also be used to join data from the same source. Joiner t/r join sources with at least one matching column. It uses a condition that matches one or more pair of columns between the 2 sources.
To configure a Joiner t/r various settings that we do are as below:
1) Master and detail source
2) Types of join
3) Condition of the join
6. Explain about Lookup transformation?
Answer :Lookup t/r is used in a mapping to look up data in a relational table, flat file, view or synonym.
The informatica server queries the look up source based on the look up ports in the transformation. It compares look up t/r port values to look up source column values based on the look up condition.
Look up t/r is used to perform the below mentioned tasks:
1) To get a related value.
2) To perform a calculation.
3) To update SCD tables.
7. How to identify this row for insert and this row for update in dynamic lookup cache?
Answer :Based on NEW LOOKUP ROW.. Informatica server indicates which one is insert and which one is update.
Newlookuprow- 0…no change
Newlookuprow- 1…Insert
Newlookuprow- 2…update
8. How many ways can we implement SCD2?
Answer :
1) Date range
2) Flag
3) Versioning
9. How will you check the bottle necks in informatica? From where do you start checking?
Answer :You start as per this order
1. Target
2. Source
3. Mapping
4. Session
5. System
10. What is incremental aggregation?
Answer :When the aggregator transformation executes all the output data will get stored in the temporary location called aggregator cache. When the next time the mapping runs the aggregator transformation runs for the new records loaded after the first run. These output values will get incremented with the values in the aggregator cache. This is called incremental aggregation. By this way we can improve performance…
—————————
Incremental aggregation means applying only the captured changes in the source to aggregate calculations in a session.
When the source changes only incrementally and if we can capture those changes, then we can configure the session to process only those changes. This allows informatica server to update target table incrementally, rather than forcing it to process the entire source and recalculate the same calculations each time you run the session. By doing this obviously the session performance increases.
11. What are the types of metadata that stores in repository?
Answer :Source definitions. Definitions of database objects (tables, views, synonyms) or files that provide source data.
Target definitions. Definitions of database objects or files that contain the target data. Multi-dimensional metadata. Target definitions that are configured as cubes and dimensions.
Mappings. A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Informatica Server uses to transform and move data.
Reusable transformations. Transformations that you can use in multiple mappings.
Mapplets. A set of transformations that you can use in multiple mappings.
Sessions and workflows. Sessions and workflows store information about how and when the Informatica Server moves data. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single mapping.
Following are the types of metadata that stores in the repository
- Database Connections
- Global Objects
- Multidimensional Metadata
- Reusable Transformations
- Short cuts
- Transformations
12. How can we store previous session logs?
Answer :Go to Session Properties –> Config Object –> Log Options
Select the properties as follows….
Save session log by –> SessionRuns
Save session log for these runs –> Change the number that you want to save the number of log files (Default is 0)
If you want to save all of the logfiles created by every run, and then select the option Save session log for these runs –> Session TimeStamp
You can find these properties in the session/workflow Properties.
13. What is Changed Data Capture?
Answer :Changed Data Capture (CDC) helps identify the data in the source system that has changed since the last extraction. With CDC data extraction takes place at the same time the insert update or delete operations occur in the source tables and the change data is stored inside the database in change tables. The change data thus captured is then made available to the target systems in a controlled manner.
14. What is an indicator file? and how it can be used?
Answer :Indicator file is used for Event Based Scheduling when you don’t know when the Source Data is available. A shell command, script or a batch file creates and send this indicator file to the directory local to the Informatica Server. Server waits for the indicator file to appear before running the session.
15. What is audit table? and What are the columns in it?
Answer: Audit Table is nothing but the table which contains about your workflow names and session names. It contains information about workflow and session status and their details.
- WKFL_RUN_ID
- WKFL_NME
- START_TMST
- END_TMST
- ROW_INSERT_CNT
- ROW_UPDATE_CNT
- ROW_DELETE_CNT
- ROW_REJECT_CNT
16. If session fails after loading 10000 records in the target, how can we load 10001th record when we run the session in the next time?
Answer :Select the Recovery Strategy in session properties as “Resume from the last check point“. Note – Set this property before running the session
17. Informatica Reject File – How to identify rejection reason?
Answer :
D – Valid data or Good Data: Writer passes it to the target database. The target accepts it unless a database error occurs, such as finding a duplicate key while inserting.
O – Overflowed Numeric Data: Numeric data exceeded the specified precision or scale for the column. Bad data, if you configured the mapping target to reject overflow or truncated data.
N – Null Value: The column contains a null value. Good data. Writer passes it to the target, which rejects it if the target database does not accept null values.
T – Truncated String Data: String data exceeded a specified precision for the column, so the Integration Service truncated it. Bad data, if you configured the mapping target to reject overflow or truncated data.
Also to be noted that the second column contains column indicator flag value ‘D’ which signifies that the Row Indicator is valid.
Now let us see how Data in a Bad File looks like:
0,D,7,D,John,D,5000.375,O,,N,BrickLand Road Singapore,T
18. What is “Insert Else Update” and “Update Else Insert”?
Answer :These options are used when dynamic cache is enabled.
- Insert Else Update option applies to rows entering the lookup transformation with the row type of insert. When this option is enabled the integration service inserts new rows in the cache and updates existing rows. When disabled, the Integration Service does not update existing rows.
- Update Else Insert option applies to rows entering the lookup transformation with the row type of update. When this option is enabled, the Integration Service updates existing rows, and inserts a new row if it is new. When disabled, the Integration Service does not insert new rows.
19. What are the Different methods of loading Dimension tables?
Answer :
Conventional Load – Before loading the data, all the Table constraints will be checked against the data.
Direct load (Faster Loading) – All the Constraints will be disabled. Data will be loaded directly. Later the data will be checked against the table constraints and the bad data won’t be indexed.
20. What are the different types of Commit intervals?
Answer :The different commit intervals are:
- Source-based commit: The Informatica Server commits data based on the number of source rows. The commit point is the commit interval you configure in the session properties.
- Target-based commit: The Informatica Server commits data based on the number of target rows and the key constraints on the target table. The commit point also depends on the buffer block size and the commit interval.
21. How to add source flat file header into target file?
Answer : Edit Task–>Mapping–>Target–>Header Options–> Output field names
22. How to load name of the file into relation target?
Answer : Source Definition–>Properties–>Add currently processed file name port
23. How to return multiple columns through un-connect lookup?
Answer :Suppose your look table has f_name,m_name,l_name and you are using unconnected lookup. In override SQL of lookup use f_name||~||m_name||~||l_name you can easily get this value using unconnected lookup in expression. Use substring function in expression transformation to separate these three columns and make then individual port for downstream transformation /Target.
24. What is a Joiner Transformation and why it is an Active one?
Answer: A Joiner is an Active and Connected transformation used to join source data from the same source system or from two related heterogeneous sources residing in different locations or file systems.
The Joiner transformation joins sources with at least one matching column. The Joiner transformation uses a condition that matches one or more pairs of columns between the two sources.
The two input pipelines include a master pipeline and a detail pipeline or a master and a detail branch. The master pipeline ends at the Joiner transformation, while the detail pipeline continues to the target.
In the Joiner transformation, we must configure the transformation properties namely Join Condition, Join Type and Sorted Input option to improve Integration Service performance.
The join condition contains ports from both input sources that must match for the Integration Service to join two rows. Depending on the type of join selected, the Integration Service either adds the row to the result set or discards the row.
The Joiner transformation produces result sets based on the join type, condition, and input data sources. Hence it is an Active transformation.
25. State the limitations where we cannot use Joiner in the mapping pipeline.
Answer: The Joiner transformation accepts input from most transformations. However, following are the limitations:
• Joiner transformation cannot be used when either of the input pipeline contains an Update Strategytransformation.
• Joiner transformation cannot be used if we connect a Sequence Generator transformation directly before the Joiner transformation.
26. Out of the two input pipelines of a joiner, which one will you set as the master pipeline?
Answer: During a session run, the Integration Service compares each row of the master source against the detail source. The master and detail sources need to be configured for optimal performance.
To improve performance for an Unsorted Joiner transformation, use the source with fewer rows as the master source. The fewer unique rows in the master, the fewer iterations of the join comparison occur, which speeds the join process.
When the Integration Service processes an unsorted Joiner transformation, it reads all master rows before it reads the detail rows. The Integration Service blocks the detail source while it caches rows from the master source. Once the Integration Service reads and caches all master rows, it unblocks the detail source and reads the detail rows.
To improve performance for a Sorted Joiner transformation, use the source with fewer duplicate key values as the master source.
When the Integration Service processes a sorted Joiner transformation, it blocks data based on the mapping configuration and it stores fewer rows in the cache, increasing performance.
Blocking logic is possible if master and detail input to the Joiner transformation originate from different sources. Otherwise, it does not use blocking logic. Instead, it stores more rows in the cache.
27. What are the different types of Joins available in Joiner Transformation?
Answer: In SQL, a join is a relational operator that combines data from multiple tables into a single result set. The Joiner transformation is similar to an SQL join except that data can originate from different types of sources.
The Joiner transformation supports the following types of joins:
• Normal
• Master Outer
• Detail Outer
• Full Outer
28. Define the various Join Types of Joiner Transformation.
Answer:
• In a normal join, the Integration Service discards all rows of data from the master and detail source that do not match, based on the join condition.
• A master outer join keeps all rows of data from the detail source and the matching rows from the master source. It discards the unmatched rows from the master source.
• A detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.
• A full outer join keeps all rows of data from both the master and detail sources.
29. Describe the impact of number of join conditions and join order in a Joiner Transformation.
Answer: We can define one or more conditions based on equality between the specified master and detail sources. Both ports in a condition must have the same datatype.
If we need to use two ports in the join condition with non-matching datatypes we must convert the datatypes so that they match. The Designer validates datatypes in a join condition.
Additional ports in the join condition increases the time necessary to join two sources.
The order of the ports in the join condition can impact the performance of the Joiner transformation. If we use multiple ports in the join condition, the Integration Service compares the ports in the order we specified.
NOTE: Only equality operator is available in joiner join condition.
30. How does Joiner transformation treat NULL value matching ?
Answer: The Joiner transformation does not match null values. For example, if both EMP_ID1 and EMP_ID2 contain a row with a null value, the Integration Service does not consider them a match and does not join the two rows.
To join rows with null values, replace null input with default values in the Ports tab of the joiner, and then join on the default values.
Note: If a result set includes fields that do not contain data in either of the sources, the Joiner transformation populates the empty fields with null values. If we know that a field will return a NULL and we do not want to insert NULLs in the target, set a default value on the Ports tab for the corresponding port.
31. Suppose we configure Sorter transformations in the master and detail pipelines with the following sorted ports in order: ITEM_NO, ITEM_NAME, PRICE. When we configure the join condition, what are the guidelines we need to follow to maintain the sort order?
Answer:. If we have sorted both the master and detail pipelines in order of the ports say ITEM_NO, ITEM_NAME and PRICE we must ensure that:
- Use ITEM_NO in the First Join Condition.
- If we add a Second Join Condition, we must use ITEM_NAME.
- If we want to use PRICE as a Join Condition apart from ITEM_NO, we must also use ITEM_NAME in the Second Join Condition.
- If we skip ITEM_NAME and join on ITEM_NO and PRICE, we will lose the input sort order and the Integration Service fails the session.
32. What are the transformations that cannot be placed between the sort origin and the Joiner transformation so that we do not lose the input sort order.
Answer: The best option is to place the Joiner transformation directly after the sort origin to maintain sorted data. However do not place any of the following transformations between the sort origin and the Joiner transformation:
- Custom
- UnsortedAggregator
- Normalizer
- Rank
- Union transformation
- XML Parser transformation
- XML Generator transformation
- Mapplet [if it contains any one of the above mentioned transformations]
33. Suppose we have the EMP table as our source. In the target we want to view those employees whose salary is greater than or equal to the average salary for their departments. Describe your mapping approach.
Answer: Our Mapping will look like this:
To start with the mapping we need the following transformations: After the Source qualifier of the EMP table place a Sorter Transformation. Sort based on DEPTNOport.
Next we place a Sorted Aggregator Transformation. Here we will find out the AVERAGE SALARY for each (GROUP BY) DEPTNO.
When we perform this aggregation, we lose the data for individual employees.
To maintain employee data, we must pass a branch of the pipeline to the Aggregator Transformation and pass a branch with the same sorted source data to the Joiner transformation to maintain the original data.
When we join both branches of the pipeline, we join the aggregated data with the original data.
After that we need a Filter Transformation to filter out the employees having salary less than average salary for their department.
Filter Condition: SAL>=AVG_SAL
34. How an Expression Transformation differs from Aggregator Transformation?
Answer: An Expression Transformation performs calculation on a row-by-row basis.
An Aggregator Transformation performs calculations on groups.
35. Does an Informatica Transformation support only Aggregate expressions?
Answer: Apart from aggregate expressions Informatica Aggregator also supports nonaggregate expressions and conditional clauses.
36. How does Aggregator Transformation handle NULL values?
Answer: By default, the aggregator transformation treats null values as NULL in aggregate functions. But we can specify to treat null values in aggregate functions as NULL or zero.
37. What is Incremental Aggregation?
Answer: We can enable the session option, Incremental Aggregation for a session that includes an Aggregator Transformation. When the Integration Service performs incremental aggregation, it actually passes changed source data through the mapping and uses the historical cache data to perform aggregate calculations incrementally.
38. What are the performance considerations when working with Aggregator Transformation?
Answer:
• Filter the unnecessary data before aggregating it. Place a Filter transformation in the mapping before the Aggregator transformation to reduce unnecessary aggregation.
• Improve performance by connecting only the necessary input/output ports to subsequent transformations, thereby reducing the size of the data cache.
• Use Sorted input which reduces the amount of data cached and improves session performance.
39. What differs when we choose Sorted Input for Aggregator Transformation?
Answer:. Integration Service creates the index and data caches files in memory to process the Aggregator transformation. If the Integration Service requires more space as allocated for the index and data cache sizes in the transformation properties, it stores overflow values in cache files i.e. paging to disk. One way to increase session performance is to increase the index and data cache sizes in the transformation properties. But when we check Sorted Input the Integration Service uses memory to process an Aggregator transformation it does not use cache files.
40. Under what conditions selecting Sorted Input in aggregator will still not boost session performance?
Answer:
• Incremental Aggregation, session option is enabled.
• The aggregate expression contains nested aggregate functions.
• Source data is data driven.
41. Under what condition selecting Sorted Input in aggregator may fail the session?
Answer:
• If the input data is not sorted correctly, the session will fail.
• Also if the input data is properly sorted, the session may fail if the sort order by ports and the group by ports of the aggregator are not in the same order.
42. Suppose we do not group by on any ports of the aggregator what will be the output.
Answer: If we do not group values, the Integration Service will return only the last row for the input rows.
43. What is the expected value if the column in an aggregator transform is neither a group by nor an aggregate expression?
Answer:. Integration Service produces one row for each group based on the group by ports. The columns which are neither part of the key nor aggregate expression will return the corresponding value of last record of the group received. However, if we specify particularly the FIRST function, the Integration Service then returns the value of the specified first row of the group. So default is the LAST function.
44. Give one example for each of Conditional Aggregation, Non-Aggregate expression and Nested Aggregation.
Answer: Use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE.
SUM( SALARY, JOB = CLERK)
Use non-aggregate expressions in group by ports to modify or replace groups. IIF( PRODUCT = Brown Bread, Bread, PRODUCT)
The expression can also include one aggregate function within another aggregate function, such as:
MAX( COUNT( PRODUCT))
45. What is the difference between STOP and ABORT?
Answer:. When we issue the STOP command on the executing session task, the Integration Service stops reading data from source. It continues processing, writing and committing the data to targets. If the Integration Service cannot finish processing and committing data, we can issue the abort command.
In contrast ABORT command has a timeout period of 60 seconds. If the Integration Service cannot finish processing and committing data within the timeout period, it kills the DTM process and terminates the session.
46. Can we copy a session to new folder or new repository?
Answer: Yes we can copy session to new folder or repository provided the corresponding Mapping is already in there.
47. What type of join does Lookup support?
Answer: Lookup is just similar like SQL LEFT OUTER JOIN.
48. How Union Transformation is used?
Answer: It is a diverse input group transformation which can be used to combine data from different sources. It works like UNION All statement in SQL that is used to combine result set of two SELECT statements.
49. What do you mean Incremental Aggregation?
Answer :Option for incremental aggregation is enabled whenever a session is created for a mapping aggregate. Power center performs incremental aggregation through the mapping and historical cache data to perform new aggregation calculations incrementally.
50. What is the difference between a connected look up and unconnected look up?
Answer :When the inputs are taken directly from other transformations in the pipeline it is called connected lookup. While unconnected lookup doesn’t take inputs directly from other transformations, but it can be used in any transformations and can be raised as a function using LKP expression. So it can be said that an unconnected lookup can be called multiple times in mapping.
51. What is a mapplet?
Answer :A recyclable object that is using mapplet designer is called a mapplet. It permits one to reuse the transformation logic in multitude mappings moreover it also contains set of transformations.
52.Briefly define reusable transformation?
Answer :Reusable transformation is used numerous times in mapping. It is different from other mappings which use the transformation since it is stored as a metadata. The transformations will be nullified in the mappings whenever any change in the reusable transformation is made.
53. What does update strategy mean, and what are the different option of it?
Answer :Row by row processing is done by informatica. Every row is inserted in the target table because it is marked as default. Update strategy is used whenever the row has to be updated or inserted based on some sequence. Moreover the condition must be specified in update strategy for the processed row to be marked as updated or inserted.
54. What is the scenario which compels informatica server to reject files?
Answer :This happens when it faces DD_Reject in update strategy transformation. Moreover it disrupts the database constraint filed in the rows was condensed.
55. What is surrogate key?
Answer :Surrogate key is a replacement for the natural prime key. It is a unique identification for each row in the table. It is very beneficial because the natural primary key can change which eventually makes update more difficult. They are always used in form of a digit or integer.
56.What are the prerequisite tasks to achieve the session partition?
Answer :In order to perform session partition one need to configure the session to partition source data and then installing the Informatica server machine in multifold CPU’s.
57. Which files are created during the session rums by informatics server?
Answer :During session runs, the files created are namely Errors log, Bad file, Workflow low and session log.
58. Briefly define a session task?
Answer :It is a chunk of instruction the guides Power center server about how and when to transfer data from sources to targets.
59. What does command task mean?
Answer :This specific task permits one or more than one shell commands in UNIX or DOS in windows to run during the workflow.
60. What is standalone command task?
Answer :This task can be used anywhere in the workflow to run the shell commands.
61. What is meant by pre and post session shell command?
Answer :Command task can be called as the pre or post session shell command for a session task. One can run it as pre session command r post session success command or post session failure command.
62.What is predefined event?
Answer :It is a file-watch event. It waits for a specific file to arrive at a specific location.
63. How can you define user defied event?
Answer :User defined event can be described as a flow of tasks in the workflow. Events can be created and then raised as need arises.
64. What is a work flow?
Answer: Work flow is a bunch of instructions that communicates server about how to implement tasks.
65. What are the different tools in workflow manager?
Answer :Following are the different tools in workflow manager namely
- Task Designer
- Task Developer
- Workflow Designer
66. Tell me any other tools for scheduling purpose other than workflow manager pmcmd?
Answer :The tool for scheduling purpose other than workflow manager can be a third party tool like ‘CONTROL M’.
67. What is OLAP (On-Line Analytical Processing?
Answer :A method by which multi-dimensional analysis occurs.
68. What are the different types of OLAP? Give an example?
Answer :ROLAP eg.BO, MOLAP eg.Cognos, HOLAP, DOLAP
69. What do you mean by worklet?
Answer :When the workflow tasks are grouped in a set, it is called as worklet. Workflow tasks includes timer, decision, command, event wait, mail, session, link, assignment, control etc.
70. What is the use of target designer?
Answer :Target Definition is created with the help of target designer.
71. Where can we find the throughput option in informatica?
Answer :Throughput option can be found in informatica in workflow monitor. In workflow monitor, right click on session, then click on get run properties and under source/target statistics we can find throughput option.
72. What is target load order?
Answer: Target load order is specified on the basis of source qualifiers in a mapping. If there are multifold source qualifiers linked to different targets then one can entitle order in which informatica server loads data into targets.
73. What is the benefit of partitioning a session?
Answer :Partitioning a session means solo implementation sequences within the session. It’s main purpose is to improve server’s operation and efficiency. Other transformations including extractions and other outputs of single partitions are carried out in parallel.
74. How are indexes created after completing the load process?
Answer :For the purpose of creating indexes after the load process, command tasks at session level can be used. Index creating scripts can be brought in line with the session’s workflow or the post session implementation sequence. Moreover this type of index creation cannot be controlled after the load process at transformation level.
75. Explain sessions. Explain how batches are used to combine executions?
Answer :A teaching set that needs to be implemented to convert data from a source to a target is called a session. Session can be carried out using the session’s manager or pmcmd command. Batch execution can be used to combine sessions executions either in serial manner or in a parallel. Batches can have different sessions carrying forward in a parallel or serial manner.
76. How many number of sessions can one group in batches?
Answer :One can group any number of sessions but it would be easier for migration if the number of sessions are lesser in a batch.
77. Explain the difference between mapping parameter and mapping variable?
Answer :When values change during the session’s execution it’s called a mapping variable. Upon completion the Informatica server stores the end value of a variable and is reused when session restarts. Moreover those values that do not change during the sessions execution are called mapping parameters. Mapping procedure explains mapping parameters and their usage. Values are allocated to these parameters before starting the session.
78.What is complex mapping?
Answer :Following are the features of complex mapping.
- Difficult requirements
- Many numbers of transformations
- Complex business logic
79. How can one identify whether mapping is correct or not without connecting session?
Answer :One can find whether the session is correct or not without connecting the session is with the help of debugging option.
80. Can one use mapping parameter or variables created in one mapping into any other reusable transformation?
Answer :Yes, One can do because reusable transformation does not contain any mapplet or mapping.
81. Explain the use of aggregator cache file?
Answer :Aggregator transformations are handled in chunks of instructions during each run. It stores transitional values which are found in local buffer memory. Aggregators provides extra cache files for storing the transformation values if extra memory is required.
82. Briefly describe lookup transformation?
Answer :Lookup transformations are those transformations which have admission right to RDBMS based data set. The server makes the access faster by using the lookup tables to look at explicit table data or the database. Concluding data is achieved by matching the look up condition for all look up ports delivered during transformations.
83. What does role playing dimension mean?
Answer :The dimensions that are utilized for playing diversified roles while remaining in the same database domain are called role playing dimensions.
84. How can repository reports be accessed without SQL or other transformations?
Answer :Repositoryreports are established by metadata reporter. There is no need of SQL or other transformation since it is a web app.
85. What are the types of metadata that stores in repository?
Answer :The types of metadata includes Source definition, Target definition, Mappings, Mapplet, Transformations.
86. Explain the code page compatibility?
Answer :When data moves from one code page to another provided that both code pages have the same character sets then data loss cannot occur. All the characteristics of source page must be available in the target page. Moreover if all the characters of source page are not present in the target page then it would be a subset and data loss will definitely occur during transformation due the fact the two code pages are not compatible.
87. How can you validate all mappings in the repository simultaneously?
Answer :All the mappings cannot be validated simultaneously because each time only one mapping can be validated.
88. Briefly explain the Aggregator transformation?
Answer :It allows one to do aggregate calculations such as sums, averages etc. It is unlike expression transformation in which one can do calculations in groups.
89. Describe Expression transformation?
Answer :Values can be calculated in single row before writing on the target in this form of transformation. It can be used to perform non aggregate calculations. Conditional statements can also be tested before output results go to target tables.
90. What do you mean by filter transformation?
Answer :It is a medium of filtering rows in a mapping. Data needs to be transformed through filter transformation and then filter condition is applied. Filter transformation contains all ports of input/output, and the rows which meet the condition can only pass through that filter.
91. What is an expression transformation?
Answer :An expression transformation is used to calculate values in a single row. Example: salary+1000
92. How to generate sequence numbers using expression transformation?
Answer :Create a variable port in expression transformation and increment it by one for every row. Assign this variable port to an output port.
93. What is a union transformation?
Answer :A union transformation is used merge data from multiple sources similar to the UNION ALL SQL statement to combine the results from two or more SQL statements.
94. As union transformation gives UNION ALL output, how you will get the UNION output?
Answer :Pass the output of union transformation to a sorter transformation. In the properties of sorter transformation check the option select distinct. Alternatively you can pass the output of union transformation to aggregator transformation and in the aggregator transformation specify all ports as group by ports.
95. What are the guidelines to be followed while using union transformation?
Answer: The following rules and guidelines need to be taken care while working with union transformation:
- You can create multiple input groups, but only one output group.
- All input groups and the output group must have matching ports. The precision, datatype, and scale must be identical across all groups.
- The Union transformation does not remove duplicate rows. To remove duplicate rows, you must add another transformation such as a Router or Filter transformation.
- You cannot use a Sequence Generator or Update Strategy transformation upstream from a Union transformation.
- The Union transformation does not generate transactions.
96. Why union transformation is an active transformation?
Answer :Union is an active transformation because it combines two or more data streams into one. Though the total number of rows passing into the Union is the same as the total number of rows passing out of it, and the sequence of rows from any given input stream is preserved in the output, the positions of the rows are not preserved, i.e. row number 1 from input stream 1 might not be row number 1 in the output stream. Union does not even guarantee that the output is repeatable
97. What is a transformation?
Answer :A transformation is a repository object that generates, modifies, or passes data.
98. What is an active transformation?
Answer :An active transformation is the one which changes the number of rows that pass through it.
Example: Filter transformation
99. What is a passive transformation?
Answer :A passive transformation is the one which does not change the number of rows that pass through it.
Example: Expression transformation
100. What is a connected transformation?
Answer :A connected transformation is connected to the data flow or connected to the other transformations in the mapping pipeline.
Example: sorter transformation
101. What is an unconnected transformation?
Answer :An unconnected transformation is not connected to other transformations in the mapping. An unconnected transformation is called within another transformation and returns a value to that transformation.
Example: Unconnected lookup transformation, unconnected stored procedure transformation
102. What are multi-group transformations?
Answer :Transformations having multiple input and output groups are called multigroup transformations.
Examples: Custom, HTTP, Joiner, Router, Union, Unstructured Data, XML source qualifier, XML Target definition, XML parser, XML generator
103. List out all the transformations which use cache?
Answer :Aggregator, Joiner, Lookup, Rank, Sorter
104. What is blocking transformation?
Answer :Transformation which blocks the input rows are called blocking transformation.
Example: Custom transformation, unsorted joiner
105. What is a reusable transformation?
Answer :A reusable transformation is the one which can be used in multiple mappings. Reusable transformation is created in transformation developer.
106. How do you promote a non-reusable transformation to reusable transformation?
Answer :Edit the transformation and check the Make Reusable option
107. How to create a non-reusable instance of reusable transformations?
Answer :In the navigator, select an existing transformation and drag the transformation into the mapping workspace. Hold down the Ctrl key before you release the transformation.
108. Which transformation can be created only as reusable transformation but not as non-reusable transformation?
Answer :External procedure transformation.
109.What is a control task?
Answer :A control task is used to alter the normal processing of a workflow by stopping, aborting, or failing a workflow or worklet.
110. What is a pipeline partition and how does provide a session with higher performance?
Answer :Within a mapping, a session can break apart different source qualifier to target pipelines into their own reader/transformation/writer thread(s). This allows the Integration Service to run the partition in parallel with other pipeline partitions in the same mapping. The parallelism creates a higher performing session.
111. What is the maximum number of partitions that can be defined for in a single pipeline?
Answer :You can define up to 64 partitions at any partition point in a pipeline.
112. Pipeline partitions is designed to increase performance, however list one of it’s disadvantages?
Answer :Increasing the number of partitions increases the load on the node. If the node does not contain enough CPU bandwidth, you can overload the system.
113.What is a dynamic session partition?
Answer :A dynamic session partition is where the Integration Service scales the number of session partitions at runtime. The number of partitions is based on a number of factors including number of nodes in a grid or source database partitions.
114.List three dynamic partitioning configurations that cause a session to run with one partition
Answer:
1. You set dynamic partitioning to the number of nodes in the grid, and the session does not run on a grid.
2. You create a user-defined SQL statement or a user-defined source filter.
3. You use dynamic partitioning with an Application Source Qualifier.
115. What is pushdown optimization?
Answer :Pushdown optimization is a feature within Informatica PowerCenter that allows us to push the transformation logic in a mapping into SQL queries that are executed by the database. If not all mapping transformation logic can be translated into SQL, then the Integration Service will process what is left.
116.List the different types of pushdown optimization that can be configured?
Answer :
1. Source-side pushdown optimization – The Integration Service pushes as much transformation logic as possible to the source database.
2. Target-side pushdown optimization – The Integration Service pushes as much transformation logic as possible to the target database.
3. Full pushdown optimization – The Integration Service attempts to push all transformation logic to the target database. If the Integration Service cannot push all transformation logic to the database, it performs both source-side and target-side pushdown optimization.
117. What databases are we able to configure pushdown optimization?
Answer :
IBM DB2
Microsoft SQL Server
Netezza
Oracle
Sybase ASE
Teradata
Databases that use ODBC drivers
118.List several transformations that work with pushdown optimization to push logic to the database?
Answer :Aggregator, Expression, Filter, Joiner, Lookup, Router, Sequence Generator, Sorter, Source Qualifier, Target, Union, Update Strategy
119.What is real-time processing?
Data sources such as JMS, WebSphere MQ, TIBCO, webMethods, MSMQ, SQP, and webservices can publish data in real-time. These real-time sources can be leveraged by Informatica PowerCenter to process data on-demand. A session can be specifically configured for real-time processing.
120.What types of real-time data can be processed with Informatica PowerCenter
Answer :
1. Messages and message queues. Examples include WebSphere MQ, JMS, MSMQ, SAP, TIBCO, and webMethods sources.
2. Web service messages. Example includes receiving a message from a web service client through the Web Services Hub.
3. Change data from PowerExchange change data capture sources.
121.What is a real-time processing terminating condition?
A real-time processing terminating condition determines when the Integration Service stops reading messages from a real-time source and ends the session.
122.List three real-time processing terminating conditions?
Answer :
1. Idle time – Time Integration Service waits to receive messages before it stops reading from the source.
2. Message count – Number of messages the Integration Service reads from a real-time source before it stops reading from the source.
3. Reader time limit – Amount of time in seconds that the Integration Service reads source messages from the real-time source before it stops reading from the source
123.What is real-time processing message recovery?
Answer :Real-time processing message recovery allows the Integration Service to recover unprocessed messages from a failed session. Recovery files, tables, queues, or topics are used to recover the source messages or IDs. Recovery mode can be used to recover these unprocessed messaged.
124. How to Join 2 tables connected to a Source Qualifier w/o having any relationship defined.
Answer: By writing an sql override.
125. In a mapping if three are 2 targets to load header and detail, how to ensure that header loads first then detail table.
Answer: Constraint Based Loading (if no relationship at oracle level) OR Target Load Plan (if only 1 source qualifier for other tables) OR select first the header target table and then the detail table while dragging them in mapping.
126. A mapping just take 10 seconds to run, it takes a source file and insert into target, but before that there is a Stored Procedure transformation which takes around 5 minutes to run and gives output ‘Y’ or ‘N’. If Y then continue feed or else stop the feed. (Hint: since SP transformation takes more time compared to the mapping, it shouldn’t run row wise).
Answer: There is an option to run the stored procedure before starting to load the rows.
127. What is difference between view and materialized view?
Answer: Views contains query whenever execute views it has read from base table Where as M views loading or replicated takes place only once, which gives you better query performance .refresh m views 1.on commit and 2. on demand (Complete, never, fast, force).
128. What is bitmap index why it’s used for DWH?
Answer: bitmap for each key value replaces a list of rowids. Bitmap index more efficient for data warehousing because low cardinality, low updates, very efficient for where class
129. What is star schema? And what is snowflake schema?
Answer: The center of the star consists of a large fact table and the points of the star are the dimension tables. snowflake schemas normalized dimension tables to eliminate redundancy. That is, the Dimension data has been grouped into multiple tables instead of one large table.
Star schema contains demoralized dimension tables and fact table, each primary key values in dimension table associated with foreign key of fact tables.
Here a fact table contains all business measures (normally numeric data) and foreign key values, and dimension tables has details about the subject area. Snowflake schema basically a normalized dimension tables to reduce redundancy in the dimension tables.
130. Why need staging area database for DWH?
Answer: Staging area needs to clean operational data before loading into data warehouse. cleaning in the sense your merging data which comes from different source.