Informatica Powercenter調優

Mapping 調優

1) 儘量在source qualifier裏過濾數據

2) Join的時候,用數據量較少的表作爲Master Table, 儘量用整型字段作爲key來join

3) 用aggregator, joiner的時候,如果能確認輸入的數據已排過序, 可以打開sorted input. (Data must be sorted in the order of the 'group by' ports.) if the size of the data is small, can use the sorter trans before the aggregater trans. Otherwise do the sort out of informatica. when the size greater than 250,000 rows, it's better to use sorted input.

4) 用|| 而不是concat函數來連接字符

5) Transaction Control Transformation會影響後面的transformation(Aggregater[, Joiner, Rank, Soter] with the All input level transformation scope)或者target的效率, 因爲它會忽略超過transaction上線的數據

6) SQL Override的時候儘量不要用aggregate function (it may cause the temp space in Oracle to fill up when selecting large amounts of data)

7) 優化Lookup

caching lookups, 但是當輸入記錄比較少(5000 rows),但是lookup table比較大(5,000,000), then the lookup table should not be cached.

static cache(by default), it is built when it processes the first lookup request, and its not updated during the session if rows are inserted.

dynamic cache, it should be used when the lookup table is also the target table. so that the lookup cache is updated.

persistent cache, when u want to save and reuse the cache files, when the lookup table doesn't change between session runs.

lookup trans應該去掉不用的端口

lookup condition把=(equality operator)用作第一個條件

lookup trans默認給每一個lookup port添加order by, 可以在lookup override sql中註釋掉一部分字段來提高效率。或者在order by port上添加index

 

===================================================================================

 

Identify the bottlenecks in below stages :
Target
Source
Mapping
Session
System

Methods :
1) Test the throughput
To verify the database tuning, create a pipeline with one partition and measure the reader throughput in the workflow monitor and after that add more partitions and verify the throughput scales linearly. If the session has two partitions, the reader throughput should be twice as fast. If the reader throughput is slow then database has to be tuned properly.

Optimizing the Target
1) drop indexes and key constraints before load, then create them after load by pre-session and post-session sql command or pre-load and post-load store procedure.

2) minimize the relational database connections

3) using bulk load

4) In case the target is flat file, create in the same Informatica server, then FTP it to other server if needs.

Optimizing the Source
1) better to join multiple sources in the source qualifier, instead of using joiner trans.

2) create an index on the order by or group by column.

3) using conditional filter in the source qualifer to filter out unnecessary rows.

Optimizing the LookUp
1) consider to disable the lookup cache, if the lookup table is huge, and no efficient sql override. But also needs to consider about the no. of rows which will refer the lookup table. Uncached lookup can be made efficient by creating index on columns used in the lookup condition.

2) using conditional filter in the sql override to reduce the no. of rows cached.

3) don't fetch the unnecessary fields

4) remove the default ORDER BY clause, and create index on columns used in the ORDER BY clause.

5) use lookup trans towards the end of the mapping, this eliminates carring extra ports through the pipeline.

Optimizing the Aggregator
1) use Sorter trans first, then enable the Sorted Input Option. You can do the ORDER BY in SQ also.

2) if possible, use nemeric ports for group by columns.

Optimizing the Joiner
1) mark the source with less number of rows as MASTER.

2) if possible, use nemeric ports for join condition.

Optimizing the Sorter
1) if the amount of source data is greater than the Sorter cache size, then the Integration Service requires more than twice the amount of disk space. For optimal performance, allocate sort cache size as 16 MB of physical memory.

Optimizing the Expression
1) use operatoer instead of function in the expression.

2) uncheck the output option for the ports which are not required as output.

Optimizing in General
1) remove the unnecessary ports

2) don't use the verbose data

3) avoid unnecessary data type conversion

4) when updating targets, only link the necessary ports

5) use shortcut in sharefolder, and reusable components like mapplet, transformation

Optimizing in Session
1) A workflow runs faster when you do not configure it to write session and workflow log files. Workflows and sessions always create binary logs. To write the session or workflow log file, the Integration Service writes logging events twice

Optimizing by Partition
1) In task window under Config Object tab there is one option called Partitioning Options. In that there is one attribute called Dynamic Partitioning. We need to select the value Based on number of nodes in grid.
2) This will dynamically use the available nodes on Informatica servers. If dev has two nodes then the session will run with two partitions and if Prod has five nodes then the session will run with five partitions.
3) Merge the output file if using dynamical partitions. (sequential merge)
4) It is always best to select Target instances and Source qualifier as partition points. For these transformations we can select direct pass-through partition as type.
5) Also if the mapping involves aggregator we need to select a partition point before the aggregator. It is advised to select the sorter transformation as a partition point and for sorter we can select Hash Auto Keys partition as type

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章