exp/imp 與 expdp/impdp 對比 及使用中的一些優化事項


ORACLE 數據庫邏輯備份 簡單 EXP/IMP

http://blog.csdn.net/tianlesoftware/archive/2009/10/23/4718366.aspx

 

Oracle 10g EXPDPIMPDP使用說明

http://blog.csdn.net/tianlesoftware/archive/2009/10/16/4674224.aspx

 

 

. exp/imp  expdp/impdp 對比

 

1.1 expdp/impdp調用Server端的API在執行操作,是數據庫內部的job任務。必須在服務器端執行。 而exp/imp 則沒有限制。

 

1.2  exp/imp  expdp/impdp 的默認模式和原理不一樣

 

1.2.1 exp/imp 不同模式原理

metalink的這邊文章中,提到了exp/imp的不同模式下的工作原理:

Parameter DIRECT: Conventional Path Export Versus Direct Path Export [ID 155477.1]

http://blog.csdn.net/tianlesoftware/archive/2010/12/22/6090759.aspx

 

 

Starting with Oracle7 release 7.3, the Export utility provides two methods for exporting table data:

- Conventional Path Export

- Direct Path Export

 

1 Conventional path Export.

Conventional path Export uses the SQL SELECT statement to extract data from tables. Data is read from disk into the buffer cache, and rows are transferred to the evaluating buffer. The data, after passing expression evaluation, is transferred to the Export client, which then writes the data into the export file.

 

            exp/imp 默認會是傳統路徑, 這種模式下,是用SELECT 加數據查詢出來, 然後寫入buffer cache, 在將這些記錄寫入evaluate buffer. 最後傳到Export客戶端,在寫入dump文件。

 

 

 

2 Direct path Export.

            When using a Direct path Export, the data is read from disk directly into the export session's program global area (PGA): the rows are transferred directly to the Export session's private buffer. This also means that the SQL command-processing layer (evaluation buffer) can be bypassed, because the data is already in the format that Export expects. As a result, unnecessary data conversion is avoided. The data is transferred to the Export client, which then writes the data into the export file.

 

The default is DIRECT=N, which extracts the table data using the conventional path.

 

This parameter is only applicable to the original export client. Export DataPump (expdp) uses a Direct Path unload by default and switches to External Table mode if required

 

            直接路徑模式下,數據直接從硬盤讀取,然後寫入PGA, 格式就是export 的格式,不需要轉換, 數據再直接傳到export 客戶端,寫入dump 文件。 這種模式沒有經過evaluation buffer。 少了一個過程,導出速度提高也是很明顯。

 

 

1.2.2 expdp/impdp 不同模式

            Export/Import DataPump Parameter ACCESS_METHOD - How to Enforce a Method of Loading and Unloading Data ? [ID 552424.1]

            http://blog.csdn.net/tianlesoftware/archive/2010/12/22/6090757.aspx

 

            The two most commonly used methods to move data in and out of databases with Data Pump are the "Direct Path" method and the "External Tables" method.

 

1Direct Path mode.

After data file copying, direct path is the fastest method of moving data. In this method, the SQL layer of the database is bypassed and rows are moved to and from the dump file with only minimal interpretation. Data Pump automatically uses the direct path method for loading and unloading data when the structure of a table allows it.

 

expdp/impdp 默認就是使用直接路徑的,所以expdp要比exp塊。

 

2External Tables mode.

            If data cannot be moved in direct path mode, or if there is a situation where parallel SQL can be used to speed up the data move even more, then the external tables mode is used. The external table mechanism creates an external table that maps the dump file data for the database table. The SQL engine is then used to move the data. If possible, the APPEND hint is used on import to speed the copying of the data into the database.

Note: When the Export NETWORK_LINK parameter is used to specify a network link for an export operation, a variant of the external tables method is used. In this case, data is selected from across the specified network link and inserted into the dump file using an external table.

 

3Data File Copying mode.

            This mode is used when a transport tablespace job is started, i.e.: the TRANSPORT_TABLESPACES parameter is specified for an Export Data Pump job. This is the fastest method of moving data because the data is not interpreted nor altered during the job, and Export Data Pump is used to unload only structural information (metadata) into the dump file.

 

4 Network Link Import mode.

            This mode is used when the NETWORK_LINK parameter is specified during an Import Data Pump job. This is the slowest of the four access methods because this method makes use of an INSERT SELECT statement to move the data over a database link, and reading over a network is generally slower than reading from a disk.

 

這種模式很方便,但是速度是最慢的,因爲它是通過insertselect + dblink來實現的。 速度慢也由此可見了。

 

示例:

create directory dump1 as '/oradata/dumpfiles';
grant read,write on dump1 to xxx;

 

創建DBLINK:

/* Formatted on 2010/12/23 11:28:22 (QP5 v5.115.810.9015) */

CREATE DATABASE LINK TIANLESOFTWARE

 CONNECT TO BUSINESS

 IDENTIFIED BY <PWD>

 USING

      '(DESCRIPTION =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = IP ADDRESS)(PORT = 1521))

)

(CONNECT_DATA =

(SID = ORCL)

(SERVER = DEDICATED)

)

)';

 

Dumpfile 參數 ,可以用%U 指定

expdp xxx/xxx schemas=xxx  directory=dump1  dumpfile=xxx_%U.dmp filesize=5g

這樣每個文件5G xxx_01.dump,xxx_02.dump 這樣。

 

關於%U參考:

http://download.oracle.com/docs/cd/B19306_01/server.102/b14215/dp_overview.htm#sthref44

 

在下文也會講到這點。

 

expdp xxx/xxx schemas=xxx  directory=dump1 network_link =dbl_65 dumpfile=xxx_01.dump ,xxx_02.dump

這樣也可以,但不確定xxx_01.dump增到多大才開始寫xxx_02.dump文件。

 

ESTIMATE_ONLY=y 可以估計文件大小。

NETWORK_LINK:這樣就可以不必一定在本機expdp,也可以在目標機通過 NETWORK_LINK 把從文件抽到目標機上。

expdp xxx/xxx schemas=xxx  directory=dump1 network_link =tianlesoftware dumpfile=xxx_%U.dump filesize=10m
或者用impdp + network_link 實現無文件導入

 

需要注意,LOB字段可以使用NETWORK_LINK ,而long類型字段會報錯,

ORA-31679: Table data object "xx"."SYS_USER" has long columns, and longs can not be loaded/unloaded using a network link

 

1.3  網絡和磁盤影響

            expdp/impdp 是服務端程序,影響它速度的只有磁盤IO

            exp/imp 可以在服務端,也可以在客戶端。所以,它受限於網絡和磁盤。

 

1.4 exp/imp  expdp/impdp 功能上的區別

1)把用戶usera的對象導到用戶userb,用法區別在於fromuser=usera touser=userb ,remap_schema='usera':'userb' 。例如

            imp system/passwd fromuser=usera touser=userb file=/oracle/exp.dmp log=/oracle/exp.log;

            impdp system/passwd directory=expdp dumpfile=expdp.dmp remap_schema='usera':'userb' logfile=/oracle/exp.log;

2)更換表空間,用exp/imp的時候,要想更改表所在的表空間,需要手工去處理一下,如alter table xxx move tablespace_new之類的操作。用impdp只要用remap_tablespace='tabspace_old':'tablespace_new'

3)當指定一些表的時候,使用exp/imp 時,tables的用法是 tables=('table1','table2','table3')expdp/impdp用法是tables='table1','table2','table3'

 

4)是否要導出數據行

            exp (ROWS=Y 導出數據行,ROWS=N 不導出數據行)

            expdp contentALL:對象+導出數據行,DATA_ONLY:只導出對象,METADATA_ONLY:只導出數據的記錄)

 

 

使用中的優化事項

 

2.1  exp

            通過上面的分析,知道採用direct path可以提高導出速度。 所以,在使用exp時,就可以採用直接路徑模式。 這種模式有2個相關的參數:DIRECT RECORDLENGTH參數。

 

  DIRECT參數定義了導出是使用直接路徑方式(DIRECT=Y),還是常規路徑方式(DIRECT=N)。常規路徑導出使用SQL SELECT語句從表中抽取數據,直接路徑導出則是將數據直接從磁盤讀到PGA再原樣寫入導出文件,從而避免了SQL命令處理層的數據轉換過程,大大提高了導出效率。在數據量大的情況下,直接路徑導出的效率優勢更爲明顯,可比常規方法速度提高三倍之多。

 

  和DIRECT=Y配合使用的是RECORDLENGTH參數,它定義了Export I/O緩衝的大小,作用類似於常規路徑導出使用的BUFFER參數。建議設置RECORDLENGTH參數爲最大I/O緩衝,即65535(64kb)。其用法如下:

            如:exp userid=system/manager full=y direct=y recordlength=65535 file=exp_full.dmp log=exp_full.log

 

 

一些限制如下:

            You cannot use the DIRECT=Y parameter when exporting in tablespace-mode  (i.e. when specifying the parameter TRANSPORT_TABLESPACES=Y).  You can use the DIRECT=Y parameter when exporting in full, user or table mode (i.e.: when specifying FULL=Y or OWNER=scott or TABLES=scott.emp).

--直接路徑不能使用在tablespace-mode

 

The parameter QUERY applies ONLY to conventional path Export. It cannot be specified in a direct path export (DIRECT=Y).

-- 直接路徑不支持query 參數。 query 只能在conventional path模式下使用。

 

In versions of Export prior to 8.1.5, you could not use direct path Export for tables containing objects and LOBs.

-如果exp 版本小於8.1.5,不能使用exp 導入有lob字段的biao。 不過現在很少有有8版本的數據庫了。 這點可以忽略掉了。

 

The BUFFER parameter applies ONLY to conventional path Export. It has no effect on a direct path Export. This BUFFER parameter specifies the size (in bytes) of the buffer used to fetch rows. It determines the maximum number of rows in an array, fetched by Export.  For direct path Export, use the RECORDLENGTH parameter to specify the size of the buffer that Export uses for writing to the export file.

-- buffer 選項只對conventional path exp有效。 對於直接路徑沒有影響。 對於直接路徑, 應該設置RECORDLENGTH 參數。

 

The RECORDLENGTH parameter specifies the length (in bytes) of the file record. You can use this parameter to specify the size of the Export I/O buffer (highest value is 64 kb). Changing the RECORDLENGTH parameter affects only the size of data that accumulates before writing to disk.  It does not affect the operating system file block size.  If you do not define this parameter, it defaults to your platform-dependent value for BUFSIZ (1024 bytes in most cases).

 

invoking a Direct path Export with a maximum I/O buffer of 64kb can improve the performance of the Export with almost 50%. This can be achieved by specifying the additional Export parameters DIRECT and RECORDLENGTH

 

-- 對於直接路徑下, RECORDLENGTH參數建議設成64k65535)。這個值對性能提高比較大。  如:

> exp system/manager FILE=exp_full.dmp LOG=exp_full.log

FULL=y DIRECT=y RECORDLENGTH=65535

> imp system/manager FILE=exp_full.dmp LOG=imp_full.log

FULL=y RECORDLENGTH=65535 

 

 

2.2  IMP

 

  Oracle Import進程需要花比Export進程數倍的時間將數據導入數據庫。某些關鍵時刻,導入是爲了應對數據庫的緊急故障恢復。爲了減少宕機時間,加快導入速度顯得至關重要。沒有特效辦法加速一個大數據量的導入,但我們可以做一些適當的設定以減少整個導入時間。

 

1)避免I/O競爭

            Import是一個I/O密集的操作,避免I/O競爭可以加快導入速度。如果可能,不要在系統高峯的時間導入數據,不要在導入數據時運行job等可能競爭系統資源的操作。

2)增加排序區

  Oracle Import進程先導入數據再創建索引,不論INDEXES值設爲YES或者NO,主鍵的索引是一定會創建的。創建索引的時候需要用到排序區,在內存大小不足的時候,使用臨時表空間進行磁盤排序,由於磁盤排序效率和內存排序效率相差好幾個數量級。增加排序區可以大大提高創建索引的效率,從而加快導入速度。

3)調整BUFFER選項

  Imp參數BUFFER定義了每一次讀取導出文件的數據量,設的越大,就越減少Import進程讀取數據的次數,從而提高導入效率。BUFFER的大小取決於系統應用、數據庫規模,通常來說,設爲百兆就足夠了。其用法如下:

            imp user/pwd fromuser=user1 touser=user2 file=/tmp/imp_db_pipe1 commit=y feedback=10000 buffer=10240000

4)使用COMMIT=Y選項

  COMMIT=Y表示每個數據緩衝滿了之後提交一次,而不是導完一張表提交一次。這樣會大大減少對系統回滾段等資源的消耗,對順利完成導入是有益的。

5)使用INDEXES=N選項

  前面談到增加排序區時,說明Imp進程會先導入數據再創建索引。導入過程中建立用戶定義的索引,特別是表上有多個索引或者數據表特別龐大時,需要耗費大量時間。某些情況下,需要以最快的時間導入數據,而索引允許後建,我們就可以使用INDEXES=N 只導入數據不創建索引,從而加快導入速度。

  我們可以用INDEXFILE選項生成創建索引的DLL腳本,再手工創建索引。我們也可以用如下的方法導入兩次,第一次導入數據,第二次導入索引。其用法如下:

imp user/pwd fromuser=user1 touser=user2 file=/tmp/imp_db_pipe1 commit=y feedback=10000 buffer=10240000 ignore=y rows=y indexes=n

 

imp user/pwd fromuser=user1 touser=user2 file=/tmp/imp_index_pipe1 commit=y feedback=10000 buffer=10240000 ignore=y rows=n indexes=y

6)增加  LARGE_POOL_SIZE

  如果在init.ora中配置了MTS_SERVICEMTS_DISPATCHERS等參數,tnsnames.ora中又沒有(SERVER=DEDICATED)的配置,那麼數據庫就使用了共享服務器模式。在MTS模式下,Exp/Imp操作會用到LARGE_POOL,建議調整LARGE_POOL_SIZE150M

 

檢查數據庫是否在MTS模式下:

  SQL>select distinct server from v$session;

如果返回值出現noneshared,說明啓用了MTS

 

 

2.3 Expdp/Impdp

 

            據泵與exp/imp 來說性能有很大的提高, 其中影響最大的就是paralle。 可以這麼來看:  expdp/impdp=exp/imp+direct moe + paralle. 所以, 使用數據泵,要想提高速度,就要設置並行參數。

 

先看2個參數:

Setting Parallelism

For export and import operations, the parallelism setting (specified with the PARALLEL parameter) should be less than or equal to the number of dump files in the dump file set. If there are not enough dump files, the performance will not be optimal because multiple threads of execution will be trying to access the same dump file.

The PARALLEL parameter is valid only in the Enterprise Edition of Oracle Database 10g.

Using Substitution Variables

Instead of, or in addition to, listing specific filenames, you can use the DUMPFILE parameter during export operations to specify multiple dump files, by using a substitution variable (%U) in the filename. This is called a dump file template. The new dump files are created as they are needed, beginning with 01 for %U, then using 02, 03, and so on. Enough dump files are created to allow all processes specified by the current setting of the PARALLEL parameter to be active. If one of the dump files becomes full because its size has reached the maximum size specified by the FILESIZE parameter, it is closed, and a new dump file (with a new generated name) is created to take its place.

 

From

http://download.oracle.com/docs/cd/B19306_01/server.102/b14215/dp_overview.htm#sthref44

 

 

如果我們使用如下語句:

expdp full=y directory=dump dumpfile=orcl_%U.dmp parallel=4

 

            那麼expdp將爲parallel 創建4個文件: ORCL_01.DMPRCL_02.DMP

ORCL_03.DMPRCL_04.DMP。 每個進程一個文件。 這樣的話,每個文件的大小會因進程而不同。 可以某個文件很大,某個文件卻很小。 要解決這個問題,就是設置filesize 參數。 來指定每個文件的最大值。 這樣當一個文件達到最大值的之後,就會創建一個新的文件。

 

如:

expdp full=y directory=dump dumpfile=orcl_%U.dmp parallel=4 filesize=50M

 

 

            導出的dump文件和paralle有關係,那麼導入也有關係。 paralle要小於dump文件數。 如果paralle 大於dump文件的個數,就會因爲超過的那個進程獲取不到文件,就不能對性能提高。

 

            一般parall 參數值等於CPU 的個數。而且要小於dump文件的個數。

 

查看CPU 個數:

SQL> show parameter cpu

 

 

注意事項:

1)導入的時候可能會停在某個地方,比如在創建索引的時候,可能在一個地方停了十幾分鍾。  這個時候切記不要中斷過程。  這個時候可能是需要導入的數據比較多。

 

            可以在不同時段觀察下表空間大小的變化。 如果表空間一直在變化,說明還在導入,這個時候耐心等待就好。

 

查看錶空間可以用如下SQL:

/* Formatted on 2010/12/23 13:14:13 (QP5 v5.115.810.9015) */

SELECT   a.tablespace_name,

         ROUND (a.total_size) "total_size(MB)",

         ROUND (a.total_size) - ROUND (b.free_size, 3) "used_size(MB)",

         ROUND (b.free_size, 3) "free_size(MB)",

         ROUND (b.free_size / total_size * 100, 2) || '%' free_rate

  FROM   (  SELECT   tablespace_name, SUM (bytes) / 1024 / 1024 total_size

              FROM   dba_data_files

          GROUP BY   tablespace_name) a,

         (  SELECT   tablespace_name, SUM (bytes) / 1024 / 1024 free_size

              FROM   dba_free_space

          GROUP BY   tablespace_name) b

 WHERE   a.tablespace_name = b.tablespace_name(+);

 

 

2)導出導入的過程,儘量避免用ssh連上服務器,在客戶端的ssh裏執行備份恢復命令。 因爲這樣,如果連接中斷,備份也就中斷了。 可以將備份腳本添加到crontab 裏。 讓備份在服務器上執行。 這樣即使ssh中斷,備份和恢復也不受影響。

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章