2019-11-12 kk日記,使用python完成ora2pg的工作小結

一、案例

從商業數據庫的使用轉移到開源數據庫是目前的潮流,所以我也不能免俗,在工作之中,抽出一點時間研究了一下,從oracle到pg到步驟。

二、問題

從oracle 到 pg 要解決一系列的問題,如:

  • 在pg中使用什麼架構能夠實現oracle rac時的同樣架構?
  • oracle 中的sql/plsql代碼如何改造?
  • oracle 與 pg 的數據類型如何對應?
  • 如何把oracle的數據遷移到pg上?

除了以上的問題,相信還有好多不同的問題,但是本記錄中,我會更聚焦於“如何把oracle的數據遷移到pg上?”

三、研究分析

3.1 問題拆解

抽取oracle的表結構

pg上重構結構

抽取oracle的數據

轉換到pg數據類型格式

插入到pg

在後面到實驗中一般的數據讀取出來後,基本上是不需要轉換,直接就可以insert、copy回去了。

3.2 工具調研

  • ora2pg
  • pgloader
  • 使用python自研一個

爲了加深理解,我覺的自己用python寫一個會好一點,所以我選擇第三項。

3.3 編程準備

使用python編寫一個數據遷移的程序,需要用到以下到包

  • cx_oracle
  • psycopg2 (安裝過程略,我是先安裝了postgresql到本機上,再用pip install psycopg2-binary ,完美安裝,沒有報錯)
  • csv

3.4 編程要點

3.4.1 psycopg2關鍵api解釋

psycopg2提夠很多高效多api給我們完成數據插入到pg到工作,舉例如下:

  • execute_values
execute_values(cursor,sql,values)
cursor:顧名思義就是pg connect創建的遊標。
sql:顧名思義就是要執行的sql,但這裏有個特色,一個佔位符,就可以代表所有參數變量,超級好用,舉例:insert into table1 values %s 
values:可以一維數組插入一條記錄,也可以是二維數組把多條記錄批量插入。
代碼示例:
sql='insert into '+pgTable+' values %s'
values=[[1,'a','b'],[2,'c','d']]
psycopg2.extras.execute_values(pgconn,sql,values)
pgconn.commit()
  • copy_from
copy_from(file, table, sep='\t', null='\\N', size=8192, columns=None)
Read data from the file-like object file appending them to the table named table.

Parameters:	
file – file-like object to read data from. It must have both read() and readline() methods.
table – name of the table to copy data into.
sep – columns separator expected in the file. Defaults to a tab.
null – textual representation of NULL in the file. The default is the two characters string \N.
size – size of the buffer used to read from the file.
columns – iterable with name of the columns to import. The length and types should match the content of the file to read. If not specified, it is assumed that the entire table matches the file structure.
示例:
  def copyDataFrom(self,tabname,filepath):
        try:
            file=open(filepath,'r')
            print 'Start to COPY....'
            self.pgCur.copy_from(file,tabname,',',null='')
            self.pgConn.commit()
            print 'copy successful!'
        except Exception as e:
            print 'copy failed, cause by %s %s'%('\n',e)
特別提醒兩個參數:sep分隔符,默認是tab,如果是逗號,就要改成sep=',' 
另一個是null你的文件中用什麼符合代表null,默認是兩個空格,我的是沒有空格,對應就是null=''

3.4.2 CSV庫對使用

非常方便用於生成csv格式,這個是用來把oracle的數據保存到csv中,然後使用copy命令導入到pg,以下是示例代碼

file=open('/work/data/tabtest.csv','w')
csvWriter=csv.writer(file,dialect='excel')
csvWriter.writerows(rows)
注意,writerows是把二維數組插入到csv中
writerow是把一緯數組插入到csv中

四、結論

  • 使用copy_from 速度> execute_values > execute
  • oracle to pg 簡單數據類型對應如下:
number -> numeric;
VARCHAR2,NVARCHAR2,NVARCHAR-> varchar;
date ,timestamp--> TIMESTAMP WITHOUT time zone ;

blob–>bytea

cblob–>text

具體數據大家壓測一下就知道了

五、參考文章

六、附上源碼

因爲只是個人研究和練手用到,代碼質量請各位大神忽略。

# -*-coding=utf-8 -*-
import psycopg2 as pg2
import cx_Oracle as oradb
import psycopg2.extras as pg2extra
# 解決讀取數據庫顯示不了中文的問題
import os
import datetime
import csv

# 顯示中文
os.environ['NLS_LANG'] = 'SIMPLIFIED CHINESE_CHINA.UTF8' 

class pg(object) :
    def __init__(self,pghost,pgport,pgdatabase,pguser,pgpassword):
        try:
            self.pgConn=pg2.connect(host=pghost,port=pgport,database=pgdatabase,user=pguser,password=pgpassword)
            print 'connect %s and %s successful!' %(pghost,pgdatabase)
            self.pgCur=self.pgConn.cursor()
        except Exception as e:
            print 'connect failed, cause by: %s %s' %('\n',e)

    def readAll(self,pgTable):
        sql='select * from '+pgTable+ ' order by 1 '
        #print sql
        self.pgCur.execute(sql)
        pgSet=self.pgCur.fetchall()
        #print pgSet
        self.output(pgSet)


    
     #普通方法逐條插入,最後提交事務,1萬條記錄約78秒   
    def fullInsert(self,pgTable,values):
        self.pgCur.execute('select count(*) from information_schema.columns where table_name=%s',(pgTable,) )
        cols=int(self.pgCur.fetchall()[0][0])
        parameters=''
        for i in range(cols):
             parameters=parameters+'%s,'
        parameters=parameters[:-1]
        #print parameters
        sql='insert into '+pgTable+' values ('+parameters+')'
        #print sql
        self.pgCur.execute(sql,values)
        #self.pgConn.commit()

     
     #pyconpg2.extras.execute_values方法批量插入,1萬條記錄約4秒

    def fullInsert2(self,pgTable,values):       
        sql='insert into '+pgTable+' values %s'
        try:
            pg2extra.execute_values(self.pgCur,sql,values)
        except Exception as e:
            print e

    def copyDataFrom(self,tabname,filepath):
        try:
            file=open(filepath,'r')
            print 'Start to COPY....'
            self.pgCur.copy_from(file,tabname,',',null='')
            self.pgConn.commit()
            print 'copy successful!'
        except Exception as e:
            print 'copy failed, cause by %s %s'%('\n',e)

    def execDDL(self,sql):
        self.pgCur.execute(sql)
        self.pgConn.commit()

    def output(self,pgset):
        for rows in pgset :
            for field in rows :
                print field,
            print 
    def commit(self):
        self.pgConn.commit()


class oracle(object):
    def __init__(self,orahost,oraport,oradatabase,orauser,orapassword):
        try:
            connectString=orauser+'/'+orapassword+'@'+orahost+':'+str(oraport)+'/'+oradatabase
            print connectString
            self.oraConn=oradb.connect(connectString,threaded=True)
            print 'connect %s and %s successful!' %(orahost,oradatabase)
            self.oraCur=self.oraConn.cursor()
            print 'cursor open!'
        except Exception as e:
            print 'connect failed, cause by: %s %s' %('\n',e)

    def readAll(self,oraTable,rownum):
        sql=''
        if rownum=='ALL':
            sql='select * from '+oraTable
        else:
            sql='select * from '+oraTable+ ' where rownum<='+str(rownum)+' order by 1 '
        #print sql
        print 'start read...'
        self.oraCur.execute(sql)
        print 'start fetch'
        oraSet=self.oraCur.fetchall()
        return oraSet
        #self.output(oraSet)
    #未完成
    def exportCsv(self,oraTable,rownum):
        sql=''
        if rownum=='ALL':
            sql='select * from '+oraTable
        else:
            sql='select * from '+oraTable+ ' where rownum<='+str(rownum)+' order by 1 '
        #print sql
        print 'start read...'
        self.oraCur.execute(sql)
        print 'start fetch'
        oraSet=self.oraCur.fetchall()
        return oraSet
    
    #生成oracle的表結構
    def genTable(self,owner,tablename):
        sql='''
        SELECT COLUMN_id ,column_name,data_type,data_length,data_precision,data_scale 
        from dba_tab_columns 
        where owner=:1 and table_name=:2
        order BY COLUMN_ID'''
        self.oraCur.execute(sql,(owner,tablename))
        rows=self.oraCur.fetchall()
        return rows

class oracle2pg(object):
    def __init__(self):
        pass
    #把oracle 的表結構 轉化到pg表結構格式
    def migrateSturct(self,tabstruct,targetDB,targetTable):
        pgstruct=[]
        pgnewstru=[]
        createTable='create table '+targetTable+'('
        for i in tabstruct:
            i=list(i)
            if i[2]=='NUMBER':
                i[2]='NUMERIC'
            elif i[2] in ('VARCHAR2','NVARCHAR2','CHAR2'):
                i[2]='VARCHAR'
            elif i[2] in ('DATE','TIMESTAMP(6)'):
                i[2]='TIMESTAMP WITHOUT TIME ZONE'
            pgstruct.append(i)
        for i in pgstruct:
            if i[2]=='NUMERIC':
                row=i[1]+' '+i[2]+'('+str(i[4])+','+str(i[5])+'),'
            elif i[2]=='VARCHAR':
                row=i[1]+' '+i[2]+'('+str(i[3])+')'+','
            elif i[2]=='TIMESTAMP WITHOUT TIME ZONE':
                row=i[1]+' '+i[2]+','
            pgnewstru.append(row)
        for i in pgnewstru:
            createTable=createTable+i
        createTable=createTable[:-1]+')'
        print createTable
        try:
            targetDB.execDDL(createTable)
            print 'create table successful!'
        except Exception as e:
            print 'Failed as %s %s'%('\n',e)
        #print createTable

    '''
    逐條插入
    '''
    def migraterows(self,srows,targetDB,targetTable):
        for row in srows:
            targetDB.fullInsert(targetTable,row)
        targetDB.commit()

    '''
    批量插入
    ''' 
    def migraterows2(self,srows,targetDB,targetTable):
        targetDB.fullInsert2(targetTable,srows)
        targetDB.commit()        



if __name__=='__main__':
   
    oraowner='TEST'
    tabname='ORDER'
    pg1=pg(pghost='192.168.0.1',pgport=5432,pgdatabase='test',pguser='pguser',pgpassword='password')
    ora1=oracle(orahost='192.168.0.2',oraport=1521,oradatabase='test',orauser='orauser',orapassword='password')
    #ora2pg1=oracle2pg()
    Start_time=datetime.datetime.now()
    #rows=ora1.readAll(oraowner+'.'+tabname,'5000000')
    #print 'read completed ,begin write to csv'
    #file=open('/work/data/so_master_new.csv','w')
    #csvWriter=csv.writer(file,dialect='excel')
    #csvWriter.writerows(rows)
    #file.close()
    #print rows
    #orastru=ora1.genTable(oraowner,tabname)
    #ora2pg1.migrateSturct(orastru,pg1,tabname)
    pg1.copyDataFrom(tabname,'/work/data/tabtest.csv')

    End_time=datetime.datetime.now()
    during_time=End_time-Start_time
    print during_time





發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章