一、案例

從商業數據庫的使用轉移到開源數據庫是目前的潮流，所以我也不能免俗，在工作之中，抽出一點時間研究了一下，從oracle到pg到步驟。

二、問題

從oracle 到 pg 要解決一系列的問題，如：

在pg中使用什麼架構能夠實現oracle rac時的同樣架構？
oracle 中的sql/plsql代碼如何改造？
oracle 與 pg 的數據類型如何對應？
如何把oracle的數據遷移到pg上？

除了以上的問題，相信還有好多不同的問題，但是本記錄中，我會更聚焦於“如何把oracle的數據遷移到pg上？”

三、研究分析

3.1 問題拆解

抽取oracle的表結構

pg上重構結構

抽取oracle的數據

轉換到pg數據類型格式

插入到pg

在後面到實驗中一般的數據讀取出來後，基本上是不需要轉換，直接就可以insert、copy回去了。

3.2 工具調研

ora2pg
pgloader
使用python自研一個

爲了加深理解，我覺的自己用python寫一個會好一點，所以我選擇第三項。

3.3 編程準備

使用python編寫一個數據遷移的程序，需要用到以下到包

cx_oracle
psycopg2 (安裝過程略，我是先安裝了postgresql到本機上，再用pip install psycopg2-binary ，完美安裝，沒有報錯)
csv

3.4 編程要點

3.4.1 psycopg2關鍵api解釋

psycopg2提夠很多高效多api給我們完成數據插入到pg到工作，舉例如下：

execute_values

execute_values(cursor,sql，values)
cursor:顧名思義就是pg connect創建的遊標。
sql：顧名思義就是要執行的sql，但這裏有個特色，一個佔位符，就可以代表所有參數變量，超級好用，舉例：insert into table1 values %s 
values:可以一維數組插入一條記錄，也可以是二維數組把多條記錄批量插入。
代碼示例：
sql='insert into '+pgTable+' values %s'
values=[[1,'a','b'],[2,'c','d']]
psycopg2.extras.execute_values(pgconn,sql,values)
pgconn.commit()

copy_from

copy_from(file, table, sep='\t', null='\\N', size=8192, columns=None)
Read data from the file-like object file appending them to the table named table.

Parameters:	
file – file-like object to read data from. It must have both read() and readline() methods.
table – name of the table to copy data into.
sep – columns separator expected in the file. Defaults to a tab.
null – textual representation of NULL in the file. The default is the two characters string \N.
size – size of the buffer used to read from the file.
columns – iterable with name of the columns to import. The length and types should match the content of the file to read. If not specified, it is assumed that the entire table matches the file structure.
示例：
  def copyDataFrom(self,tabname,filepath):
        try:
            file=open(filepath,'r')
            print 'Start to COPY....'
            self.pgCur.copy_from(file,tabname,',',null='')
            self.pgConn.commit()
            print 'copy successful!'
        except Exception as e:
            print 'copy failed, cause by %s %s'%('\n',e)
特別提醒兩個參數：sep分隔符，默認是tab，如果是逗號，就要改成sep=',' 
另一個是null你的文件中用什麼符合代表null，默認是兩個空格，我的是沒有空格，對應就是null=''

3.4.2 CSV庫對使用

非常方便用於生成csv格式，這個是用來把oracle的數據保存到csv中，然後使用copy命令導入到pg,以下是示例代碼

file=open('/work/data/tabtest.csv','w')
csvWriter=csv.writer(file,dialect='excel')
csvWriter.writerows(rows)
注意，writerows是把二維數組插入到csv中
writerow是把一緯數組插入到csv中

四、結論

使用copy_from 速度> execute_values > execute
oracle to pg 簡單數據類型對應如下：

number -> numeric;
VARCHAR2,NVARCHAR2,NVARCHAR-> varchar;
date ,timestamp--> TIMESTAMP WITHOUT time zone ;

blob–>bytea

cblob–>text

具體數據大家壓測一下就知道了

五、參考文章

六、附上源碼

因爲只是個人研究和練手用到，代碼質量請各位大神忽略。

# -*-coding=utf-8 -*-
import psycopg2 as pg2
import cx_Oracle as oradb
import psycopg2.extras as pg2extra
# 解決讀取數據庫顯示不了中文的問題
import os
import datetime
import csv

# 顯示中文
os.environ['NLS_LANG'] = 'SIMPLIFIED CHINESE_CHINA.UTF8' 

class pg(object) :
    def __init__(self,pghost,pgport,pgdatabase,pguser,pgpassword):
        try:
            self.pgConn=pg2.connect(host=pghost,port=pgport,database=pgdatabase,user=pguser,password=pgpassword)
            print 'connect %s and %s successful!' %(pghost,pgdatabase)
            self.pgCur=self.pgConn.cursor()
        except Exception as e:
            print 'connect failed, cause by: %s %s' %('\n',e)

    def readAll(self,pgTable):
        sql='select * from '+pgTable+ ' order by 1 '
        #print sql
        self.pgCur.execute(sql)
        pgSet=self.pgCur.fetchall()
        #print pgSet
        self.output(pgSet)


    
     #普通方法逐條插入，最後提交事務，1萬條記錄約78秒   
    def fullInsert(self,pgTable,values):
        self.pgCur.execute('select count(*) from information_schema.columns where table_name=%s',(pgTable,) )
        cols=int(self.pgCur.fetchall()[0][0])
        parameters=''
        for i in range(cols):
             parameters=parameters+'%s,'
        parameters=parameters[:-1]
        #print parameters
        sql='insert into '+pgTable+' values ('+parameters+')'
        #print sql
        self.pgCur.execute(sql,values)
        #self.pgConn.commit()

     
     #pyconpg2.extras.execute_values方法批量插入，1萬條記錄約4秒

    def fullInsert2(self,pgTable,values):       
        sql='insert into '+pgTable+' values %s'
        try:
            pg2extra.execute_values(self.pgCur,sql,values)
        except Exception as e:
            print e

    def copyDataFrom(self,tabname,filepath):
        try:
            file=open(filepath,'r')
            print 'Start to COPY....'
            self.pgCur.copy_from(file,tabname,',',null='')
            self.pgConn.commit()
            print 'copy successful!'
        except Exception as e:
            print 'copy failed, cause by %s %s'%('\n',e)

    def execDDL(self,sql):
        self.pgCur.execute(sql)
        self.pgConn.commit()

    def output(self,pgset):
        for rows in pgset :
            for field in rows :
                print field,
            print 
    def commit(self):
        self.pgConn.commit()


class oracle(object):
    def __init__(self,orahost,oraport,oradatabase,orauser,orapassword):
        try:
            connectString=orauser+'/'+orapassword+'@'+orahost+':'+str(oraport)+'/'+oradatabase
            print connectString
            self.oraConn=oradb.connect(connectString,threaded=True)
            print 'connect %s and %s successful!' %(orahost,oradatabase)
            self.oraCur=self.oraConn.cursor()
            print 'cursor open!'
        except Exception as e:
            print 'connect failed, cause by: %s %s' %('\n',e)

    def readAll(self,oraTable,rownum):
        sql=''
        if rownum=='ALL':
            sql='select * from '+oraTable
        else:
            sql='select * from '+oraTable+ ' where rownum<='+str(rownum)+' order by 1 '
        #print sql
        print 'start read...'
        self.oraCur.execute(sql)
        print 'start fetch'
        oraSet=self.oraCur.fetchall()
        return oraSet
        #self.output(oraSet)
    #未完成
    def exportCsv(self,oraTable,rownum):
        sql=''
        if rownum=='ALL':
            sql='select * from '+oraTable
        else:
            sql='select * from '+oraTable+ ' where rownum<='+str(rownum)+' order by 1 '
        #print sql
        print 'start read...'
        self.oraCur.execute(sql)
        print 'start fetch'
        oraSet=self.oraCur.fetchall()
        return oraSet
    
    #生成oracle的表結構
    def genTable(self,owner,tablename):
        sql='''
        SELECT COLUMN_id ,column_name,data_type,data_length,data_precision,data_scale 
        from dba_tab_columns 
        where owner=:1 and table_name=:2
        order BY COLUMN_ID'''
        self.oraCur.execute(sql,(owner,tablename))
        rows=self.oraCur.fetchall()
        return rows

class oracle2pg(object):
    def __init__(self):
        pass
    #把oracle 的表結構 轉化到pg表結構格式
    def migrateSturct(self,tabstruct,targetDB,targetTable):
        pgstruct=[]
        pgnewstru=[]
        createTable='create table '+targetTable+'('
        for i in tabstruct:
            i=list(i)
            if i[2]=='NUMBER':
                i[2]='NUMERIC'
            elif i[2] in ('VARCHAR2','NVARCHAR2','CHAR2'):
                i[2]='VARCHAR'
            elif i[2] in ('DATE','TIMESTAMP(6)'):
                i[2]='TIMESTAMP WITHOUT TIME ZONE'
            pgstruct.append(i)
        for i in pgstruct:
            if i[2]=='NUMERIC':
                row=i[1]+' '+i[2]+'('+str(i[4])+','+str(i[5])+'),'
            elif i[2]=='VARCHAR':
                row=i[1]+' '+i[2]+'('+str(i[3])+')'+','
            elif i[2]=='TIMESTAMP WITHOUT TIME ZONE':
                row=i[1]+' '+i[2]+','
            pgnewstru.append(row)
        for i in pgnewstru:
            createTable=createTable+i
        createTable=createTable[:-1]+')'
        print createTable
        try:
            targetDB.execDDL(createTable)
            print 'create table successful!'
        except Exception as e:
            print 'Failed as %s %s'%('\n',e)
        #print createTable

    '''
    逐條插入
    '''
    def migraterows(self,srows,targetDB,targetTable):
        for row in srows:
            targetDB.fullInsert(targetTable,row)
        targetDB.commit()

    '''
    批量插入
    ''' 
    def migraterows2(self,srows,targetDB,targetTable):
        targetDB.fullInsert2(targetTable,srows)
        targetDB.commit()        



if __name__=='__main__':
   
    oraowner='TEST'
    tabname='ORDER'
    pg1=pg(pghost='192.168.0.1',pgport=5432,pgdatabase='test',pguser='pguser',pgpassword='password')
    ora1=oracle(orahost='192.168.0.2',oraport=1521,oradatabase='test',orauser='orauser',orapassword='password')
    #ora2pg1=oracle2pg()
    Start_time=datetime.datetime.now()
    #rows=ora1.readAll(oraowner+'.'+tabname,'5000000')
    #print 'read completed ,begin write to csv'
    #file=open('/work/data/so_master_new.csv','w')
    #csvWriter=csv.writer(file,dialect='excel')
    #csvWriter.writerows(rows)
    #file.close()
    #print rows
    #orastru=ora1.genTable(oraowner,tabname)
    #ora2pg1.migrateSturct(orastru,pg1,tabname)
    pg1.copyDataFrom(tabname,'/work/data/tabtest.csv')

    End_time=datetime.datetime.now()
    during_time=End_time-Start_time
    print during_time

2019-11-12 kk日記，使用python完成ora2pg的工作小結

一、案例

二、問題

三、研究分析

3.1 問題拆解

3.2 工具調研

3.3 編程準備

3.4 編程要點

3.4.1 psycopg2關鍵api解釋

3.4.2 CSV庫對使用

四、結論

五、參考文章

六、附上源碼

linux安裝cuda和cudnn

Mellanox網卡開啓SR-IOV

模擬手機設備：使用 Playwright 實現移動端自動化測試

全面系統的AI學習路徑，幫助普通人也能玩轉AI

HTML 00 Tutorial

從零開始：使用 Playwright 腳本錄製實現自動化測試

uni-app實現上拉加載

vue3編譯優化之“靜態提升”

又是一個月-20240513

flask 如何保證返回json有序

2019-11-12 kk日記，使用python完成ora2pg的工作小結

2019-09-12 KK日記，oracle 19c 容器數據初體驗

RocketMQ DLedger 多副本即主從切換配置

2020-01-03 KK日記，第一次進行postgresql 11.5+pgpool 安裝

2017-08-08 DBA日記，使用python模擬高併發訪問數據庫

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結