Python數據庫基本操作

原創

2018-11-04 12:11

最近有個項目由於版本原因使某些重要數據丟失、未存儲到數據庫中，需要從線上故障日誌中統計這些數據，並將統計結果輸出並補償數據庫。剛開始對python並不是很熟練，最先在服務器上採用grep|awk等命令獲取想要的格式化數據文件後，由於數據冗餘，所以手動採取措施顯然不現實，因此想到用Java來讀取文件並進行此數據操作，可代碼寫下來發現太繁雜了，不適合做日誌數據處理。經Leader提醒，毅然決定採用python處理，代碼寫下來後，非常簡潔，讓我感受了python處理數據的高效性，本文貼上代碼，也算是做一個小總結或嘗試。

本次數據處理用到的python知識點有
1. Pandas結構化數據，並將數據分組、取最大值；
2. pymysql模塊操作mysql數據庫；
3. lambda函數表達式、map對iterables數據結構操作
備註：關於pandas的groupby和apply操作參考文章

# -*- coding: utf-8 -*-
"""
Created on Wed Apr  4 19:45:34 2018
pymysql處理mysql數據

"""
import pymysql as mysql
import pandas as pd

# 讀取數據，用兩個list存儲從文件中讀取的兩列數據
userIds = []
times = []
with open('1.file/attend_fail_2018-03-23-18-59.log', 'r', encoding='utf-8') as fi:
    for line in fi.readlines():
        # 去除行字符串右邊的空格
        if line.rstrip():
            temLineArray = line.split('     ')
            userIds.append(temLineArray[0].lstrip())
            times.append(temLineArray[1].lstrip())

# 將數據轉換爲DataFrame格式
attendPd = pd.DataFrame({
        'userTel': userIds,
        'attendTime': times
        })

# 取出每個userTel中的最大的attendTime值，並組成新的DataFrame
resultPd = attendPd.groupby('userTel').apply(lambda rowData : rowData[rowData.attendTime==rowData.attendTime.max()])
# 新建一列初始化值
resultPd['userId'] = 0

# 操作數據庫
db_config = {
  'host': 'localhost',
  'port': 3306,
  'user': 'root',
  'passwd': 'root',
  'db' : 'test',
  'charset': 'utf8'}

conn = mysql.connect(**db_config)

# 獲取遊標操作，即數據庫的指針
cur = conn.cursor()
userTels =', '.join(map(lambda x: "'%s'" % x, list(resultPd['userTel'])))
sql = " select a.employee_id, a.tel from t_test_info a where a.tel IN (%s)" % (userTels)
fo = open('1.file/result_2018_04_04.sql', 'w')
try:
    cur.execute(sql)
    resDb = cur.fetchall()
    for line in resDb:
        for index, row in resultPd.iterrows():
            if row['userTel'] == line[1]:
                # resultPd['userId'][index] = line[0]  # 修改DataFrame中'userId'的值
                #print("({}, {})".format(line[0], row['attendTime']))
                # 將計算結果寫到文件中
                fo.write("({}, {}),\n".format(line[0], row['attendTime']))
    conn.commit()
except:
    import traceback
    traceback.print_exc()
    conn.rollback()
finally:
    cur.close()
    conn.close()
    fo.close()

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Python數據庫基本操作

35K*14 薪，入職了！這公司只要不裁員，我能一直呆下去！

Spring Security Oauth2實踐(3) - 單點登錄（SSO）

Spring Security Oauth2實踐(1) - 授權碼模式

利用jstack工具分析JVM線程

Spring Security Oauth2實踐(2) - 客戶端對接

算法練習_LeetCode_鏈表1

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結