Python創建並寫入訓練數據到xlsx文件

文章目錄

Python創建並寫入訓練數據到xlsx文件

前言

訓練模型後，總是需要測試並保存評估結果，之前一直是將各個數據集和指標對應的結果組成一個字典，直接構造到f-string輸出到txt文件中，這樣雖然方便，但是卻也導致想要使用excel處理數據的時候，來回的複製粘貼。爲了愛護雙手，愛護指關節，還是應該嘗試下更直接的，利用python的第三方庫來寫入數據到excel的xlsx文件。下面是這個過程的一些記錄。

`Xlwings`&`openpyxl`

我找了很多帖子，各種庫推薦的都有，最開始看到這篇文章：https://zhuanlan.zhihu.com/p/54003662，準備嘗試下Xlwings，但是我的簡單嘗試中發現，這個庫子在linux上有點問題，而且官方也沒有提在linux上的使用，給的安裝方式都僅僅是win和mac的，可見linux上的支持並不是很好。我最終選擇使用更爲常用的openpyxl來作爲處理的工具。

具體過程

關於一些函數的理解，主要參考了這篇文章：https://blog.csdn.net/weixin_43094965/article/details/82226263，在此表示感謝。

先放出我的代碼：

# 全局變量，提供一些必要的參數
dataset_list = ['lfsd', 'njud', 'nlpr', 'rgbd135', 'sip', 'ssd', 'stereo', 'dutrgbd']
dataset_num_list = [100, 500, 300, 135, 929, 80, 1000, 400]
metric_list = ['MaxF', 'MeanF', 'MAE', 'SM', 'EM']
xlsx_path = os.path.join(path_config['ckpt_path'], 'records.xlsx')

def pre_mkdir():
    """訓練模型之前的必要文件的檢查操作"""
    ...
    if not os.path.exists(xlsx_path):
        make_xlsx(xlsx_path)

def make_xlsx(path):
    """
    創建xlsx文件，並向其中寫入部分公用數據
    
    :param path: xlsx文件路徑，這裏要提供完整路徑
    """
    
    num_metrics = len(metric_list)
    num_datasets = len(dataset_list)
    
    # 創建一個Workbook對象
    wb = Workbook()
    # 創建一個Sheet對象
    sheet = wb.create_sheet(title="實驗結果統計", index=0)

    sheet['A1'] = 'name_dataset'
    sheet['A2'] = 'num_dataset'   
    for i, dataset_name in enumerate(dataset_list):
        if (i * num_metrics + 1) // 26 == 0:
            start_region_idx = f"{chr(ord('A') + (i * num_metrics + 1) % 26)}1"
        else:
            start_region_idx = (f"{chr(ord('A') + (i * num_metrics + 1) // 26 - 1)}"
                                f"{chr(ord('A') + (i * num_metrics + 1) % 26)}1")
        if ((i + 1) * num_metrics) // 26 == 0:
            end_region_idx = f"{chr(ord('A') + ((i + 1) * num_metrics) % 26)}1"
        else:
            end_region_idx = (f"{chr(ord('A') + ((i + 1) * num_metrics) // 26 - 1)}"
                              f"{chr(ord('A') + ((i + 1) * num_metrics) % 26)}1")
        region_idx = f"{start_region_idx}:{end_region_idx}"
        sheet.merge_cells(region_idx)  # 合併一行中的幾個單元格
        sheet[start_region_idx] = dataset_name
        
        # 構造第二行數據
        start_region_idx = start_region_idx.replace('1', '2')
        sheet[start_region_idx] = dataset_num_list[i]
    
    # 構造第三行數據
    third_row = ['metrics'] + metric_list * num_datasets
    sheet.append(third_row)
    
    # 最後保存workbook
    wb.save(path)


def write_xlsx(model_name, data):
    """
    向xlsx文件中寫入數據
    
    :param model_name: 模型名字
    :param data: 數據信息，包含數據集名字和對應的測試結果
    """
    
    num_metrics = len(metric_list)
    num_datasets = len(dataset_list)
        
    # 必須先得由前面的部分進行xlsx文件的創建，確保前三行OK滿足要求，後面的操作都是從第四行開始的
    wb = load_workbook(xlsx_path)
    assert "實驗結果統計" in wb.sheetnames, "請確保操作的是使用`make_xlsx`創建的xlsx文件"
    sheet = wb["實驗結果統計"]
    num_cols = num_metrics * num_datasets + 1
    idx_insert_row = len(sheet['A']) + 1
    
    sheet.cell(row=idx_insert_row, column=1, value=model_name)
    for dataset_name in data.keys():
        # 遍歷每個單元格
        for row in sheet.iter_rows(min_row=1, min_col=2, max_col=num_cols, max_row=1):
            for cell in row:
                if cell.value == dataset_name:
                    for i in range(num_metrics):
                        matric_name = sheet.cell(row=3, column=cell.column + i).value
                        sheet.cell(row=idx_insert_row, column=cell.column + i,
                                   value=data[dataset_name][matric_name])
    wb.save(xlsx_path)

這裏主要寫了兩個函數，一個是make_xlsx另一個是write_xlsx，接下來主要說明其中的一些關鍵內容。

需要明確一點，對於操作xlsx文件來說，實際上就是針對其中的各個位置（cell）賦值的過程。所以實際上這裏的操作很大部分就是在處理其中的單元格位置。

對於索引單元格有三種方法：

一種是直接使用excel中的索引方式[字母+數字]，這裏的索引可以利用f-string等字符串操作方式來構造特定的位置座標，例如：sheet['A1'] = 'name_dataset'
一種是使用指定行列的方式來索引：matric_name = sheet.cell(row=3, column=cell.column + i).value

通過遍歷的形式來索引，openpyxl提供了迭代器方法來遍歷行或者列，例如：

for row in sheet.iter_rows(min_row=1, min_col=2, max_col=num_cols, max_row=1):
        for cell in row:

對單元格賦值的亦是可以如是操作：

第一點差不多
第二點可有兩種形式：
- sheet.cell(row=3, column=cell.column + i).value = new_value
- sheet.cell(row=idx_insert_row, column=cell.column + i, value=new_value)

第三種可以如下形式，這裏的cell的修改，實際上會改變最終的sheet中的值，可以認爲是一個引用：

    for row in sheet.iter_rows(min_row=3, min_col=2, max_col=num_cols, max_row=3):
        for cell in row:
            cell.value = value

我們創建新的xlsx文件，需要先在表格中插入一些數據，主要是如下的類似形式：

可以看到，這裏面涉及到了單個單元格的處理，也涉及到了數個單元格合併後的處理。但是對於一行有規律的數據，若要直接挨個插入，這有點麻煩。openpyxl提供了直接插入一行的方法.append()，所以，對於可以很容易構造成一個完整列表的數據，我們是可以直接藉助該方法插入到表格中的。在make_xlsx()中，我使用了這樣的方式構造了第三行的數據：

    # 構造第三行數據
    third_row = ['metrics'] + metric_list * num_datasets
    sheet.append(third_row)

但是對於第一行的數據，我需要合併單元格操作，這裏主要使用瞭如下代碼：

        if (i * num_metrics + 1) // 26 == 0:
            start_region_idx = f"{chr(ord('A') + (i * num_metrics + 1) % 26)}1"
        else:
            start_region_idx = (f"{chr(ord('A') + (i * num_metrics + 1) // 26 - 1)}"
                                f"{chr(ord('A') + (i * num_metrics + 1) % 26)}1")
        if ((i + 1) * num_metrics) // 26 == 0:
            end_region_idx = f"{chr(ord('A') + ((i + 1) * num_metrics) % 26)}1"
        else:
            end_region_idx = (f"{chr(ord('A') + ((i + 1) * num_metrics) // 26 - 1)}"
                              f"{chr(ord('A') + ((i + 1) * num_metrics) % 26)}1")
        region_idx = f"{start_region_idx}:{end_region_idx}"
        sheet.merge_cells(region_idx)  # 合併一行中的幾個單元格
        sheet[start_region_idx] = dataset_name

根據參考文章，這裏可以使用merge_cells()，根據文檔：

def merge_cells(self,
                range_string: Any = None,
                start_row: Any = None,
                start_column: Any = None,
                end_row: Any = None,
                end_column: Any = None) -> None

可以知道，這裏是可以使用範圍字符串，即我這裏實際在make_xlsx()中使用的，也可以使用起始行列索引來指定合併範圍。

實際上，使用行列索引來指定應該是更靈活的，可以看到，我這裏因爲使用了範圍字符串，導致對於列索引超出Z的列，還得考慮進一步的“進位”，無形中造成了不必要的構造。

對於合併的單元格，若要向其中添加數據，則應該使用該區域的左上角的單元格座標。

參考鏈接

lart

發佈了153 篇原創文章 · 獲贊 72 · 訪問量 11萬+

私信關注

Python創建並寫入訓練數據到xlsx文件

Python創建並寫入訓練數據到xlsx文件

文章目錄

前言

`Xlwings`&`openpyxl`

具體過程

參考鏈接

顯著性目標檢測之Shifting More Attention to Video Salient Object Detection

自動更新Sublime Text 3的channel_v3.json

顯著性目標檢測之Towards High-Resolution Salient Object Detection

顯著性目標檢測之Selectivity or Invariance: Boundary-Aware Salient Object Detection

基礎知識之針孔相機模型

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Python創建並寫入訓練數據到xlsx文件

Python創建並寫入訓練數據到xlsx文件

文章目錄

前言

Xlwings&openpyxl

具體過程

參考鏈接

`Xlwings`&`openpyxl`