Go如何按行讀取文件及bufio.Split()函數的使用

原創

2020-02-21 15:23

　　最近初接觸了go這門語言，爲了更加深入學習，完成了一個項目。將一個c語言實現的linux讀取文件行命令程序修改爲go語言實現。以下是項目地址：　　
原項目：https://www.ibm.com/developerworks/cn/linux/shell/clutil/index.html
go實現的項目：https://github.com/kangbb/go-learning/tree/master/selpg
　　
　　當然，項目做完後覺得很簡單，不過，過程中還是遇到了不少麻煩，尤其是讀寫文件的時候。所以，對這方面的內容做了一下總結。

Go如何打開文件

　　主要有下面兩個函數：

import "os"
//1
func Open(name string) (file *File, err error)

//2
func OpenFile(name string, flag int, perm FileMode) (file *File, err error)

　　其實兩個函數差不多，一般來說，使用第一個就可以完成正常的讀寫。當然，更加推薦使用第二個，尤其是在linux下，有時候perm是必需的（例如創建文件的時候）。name是文件的地址及文件名，flag是一些打開文件選項的常量，perm是文件操作權限。更多請參考：
flag: https://go-zh.org/pkg/os/
perm:https://go-zh.org/pkg/os/#FileMode

Go如何按行讀取文件

　　go按行讀取文件主要有三種方式。前兩種相對簡單，第三種會比較難一些，但是我覺得用途更見廣泛，用起來更見自由。

第一種方式：

import (
"bufio"
"fmt"
)

func useNewReader(filename string) {
    var count int = 0

    fin, error := os.OpenFile(filename, os.O_RDONLY, 0)
    if error != nil {
        panic(error)
    }
    defer fin.Close()

    /*create a Reader*/
    rd := bufio.NewReader(fin)

    /*read the file and stop when meet err or EOF*/
    for {
        line, err := rd.ReadString('\n')
        if err != nil || err == io.EOF {
            break
        }
        count++
        /*for each line, process it.
          if you want it ouput format in command-line, you need clean the '\f'*/
        line = strings.Replace(line, "\f", "", -1)
        fmt.Printf("the line %d: %s", count, line)
    }
}

第二種方式：

import(
  "fmt"
  "os"
  "bufio"
)

func useNewScanner(filename string) {
    var count int = 0

    fin, error := os.OpenFile(filename, os.O_RDONLY, 0)
    if error != nil {
        panic(error)
    }
    defer fin.Close()

    sc := bufio.NewScanner(fin)
    /*default split the file use '\n'*/
    for sc.Scan() {
        count++
        fmt.Printf("the line %d: %s\n", count, sc.Text())
    }
    if err := sc.Err(); err != nil{
        fmt.Prinfln("An error has hippened")
    }
}

第三種方式：

import(
  "fmt"
  "os"
  "bufio"
)
var LineSplit = func(data []byte, atEOF bool) (advance int, token []byte, err error) {
    /*read some*/
    if atEOF && len(data) == 0 {
        return 0, nil, nil
    }

    /*find the index of the byte '\n'
      and find another line begin i+1
      default token doesn't include '\n'*/
    if i := bytes.IndexByte(data, '\n'); i > 0 {
        return i + 1, dropCR(data[0:i]), nil
    }

    /*at EOF, we have a final, non-terminal line*/
    if atEOF {
        return len(data), dropCR(data), nil
    }

    /*read some more*/
    return 0, nil, nil
}

func dropCR(data []byte) []byte {
    /*drop the '\f'
      if you don't need, you can delete it*/
    if i := bytes.IndexByte(data, '\f'); i >= 0 {
        tmp := [][]byte{data[0:i], data[(i + 1):]}
        sep := []byte("")
        data = bytes.Join(tmp, sep)
    }
    if len(data) > 0 && data[len(data)-1] == '\r' {
        return data[0 : len(data)-1]
    }
    return data
}

func useSplit(filename string) {
    var count int = 0

    fin, error := os.OpenFile(filename, os.O_RDONLY, 0)
    if error != nil {
        panic(error)
    }
    defer fin.Close()

    sc := bufio.NewScanner(fin)
    /*Specifies the matching function, default read by lines*/
    sc.Split(LineSplit)
    /*begin scan*/
    for sc.Scan() {
        count++
        fmt.Printf("the line %d: %s\n", count, sc.Text())
    }
    if err := sc.Err(); err != nil{
        fmt.Prinfln("An error has hippened")
    }
}

　　整體看起來，第二種方法可能更加簡單。因爲它的代碼最少。實際上，第三種和第二種一樣，不過過換了一種寫法。因爲Scanner.Scan()默認按行讀取，所以第二種方法中省略了：

  sc.Split(bufio.ScanLines)

　　如果你認真看過源碼，你會發現，我的第三種方式實現的按行讀取的LineSplit函數，實際上來自go的bufio.go包，以下是地址：
　　
bufio.go：https://go-zh.org/src/bufio/bufio.go

這裏之所以列出來，是希望能夠着重強調一下如何使用Scanner.Split()函數和Scanner.Scan()函數搭配來讀取文件，或者分割字符串並輸出。當然，這些也可以通過strings包來實現，相對來說，它的功能更加全面一些。需要向大家強調一點：當你不會使用go語言函數時，可以多看看官方提供的文檔以及包中的源代碼。

Go如何按頁讀取文件

　　既然按行讀取已經實現，那麼按頁讀取也很簡單啦。只需要稍微修改依稀代碼即可實現。
　　
對於第一種方式：

page, err := rd.ReadString('\f')

當然，這樣一般情況下就可以了。但是，對於Reader.ReadString()函數來說，假如入它找不到分隔符’\f’，會再多讀一些字節數據進去；假如再找不到，會再讀一些，直到緩存區滿，才返回已經讀取的字節數據；但是，假如一直到遇到EOF都沒有’\f’，並且緩存去足夠大，這時候，如果按照剛剛的修改，就再屏幕上看不到輸出了，所以，還需要修改：

if err != nil || err == io.EOF {
    /*if it has no '\f' behind the last line*/
    if err == io.EOF && len(page) != 0 {
        count++
        fmt.Printf("the page %d:\n%s\n", count, page)
    }
    break
}

對於第三種方式：

var LineSplit = func(data []byte, atEOF bool) 
    (advance int, token []byte, err error) {
    /*read some*/
    if atEOF && len(data) == 0 {
        return 0, nil, nil
    }

    /*find the index of the byte '\f'
      and find another line begin i+1
      default token doesn't include '\n'*/
    if i := bytes.IndexByte(data, '\f'); i > 0 {
        return i + 1, dropCR(data[0:i]), nil
    }

    /*at EOF, we have a final, non-terminal line*/
    if atEOF {
        return len(data), dropCR(data), nil
    }

    /*read some more*/
    return 0, nil, nil
}

func dropCR(data []byte) []byte {
    /*drop the '\f'
      if you don't need, you can delete it*/
    if i := bytes.IndexByte(data, '\f'); i >= 0 {
        tmp := [][]byte{data[0:i], data[(i + 1):]}
        sep := []byte("")
        data = bytes.Join(tmp, sep)
    }
    return data
}

　　以上便是對這次學習的總結，也希望對大家有所幫助。

源碼下載地址：https://github.com/kangbb/go-learning/tree/master/readfile

Kiloveyousmile

發佈了42 篇原創文章 · 獲贊 49 · 訪問量 12萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Go如何按行讀取文件及bufio.Split()函數的使用

Go如何打開文件

Go如何按行讀取文件

Go如何按頁讀取文件

《Python進階》學習筆記

Leetcode 3161. 物塊放置查詢

leetcode 60 排列序列

一個docker容器暴露多個端口

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

操作系統實驗七之有限緩衝問題

多週期CPU實驗

Python Scrapy學習之pipelines不能保存數據到文件問題

Cache控制器的設計

ubuntu安裝程序常用方式及指令

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結