逐行讀取文本文件的最快方法是什麼?

本文翻譯自:What's the fastest way to read a text file line-by-line?

I want to read a text file line by line. 我想逐行閱讀文本文件。 I wanted to know if I'm doing it as efficiently as possible within the .NET C# scope of things. 我想知道我是否在.NET C#範圍內儘可能高效地執行此操作。

This is what I'm trying so far: 到目前爲止,我正在嘗試以下操作:

var filestream = new System.IO.FileStream(textFilePath,
                                          System.IO.FileMode.Open,
                                          System.IO.FileAccess.Read,
                                          System.IO.FileShare.ReadWrite);
var file = new System.IO.StreamReader(filestream, System.Text.Encoding.UTF8, true, 128);

while ((lineOfText = file.ReadLine()) != null)
{
    //Do something with the lineOfText
}

#1樓

參考:https://stackoom.com/question/XioA/逐行讀取文本文件的最快方法是什麼


#2樓

Use the following code: 使用以下代碼:

foreach (string line in File.ReadAllLines(fileName))

This was a HUGE difference in reading performance. 這是閱讀性能的巨大差異。

It comes at the cost of memory consumption, but totally worth it! 它以消耗內存爲代價,但是完全值得!


#3樓

There's a good topic about this in Stack Overflow question Is 'yield return' slower than "old school" return? 關於Stack Overflow問題,有一個很好的話題:“收益收益”比“舊派”收益慢嗎? .

It says: 它說:

ReadAllLines loads all of the lines into memory and returns a string[]. ReadAllLines將所有行加載到內存中並返回一個string []。 All well and good if the file is small. 如果文件很小,一切都很好。 If the file is larger than will fit in memory, you'll run out of memory. 如果文件大於內存大小,則會用完內存。

ReadLines, on the other hand, uses yield return to return one line at a time. 另一方面,ReadLines使用yield return一次返回一行。 With it, you can read any size file. 有了它,您可以讀取任何大小的文件。 It doesn't load the whole file into memory. 它不會將整個文件加載到內存中。

Say you wanted to find the first line that contains the word "foo", and then exit. 假設您要查找包含單詞“ foo”的第一行,然後退出。 Using ReadAllLines, you'd have to read the entire file into memory, even if "foo" occurs on the first line. 使用ReadAllLines,即使第一行出現“ foo”,也必須將整個文件讀入內存。 With ReadLines, you only read one line. 使用ReadLines,您只能讀取一行。 Which one would be faster? 哪一個會更快?


#4樓

While File.ReadAllLines() is one of the simplest ways to read a file, it is also one of the slowest. 儘管File.ReadAllLines()是讀取文件的最簡單方法之一,但它也是最慢的一種方法。

If you're just wanting to read lines in a file without doing much, according to these benchmarks , the fastest way to read a file is the age old method of: 根據這些基準 ,如果您只是想讀取文件中的行而沒有做太多事情,則讀取文件的最快方法是使用以下方法:

using (StreamReader sr = File.OpenText(fileName))
{
        string s = String.Empty;
        while ((s = sr.ReadLine()) != null)
        {
               //do minimal amount of work here
        }
}

However, if you have to do a lot with each line, then this article concludes that the best way is the following (and it's faster to pre-allocate a string[] if you know how many lines you're going to read) : 但是,如果您必須對每一行做很多事情,那麼本文總結出最好的方法是執行以下操作(如果您知道要讀取多少行,則預分配string []會更快):

AllLines = new string[MAX]; //only allocate memory here

using (StreamReader sr = File.OpenText(fileName))
{
        int x = 0;
        while (!sr.EndOfStream)
        {
               AllLines[x] = sr.ReadLine();
               x += 1;
        }
} //Finished. Close the file

//Now parallel process each line in the file
Parallel.For(0, AllLines.Length, x =>
{
    DoYourStuff(AllLines[x]); //do your work here
});

#5樓

If you're using .NET 4, simply use File.ReadLines which does it all for you. 如果您使用的是.NET 4,則只需使用File.ReadLines即可爲您完成所有工作。 I suspect it's much the same as yours, except it may also use FileOptions.SequentialScan and a larger buffer (128 seems very small). 我懷疑它與您的大同小異 ,只是它可能還使用FileOptions.SequentialScan和更大的緩衝區(128個似乎很小)。


#6樓

If you have enough memory, I've found some performance gains by reading the entire file into a memory stream , and then opening a stream reader on that to read the lines. 如果您有足夠的內存,則可以通過將整個文件讀取到內存流中 ,然後在其上打開流讀取器來讀取行,從而發現性能有所提高。 As long as you actually plan on reading the whole file anyway, this can yield some improvements. 只要您實際計劃以任何方式讀取整個文件,這都會帶來一些改進。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章