逐行读取文本文件的最快方法是什么?

本文翻译自:What's the fastest way to read a text file line-by-line?

I want to read a text file line by line. 我想逐行阅读文本文件。 I wanted to know if I'm doing it as efficiently as possible within the .NET C# scope of things. 我想知道我是否在.NET C#范围内尽可能高效地执行此操作。

This is what I'm trying so far: 到目前为止,我正在尝试以下操作:

var filestream = new System.IO.FileStream(textFilePath,
                                          System.IO.FileMode.Open,
                                          System.IO.FileAccess.Read,
                                          System.IO.FileShare.ReadWrite);
var file = new System.IO.StreamReader(filestream, System.Text.Encoding.UTF8, true, 128);

while ((lineOfText = file.ReadLine()) != null)
{
    //Do something with the lineOfText
}

#1楼

参考:https://stackoom.com/question/XioA/逐行读取文本文件的最快方法是什么


#2楼

Use the following code: 使用以下代码:

foreach (string line in File.ReadAllLines(fileName))

This was a HUGE difference in reading performance. 这是阅读性能的巨大差异。

It comes at the cost of memory consumption, but totally worth it! 它以消耗内存为代价,但是完全值得!


#3楼

There's a good topic about this in Stack Overflow question Is 'yield return' slower than "old school" return? 关于Stack Overflow问题,有一个很好的话题:“收益收益”比“旧派”收益慢吗? .

It says: 它说:

ReadAllLines loads all of the lines into memory and returns a string[]. ReadAllLines将所有行加载到内存中并返回一个string []。 All well and good if the file is small. 如果文件很小,一切都很好。 If the file is larger than will fit in memory, you'll run out of memory. 如果文件大于内存大小,则会用完内存。

ReadLines, on the other hand, uses yield return to return one line at a time. 另一方面,ReadLines使用yield return一次返回一行。 With it, you can read any size file. 有了它,您可以读取任何大小的文件。 It doesn't load the whole file into memory. 它不会将整个文件加载到内存中。

Say you wanted to find the first line that contains the word "foo", and then exit. 假设您要查找包含单词“ foo”的第一行,然后退出。 Using ReadAllLines, you'd have to read the entire file into memory, even if "foo" occurs on the first line. 使用ReadAllLines,即使第一行出现“ foo”,也必须将整个文件读入内存。 With ReadLines, you only read one line. 使用ReadLines,您只能读取一行。 Which one would be faster? 哪一个会更快?


#4楼

While File.ReadAllLines() is one of the simplest ways to read a file, it is also one of the slowest. 尽管File.ReadAllLines()是读取文件的最简单方法之一,但它也是最慢的一种方法。

If you're just wanting to read lines in a file without doing much, according to these benchmarks , the fastest way to read a file is the age old method of: 根据这些基准 ,如果您只是想读取文件中的行而没有做太多事情,则读取文件的最快方法是使用以下方法:

using (StreamReader sr = File.OpenText(fileName))
{
        string s = String.Empty;
        while ((s = sr.ReadLine()) != null)
        {
               //do minimal amount of work here
        }
}

However, if you have to do a lot with each line, then this article concludes that the best way is the following (and it's faster to pre-allocate a string[] if you know how many lines you're going to read) : 但是,如果您必须对每一行做很多事情,那么本文总结出最好的方法是执行以下操作(如果您知道要读取多少行,则预分配string []会更快):

AllLines = new string[MAX]; //only allocate memory here

using (StreamReader sr = File.OpenText(fileName))
{
        int x = 0;
        while (!sr.EndOfStream)
        {
               AllLines[x] = sr.ReadLine();
               x += 1;
        }
} //Finished. Close the file

//Now parallel process each line in the file
Parallel.For(0, AllLines.Length, x =>
{
    DoYourStuff(AllLines[x]); //do your work here
});

#5楼

If you're using .NET 4, simply use File.ReadLines which does it all for you. 如果您使用的是.NET 4,则只需使用File.ReadLines即可为您完成所有工作。 I suspect it's much the same as yours, except it may also use FileOptions.SequentialScan and a larger buffer (128 seems very small). 我怀疑它与您的大同小异 ,只是它可能还使用FileOptions.SequentialScan和更大的缓冲区(128个似乎很小)。


#6楼

If you have enough memory, I've found some performance gains by reading the entire file into a memory stream , and then opening a stream reader on that to read the lines. 如果您有足够的内存,则可以通过将整个文件读取到内存流中 ,然后在其上打开流读取器来读取行,从而发现性能有所提高。 As long as you actually plan on reading the whole file anyway, this can yield some improvements. 只要您实际计划以任何方式读取整个文件,这都会带来一些改进。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章