在文件中查找重複的行並計算每行復制的時間長度?

本文翻譯自:Find duplicate lines in a file and count how many time each line was duplicated?

Suppose I have a file similar to the following: 假設我有一個類似於以下的文件:

123 
123 
234 
234 
123 
345

I would like to find how many times '123' was duplicated, how many times '234' was duplicated, etc. So ideally, the output would be like: 我想找出'123'重複多少次,重複'234'多少次等等。理想情況下,輸出結果如下:

123  3 
234  2 
345  1

#1樓

參考:https://stackoom.com/question/SAD7/在文件中查找重複的行並計算每行復制的時間長度


#2樓

To find and count duplicate lines in multiple files, you can try the following command: 要查找和計算多個文件中的重複行,可以嘗試以下命令:

sort <files> | uniq -c | sort -nr

or: 要麼:

cat <files> | sort | uniq -c | sort -nr

#3樓

This will print duplicate lines only , with counts: 這將僅打印重複行 ,計數:

sort FILE | uniq -cd

or, with GNU long options (on Linux): 或者,使用GNU長選項(在Linux上):

sort FILE | uniq --count --repeated

on BSD and OSX you have to use grep to filter out unique lines: BSD和OSX上,你必須使用grep來過濾掉唯一的行:

sort FILE | uniq -c | grep -v '^ *1 '

For the given example, the result would be: 對於給定的示例,結果將是:

  3 123
  2 234

If you want to print counts for all lines including those that appear only once: 如果要打印所有行的計數,包括僅出現一次的行:

sort FILE | uniq -c

or, with GNU long options (on Linux): 或者,使用GNU長選項(在Linux上):

sort FILE | uniq --count

For the given input, the output is: 對於給定的輸入,輸出爲:

  3 123
  2 234
  1 345

In order to sort the output with the most frequent lines on top, you can do the following (to get all results): 爲了對頂部最頻繁的行進行排序 ,您可以執行以下操作(以獲得所有結果):

sort FILE | uniq -c | sort -nr

or, to get only duplicate lines, most frequent first: 或者,爲了獲得重複的行,最常見的是:

sort FILE | uniq -cd | sort -nr

on OSX and BSD the final one becomes: 在OSX和BSD上,最後一個成爲:

sort FILE | uniq -c | grep -v '^ *1 ' | sort -nr

#4樓

Via : 通過

awk '{dups[$1]++} END{for (num in dups) {print num,dups[num]}}' data

In awk 'dups[$1]++' command, the variable $1 holds the entire contents of column1 and square brackets are array access. awk 'dups[$1]++'命令中,變量$1保存column1的全部內容,方括號是數組訪問。 So, for each 1st column of line in data file, the node of the array named dups is incremented. 因此,對於data文件中每行的第1列,名爲dups的數組的節點遞增。

And at the end, we are looping over dups array with num as variable and print the saved numbers first then their number of duplicated value by dups[num] . 最後,我們循環遍歷帶有num作爲變量的dups數組,然後首先打印保存的數字,然後按dups[num]打印它們的重複值。

Note that your input file has spaces on end of some lines, if you clear up those, you can use $0 in place of $1 in command above :) 請注意,您的輸入文件在某些​​行的末尾有空格,如果您清除它們,您可以在上面的命令中使用$0代替$1 :)


#5樓

In windows using "Windows PowerShell" I used the command mentioned below to achieve this 在使用“Windows PowerShell”的Windows中,我使用下面提到的命令來實現此目的

Get-Content .\file.txt | Group-Object | Select Name, Count

Also we can use the where-object Cmdlet to filter the result 我們也可以使用where-object Cmdlet來過濾結果

Get-Content .\file.txt | Group-Object | Where-Object { $_.Count -gt 1 } | Select Name, Count

#6樓

Assuming there is one number per line: 假設每行有一個數字:

sort <file> | uniq -c

You can use the more verbose --count flag too with the GNU version, eg, on Linux: 您可以在GNU版本中使用更詳細的--count標誌,例如,在Linux上:

sort <file> | uniq --count
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章