Linux下常用的文本分析工具

在linux下用vim創建一個test文本文件用來測試，編輯文件，輸入以下內容：
hello:world:name:phone
what’s your name
you should work hard or you will failure
phone number:162-8990-0988
This dish tast good
too many problem need solve
this is a test file,write some words to test

1、 grep

grep分析一行信息，若當中有我們所需要的信息，就將該行拿出來。簡單的語法如下：
grep [-acinv] [–color=auto] ‘查找字符’ filename
選項與參數：

-a：將二進制文件以文本文件的方式查找數據
-c：計算找到查找字符的次數
-i：忽略大小寫的不同
-n：輸出行號
-v：反向選擇，亦即顯示出沒有‘查找字符’內容的那一行
--color：將找到的關鍵字部分加上顏色顯示

[範例1]查找含有‘test’字符的數據行
vagrant@vagrant-ubuntu-trusty-64:~$ grep test test
輸出：this is a test file

[範例2]統計含有‘name’字符的數據行數
vagrant@vagrant-ubuntu-trusty-64:~$ grep -c name test
輸出：2

[範例3]利用中括號查找集合字符
vagrant@vagrant-ubuntu-trusty-64:~$ grep -n 't[ae]st' tes
輸出：
5:This dish  tast good
6:this is a test file,write some  words to test
vagrant@vagrant-ubuntu-trusty-64:~$ grep -n '[^t]oo' test
輸出：5:This dish  tast good

[範例4]行首與行位字符^$
vagrant@vagrant-ubuntu-trusty-64:~$ grep -n '^this' test
輸出：7:this is a test file,write some  words to test
vagrant@vagrant-ubuntu-trusty-64:~$ grep -n '^$' test
輸出：空白行

2、 egrep

等價於grep -E 命令，grep默認支持基礎正則表達式，egrep支持擴展型的正則表達式，支持一下RE字符：+、？、|、（）、（）+

3、 awk

awk是一個很好地數據處理工具，sed常常作用於一整個行的處理，awk則比較傾向於一行當中分成數個字段來處理。awk以行爲一次處理的單位，爲以字段爲最小的處理單位。因此awk相當適合處理小型的文本數據。
awk常見的運行模式是這樣的：
awk ‘條件類型1{操作1} 條件類型2{操作2}’ filename
awk用$1、$2…來標識每一行數據中的每一列，列之間的分隔以空格或tab鍵來隔開。特殊的是，$0代表的是整行數據，此外還有以下內置變量：

變量名稱	意義
NF	每一行的字段總數
NR	目前awk處理的是第幾行數據
FS	目前的分隔字符，默認是空格鍵

[範例1：]打印字段總數，第幾行數據，和第一列。
vagrant@vagrant-ubuntu-trusty-64:~$ last|awk '{print $1 "\t  lines: " NR "\t columns: " NF}'
輸出：
vagrant	  lines: 1	 columns: 10
reboot	  lines: 2	 columns: 11
vagrant	  lines: 3	 columns: 10
reboot	  lines: 4	 columns: 11
	  	  lines: 5	 columns: 0
wtmp	  lines: 6	 columns: 7

awk的邏輯運算符：

運算單元	代表的意義
>	大於
<	小於
>=	大於或等於
<=	小於或等於
==	等於
!=	不等於

[範例2：]使用條件表達式
vagrant@vagrant-ubuntu-trusty-64:~$ cat /etc/passwd| awk 'BEGIN {FS=":"} $3<10 {print $1 "\t " $3}'
輸出：
root	 0
daemon	 1
bin	 	 2
sys	 	 3
sync	 4
games	 5
man	 	 6
lp	 	 7
mail	 8
news	 9

這裏因爲/etc/password文件中的分隔符是“:”，因此提前用BEGIN關鍵詞定義FS。

接下來新建一個salary.txt文件測試awk的計算功能，文件內包含以下內容：

Name June July August
xiaoming 3008 4999 8722
damao 788 8999 1000
meimei 8991 9011 10003

[範例3：]計算每個員工六七八三個月的工資總和
vagrant@vagrant-ubuntu-trusty-64:~$ cat salary.txt | awk 'NR==1{printf  "%10s %10s %10s %10s %10s\n",$1,$2,$3,$4,"Total"}  NR>=2{total=$2+$4+$3 ; printf "%10s %10s %10s %10s %10.2f\n",$1,$2,$3,$4,total}'

輸出：
Name       June       July     August      Total
xiaoming       3008       4999       8722   16729.00
  damao        788       8999       1000   10787.00
  meimei       8991       9011      10003   28005.00

4、 sed

sed是一個好用的文本分析工具，可以將數據進行替換、刪除、新增。
用法：sed [-nefr] [操作]
選項與參數：
-n：安靜模式，在sed的一般用法中，所有來自stdin的數據一般都會被列到屏幕上，但如果加上-n選項，則只有經過sed特殊處理的那一行纔會被列出來。
-e：直接在命令行模式上進行sed的操作編輯
-f: 直接將sed的操作寫在一個文件內，-f filename則可以執行filename內的sed操作。
-i：直接修改讀取的文件內容，而不是由屏幕輸出
-r:使用擴展的正則表達式語法，默認是基礎正則表達式語法。

操作說明：[n1[,n2]] function
n1,n2：不一定會存在，一般代表進行操作的行數，例如，操作需要在10到20行之間進行，則【10,20[操作行爲]】

function有下面這些內容：
a：新增，a的後面可以接字符，這些字符會在新的一行出現（當前的下一行）；
c：替換，c的後面可以接字符，這些字符可以替換n1，n2之間的行；
d：刪除，後面不接任何東西；
i：插入，i的後面可以接字符，這些字符會在新的一行出現（目前的上一行）；
p：打印，將某個選擇的數據打印，通常p會與參數sed -n一起運行；
s：替換，可以直接進行替換操作，通常s的操作可以搭配正則表達式，例如，1,20s/old/new/g就是。

[範例1：]刪除2-6行
vagrant@vagrant-ubuntu-trusty-64:~$ nl test |sed '2,6d'
輸出：
1	hello:world:name:phone
7	this is a test file,write some  words to test

[範例2：]在第二行後面加上【hello  world】
vagrant@vagrant-ubuntu-trusty-64:~$ nl test |sed '2a hello world'
輸出：
     1	hello:world:name:phone
     2	what's  your name
hello world
     3	you should work hard or you will failure
     4	phone number:162-8990-0988
     5	This dish  tast good
     6	too many problem need solve
     7	this is a test file,write some  words to test
增加多行：vagrant@vagrant-ubuntu-trusty-64:~$ nl test |sed '2a hello world\nyou name'
輸出：
 1	hello:world:name:phone
     2	what's  your name
hello world
you name
     3	you should work hard or you will failure
     4	phone number:162-8990-0988
     5	This dish  tast good
     6	too many problem need solve
     7	this is a test file,write some  words to test

[例3：]整行替換
vagrant@vagrant-ubuntu-trusty-64:~$ nl test |sed '2,3c  changed'
輸出：
	 1	hello:world:name:phone
changed
     4	phone number:162-8990-0988
     5	This dish  tast good
     6	too many problem need solve
     7	this is a test file,write some  words to test


[例4：] 部分數據的查找和替換
vagrant@vagrant-ubuntu-trusty-64:~$ grep 162 test | sed 's/[0-9]*-[0-9]*-[0-9]*/110/g'
輸出：
phone number:110

[例5：] 直接修改文件
vagrant@vagrant-ubuntu-trusty-64:~$ sed -i '$a #add by sed'  test
test文件最後一行添加了：#add by sed

5、 cut

cut命令可以將一段信息的某一段給切出來，處理的信息是以行爲單位的。
選項與參數：
-d：後面接分隔字符，與-f一起使用；
-f：根據-d的分隔字符將一段信息劃分爲數段，用-f取出第幾段的意思；
-c：以字符（characters）的單位取出固定字符區間；

[範例1]取出第一行文本，以“:”爲分隔符的第二段字符
vagrant@vagrant-ubuntu-trusty-64:~$ grep hello test | cut -d':' -f 2    
輸出：world

[範例2]取出第一行文本，第10到末尾的所有字符
vagrant@vagrant-ubuntu-trusty-64:~$ grep hello test | cut -c 10-
輸出：ld:name:phone

[範例3]取出第一行文本，以“:”爲分隔符的第二段和第四段字符
vagrant@vagrant-ubuntu-trusty-64:~$ grep hello test | cut -d':' -f 2,4   
輸出：world:phone

*參考書目：《鳥哥的Linux私房菜基礎學習篇》

Linux下常用的文本分析工具

1、 grep

2、 egrep

3、 awk

4、 sed

5、 cut

PDManer [元數建模]-v4.9.0 發佈：一款簡單好用的數據庫建模平臺

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

cs01 CSS Syntax

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

[MASM拾遺]Offset僞指令

h30 HTML Layout Elements

瞭解顯卡

一款基於C#開發的通訊調試工具（支持Modbus RTU、MQTT調試）

Linux/Golang/glibC系統調用

cs04 CSS Measurement Units

Linux下常用的文本分析工具

laravel源碼探析（一）：composer自動加載

Laravel源碼（6）：Collection集合類

python編程實踐（一）：統計智聯招聘數據

Redis緩存鍵的過期策略

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結