文本三劍客awk

文本三劍客:
    文本過濾工具:grep
    文本行編輯修改工具:sed(支持正則表達式,也是逐行匹配的)
    文本報告生成器:awk(顯示美觀的工具美容師)-----三個當中最強大的本身也是編程語言,
-------------------------------------------------


grep, sed, awk
        grep: 文本過濾器
        sed: 行編輯器
        awk: 報告生成器    (一行一行處理,自動遍歷)

            AWK a.k.a Aho, Weinberger, Kernighan

            Gnu AWK, gawk   awk的連接是gawk
        $0是可以引用整行,$1,$2是用指明分隔符分離開來的。
        NF 表示這行的字段數
        $NF 表示這行的最後一個字段
        awk 需要對每一行裏的每個字段進行處理,就需要用到循環。按照條件挑出想需要的字段
        基本語法
        awk [options] 'program' file file ...
        awk [options] 'PATTERN{action}' file file ...
    1、awk的輸出
        print item1, item2,...
        要點:
        (1)各項目之間使用逗號分隔,而輸出時則使用輸出分隔符分隔(輸出分隔符默認是空白);
        (2)輸出的各item可以字符串或數值、當前記錄的字段、變量或awk的表達式;數值會被隱式轉換爲字符串後輸出
        (3)print 後面item如果省略,相當於priint $0: 輸出空白,使用print " "
            (3)
                實例:
                    # ifconfig | awk -F: '/inet addr*/ {print $2}' | awk -F' ' '{print $0}'
                        192.168.1.108  Bcast
                        127.0.0.1  Mask
                    # ifconfig | awk -F: '/inet addr*/ {print $2}' | awk -F' ' '{print }'
                        192.168.1.108  Bcast
                        127.0.0.1  Mask
                    # ifconfig | awk -F: '/inet addr*/ {print $2}' | awk -F' ' '{print $1}'
                        192.168.1.108
                        127.0.0.1
            (2)實例,輸出字段(說明下輸出的字符串時需要“”)
                    # awk -F: '{print "ning"}' /etc/passwd | tail -n 4
                        ning
                        ning
                        ning
                        ning
                    # awk -F: '{print"ning",$1}' /etc/passwd | tail -n 4
                        ning liang3
                        ning ning4
                        ning gentoo
                        ning mysql
    2、awk的變量
        內置變量,自定義變量
        2.1內置變量
            FS: Field Seperator,輸入時的字段分隔符
                實例:指定輸入分隔符,FS是複製爲:   
                    # awk 'BEGIN{FS=":"}{print $1,$7}' /etc/passwd | tail -n 4
                        liang3 /bin/bash
                        ning4 /bin/bash
                        gentoo /bin/bash
                        mysql /bin/bash
                    # awk -F: '{print $1,$7}' /etc/passwd | tail -n 4
                        liang3 /bin/bash
                        ning4 /bin/bash
                        gentoo /bin/bash
                        mysql /bin/bash
            RS:Record Seperator ,輸入行分隔符(每個字段當行處理)
                實例:每個字段當作行來處理;如果指定的分隔符在行中沒有,將會當成整行來處理
                    # awk 'BEGIN{RS=":"}{print }' /etc/passwd |head -n 7
                        root
                        x
                        0
                        0
                        root
                        /root
                        /bin/bash
                    # awk 'BEGIN{RS=";"}{print }' /etc/passwd |head -n 4
                        root:x:0:0:root:/root:/bin/bash
                        bin:x:1:1:bin:/bin:/sbin/nologin
                        daemon:x:2:2:daemon:/sbin:/sbin/nologin
                        adm:x:3:4:adm:/var/adm:/sbin/nologin
            OFS: Output Field Seperator,輸出字段分隔符。
                實例:
                    # awk 'BEGIN{FS=":";OFS=":"}{print $1,$7}' /etc/passwd |head -n 4
                        root:/bin/bash
                        bin:/sbin/nologin
                        daemon:/sbin/nologin
                        adm:/sbin/nologin
                    # awk -F: 'BEGIN{OFS=":"}{print $1,$7}' /etc/passwd |head -n 4
                        root:/bin/bash
                        bin:/sbin/nologin
                        daemon:/sbin/nologin
                        adm:/sbin/nologin
            ORS: Output Record Seperator,輸出的行分隔符
                實例:
                    # awk 'BEGIN{FS=":";ORS="#####"}{print $1,$7}' /etc/passwd |head -n 1
            NF: Numbers of Field,每行字段總數
            NR: Numbers of Record ,行數 所有文件的一併計數;
                實例:
                    #awk 'BEGIN{FS=":";OFS=":"}{print NR ,$1,$7}' /etc/passwd |head -n 4
                        1:root:/bin/bash
                        2:bin:/sbin/nologin
                        3:daemon:/sbin/nologin
                        4:adm:/sbin/nologin
            FNR: 行數 各文件分別計數
                實例:# awk 'BEGIN{FS=":";OFS=":"}{print FNR ,$1,$7}' /etc/passwd /etc/shadow
                        40:liang3:/bin/bash
                        41:ning4:/bin/bash
                        42:gentoo:/bin/bash
                        43:mysql:/bin/bash
                        1:root:
                        2:bin:
                        3:daemon:
                        4:adm:
            ARGV:屬組,保存命令本身這個字符,awk '{print $0}' 1.txt 2.txt ,意味着ARGV[0]保存awk命令本身,ARGV[1]保存的是1.txt本事
                實例   
                    # awk 'BEGIN{print ARGV[0],ARGV[1]}' /etc/passwd /etc/group
                        awk /etc/passwd
            ARGC:保存awk整個命令中參數的個數
                實例:
                    (這裏的參數爲什麼是3個呢,因爲,awk是一個,/etc/passwd,/tec/group,整個命令的參數)
                    # awk 'BEGIN{print ARGV[0],ARGV[1],ARGC}' /etc/passwd /etc/group
                        awk /etc/passwd 3
            FILENAME:awk正在處理的當前文件的名稱;
                實例:
                    # awk '{print $3,FILENAME}' /etc/passwd  | tail -n 4
                     /etc/passwd
                     /etc/passwd
                     /etc/passwd
                     /etc/passwd
        2.2可自定義比變量
            -v var_name=VALUE
            變量名區分字符大小寫,但不能以數字開頭定義;
            awk [options] 'program' file file ...
                (1)可以program中定義變量;
                    # awk 'BEGIN{a="ning"; print a}'
                        ning
                (2)可以在命令行中通過-v選項自定義變量;
                    # awk -v a="ning" 'BEGIN{print a}'
                        ning
    3、awk的printf命令
        命令的使用格式:printf format ,item1,item2,……
            要點:
                (1)必須要指定format
                (2)不會自動換行:如需換行則需要給出\n
                (3)format 用於爲後面的每個item指定其輸出格式
        format格式的指示符都%開頭,後跟一個字符;
            %c:顯示字符的ASCII碼
            %d,%i:十進制整數
            %e,%E:科學記數法
            %f :顯示浮點數(15.2---點是指明精度)
            %g,%G:以科學計數法或浮點數格式顯示數值
            %s: 顯示字符串
            %u:顯示無符號整數;
            %%:顯示%自身;
        格式中可使用的修飾符:
            #:顯示寬度
            -:左對齊(%-20s顯示20個字串左對齊否則爲右對齊)
            +:顯示數值的符合
            .#: 取值精度
        實例:
            # awk -F: '{printf "%20s,%30s\n",$1,$7}' /etc/passwd | tail -n 4
              liang3,                     /bin/bash
               ning4,                     /bin/bash
              gentoo,                     /bin/bash
               mysql,                     /bin/bash
            # awk -F: '{printf "%20s,%-30s\n",$1,$7}' /etc/passwd | tail -n 4(30個字符左對齊)
              liang3,/bin/bash                    
               ning4,/bin/bash                    
              gentoo,/bin/bash                    
               mysql,/bin/bash
            # awk -F: '{printf "%20s %-30s\n",$1,$7}' /etc/passwd | tail -n 4(在中間添加任何字符都可以顯示)
              liang3 /bin/bash                    
               ning4 /bin/bash                    
              gentoo /bin/bash                    
               mysql /bin/bash
            # awk 'BEGIN{printf "%f\n",3.1415}'(輸出精度不夠補0)
                3.141500
            # awk 'BEGIN{printf "%f\n",3.1415926}'(輸出精度超過時,將四捨五入)
                3.141593
            # awk 'BEGIN{printf "%e\n",3141.5926}'(科學計數法)
                3.141593e+03
            # awk 'BEGIN{printf "%e\n",3.1415926}'
                3.141593e+00
            # awk 'BEGIN{printf "%20e\n",3.1415926}'(也可以指定位寬)
                        3.141593e+00
            # awk 'BEGIN{printf "%-20e\n",3.1415926}'
                3.141593e+00  
            # awk 'BEGIN{printf "%20.2f\n",3.1415926}'(顯示數值的精度)
                3.14
    4、awk輸出重定向
        print items > output-file
        print items >> output-file
        print items | command
        特殊文件描述符:
        /dev/stdin:標準輸入
        /dev/stdout:標準輸出
        /dev/stderr:錯誤輸出
    5、awk的操作符
        算數操作符:
            x+y
            x-y
            x*y
            x/y
            x**y,x^y
            x%y
            -x:負值
            +x:轉換爲數值
        字符串操作符:連接
        賦值操作符:
            =
            +=
            -=
            *=
            /=
            %=
            ^=
            **=
            ++
            --
            如果模式自身是=號,要寫爲/=/
        比較操作符:
            <
            <=
            >
            >=
            ==
            !=
            ~:模式匹配,左邊的字符串能夠被右邊的模式所匹配爲真,否則爲假
            !~
        邏輯操作符:
            &&:與
            ||:或
        條件表達式:
            selector?if-true-expression:if-false-epression
            解釋:條件”selector?“如果爲真,則運行if-true-expression(是賦值表達式);否則:if-false-epression(是複製表達式)
            實例:判斷/etc/passwd中uid大於等於500的用戶,並輸出指定用戶信息和變量複製的信息。
                # awk -F: '{$3>=500?utype="common user":utype="admin or system user";print $1,"is",utype}' /etc/passwd
     
    6、模式
        awk [options] 'PATTERN{action}' file file ...
        (1)Regexp:可以是正則表達式,格式爲/PATTERN/
            僅處理被/PATTERN/匹配的行;
            實例:顯示有root字串的行的用戶
                # awk -F: '/root/{print $0}' /etc/passwd
                    root:x:0:0:root:/root:/bin/bash
                    operator:x:11:0:operator:/root:/sbin/nologin
                # grep 'root' /etc/passwd
                    root:x:0:0:root:/root:/bin/bash
                    operator:x:11:0:operator:/root:/sbin/nologin
        (2)Expression:表達式,其結果爲非0或非空字符串時滿足條件;
            僅處理滿足條件的行
            實例:顯示/etc/passwd中用戶UID大於等於500的用戶和UID
                # awk -F: '$3>=500{print $1,$3}' /etc/passwd | tail -n 4
                    liang3 504
                    ning4 505
                    gentoo 506
                    liang4 507
                我們這裏也可以使用printf來修飾輸出內容
                # awk -F: '$3>=500{printf "%12s %-5d\n",$1,$3}' /etc/passwd | tail -n 4
                      liang3 504 
                       ning4 505 
                      gentoo 506 
                      liang4 507
        (3)Ranges:行範圍,此前地址定界,startline,endline
            僅處理範圍內的行:
            如/root/,/bin/匹配到root字串開始到匹配到第一次匹配到bin的行
                實例:顯示/etc/passwd中匹配到字串2開始到匹配到bin字串的行顯示出來
                    # awk -F: '/0/,/bin/{print $0}' /etc/passwd | tail -n 4
                        liang3:x:504:504::/home/liang3:/bin/bash
                        ning4:x:505:505::/home/ning4:/bin/bash
                        gentoo:x:506:506::/home/gentoo:/bin/bash
                        liang4:x:507:507::/home/liang4:/bin/bash
        (4)BEGIN/END:特殊模式,僅在awk命令的program運行之前(BEGIN)或運行之後(END)執行一次
                實例:在顯示頭行前添加打印指定內容   
                    # awk -F: 'BEGIN{print "this is boy!!"}{print $1}' /etc/passwd | head -n 4
                        this is boy!!
                        root
                        bin
                        daemon
                    在顯示內容的最後添加指定的內容
                    # awk -F: '{print $1}END{print "you is boy!!"}' /etc/passwd | tail -n 4
                        gentoo
                        mysql
                        liang4
                        you is boy!!
        (5)Empty;空模式,匹配任意行;(沒有模式的情況在這裏不做解釋,就是不做任何條件的處理顯示結果)
    7、常用的action
            awk [options] 'PATTERN{action}' file file ...
                (1)Expressions 表達式
                (2)Control statements 控制語句
                (3)Compund statements 組合語句
                (4)input statements 輸入語句
                (5)output statements 輸出語句
    8、控制語句
    (1)if-else 條件判斷語句(用於判斷某個字段)
            格式:if (條件){條件爲真時的語句}else {條件爲假時的語句}
            實例:顯示大於500的用戶並輸出指定的內容
                # awk -F: '{if ($3>=500){print $1,"is a common user"}else{print $1,"is an admin or system user"}}' /etc/passwd | tail -n 4
                        ning4 is a common user
                        gentoo is a common user
                        mysql is an admin or system user
                        liang4 is a common user
                或       
                # awk -F: '{$3>=500?utype="common user":utype="admin or system user";print $1,"is",utype}' /etc/passwd |tail -n 4
                顯示大於5個字段的行
                # awk '{if (NF>=5){print}}' /etc/inittab  | tail -n 5
                    #   0 - halt (Do NOT set initdefault to this)
                    #   1 - Single user mode
                    #   2 - Multiuser, without NFS (The same as 3, if you do not have networking)
                    #   3 - Full multiuser mode
                    #   6 - reboot (Do NOT set initdefault to this)
                顯示小於20個字段的行
                # awk '{if (NF<=20) {print }}' /etc/inittab | tail -5
                    #   4 - unused
                    #   5 - X11
                    #   6 - reboot (Do NOT set initdefault to this)
                    #
                    id:3:initdefault:
    (2)while 循環語句
            格式:while (條件) {條件爲真時循環,爲假時不循環}       
                length()內置函數:取字串的長度
            實例:打印輸出/etc/inittab中每行奇數字段
                    # awk '{i=1;while(i<=NF){printf "%s",$i,i+=2};printf "\n" }' /etc/inittab
                    或
                    # awk '{i=1;while(i<=NF){printf "%s",$i,i+=2};print"" }' /etc/inittab
                打印輸出/etc/inittab中字段大於6個字節長度的字段
                    # awk '{i=1;while(i<=NF){if(length($i)>=6){print $i};i++}}' /etc/inittab
            #awk 'BEGIN{i=1;do{print "uui";i++}while(i<=5)}' /etc/inittab
    (3)for循環
            格式: for (變量複製;條件;條件修正){循環體}
                實例:打印輸出/etc/inittab中奇數字段
                    # awk '{for (i=1;i<=NF;i+=2){printf "%s",$i};print ""}' /etc/inittab
                打印輸出/etc/inittab中字段大於6個字節長度的字段
                    # awk '{for (i=1;i<=NF;i++){if (length($i)>=6) print $i}}' /etc/inittab
        for循環可用來遍歷屬組元素;
                語法: for (i in array) {for body}
                    命令解釋:i會遍歷變量的下標
    
     (4)    next
            提前結束對本行處理進而進入下一行的處理:
            實例:顯示UID爲奇數行的用戶和UID;顯示UID爲奇數的行
                # awk -F: '{if ($3%2==0)next;{print $1,$3}}' /etc/passwd
                或
                # awk -F: '{if (NR%2==0) next; print NR,$1}' /etc/passwd | head -n 4
                    1 root
                    3 daemon
                    5 lp
                    7 shutdown
9、數組
    array [index-expression]
        index-expression:可以使用任意字符串;如果某屬組元素事先不存在,那麼在引用時,awk會自動創建此元素並將其初始化爲空串
        因此,要判斷某屬組是否存在某元素,必須使用”index in array“這種格式:如“i in stat”
        要遍歷屬組中的每一個元素,需要使用如下特殊結構:
            for (var in array) {for body}
            其var會遍歷array的索引;
        stat[LISTEN]++
        stat[ESTABLISHED]++
    實例:統計tcp進程各狀態有多少個
        # netstat -tan | awk '/^tcp/{++stat[$NF]}END{for (i in stat){print i, stat[i]}}'
            ESTABLISHED 1
            LISTEN 14
         統計每個IP訪問了多少次
            # awk '{ip[$1]++}END{for (i in ip){print i,ip[i]}}' /etc/httpd/logs/access_log
                192.168.1.101 2000
            命令解釋:用所查的ip當下標,逐次加一,一直到碰到其他IP位置爲止,開始同樣的方式類推下去,在用for循環語句隱式遍歷的是下標,賦值給i,明面上複製的是ip其實隱式的是下標,這樣理解就能好一點,輸出明覆制ip,輸出屬組的賦值就是IP的數量
        我們這裏一個實例:
            #watch -n 1 `netstat -tn`查看動態進程表
            #ab -n 1000 -c 100 http://172.16.100.7/index.html
        刪除屬組元素:
            delete array[index]
10、awk的內置函數

  split(string,array[,fieldsep[,seps]])
    split (root:x:0:0,user,:)
        解釋:以“:”爲分隔符進行切片把切片的值賦值給user
    功能:將string表示的字符串以fieldsep爲分隔符進行切片,並切片後的結果保存至array爲名的屬組中;數組下標從1開始
        實例:顯示下標數和複製內容對應
            # awk 'BEGIN{split("root:x:0:0",user,":"); for (i in user) print i, user[i]}'
                4 0
                1 root
                2 x
                3 0
                命令解釋:這裏切片賦值給user,user[1]=root;user[2]=x......
                            i遍歷的是下標
                            輸出i
                            輸出user[i]=賦值
        實例 顯示進程中IP的訪問數量netstat -tnl
            # netstat -tn | awk '/^tcp/{lens=split($5,client,":");ip[client[lens-1]]++}END {for (i in ip) print i,ip[i]}'
                172.16.250.91 1
                172.16.3.2 1
                192.168.1.100 1
            解釋:lens=split($5,client,":")
                split函數有返回值,返回值爲切片的元素個數
  
    gawk官方文檔           
http://www.gnu.org/software/gawk/manual/gawk.html#Built_002din

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章