文本三剑客awk

文本三剑客:
    文本过滤工具:grep
    文本行编辑修改工具:sed(支持正则表达式,也是逐行匹配的)
    文本报告生成器:awk(显示美观的工具美容师)-----三个当中最强大的本身也是编程语言,
-------------------------------------------------


grep, sed, awk
        grep: 文本过滤器
        sed: 行编辑器
        awk: 报告生成器    (一行一行处理,自动遍历)

            AWK a.k.a Aho, Weinberger, Kernighan

            Gnu AWK, gawk   awk的连接是gawk
        $0是可以引用整行,$1,$2是用指明分隔符分离开来的。
        NF 表示这行的字段数
        $NF 表示这行的最后一个字段
        awk 需要对每一行里的每个字段进行处理,就需要用到循环。按照条件挑出想需要的字段
        基本语法
        awk [options] 'program' file file ...
        awk [options] 'PATTERN{action}' file file ...
    1、awk的输出
        print item1, item2,...
        要点:
        (1)各项目之间使用逗号分隔,而输出时则使用输出分隔符分隔(输出分隔符默认是空白);
        (2)输出的各item可以字符串或数值、当前记录的字段、变量或awk的表达式;数值会被隐式转换为字符串后输出
        (3)print 后面item如果省略,相当于priint $0: 输出空白,使用print " "
            (3)
                实例:
                    # ifconfig | awk -F: '/inet addr*/ {print $2}' | awk -F' ' '{print $0}'
                        192.168.1.108  Bcast
                        127.0.0.1  Mask
                    # ifconfig | awk -F: '/inet addr*/ {print $2}' | awk -F' ' '{print }'
                        192.168.1.108  Bcast
                        127.0.0.1  Mask
                    # ifconfig | awk -F: '/inet addr*/ {print $2}' | awk -F' ' '{print $1}'
                        192.168.1.108
                        127.0.0.1
            (2)实例,输出字段(说明下输出的字符串时需要“”)
                    # awk -F: '{print "ning"}' /etc/passwd | tail -n 4
                        ning
                        ning
                        ning
                        ning
                    # awk -F: '{print"ning",$1}' /etc/passwd | tail -n 4
                        ning liang3
                        ning ning4
                        ning gentoo
                        ning mysql
    2、awk的变量
        内置变量,自定义变量
        2.1内置变量
            FS: Field Seperator,输入时的字段分隔符
                实例:指定输入分隔符,FS是复制为:   
                    # awk 'BEGIN{FS=":"}{print $1,$7}' /etc/passwd | tail -n 4
                        liang3 /bin/bash
                        ning4 /bin/bash
                        gentoo /bin/bash
                        mysql /bin/bash
                    # awk -F: '{print $1,$7}' /etc/passwd | tail -n 4
                        liang3 /bin/bash
                        ning4 /bin/bash
                        gentoo /bin/bash
                        mysql /bin/bash
            RS:Record Seperator ,输入行分隔符(每个字段当行处理)
                实例:每个字段当作行来处理;如果指定的分隔符在行中没有,将会当成整行来处理
                    # awk 'BEGIN{RS=":"}{print }' /etc/passwd |head -n 7
                        root
                        x
                        0
                        0
                        root
                        /root
                        /bin/bash
                    # awk 'BEGIN{RS=";"}{print }' /etc/passwd |head -n 4
                        root:x:0:0:root:/root:/bin/bash
                        bin:x:1:1:bin:/bin:/sbin/nologin
                        daemon:x:2:2:daemon:/sbin:/sbin/nologin
                        adm:x:3:4:adm:/var/adm:/sbin/nologin
            OFS: Output Field Seperator,输出字段分隔符。
                实例:
                    # awk 'BEGIN{FS=":";OFS=":"}{print $1,$7}' /etc/passwd |head -n 4
                        root:/bin/bash
                        bin:/sbin/nologin
                        daemon:/sbin/nologin
                        adm:/sbin/nologin
                    # awk -F: 'BEGIN{OFS=":"}{print $1,$7}' /etc/passwd |head -n 4
                        root:/bin/bash
                        bin:/sbin/nologin
                        daemon:/sbin/nologin
                        adm:/sbin/nologin
            ORS: Output Record Seperator,输出的行分隔符
                实例:
                    # awk 'BEGIN{FS=":";ORS="#####"}{print $1,$7}' /etc/passwd |head -n 1
            NF: Numbers of Field,每行字段总数
            NR: Numbers of Record ,行数 所有文件的一并计数;
                实例:
                    #awk 'BEGIN{FS=":";OFS=":"}{print NR ,$1,$7}' /etc/passwd |head -n 4
                        1:root:/bin/bash
                        2:bin:/sbin/nologin
                        3:daemon:/sbin/nologin
                        4:adm:/sbin/nologin
            FNR: 行数 各文件分别计数
                实例:# awk 'BEGIN{FS=":";OFS=":"}{print FNR ,$1,$7}' /etc/passwd /etc/shadow
                        40:liang3:/bin/bash
                        41:ning4:/bin/bash
                        42:gentoo:/bin/bash
                        43:mysql:/bin/bash
                        1:root:
                        2:bin:
                        3:daemon:
                        4:adm:
            ARGV:属组,保存命令本身这个字符,awk '{print $0}' 1.txt 2.txt ,意味着ARGV[0]保存awk命令本身,ARGV[1]保存的是1.txt本事
                实例   
                    # awk 'BEGIN{print ARGV[0],ARGV[1]}' /etc/passwd /etc/group
                        awk /etc/passwd
            ARGC:保存awk整个命令中参数的个数
                实例:
                    (这里的参数为什么是3个呢,因为,awk是一个,/etc/passwd,/tec/group,整个命令的参数)
                    # awk 'BEGIN{print ARGV[0],ARGV[1],ARGC}' /etc/passwd /etc/group
                        awk /etc/passwd 3
            FILENAME:awk正在处理的当前文件的名称;
                实例:
                    # awk '{print $3,FILENAME}' /etc/passwd  | tail -n 4
                     /etc/passwd
                     /etc/passwd
                     /etc/passwd
                     /etc/passwd
        2.2可自定义比变量
            -v var_name=VALUE
            变量名区分字符大小写,但不能以数字开头定义;
            awk [options] 'program' file file ...
                (1)可以program中定义变量;
                    # awk 'BEGIN{a="ning"; print a}'
                        ning
                (2)可以在命令行中通过-v选项自定义变量;
                    # awk -v a="ning" 'BEGIN{print a}'
                        ning
    3、awk的printf命令
        命令的使用格式:printf format ,item1,item2,……
            要点:
                (1)必须要指定format
                (2)不会自动换行:如需换行则需要给出\n
                (3)format 用于为后面的每个item指定其输出格式
        format格式的指示符都%开头,后跟一个字符;
            %c:显示字符的ASCII码
            %d,%i:十进制整数
            %e,%E:科学记数法
            %f :显示浮点数(15.2---点是指明精度)
            %g,%G:以科学计数法或浮点数格式显示数值
            %s: 显示字符串
            %u:显示无符号整数;
            %%:显示%自身;
        格式中可使用的修饰符:
            #:显示宽度
            -:左对齐(%-20s显示20个字串左对齐否则为右对齐)
            +:显示数值的符合
            .#: 取值精度
        实例:
            # awk -F: '{printf "%20s,%30s\n",$1,$7}' /etc/passwd | tail -n 4
              liang3,                     /bin/bash
               ning4,                     /bin/bash
              gentoo,                     /bin/bash
               mysql,                     /bin/bash
            # awk -F: '{printf "%20s,%-30s\n",$1,$7}' /etc/passwd | tail -n 4(30个字符左对齐)
              liang3,/bin/bash                    
               ning4,/bin/bash                    
              gentoo,/bin/bash                    
               mysql,/bin/bash
            # awk -F: '{printf "%20s %-30s\n",$1,$7}' /etc/passwd | tail -n 4(在中间添加任何字符都可以显示)
              liang3 /bin/bash                    
               ning4 /bin/bash                    
              gentoo /bin/bash                    
               mysql /bin/bash
            # awk 'BEGIN{printf "%f\n",3.1415}'(输出精度不够补0)
                3.141500
            # awk 'BEGIN{printf "%f\n",3.1415926}'(输出精度超过时,将四舍五入)
                3.141593
            # awk 'BEGIN{printf "%e\n",3141.5926}'(科学计数法)
                3.141593e+03
            # awk 'BEGIN{printf "%e\n",3.1415926}'
                3.141593e+00
            # awk 'BEGIN{printf "%20e\n",3.1415926}'(也可以指定位宽)
                        3.141593e+00
            # awk 'BEGIN{printf "%-20e\n",3.1415926}'
                3.141593e+00  
            # awk 'BEGIN{printf "%20.2f\n",3.1415926}'(显示数值的精度)
                3.14
    4、awk输出重定向
        print items > output-file
        print items >> output-file
        print items | command
        特殊文件描述符:
        /dev/stdin:标准输入
        /dev/stdout:标准输出
        /dev/stderr:错误输出
    5、awk的操作符
        算数操作符:
            x+y
            x-y
            x*y
            x/y
            x**y,x^y
            x%y
            -x:负值
            +x:转换为数值
        字符串操作符:连接
        赋值操作符:
            =
            +=
            -=
            *=
            /=
            %=
            ^=
            **=
            ++
            --
            如果模式自身是=号,要写为/=/
        比较操作符:
            <
            <=
            >
            >=
            ==
            !=
            ~:模式匹配,左边的字符串能够被右边的模式所匹配为真,否则为假
            !~
        逻辑操作符:
            &&:与
            ||:或
        条件表达式:
            selector?if-true-expression:if-false-epression
            解释:条件”selector?“如果为真,则运行if-true-expression(是赋值表达式);否则:if-false-epression(是复制表达式)
            实例:判断/etc/passwd中uid大于等于500的用户,并输出指定用户信息和变量复制的信息。
                # awk -F: '{$3>=500?utype="common user":utype="admin or system user";print $1,"is",utype}' /etc/passwd
     
    6、模式
        awk [options] 'PATTERN{action}' file file ...
        (1)Regexp:可以是正则表达式,格式为/PATTERN/
            仅处理被/PATTERN/匹配的行;
            实例:显示有root字串的行的用户
                # awk -F: '/root/{print $0}' /etc/passwd
                    root:x:0:0:root:/root:/bin/bash
                    operator:x:11:0:operator:/root:/sbin/nologin
                # grep 'root' /etc/passwd
                    root:x:0:0:root:/root:/bin/bash
                    operator:x:11:0:operator:/root:/sbin/nologin
        (2)Expression:表达式,其结果为非0或非空字符串时满足条件;
            仅处理满足条件的行
            实例:显示/etc/passwd中用户UID大于等于500的用户和UID
                # awk -F: '$3>=500{print $1,$3}' /etc/passwd | tail -n 4
                    liang3 504
                    ning4 505
                    gentoo 506
                    liang4 507
                我们这里也可以使用printf来修饰输出内容
                # awk -F: '$3>=500{printf "%12s %-5d\n",$1,$3}' /etc/passwd | tail -n 4
                      liang3 504 
                       ning4 505 
                      gentoo 506 
                      liang4 507
        (3)Ranges:行范围,此前地址定界,startline,endline
            仅处理范围内的行:
            如/root/,/bin/匹配到root字串开始到匹配到第一次匹配到bin的行
                实例:显示/etc/passwd中匹配到字串2开始到匹配到bin字串的行显示出来
                    # awk -F: '/0/,/bin/{print $0}' /etc/passwd | tail -n 4
                        liang3:x:504:504::/home/liang3:/bin/bash
                        ning4:x:505:505::/home/ning4:/bin/bash
                        gentoo:x:506:506::/home/gentoo:/bin/bash
                        liang4:x:507:507::/home/liang4:/bin/bash
        (4)BEGIN/END:特殊模式,仅在awk命令的program运行之前(BEGIN)或运行之后(END)执行一次
                实例:在显示头行前添加打印指定内容   
                    # awk -F: 'BEGIN{print "this is boy!!"}{print $1}' /etc/passwd | head -n 4
                        this is boy!!
                        root
                        bin
                        daemon
                    在显示内容的最后添加指定的内容
                    # awk -F: '{print $1}END{print "you is boy!!"}' /etc/passwd | tail -n 4
                        gentoo
                        mysql
                        liang4
                        you is boy!!
        (5)Empty;空模式,匹配任意行;(没有模式的情况在这里不做解释,就是不做任何条件的处理显示结果)
    7、常用的action
            awk [options] 'PATTERN{action}' file file ...
                (1)Expressions 表达式
                (2)Control statements 控制语句
                (3)Compund statements 组合语句
                (4)input statements 输入语句
                (5)output statements 输出语句
    8、控制语句
    (1)if-else 条件判断语句(用于判断某个字段)
            格式:if (条件){条件为真时的语句}else {条件为假时的语句}
            实例:显示大于500的用户并输出指定的内容
                # awk -F: '{if ($3>=500){print $1,"is a common user"}else{print $1,"is an admin or system user"}}' /etc/passwd | tail -n 4
                        ning4 is a common user
                        gentoo is a common user
                        mysql is an admin or system user
                        liang4 is a common user
                或       
                # awk -F: '{$3>=500?utype="common user":utype="admin or system user";print $1,"is",utype}' /etc/passwd |tail -n 4
                显示大于5个字段的行
                # awk '{if (NF>=5){print}}' /etc/inittab  | tail -n 5
                    #   0 - halt (Do NOT set initdefault to this)
                    #   1 - Single user mode
                    #   2 - Multiuser, without NFS (The same as 3, if you do not have networking)
                    #   3 - Full multiuser mode
                    #   6 - reboot (Do NOT set initdefault to this)
                显示小于20个字段的行
                # awk '{if (NF<=20) {print }}' /etc/inittab | tail -5
                    #   4 - unused
                    #   5 - X11
                    #   6 - reboot (Do NOT set initdefault to this)
                    #
                    id:3:initdefault:
    (2)while 循环语句
            格式:while (条件) {条件为真时循环,为假时不循环}       
                length()内置函数:取字串的长度
            实例:打印输出/etc/inittab中每行奇数字段
                    # awk '{i=1;while(i<=NF){printf "%s",$i,i+=2};printf "\n" }' /etc/inittab
                    或
                    # awk '{i=1;while(i<=NF){printf "%s",$i,i+=2};print"" }' /etc/inittab
                打印输出/etc/inittab中字段大于6个字节长度的字段
                    # awk '{i=1;while(i<=NF){if(length($i)>=6){print $i};i++}}' /etc/inittab
            #awk 'BEGIN{i=1;do{print "uui";i++}while(i<=5)}' /etc/inittab
    (3)for循环
            格式: for (变量复制;条件;条件修正){循环体}
                实例:打印输出/etc/inittab中奇数字段
                    # awk '{for (i=1;i<=NF;i+=2){printf "%s",$i};print ""}' /etc/inittab
                打印输出/etc/inittab中字段大于6个字节长度的字段
                    # awk '{for (i=1;i<=NF;i++){if (length($i)>=6) print $i}}' /etc/inittab
        for循环可用来遍历属组元素;
                语法: for (i in array) {for body}
                    命令解释:i会遍历变量的下标
    
     (4)    next
            提前结束对本行处理进而进入下一行的处理:
            实例:显示UID为奇数行的用户和UID;显示UID为奇数的行
                # awk -F: '{if ($3%2==0)next;{print $1,$3}}' /etc/passwd
                或
                # awk -F: '{if (NR%2==0) next; print NR,$1}' /etc/passwd | head -n 4
                    1 root
                    3 daemon
                    5 lp
                    7 shutdown
9、数组
    array [index-expression]
        index-expression:可以使用任意字符串;如果某属组元素事先不存在,那么在引用时,awk会自动创建此元素并将其初始化为空串
        因此,要判断某属组是否存在某元素,必须使用”index in array“这种格式:如“i in stat”
        要遍历属组中的每一个元素,需要使用如下特殊结构:
            for (var in array) {for body}
            其var会遍历array的索引;
        stat[LISTEN]++
        stat[ESTABLISHED]++
    实例:统计tcp进程各状态有多少个
        # netstat -tan | awk '/^tcp/{++stat[$NF]}END{for (i in stat){print i, stat[i]}}'
            ESTABLISHED 1
            LISTEN 14
         统计每个IP访问了多少次
            # awk '{ip[$1]++}END{for (i in ip){print i,ip[i]}}' /etc/httpd/logs/access_log
                192.168.1.101 2000
            命令解释:用所查的ip当下标,逐次加一,一直到碰到其他IP位置为止,开始同样的方式类推下去,在用for循环语句隐式遍历的是下标,赋值给i,明面上复制的是ip其实隐式的是下标,这样理解就能好一点,输出明复制ip,输出属组的赋值就是IP的数量
        我们这里一个实例:
            #watch -n 1 `netstat -tn`查看动态进程表
            #ab -n 1000 -c 100 http://172.16.100.7/index.html
        删除属组元素:
            delete array[index]
10、awk的内置函数

  split(string,array[,fieldsep[,seps]])
    split (root:x:0:0,user,:)
        解释:以“:”为分隔符进行切片把切片的值赋值给user
    功能:将string表示的字符串以fieldsep为分隔符进行切片,并切片后的结果保存至array为名的属组中;数组下标从1开始
        实例:显示下标数和复制内容对应
            # awk 'BEGIN{split("root:x:0:0",user,":"); for (i in user) print i, user[i]}'
                4 0
                1 root
                2 x
                3 0
                命令解释:这里切片赋值给user,user[1]=root;user[2]=x......
                            i遍历的是下标
                            输出i
                            输出user[i]=赋值
        实例 显示进程中IP的访问数量netstat -tnl
            # netstat -tn | awk '/^tcp/{lens=split($5,client,":");ip[client[lens-1]]++}END {for (i in ip) print i,ip[i]}'
                172.16.250.91 1
                172.16.3.2 1
                192.168.1.100 1
            解释:lens=split($5,client,":")
                split函数有返回值,返回值为切片的元素个数
  
    gawk官方文档           
http://www.gnu.org/software/gawk/manual/gawk.html#Built_002din

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章