sed學習小結

  最初開始接觸到sed這個工具是源於linux內核,感覺很複雜,全是一些/ /之類的東西,完全看不懂,現在做android,我想查看他的編譯過程,就想在那些.mk文件中插入一些內容來顯示跑到那個文件中了,.mk文件很多,文件首位插入一行,我不可能一個個去做。於是想到了sed,在網絡上看sed & awk這本書好像是很不錯,就找來看了。中文的全是看上去很不爽的,字跡模糊,看了英文原版,還好不長,要不然就吃不消了。

  sed我個人認爲主要是插入到bash腳本中使用的,專門用一個script來些我沒有碰到過,排版方面的內容我沒有做過,也不清楚。看看我在所有.mk文件首位各插入一行:

#!/bin/bash

while (($#>0))
do
    sed -i '1i/
$(info this is the file '"$1"' begin)' $1
    sed -i '$a/
$(info this is the file '"$1"' end)' $1
    shift
done

return 0

 

執行如下:

$./insert.sh `find . -name *.mk`

 

看看吧,我修改至少有100個文件,要一個個手動改我會膩味的,改不下的。

bash的內容我不做介紹,看一下sed,-i 選項指明修改源文件,其實內部也是先保存一個輸出,在將輸出覆蓋源文件。單引號阻止bash解析字符,例如$等,而其中的$1需要它解析,就用了單引號內包雙引號的這麼一個格式。1爲地址,指明第一行,同理,$指明爲最後一行。i在當前行的行前插入一行,a在當前行的行尾插入一行。

 

sed使用格式簡言之就是地址address後跟命令command,有差異的是address有可能是兩個,採取這個的格式one-address,two-address表示兩個address之間的全部內容,有的命令是不能用這種格式的。同一地址下多命令使用{}括起來,分行寫,如下:

address{

command1

command2

command3

}

還有一種方式,比如你想模仿grep在文件中找含有hello這個字符串的行,並打印行號和內容,可以這樣:

sed -n '/hello/{=;p}' file

其中,-n選項阻止其餘行輸出,=輸出行號,p打印行內容。我的gnu sed版本是支持這種把兩個命令放在一行的,posix好像不支持。這個例子本身沒有什麼意義,不如用grep來的直接了當,看起來還舒服些。

 

address通常用正則表達式來表示,置於//之間。當然你可以用數字來具體表示第幾行,比如我上面那個腳本,1表示第一行,$表示最後一行。但是應用性估計不大。

 

通常來說,sed的執行流程是一行輸入,匹配地址,滿足的話就執行命令。當然,存在改變這一流程的語句,比如n命令。

 

下面貼一下sed&awk中所有的sed命令,我修改了排序:

=  [address]=

Write to standard output the line number of addressed line.

p  [address1[,address2]]p

Print the addressed line(s). Note that this can result in duplicateoutput unless default output is suppressed by using "#n" orthe -ncommand-line option. Typically used before commands that change flowcontrol (d, n,b) and might prevent the current line from beingoutput.

簡單的打印命令,打印行號和內容,不做任何修改,只是輸出。sed -n選項會阻止一般輸出,這是需要用p打印內容。
i  [address1]i/

text

Insert text before each line matched byaddress. (See a fordetails on text.)

a  [address]a/

text

Append textfollowing each line matched by address. Iftext goes over more than one line, newlinesmust be "hidden" by preceding them with a backslash. Thetext will be terminated by the firstnewline that is not hidden in this way. Thetext is not available in the pattern spaceand subsequent commands cannot be applied to it. The results of thiscommand are sent to standard output when the list of editing commandsis finished, regardless of what happens to the current line in thepattern space.

c  [address1[,address2]]c/

text

Replace (change) the lines selected by the address withtext. When a range of lines is specified,all lines as a group are replaced by a single copy oftext. The newline following each line oftext must be escaped by a backslash, exceptthe last line. The contents of the pattern space are, in effect,deleted and no subsequent editing commands can be applied to it (or totext).

這三個命令改變了模式空間(pattern space,就是在這個空間內對輸入的每一行進行處理)的內容,但是修改(插入i/a和改變c)的內容是不可被修改的。
l  [address1[,address2]]l

List the contents of the pattern space, showing nonprinting charactersas ASCII codes. Long lines are wrapped.

這個命令我從來沒用到過,不過你可以用這個命令來顯示有幾個空格,前提是你記得ASCII編碼。
d  [address1[,address2]]d

Delete line(s) from pattern space. Thus, the line is not passed to standardoutput. A new line of input is read and editing resumes with firstcommand in script.

n  [address1[,address2]]n

Read next line of input into pattern space. Current line is sent tostandard output. New line becomes current line and increments linecounter. Control passes to command following ninstead of resuming at the top of the script.

q  [address]q

Quit when address is encountered. Theaddressed line is first written to output (if default output is notsuppressed), along with any text appended to it by previousa or r commands.

這三個命令,q的用處可以用b命令替換,d,n命令用於一些組合,d修改了流程,n命令不修改流程(就是接下去執行下一行),但是模式空間的內容被修改了。
s  [address1[,address2]]s/pattern/replacement/[flags]

Substitute replacement forpattern on each addressed line. If patternaddresses are used, the pattern // represents thelast pattern address specified. The following flags can be specified:

n

Replace nth instance of/pattern/ on each addressed line.n is any number in the range 1 to 512, andthe default is 1.

g

Replace all instances of /pattern/ on eachaddressed line, not just the first instance.

p

Print the line if a successful substitution is done. If severalsuccessful substitutions are done, multiple copies of the line will beprinted.

w file

Write the line to file if a replacementwas done. A maximum of 10 different filescan be opened.

y  [address1[,address2]]y/abc/xyz/

Transform each character by position in stringabc to its equivalent in stringxyz.

上面兩個都是替換命令,y命令的替換是逐字符的,比如y/abc/xyz/就是把所有的a換爲x,b換爲y,c換爲z。s命令用的比較多的,上面的有提到的空模式是這樣的。比如:
/hello/s//hai/g
這一行表示的意思是,當前行如果有hello這個字符串,我們就把這一行的全部hello字串改爲hai字串。
s後面跟的/爲分割字符(delimiter),可以使用其他的,例如:s!hello/me!hai/you! 這裏delimiter是!,當然,這種情況是正則中大量存在/字符用的,你也可以不停的轉義,改爲s/hello//me/hai//you/,這樣看起來就比較複雜了。
replacement中可以使用元字符,主要是&,替換pattern中匹配的內容,比如你pattern爲hel*o,匹配到的是helllllllo,就替換爲這個字串。
/n,n爲數字,用於替換pattern中用/(和/)之間的內容,第一個就是/1,第二個就是/2,依次類推。還有這麼一種情況:
s/[tab]//
/2
上面的[tab]表示你type一下tab鍵。意思爲,把第二個tab轉換爲新行。改爲這樣更好一點:s//t//n/2,因爲我在命令行直接打tab鍵是不行的。不知道腳本中行不。

r  [address]r file

Read contents of file and append after thecontents of the pattern space. Exactly one space must be put betweenr and the filename.

w  [address1[,address2]]w file

Append contents of pattern space to file.This action occurs when the command is encountered rather than whenthe pattern space is output. Exactly one space must separate thew and the filename. A maximum of 10 differentfiles can be opened in a script. This command will create the file ifit does not exist; if the file exists, its contents will beoverwritten each time the script is executed. Multiple write commandsthat direct output to the same file append to the end of the file.

讀寫文件,沒什麼意思。

: :label

Label a line in the script for the transfer of control byb or t.label may contain up to seven characters.(The POSIX standard says that an implementation can allow longerlabels if it wishes to. GNU sed allows labels to be of any length.)

b  [address1[,address2]]b [label]

Transfer control unconditionally (branch) to:label elsewhere inscript. That is, the command following thelabel is the next command applied to thecurrent line. If no label is specified,control falls through to the end of the script, so no more commandsare applied to the current line.

t  [address1[,address2]]t [label]

Test if successful substitutions have been made on addressed lines,and if so, branch to line marked by :label.(See b and :.) If label is notspecified, control falls through to bottom of script.

顯然的,:label是用於輔助b和t命令的,改變了執行流程。t命令用於檢測替換是否成功,所以一般跟在s命令之後。上一行命令成功,則執行跳轉。

N  [address1[,address2]]N

Append next input line to contents of pattern space; the new line isseparated from the previous contents of the pattern space by a newline. (This command is designed to allow pattern matches across twolines. Using /n to match the embedded newline, you can matchpatterns across multiple lines.)

D  [address1[,address2]]D

Delete first part (up to embedded newline) of multiline pattern space createdby N command and resume editing with first command inscript. If this command empties the pattern space, then a new lineof input is read, as if the d command had been executed.

P  [address1[,address2]]P

Print first part (up to embedded newline) of multiline pattern spacecreated by N command. Same as pif N has not been applied to a line.

這些命令配合可以達到一些很好的效果。舉例:
/^$/{
N
/^/n$/D
}
多個空行改爲一個空行。
/UNIX$/{

        N

        //nSystem/{

        s// Operating &/

        P

        D

        }

}
用於把UNIX/nSystem這種形式的字串改爲UNIX Operating/nSystem。

g  [address1[,address2]]g

Copy (get) contents of hold space (see h orH command) into the pattern space, wiping outprevious contents.

G  [address1[,address2]]G

Append newline followed by contents of hold space (seeh or H command) to contents ofthe pattern space. If hold space is empty, a newline is stillappended to the pattern space.

h  [address1[,address2]]h

Copy pattern space into hold space, a special temporary buffer.Previous contents of hold space are wiped out.

H  [address1[,address2]]H

Append newline and contents of pattern space to contents of the holdspace. Even if hold space is empty, this command still appends thenewline first.

x  [address1[,address2]]x

Exchange contents of the pattern space with the contents of the holdspace.

這些命令主要使用了hold space,pattern space是一個處理當前內容的空間,hold space類似於一個倉庫。開始的時候hold space是空的,這裏的命令使用很考驗靈活性,用的好作用很大,用的不好就沒什麼作用了。

貼一段代碼欣賞一下:

#! /bin/sh

# phrase -- search for words across lines

# $1 = search string; remaining args = filenames

search=$1

shift

for file 

do

sed '

/'"$search"'/b

N

h

s/.*/n//

/'"$search"'/b

g

s/ */n/ /

/'"$search"'/{

g

b

}

g

D' $file 

done

這是一個bash腳本,不解釋了。猜猜看什麼作用,這個腳本還有可以完善的地方,比如一個phrase有三行甚至更多。

 

ok,總結就這麼多了,sed也就這麼多東西。有空把awk也記錄一下。

 

2011-01-08 16:47:10

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章