Shell 助力開發效率提升

https://www.tanglei.name/blog/linux-shell-makes-more-efficient.html

本文主要來源於小組內部的一個小的分享, 這裏整理成一篇文章po出來. 題目叫 “Shell 助力開發效率提升”, 更切題的應該是叫 “命令行” 提升開發效率, 這裏並沒有講到Shell編程, 而是主要介紹Linux 或者 Mac下常用的一些基本工具命令來幫助處理一些日常事務. (其實之前這篇文章Mac 軟件推薦(續)之程序猿篇也大致提到了本文中的大部分內容)

通過本文的介紹, 你應該對相關命令有一個初步的瞭解, 知道比如用什麼命令可以完成怎樣的操作,
至於具體的參數, 你不用去刻意地記, 等到你用到的時候, 你再去 cmd --help 或者 man cmd去看, 用熟悉了, 常用的你也就記住了.

本文首先介紹了Linux/Mac下一些常用的命令行工具, 然後介紹了一些常用的命令, 最後通過一兩個案例來說明這些工具的強大之處:
比如給定一個nginx日誌文件, 能夠找出HTTP 404 請求最多的top 10 是什麼? 比如能找到請求耗時最多的top 10是什麼? 再比如能夠簡單的得到每小時的”PV”是多少?
再比如拿到一篇文章, 能否簡單統計一下這篇文章單次詞頻最高的10個詞語是什麼?

Mac 環境

zsh
on-my-zsh
plugin
- git
- autojump
- osx(man-preview/quick-look/pfd(print Finder director)/cdf(cd Finder))
常用快捷鍵(bindkey)
演示: 高亮/git/智能補全/跳轉(j,d)…

關於Mac相關的技巧, 更多的可以參考以下三篇文章: Mac 軟件推薦(序), Mac 軟件推薦續(!程序猿篇), 和 Mac 軟件推薦(續)之程序猿篇

Shell 基礎命令

which/whereis, 常用 whatis, man, --help

➜  .oh-my-zsh git:(master)$ whereis ls
/bin/ls
➜  .oh-my-zsh git:(master)$ which ls
ls: aliased to ls -G

基本文件目錄操作

rm, mkdir, mv, cp, cd, ls, ln, file, stat, wc(-l/w/c), head, more, tail, cat...

利器管道: |

Shell 文本處理

這裏就是通過案例講了一下12個命令的大致用法和參數, 可以通過點擊右邊的目錄直達你想要了解的命令.

find, grep, xargs, cut, paste, comm
join, sort, uniq, tr, sed, awk

find

常用參數
- 文件名 -name, 文件類型-type, 查找最大深度-maxdepth
- 時間過濾(create/access/modify) -[cam]time
- 執行動作 -exec

示例

find ./ -name "*.json"
find . -maxdepth 7 -name "*.json" -type f
find . -name "*.log.gz" -ctime +7 -size +1M -delete (atime/ctime/mtime)
find . -name "*.scala" -atime -7 -exec du -h {} \;

grep

常用參數
- -v(invert-match),
- -c(count),
- -n(line-number),
- -i(ignore-case),
- -l, -L, -R(-r, –recursive), -e

示例

grep 'partner' ./*.scala -l
grep -e 'World' -e 'first' -i -R ./  (-e: or)

相關命令: grep -z / zgrep / zcat xx | grep

xargs

常用參數
- -n(每行列數),
- -I(變量替換)
- -d(分隔符), Mac 不支持, 注意與GNU版本的區別

示例

find . -type f -name "*.jpg" | xargs -n1 -I {} du -sh {}

cut

常用參數
- -b(字節)
- -c(字符)
- -f(第幾列), -d(分隔符), f範圍: n, n-, -m, n-m

示例

echo "helloworldhellp" | cut -c1-10
cut -d, -f2-8 csu.db.export.csv

paste

常用參數
- -d 分隔符
- -s 列轉行

示例

➜  Documents$ cat file1
1 11
2 22
3 33
4 44
➜  Documents$ cat file2
one     1
two     2
three   3
one1    4

➜  Documents$ paste -d, file1 file2
1 11,one     1
2 22,two     2
3 33,three   3
4 44,one1    4
➜  Documents$ paste -s -d: file1 file2
a 11:b bb:3 33:4 44
one     1:two     2:three   3:one1    4

join

類似sql中的 ...inner join ...on ..., -t 分隔符, 默認爲空格或tab

➜  Documents$ cat j1
1 11
2 22
3 33
4 44
5 55
➜  Documents$ cat j2
one     1   0
one     2   1
two     4   2
three   5   3
one1    5   4
➜  Documents$ join -1 1 -2 3 j1 j2
1 11 one 2
2 22 two 4
3 33 three 5
4 44 one1 5

comm

常用參數
- 用法 comm [-123i] file1 file2
- 字典序列, 3列: 只在file1/file2/both
- - 去掉某列, i 忽略大小寫

示例

➜  Documents$ seq 1 5 >file11
➜  Documents$ seq 2 6 >file22
➜  Documents$ cat file11
1
2
3
4
5
➜  Documents$ cat file22
2
3
4
5
6
➜  Documents$ comm file11 file22
1
        2
        3
        4
        5
    6
➜  Documents$ comm -1 file11 file22
    2
    3
    4
    5
6
➜  Documents$ comm -2 file11 file22
1
    2
    3
    4
    5
➜  Documents$ comm -23 file11 file22
1

相關命令 diff(類似git diff)

sort

常用參數
- -d, –dictionary-order
- -n, –numeric-sort
- -r, –reverse
- -b, –ignore-leading-blanks
- -k, –key

示例

➜  Documents$ cat file2
one     1
two     2
three   3
one1    4
➜  Documents$ sort file2
one     1
one1    4
three   3
two     2
➜  Documents$ sort -b -k2 -r file2
one1    4
three   3
two     2
one     1

uniq

常用參數
- -c 重複次數
- -d 重複的
- -u 沒重複的
- -f 忽略前幾列

示例

➜  Documents$ cat file4
11
22
33
11
11
➜  Documents$ sort file4 | uniq -c
   3 11
   1 22
   1 33
➜  Documents$ sort file4 | uniq -d
11
➜  Documents$ sort file4 | uniq -u
22
33
➜  Documents$ cat file3
one     1
two     1
three   3
one1    4
➜  Documents$ uniq -c -f 1 file3
   2 one     1
   1 three   3
   1 one1    4

注意: uniq比較相鄰的是否重複, 一般與sort聯用

tr

常用參數
- -c 補集
- -d 刪除
- -s 壓縮相鄰重複的

示例

➜  Documents$ echo '1111234444533hello' | tr  '[1-3]' '[a-c]'
aaaabc44445cchello
➜  Documents$ echo '1111234444533hello' | tr -d '[1-3]'
44445hello
➜  Documents$ echo '1111234444533hello' | tr -dc '[1-3]'
11112333
➜  Documents$ echo '1111234444533hello' | tr -s '[0-9]'
123453hello
➜  Documents$ echo 'helloworld' | tr '[:lower:]' '[:upper:]'
HELLOWORLD

sed

常用參數
- -d 刪除
- -s 替換, g 全局
- -e 多個命令疊加
- -i 修改原文件(Mac下加參數 “”, 備份)

示例

➜  Documents$ cat file2
one     1
two     2
three   3
one1    4
➜  Documents$ sed "2,3d" file2
one     1
one1    4
➜  Documents$ sed '/one/d' file2
two     2
three   3
➜  Documents$ sed 's/one/111/g' file2
111     1
two     2
three   3
1111    4
#將one替換成111 並將含有two的行刪除
➜  Documents$ sed -e 's/one/111/g' -e '/two/d' file2
111     1
three   3
1111    4
# ()標記(轉義), \1 引用
➜  Documents$ sed 's/\([0-9]\)/\1.html/g' file2
one     1.html
two     2.html
three   3.html
one1.html    4.html
# 與上面一樣 & 標記匹配的字符
➜  Documents$ sed 's/[0-9]/&.html/g' file2
one     1.html
two     2.html
three   3.html
one1.html    4.html
➜  Documents$ cat mobile.csv
"13090246026"
"18020278026"
"18520261021"
"13110221022"
➜  Documents$ sed 's/\([0-9]\{3\}\)[0-9]\{4\}/\1xxxx/g' mobile.csv
"130xxxx6026"
"180xxxx8026"
"185xxxx1021"
"131xxxx1022"

awk

基本參數和語法
- NR 行號, NF 列數量
- $1 第1列, $2, $3…
- -F fs fs分隔符，字符串或正則
- 語法: awk 'BEGIN{ commands } pattern{ commands } END{ commands }', 流程如下:
  1. 執行begin
  2. 對輸入每一行執行 pattern{ commands }, pattern 可以是正則/reg exp/, 關係運算等
  3. 處理完畢, 執行 end

示例

➜  Documents$ cat file5
11  11 aa cc
22  22 bb
33  33 d
11  11
11  11
#行號, 列數量, 第3列
➜  Documents$ awk '{print NR"("NF"):", $3}' file5
1(4): aa
2(3): bb
3(3): d
4(2):
5(2):
#字符串分割, 打印1，2列
➜  Documents$ awk -F"xxxx" '{print $1, $2}' mobile.csv
"130 6026"
"180 8026"
"185 1021"
"131 1022"
#添加表達式
➜  Documents$ awk '$1>=22 {print NR":", $3}' file5
2: bb
3: d
#累加1到36，奇數，偶數
➜  Documents$ seq 36 | awk 'BEGIN{sum=0; print "question:"} {print $1" +"; sum+=$1} END{print "="; print sum}' | xargs | sed 's/+ =/=/'
question: 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 + 31 + 32 + 33 + 34 + 35 + 36 = 666
➜  Documents$ seq 36 | awk 'BEGIN{sum=0; print "question:"} $1 % 2 ==1 {print $1" +"; sum+=$1} END{print "="; print sum}' | xargs | sed 's/+ =/=/'
question: 1 + 3 + 5 + 7 + 9 + 11 + 13 + 15 + 17 + 19 + 21 + 23 + 25 + 27 + 29 + 31 + 33 + 35 = 324
➜  Documents$ seq 36 | awk 'BEGIN{sum=0; print "question:"} $1 % 2 !=1 {print $1" +"; sum+=$1} END{print "="; print sum}' | xargs | sed 's/+ =/=/'
question: 2 + 4 + 6 + 8 + 10 + 12 + 14 + 16 + 18 + 20 + 22 + 24 + 26 + 28 + 30 + 32 + 34 + 36 = 342

其他高級語法: for, while 等, 各種函數等, 本身awk是一個強大的語言, 可以掌握一些基本的用法.

實際應用

日誌統計分析

例如拿到一個nginx日誌文件, 可以做很多事情, 比如看哪些請求是耗時最久的進而進行優化, 比如看每小時的”PV”數等等.

➜  Documents$ head -n5 std.nginx.log
106.38.187.225 - - [20/Feb/2017:03:31:01 +0800] www.tanglei.name "GET /baike/208344.html HTTP/1.0" 301 486 "-" "Mozilla/5.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322) 360JK yunjiankong 975382" "106.38.187.225, 106.38.187.225" - 0.000
106.38.187.225 - - [20/Feb/2017:03:31:02 +0800] www.tanglei.name "GET /baike/208344.html HTTP/1.0" 301 486 "-" "Mozilla/5.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322) 360JK yunjiankong 975382" "106.38.187.225, 106.38.187.225" - 0.000
10.130.64.143 - - [20/Feb/2017:03:31:02 +0800] stdbaike.bdp.cc "POST /baike/wp-cron.php?doing_wp_cron=1487532662.2058920860290527343750 HTTP/1.1" 200 182 "-" "WordPress/4.5.6; http://www.tanglei.name/baike" "10.130.64.143" 0.205 0.205
10.130.64.143 - - [20/Feb/2017:03:31:02 +0800] www.tanglei.name "GET /external/api/login-status HTTP/1.0" 200 478 "-" "-" "10.130.64.143" 0.003 0.004
10.130.64.143 - - [20/Feb/2017:03:31:02 +0800] www.tanglei.name "GET /content_util/authorcontents?count=5&offset=0&israndom=1&author=9 HTTP/1.0" 200 11972 "-" "-" "10.130.64.143" 0.013 0.013

上面是nginx的一個案例, 例如希望找到top 10 請求的path:

head -n 10000 std.nginx.log | awk '{print $8 "," $10}' | grep ',404' | sort | uniq -c | sort -nr -k1 | head -n 10
#or
head -n 10000 std.nginx.log | awk '$10==404 {print $8}' |sort | uniq -c | sort -nr -k1 | head -n 10

當然, 你可能一次不會直接處理成功, 一般會先少拿一部分數據進行處理看邏輯是否正常, 或者你可以緩存一些中間結果.

cat std.nginx.log | awk '{print $8 "," $10}' | grep ',404' >404.log
sort 404.log | uniq -c | sort -nr -k1 | head -n 10

再比如每小時請求數量, 請求耗時等等

➜  Documents$ head -n 100000 std.nginx.log | awk -F: '{print $1 $2}' | cut -f3 -d/ | uniq -c
8237 201703
15051 201704
16083 201705
18561 201706
22723 201707
19345 201708

其他實際案例 ip block

案例: db數據批處理

背景: 因爲某服務bug, 導致插入到db的圖片路徑不對, 需要將形如(安全需要已經將敏感數據替換)
https://www.tanglei.name/upload/photos/129630//internal-public/shangtongdai/2017-02-19-abcdefg-eb85-4c24-883e-hijklmn.jpg
替換成
http://www.tanglei.me/internal-public/shangtongdai/2017-02-19-abcdefg-eb85-4c24-883e-hijklmn.jpg, 因爲mysql等db貌似不支持直接正則的替換, 所以不能夠很方便的進行寫sql進行替換.
當然將數據導出, 然後寫python等腳本處理也是一種解決方案, 但如果用上面的命令行處理, 只需要幾十秒即可完成.

步驟:

準備數據

select id, photo_url_1, photo_url_2, photo_url_3 from somedb.sometable where 
photo_url_1 like 'https://www.tanglei.name/upload/photos/%//internal-public/%' or
photo_url_2 like 'https://www.tanglei.name/upload/photos/%//internal-public/%' or
photo_url_3 like 'https://www.tanglei.name/upload/photos/%//internal-public/%';

替換原文件
一般在用sed替換的時候, 先測試一下是否正常替換.

#測試是否OK
head -n 5 customers.csv | sed 's|https://www.tanglei.name/upload/photos/[0-9]\{1,\}/|http://www.tanglei.me|g'
# 直接替換原文件, 可以sed -i ".bak" 替換時保留原始備份文件
sed -i "" 's|https://www.tanglei.name/upload/photos/[0-9]\{1,\}/|http://www.tanglei.me|g' customers.csv

拼接sql, 然後執行

awk -F, '{print "update sometable set photo_url_1 = " $2, ", photo_url_2 = " $3, ", photo_url_3 = " $4, " where id = " $1 ";" }' customers.csv > customer.sql
#然後執行sql 即可

其他

play framework session

老方式: 需要啓play環境, 慢

sbt "project site" consoleQuick
import play.api.libs._
val sec = "secret...secret"
var uid = "97522"
Crypto.sign(s"uid=$uid", sec.getBytes("UTF-8")) + s"-uid=$uid"

新方式:

➜  Documents$  ~/stdcookie.sh 97522
918xxxxdf64abcfcxxxxc465xx7554dxxxx21e-uid=97522
➜  Documents$ cat ~/stdcookie.sh
#!/bin/bash ##  cannot remove this line
uid=$1
hash=`echo -n "uid=$uid" | openssl dgst -sha1 -hmac "secret...secret"`
echo "$hash-uid=$uid"

統計文章單詞頻率: 下面案例統計了川普就職演講原文中詞頻最高的10個詞.

➜  Documents$ head -n3 chuanpu.txt
Chief Justice Roberts, President Carter, President Clinton, President Bush, President Obama, fellow Americans and people of the world, thank you.

We, the citizens of America, are now joined in a great national effort to rebuild our country and restore its promise for all of our people. Together we will determine the course of America and the world for many, many years to come.
➜  Documents$ cat chuanpu.txt | tr -dc 'a-zA-Z ' | xargs -n 1 | sort | uniq -c | sort -nr -k1 | head -n 20
  65 the
  63 and
  48 of
  46 our
  42 will
  37 to
  21 We
  20 is
  18 we
  17 America
  15 a
  14 all
  13 in
  13 for
  13 be
  13 are
  10 your
  10 not
  10 And
  10 American

隨機數

➜  Documents$ cat /dev/urandom | LC_CTYPE=C tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 5
cpBnvC0niwTybSSJhUUiZwIz6ykJxBvu
VDP56NlHnugAt2yDySAB9HU2Nd0LlYCW
0WEDzpjPop32T5STvR6K6SfZMyT6KvAI
a9xBwBat7tJVaad279fOPdA9fEuDEqUd
hTLrOiTH5FNP2nU3uflsjPUXJmfleI5c
➜  Documents$ cat /dev/urandom | head -c32 | base64
WoCqUye9mSXI/WhHODHDjzLaSb09xrOtbrJagG7Kfqc=

圖片處理壓縮, 可批量改圖片大小等等 sips

➜  linux-shell-more-effiency$ sips -g all which-whereis.png
/Users/tanglei/Documents/linux-shell-more-effiency/which-whereis.png
  pixelWidth: 280
  pixelHeight: 81
  typeIdentifier: public.png
  format: png
  formatOptions: default
  dpiWidth: 72.000
  dpiHeight: 72.000
  samplesPerPixel: 4
  bitsPerSample: 8
  hasAlpha: yes
  space: RGB
  profile: DELL U2412M
➜  linux-shell-more-effiency$ sips -Z 250 which-whereis.png
/Users/tanglei/Documents/linux-shell-more-effiency/which-whereis.png
  /Users/tanglei/Documents/linux-shell-more-effiency/which-whereis.png
➜  linux-shell-more-effiency$ sips -g all which-whereis.png
/Users/tanglei/Documents/linux-shell-more-effiency/which-whereis.png
  pixelWidth: 250
  pixelHeight: 72
  typeIdentifier: public.png
  format: png
  formatOptions: default
  dpiWidth: 72.000
  dpiHeight: 72.000
  samplesPerPixel: 4
  bitsPerSample: 8
  hasAlpha: yes
  space: RGB
  profile: DELL U2412M
➜  linux-shell-more-effiency$ sips -z 100 30 which-whereis.png
/Users/tanglei/Documents/linux-shell-more-effiency/which-whereis.png
  /Users/tanglei/Documents/linux-shell-more-effiency/which-whereis.png
➜  linux-shell-more-effiency$ sips -g pixelWidth -g pixelHeight which-whereis.png
/Users/tanglei/Documents/linux-shell-more-effiency/which-whereis.png
  pixelWidth: 30
  pixelHeight: 100

Reference

歡迎用微信掃碼聯繫我

# 工具 # Shell # Mac

Shell 助力開發效率提升

Mac 環境

Shell 基礎命令

Shell 文本處理

find

grep

xargs

cut

paste

join

comm

sort

uniq

tr

sed

awk

實際應用

日誌統計分析

案例: db數據批處理

其他

Reference

SQL優化-20231016

你有被銀行套路過嗎？| 一文教你計算真實的年化利率

我的博客之路|換主題求投票

每個人都應該瞭解的金融小知識 -- 利率計算

關於這道面試題的小調查

我換工作了, 另外給大家推薦幾家公司

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結