ip 獨立ip 特定時間內統計到的不通ip
squid統計方法
[root@localhost etc]# cut -d " " -f1 /usr/local/squid/var/logs/access.log|sort|uniq|wc -l
2823
當用到awk時
awk '{print $1}'|sort|uniq test_8.5.2016
10.185.130.78 - - [04/Aug/2016:23:59:29 +0800] "GET http://swcdn.apple.com/content/downloads/52/05/041-9986/25o8fo3gq5m75ylgi9h1vgf4b0r5fovzhr/041-9986.zh_TW.dist HTTP/1.1" 200 53879 TCP_MISS:HIER_DIREC
以上應是cut與awk 的不同之處
pv page view,頁面流浪次數
在squid中每一條log就是一個頁面的pv,但這個統計是沒有意義的嚒(統計那麼多的網站的總共pv,有啥用,倒是可以用grep抓取同一域名網站的進行統計)
籠統的統計所有:
[root@localhost etc]# wc -l /usr/local/squid/var/logs/access.log|awk '{print $1}'
3874236
根據不通域名統計
(通過for循環實現)
[root@localhost etc]# for i in `cut -d " " -f7 test_8.5.2016 |cut -d "/" -f3|sort|uniq`;do echo "${i} is visited (times)"; grep $i test_8.5.2016 -c ;done
cn180156 is visited (times)
1
dungcoivb.googlepages.com is visited (times)
1
lh-hn-505 is visited (times)
8
swcdn.apple.com is visited (times)
4
www.whatismyip.com is visited (times)
1
下面來到鄭州服務器實踐
!/bin/bash
#Version 1.0
#Author Scott
#Mail [email protected]
#Introduction This is for count website that was visited a few hours ago in special log
cut -d " " -f7 /bash/script/log|sort|uniq \
>/bash/script/website.log
for i in `grep -v ^htt /bash/script/website.log`
do
echo -ne "$i is visited\n"
grep -c $i /bash/script/log
done
for j in `cut -d "/" -f3 /bash/script/website.log|uniq`
do
echo "$j is visited"
grep $j /bash/script/log -c
done
以上程序跑了1.5個小時還沒出結果,log共有300W個
改進程序
#!/bin/bash
#Version 1.1
#Author Scott
#Mail [email protected]
#Introduction This is for count website that was visited a few hours ago in special log
cut -d " " -f7 /bash/scripts/log|sort\
>/bash/scripts/website.log
grep -v ^htt /bash/scripts/website.log|uniq -c >loged.log &&\
cut -d "/" -f3 /bash/scripts/website.log|uniq -c >loged.log
再改進
[root@hadphost scripts]# wc -l log
2805719 log
[root@hadphost scripts]# cat pv_test.sh
#!/bin/bash
#Version 1.2
#Author Scott
#Mail [email protected]
#Introduction This is for count website that was visited a few hours ago in special log
cut -d " " -f7 /bash/scripts/log|sort\
>/bash/scripts/website.log
grep -v ^htt /bash/scripts/website.log|uniq -c >loged.log &&\
cut -d "/" -f3 /bash/scripts/website.log|uniq -c >>loged.log &&\
sort -rbn loged.log >queue.log
[root@hadphost scripts]# wc -l queue.log
18408 queue.log
[root@hadphost scripts]# fg
time sh pv_test.sh
real 1m40.616s
user 1m37.560s
sys 0m1.807s
uv 統計不通客服端個數
squid的log信息佔時沒有相關數據,故不做研究了