用shell分析nginx日誌百度網頁蜘蛛列表頁來訪情況

原創

2018-09-11 07:23

#!/bin/bash
#desc: this scripts for baidunews-spider
#date:2014.02.25
#testd in CentOS 5.9 x86_64
#saved in /usr/local/bin/baidu-web.sh
#written by [email protected] www.zjyxh.com
dt=`date -d "yesterday" +%m%d`
if [ $1x != x ] ;then
  if [ -e $1 ] ;then
     grep -i "Baiduspider/2.0" $1 > baiduspider-${dt}.txt
     num=`cat baiduspider-${dt}.txt|wc -l`
     echo "baiduspider number is ${num},file is baidu-${dt}.txt"
     cat baiduspider-${dt}.txt|awk '{print $7}'|sort |uniq -c|sort -r >`ls ${1}|cut -c 1-10`-${dt}.txt
     echo "$1 was done"
    else
       echo "$1 not exsist!"
  fi
else
     echo "usage: $0 file_path"
fi

本次用shell分析百度網頁蜘蛛跟百度新聞蜘蛛一個方法，無非就是把關鍵詞由baiduspider-news換爲baiduspider/2.0。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

用shell分析nginx日誌百度網頁蜘蛛列表頁來訪情況

CORS error 但是 status code 是200 OK

壓縮上傳的GPU數據的方案

使用skopeo同步鏡像

在centos 5.6 64位配置puppet 服務端和客戶端(二)

在centos 5.6 64位用yum安裝puppet 2.6.9(一)

linux 負載報警腳本第一版

生產服務器LNMP全自動安裝腳本

巧用find命令查找目錄及子目錄中所有htm或者html

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結