PHP 日誌收集系統

最近業務中涉及到遠程服務器的日誌收集需求, 出於限制技術棧擴大的想法,使用PHP進行了實現.

實現過程中有些小小需要注意的點,記錄如下:

1. 主動獲取. 由於服務器較多, 如果使用Flume之類的架構, 需要在每臺服務器上安裝軟件, 這就產生了運維成本 . 所以我們使用 收集端主動獲取的方式. 不需要在生產者(服務端)安裝軟件.

2.SSH連接. 每臺服務器都配置了SSH連接權限,使用PHP的 ssh2擴展即可遠程連接並訪問服務器內容.

3.服務器日誌結構統一.  每臺服務器上的日誌文件都按同一目錄 規則放置,以簡化程序邏輯.

4.CLI運行. 收集是持續運行的程序,使用CLI模式,要注意,此時所使用的INI文件問題.

5.SSH連接異常.  有時,由於網絡問題,導致SSH連接或驗證失敗, 延時重試即可.

6.日誌截斷與壓縮. 通常,我們的運維會在每天的固定時間對日誌進行截斷和壓縮, 這就有了兩種類型的文件需要讀取:壓縮與未壓縮的日誌, 需要分別處理.

7.日誌中的時間戳. 以秒爲單位 的時間戳不足以區分請求, 我們增加$msec以毫秒計量, 同一毫秒內,同一IP來源,同一UA的可以認爲是一個請求.

8.讀取目錄.  使用readdir即可讀取SSH格式的遠程目錄, readdir("ssh2.sft://......"); 過濾掉不需要的文件後, 按文件創建時間排序,逐個處理.

9.讀取壓縮文件. 如果用file_get_contents會導致界面長期無響應, 我使用了fopen, fread 分步讀取. 一次讀取8K(再大也沒有用了).  每讀取一定次數後,輸出一個進度顯示.

10.壓縮文件緩存. 讀取成功後, 保存到緩存目錄 , 以便備份以及下次使用. 如果程序出錯或重新運行時, 先檢查緩存目錄, 如果有緩存文件,就不用從網絡上讀取了.

11.解壓縮. 使用gzdecode即可. 這會導致PHP內存需要暴增, 調整PHP.INI吧, 把內存限制擴大.

12.壓縮日誌處理完成記錄. 處理完成一個壓縮文件後, 在數據庫中記錄下來, 以後PHP程序運行後,就不用重複處理了.

13.未壓縮日誌處理. 未壓縮的日誌表明,此日誌仍在增長中. 不需要緩存. 使用數據庫記錄,當前文件指針(使用ftell,fseek). 記錄文件創建日期.

14.未壓縮日誌判斷. 當文件日期與記錄的日期不同時, 或文件小於記錄中的文件大小, 說明 此文件被更新了, 需要重置文件指針.

    否則可以直接定位(fseek),以繼續從上次處理的位置進行.

15.日誌行分解. 使用正則即可,根據空格及定界符進行區分. 也可使用logParser第三方類庫來處理. 爲節省內存開銷.可使用Iterator 協程模式, 逐行返回.

16.日誌判重. 事先讀取每個服務器的最後 日誌時間戳(毫秒)以及IP,UA. 

17.日誌保存. 我是使用了MYSQL來保存日誌. 每一行日誌執行一次MYSQL會極大浪費運行時間, 可以累積4000行再一次性插入.

18.錯誤處理. 除了SSH連接失敗外, 還會讀取半行日誌,導致分解失敗, 此時也拋出異常. 由主程序捕獲,並重新運行即可.

源程序如下:

<?php
/**
 * Created by IcePHP Framework.
 * User: 藍冰大俠
 * Date: 2018/4/11
 * Time: 15:09
 */

class MLogImport
{
    /**
     * 當前正在處理的站點名稱
     * @var string
     */
    private $feed;

    /**
     * 當前正在處理的站點主機地址
     * @var string
     */
    private $host;

    /**
     * 當前站點的登錄名/站點名稱
     * @var string
     */
    private $user;

    /**
     * 當前站點的登錄密碼
     * @var string
     */
    private $pass;

    /**
     * 當前站點的日誌所在目錄
     * @var string
     */
    private $logPath;

    //用於本地保存服務器日誌的目錄,僅備份壓縮後的日誌
    const CACHE_PATH = DIR_ROOT . 'run/serverLog/';

    /**
     * 當前站點的SSH連接
     * @var resource
     */
    private $sftp;

    /**
     * 當前正在處理的文件名
     * @var string
     */
    private $file;

    /**
     * 本站最後一條日誌
     * @var array
     */
    private $lastRow;

    private $agentPatterns;
    private $agentReplaces;

    public function __construct()
    {
        $maps = [
            '/AppleWebKit\/[\d\.]*/i' => 'AppleWebKit/...',
            '/Mobile\/[\d\w]*/i' => 'Mobile/...',
            '/Safari\/[\d\.]*/i' => 'Safari/...',
            '/CriOS\/[\d\.]*/i' => 'CriOS/...',
            '/GSA\/[\d\.]*/i' => 'GSA/...',
            '/Version\/[\d\.]*/i' => 'Version/...',
            '/Chrome\/[\d\.]*/i' => 'Chrome/...',
            '/Edge\/[\d\.]*/i' => 'Edge/...',
            '/Firefox\/[\d\.]*/i' => 'Firefox/...',
            '/SamsungBrowser\/[\d\.]*/i' => 'SamsungBrowser/...',
            '/build\/[\d\w\-\.]*/i' => 'build/...',
            '/Silk\/[\d\.]*/i' => 'Silk/...',
            '/Crosswalk\/[\d\.]*/i' => 'Crosswalk/...',
            '/Gecko\/[\d\.]*/i' => 'Gecko/...',
            '/NTENTBrowser\/[\.\d]*/i' => 'NTENTBrowser/...',
            '/Snapchat\/[\d\w\-\.]*/i' => 'Snapchat/...',
            '/Java\/[\d\.]*/i' => 'Java/...',
            '/UCBrowser\/[\d\.]*/i' => 'UCBrowser',

            '/\(Linux[^\)]*SAMSUNG[^\)]*\)/i' => 'SAMSUNG...',
            '/\([^\)]*IPAD[^\)]*\)/i' => 'IPAD...',
            '/\([^\)]*SM-[^\)]*\)/i' => 'SM...',

            '/LG\-[\w\d]*/i' => 'LG...',
            '/LGL\d[\w\d]*/i' => 'LGL...',
            '/itel it\d*/i' => 'itel...',
            '/XT\d*/i' => 'XT...',
            '/TECNO\-[\w\d]*/i' => 'TECNO...',
            '/RCT[\d\w]*/i' => 'RCT...',
            '/Micromax\s[\w\d]*/i' => 'Micromax...',
            '/LGMS[\d]*/i' => 'LGMS...',
            '/GT\-[\w\d]*/i' => 'GT...',
            '/HUAWEI\s[\-\w\d]*/i' => 'HUAWEI...',
            '/Lenovo\s[\-\w\d]*/i' => 'Lenovo...',
            '/SCH\-[\w\d]*/i' => 'SCH...',
            '/rv\:[\d\.]*/i' => 'rv:...',
            '/Lumia\s\d+/i' => 'Lumia...',
            '/Instagram\s[\d\.]*/i' => 'Instagram...',

            '/iPhone OS 5[_\d]*/i' => 'iOS 5...',
            '/iPhone OS 6[_\d]*/i' => 'iOS 6...',
            '/iPhone OS 7[_\d]*/i' => 'iOS 7...',
            '/iPhone OS 8[_\d]*/i' => 'iOS 8...',
            '/iPhone OS 9[_\d]*/i' => 'iOS 9...',
            '/iPhone OS 10[_\d]*/i' => 'iOS 10...',
            '/iPhone OS 11[_\d]*/i' => 'iOS 11...',
            '/iOS 5[_\d]*/i' => 'iOS 5...',
            '/iOS 6[_\d]*/i' => 'iOS 6...',
            '/iOS 7[_\d]*/i' => 'iOS 7...',
            '/iOS 8[_\d]*/i' => 'iOS 8...',
            '/iOS 9[_\d]*/i' => 'iOS 9...',
            '/iOS 10[_\d]*/i' => 'iOS 10...',
            '/iOS 11[_\d]*/i' => 'iOS 11...',
            '/Android 2[\.\d]*/i' => 'Android 2...',
            '/Android 3[\.\d]*/i' => 'Android 3...',
            '/Android 4[\.\d]*/i' => 'Android 4...',
            '/Android 5[\.\d]*/i' => 'Android 5...',
            '/Android 6[\.\d]*/i' => 'Android 6...',
            '/Android 7[\.\d]*/i' => 'Android 7...',
            '/Android 8[\.\d]*/i' => 'Android 8...',

            '/QuantcastSDK[^\s]*(\s\(\d+\))?/i' => 'QuantcastSDK...',
        ];
        $this->agentPatterns = array_keys($maps);
        $this->agentReplaces = array_values($maps);
    }

    /**
     * 記錄一個站點的賬號,密碼,日誌路徑
     * @param string $host 主機/賬號
     * @param string $user 站點名稱/登錄名
     * @param string $pass 登錄密碼
     * @param string $logPath 日誌文件路徑
     */
    public function site(string $host, string $user, string $pass, string $logPath): void
    {
        $this->feed = $user;
        $this->host = $host;
        $this->user = $user;
        $this->pass = $pass;
        $this->logPath = $logPath;

        //重新連接的間隔時間
        $interval = 1;
        $connect = null;
        while (true) {
            //連接主機
            $connect = ssh2_connect($this->host, '22');

            //賬號密碼驗證成功
            if (false !== ssh2_auth_password($connect, $this->user, $this->pass)) {
                break;
            }

            //間隔時間2秒,4,8,...
            $interval *= 2;
            echo "auth wrong at $this->host, retry after $interval seconds\r\n";

            //間隔指定 時間後,重新連接
            sleep($interval);
        }

        //登錄成功
        echo "\r\nlogin $this->feed\r\n";

        //讀取文件列表
        $this->sftp = ssh2_sftp($connect);
        if(!$this->sftp){
            throw new Exception('ssh2_sftp fail.');
        }
        $handle = opendir("ssh2.sftp://{$this->sftp}{$this->logPath}");   //ssh2.sftp://Resource #33/home/.....
        if (!$handle) {
            throw new Exception('open dir ssh2.sftp fail.');
        }

        $zippedFiles = [];
        $unzippedFile = '';
        while (false !== ($file = readdir($handle))) {
            $filePath = "ssh2.sftp://{$this->sftp}{$this->logPath}/$file";

            //必須是文件,目錄的不要
            if (!is_file($filePath)) continue;

            //必須是訪問日誌
            if (left($file, 10) !== 'access.log') continue;

            //如果是壓縮文件
            if (substr($file, -3) === '.gz') {
                //4.5之前的不處理(這天改格式了)
                if (substr($file, 11, 8) < '20180405') continue;
                $zippedFiles[] = $file;
            } else {
                $unzippedFile = $file;
            }
        }
        closedir($handle);

        //本站最後請求時間
        $this->lastRow = table('log')->row('*', ['feedName' => $this->feed], 'id desc')->toArray();

        //按創建時間正序排序
        asort($zippedFiles);

        //逐個文件處理壓縮文件
        foreach ($zippedFiles as $file) {
            $this->file = $file;
            $this->zipped();
        }

        //如果有非壓縮日誌,處理
        if ($unzippedFile) {
            $this->file = $unzippedFile;
            $this->unzipped();
        }
    }


    /**
     * 讀取遠程 文件內容
     * @param $indicator string 遠程 文件指示器
     * @param $size int 文件大小
     * @return Iterator 遍歷器
     */
    private function readUnzipped(string $indicator, int $size): Iterator
    {
        echo "Begin read File:$this->file:" . STool::kmgt($size) . "\r\n";

        //打開文件,指向上次讀取的位置
        $f = fopen($indicator, 'r');
        if (!$f) {
            return;
        }
        if ($this->offset) {
            fseek($f, $this->offset);
            echo "Seek to $this->offset\r\n";
        }

        //總行數
        $lines = 0;

        //逐行讀取
        while (!feof($f)) {
            $lines++;
            $line = fgets($f);

            //更新偏移量
            $this->offset = ftell($f);

            //返回行數
            yield $line;

            //每200行輸出一個顯示
            if ($lines % 500 == 0) {
                echo "read $this->feed $this->file Lines:$lines\r\n";
            }
        }
        fclose($f);
        echo "read $this->feed $this->file Lines:$lines\r\n";
        echo "End.\r\n";
    }


    /**
     * 讀取遠程 文件內容
     * @return string 緩存文件路徑
     */
    private function readZipped(): string
    {
        //構造遠程文件地址
        $indicator = "ssh2.sftp://$this->sftp$this->logPath/$this->file";

        //文件大小
        $fileSize = filesize($indicator);
        $size = STool::kmgt($fileSize);

        //如果有緩存文件且緩存文件大小一致,則使用緩存文件
        $cacheFile = self::CACHE_PATH . $this->feed . '/' . $this->file;
        if (is_file($cacheFile) and filesize($cacheFile) == $fileSize) {
            echo "Read Zipped File From Cache:" . $this->file . ' ' . $size . "\r\n";
            return $cacheFile;
        }

        //從服務器讀文件
        echo "Begin read File:{$this->file}:" . $size . "\r\n";
        $fileHandle = fopen($indicator, 'rb');
        if (!$fileHandle) {
            dump($indicator, 'OPEN FAIL');
            exit;
        }

        //讀取遠程文件內容
        $content = '';
        $i = 0;
        while (!feof($fileHandle)) {
            //每次能讀回8K字節
            $content .= fread($fileHandle, 65536);

            //每128K顯示一次讀取進度
            $i++;
            if ($i % 16 == 0) {
                echo "$this->feed $this->file Reading :" . STool::kmgt(strlen($content)) . "/$size\r\n";
            }
        }
        fclose($fileHandle);

        //保存到緩存文件中
        echo "Save to cache:" . $cacheFile . " \r\n";
        makeDir(dirname($cacheFile));
        file_put_contents($cacheFile, $content);

        //返回壓縮文件內容
        return $cacheFile;
    }

    /**
     * 字符串分行
     * @param string $content
     * @return Iterator
     */
    public function explode(string $content): Iterator
    {
        $size = strlen($content);
        $pointer = 0;
        while ($pointer < $size) {
            $next = strpos($content, "\n", $pointer);
            if ($next === false) {
                $line = substr($content, $pointer);
                $next = $size;
            } else {
                $line = substr($content, $pointer, $next - $pointer);
            }
            yield $line;
            $pointer = $next + 1;
        }
    }

    private function valid(string $url): bool
    {
        return false !== strpos($url, '/?s=') or false !== strpos($url, '/?ss=') or preg_match('/^\/.*\/.*\/$/i', $url);
    }

    /**
     * 處理一個壓縮日誌文件
     */
    private function zipped(): void
    {
        //檢查文件已經處理過
        $fileTable = table('zipped');
        if ($fileTable->exist(['feedName' => $this->feed, 'fileName' => $this->file])) return;

        //讀取文件內容
        $gz=gzopen($this->readZipped(),'r');

        echo "\r\nBegin Process File\r\n";
        //$memTable = $this->createTemporaryTable(uniqid('tmp_'));

        //要插入的日誌表
        $logTable = table('log');

        //要插入的行緩衝區
        $rows = [];
        $insertRowsCount = 0;

        $content = null;
        $key=0;
        while(!gzeof($gz)) {
            $line=gzgets($gz);
            if ((++$key) % 30000 == 0) {
                echo "Analysis LINES:$key\r\n";
            }

            //空行不處理
            $line = trim($line);
            if (!$line) continue;

            //行分解
            $parts = $this->explodeLine($line);
            if (!$parts) continue;

            //判斷 是否是 搜索 行
            if (!$this->valid($parts['url'])) continue;

            //檢查是否已經處理過
            if ($this->lastRow) {
                if ($parts['timestamp'] < $this->lastRow['timestamp']) continue;
                if ($parts['timestamp'] == $this->lastRow['timestamp'] and $parts['url'] == $this->lastRow['url'] and $parts['ip'] == $this->lastRow['ip']) {
                    continue;
                }
            }

            //加入緩衝 區
            $parts['feedName'] = $this->feed;
            $rows[] = $parts;

            //每4000行執行一次插入,再多就會出現placeholder太多
            if (count($rows) >= 4000) {
                $logTable->inserts($rows);
                $insertRowsCount += count($rows);
                SDebug::clearMsgs();
                $rows = [];
            }
        }

        //處理最後剩餘的行
        if (count($rows)) {
            $logTable->inserts($rows);
            $insertRowsCount += count($rows);
            SDebug::clearMsgs();
        }

        echo "insert LINES:$insertRowsCount\r\n";

        //標記此文件已經處理過
        //$fileTable->begin();
        //$this->move($memTable);
        $fileTable->insert(['feedName' => $this->feed, 'fileName' => $this->file]);
        //$fileTable->commit();
    }

    /**
     * 將臨時表中的日誌轉移到正式表中
     * @param STable $memTable 臨時表對象
     */
    private function move(STable $memTable)
    {
        $fields = ['feedName', 'accessTime', 'timestamp', 'ip', 'requestTime', 'responseTime', 'method', 'url', 'code', 'length', 'referrer', 'agentId', 'created', 'updated', 'forward'];;
        $fieldsStr = implode(',', $fields);
        $memTable->execute("Insert" . " Into log($fieldsStr) select $fieldsStr from " . $memTable->name());
        $memTable->deleteAll();
    }

    /**
     * 當前文件的偏移
     * @var int
     */
    private $offset;

    /**
     * 處理一個未壓縮的日誌文件
     */
    private function unzipped(): void
    {
        //檢查上次處理情況
        $fileTable = table('unzipped');

        //如果沒有記錄,則生成一條初始記錄
        if ($fileTable->notExist(['feedName' => $this->feed])) {
            $fileTable->insert(['feedName' => $this->feed, 'offset' => 0, 'size' => 0, 'timestamp' => 0]);
        }

        //取出處理信息,其中包含 offset(上次文件指針位置),size(上次文件大小), lasttime(上次最後時間)
        $info = $fileTable->row('*', ['feedName' => $this->feed]);

        //構造遠程文件地址
        $indicator = "ssh2.sftp://$this->sftp$this->logPath/$this->file";

        //文件大小
        $fileSize = filesize($indicator);

        //文件變小了, 說明是新文件
        if ($fileSize < $info['size']) {
            $this->offset = 0;
        } else {
            // 取首行
            $f = fopen($indicator, 'r');
            $firstLine = fgets($f);
            fclose($f);
            $first = $this->explodeLine($firstLine);
            $timestamp = $first['timestamp'];
            if ($timestamp > $info['timestamp']) {
                $this->offset = 0;
            } else {
                $this->offset = $info['offset'];
            }
        }

        echo "\r\nBegin Process File\r\n";

        //要插入的日誌表
        $logTable = table('log');

        //要插入的行緩衝區
        $rows = [];
        $insertedRowsCount = 0;

        $iterator = $this->readUnzipped($indicator, $fileSize);
        $lastTime = 0;
        foreach ($iterator as $key => $line) {
            //空行不處理
            $line = trim($line);
            if (!$line) continue;

            //分解 日誌行
            $parts = $this->explodeLine($line);
            if (!$parts) continue;

            //判斷 是否是 搜索 行
            if (!$this->valid($parts['url'])) continue;

            //判斷是否已經導入
            if ($this->lastRow and (floatval($parts['timestamp']) < floatval($this->lastRow['timestamp']))) continue;
            $rows[] = array_merge($parts, [
                'feedName' => $this->feed
            ]);

            //最大的時間戳
            $lastTime = $parts['timestamp'];

            //批量插入
            if (count($rows) >= 100) {
                $insertedRowsCount += count($rows);
                $logTable->inserts($rows);
                $fileTable->update(['size' => $fileSize, 'offset' => $this->offset, 'timestamp' => $lastTime], ['feedName' => $this->feed]);
                echo "Insert LINES:$insertedRowsCount\r\n";
                SDebug::clearMsgs();
                $rows = [];
            }
        }

        //處理最後剩餘的行
        if (count($rows)) {
            $insertedRowsCount += count($rows);
            $logTable->inserts($rows);
            $fileTable->update(['size' => $fileSize, 'offset' => $this->offset, 'timestamp' => $lastTime], ['feedName' => $this->feed]);
            echo "Insert LINES:$insertedRowsCount\r\n";
            SDebug::clearMsgs();
        }
    }

    /**
     * 分解一行日誌
     * @param $line string
     * @return array
     * @throws Exception 匹配失敗
     */
    private function explodeLine(string $line): array
    {
        //[08/Apr/2018:03:30:17 +0800] 1523129417.075 72.178.128.43 - 0.114 - "GET /index.php/blog/search/?s=lowering%20ldl%20cholesterol&subid=tgr_zhen_BFX0J1ILON6N__rmlwuf_73004751 HTTP/1.1" 499 0 "http://168634854.keywordblocks.com/Cholesterol_Hdl_Ldl_Ratio.cfm?&vsid=1661264105777118&vi=1523124812717930856&dytm=1523124813100&kbbq=%26sde%3D1%26adepth%3D1%26ddepth%3D3&tdAdd[]=%7C%40%7Csde%3D1%7C%40%7Cadepth%3D1%7C%40%7Cddepth%3D3&sbdrId=135&vgd_matchstr=CommercialUrlOn%7Chlid%3D2002&matchstring=CommercialUrlOn%7Chlid%3D2002&vgd_bdata=ss%3D320x568%7C%7CMM%3D1.0%7C%7Cbb%3D145%7C%7CMP%3D.*%2Fcholesterol-management%2F.*%7C%7Cfbb%3D0%7C%7CRB%3D34.18110604079318%7C%7Cbtd%3D2341877441767294977%7C%7Ccbid%3D34.18110604079318%7C%7CMB%3D15.0%7C%7CMC%3DAUTO%7C%7Curl_l%3D50%7C%7Chour_group_l%3D20%7C%7CRImp%3D9.0%7C%7Cbid%3D15.1%7C%7Cdevice_l%3D20%7C%7CisRef%3D0&verid=111299&acid=427573913652889251523124810846&hvsid=00001523124812870012196577713996&upk=1523124813.1380&sttm=1523124812870&=&kp=1&kbc=143697&bdrid=4&subBdr=135&kt=266&ki=5912010&ktd=274911461948&kbc2=rpc%3D0.14&fdkt=266&lkpgd=UUID%3Duuid_s8_3_1523124813_778621763%7C%7CSI%3D863%7C%7CMPTD%3D232%7C%7CPTD%3D6922032661652308480%7C%7CSID%3D14%7C%7CCI%3D863%7C%7CMN%3D8%7C%7Cerpm%3D-1.0%7C%7CMI%3D863%7C%7CKTGD%3D3866%7C%7CKSE%3D1523124813242%7C%7CAN%3D5%7C%7CHID%3D3%7C%7CPTD2%3D16896&&lktgd=3866&&fp=biwFab2EOSptF9Dp9P5pLIuIHpVTe2ha94T5u6HCtebISTPUlc1la6_ujtvHa-nb8nGHPkJ_EnIwZF7mo3KnR2p3XYd1wmF70O9szYDQ9ufyP0OyS-gxVg%3D%3D&c=O5LJq2Lix-2w0IdspaXDCw&cme=rs5xevxSmJb0u22ZZHKqUTjYupvdJAHcw4kmb0sBhK6UBgyb-EKIO8Yg8DI2Uv0ZcpIG4AQvPb75jBLoeAG5VMn2cBgcO0Er9uHnU2G2b5527aplb-EHrVG_De8s_c_9-9bkhpH6jUmk3eK5uGthWBagtuatdg2SBe72cEUSh9aPY9sVJrkoOPaGQsQOH5rqAz1TMLK3_fisF-ozH6JyNg%3D%3D%7C%7CNDHRnZ9Gz3KXlI-i9OnZqQ%3D%3D%7C5gDUJdTGiJzedmq9hanWYg%3D%3D%7CtrJ5NInYpv_AyRdJRHyQbAoA6iGqXTxu%7CRrUTbnOe6Nf9cTuAtIJVy9no3H-wuOVy%7CN7fu2vKt8_s%3D%7Cl44MelaykDW0jQJG6bjukdQlinX0DB9oV4Sm9gijr_bD43Zl1UaHw39JatxHgP46euFaB3PMdSqZJqb8JKnexHrlF_K3RJ5R%7CJf0d-WoAdPuDA6UD6Gc_F1zJX7Ucny7osFvXic8Z4MU%3D%7Cue9AR4Lxeuwq7AuXzY3UTfqQIZ7T1ETAepQ5ZjhMUrn8F4iL72pDJxv9w1vxSK2jeiEactQl6VTIdrnkiwcfmH0laLhDYgMhmFyUaT5z0ZmFu4kbMwh587f73k-Z2prl2NRyNqvZoZL_mL9UwcCaoUiGM916VV0SyiuEizF5kMH-PgMGZNtaVAulY1i6cP1h%7C&ib=0&cid=8CU12LGKP&crid=285618735&size=300x250&lpid=&tsid=1&ksu=233&chid=&https=0&kwdsMaxTm=400&ugd=3&maxProviderPixel1=%2F%2Fc.ad-srv.co%2Fpixel&maxProviderPixel2=%2F%2Fc.adyield.co%2Fpixel&rms=1523124813&&sc=TX&asn=11427&kals=base&kalog=SI%3D863%7C%7CTPTD%3D516%7C%7CCI%3D863%7C%7CUUID%3Duuid_s12_nc1b_4_1523124812_210054693%7C%7CSID%3D11%7C%7CHID%3D4%7C%7CMI%3D863%7C%7CMPTD%3D176&kasts=tstype%3DBASE_BAG%7C%7C&kata=8ce5&clsKb=2&ecref=w77E%3ASS7mE8NQ.BJGYO.NmYS7mE8NSuSTOj%2BTJeJjQ%2BImLY1j%2BD1zyJS%3Fx7YMN1YE18yzvJY4RuuT%26x7YM7JLYvBw17n8QnzmLY1jnjOjnNwmjJQ7JLmj%26x7YMQmxLNJv%26x7YMYJO8xYvG%26x7YMNmz7Jz7vfHFWiXW9FiuH%26yM7yv%26yM78vUBOofiiAXAf9uWuH%26yMOJvY%26yMOYv%26yM1Evu7f%26yMzBvy%26yMN8vu9HHHfuWWX%26yM18vX9hiuHFXXFA%26yMjEvi9fhX9A%26yMj8v%26UvBw17n8QnzmLY1jnjOjnNwmjJQ7JLmj%26yNj8Ov%3Dd9C%3DgdBfCqpRD%3DfKDVQK6rMLAkw9YDzi%20YAVJOGawyN077a0Ezk7z8903%20AuogkUFBTjI%20%3DK8jODdV1K8Qg4KTBMBNR&kct=20512&abpl=2" "Mozilla/5.0 (iPhone; CPU iPhone OS 11_2_6 like Mac OS X) AppleWebKit/604.5.6 (KHTML, like Gecko) Version/11.0 Mobile/15D100 Safari/604.1" "-"

        $ns = '([^\s]*)';
        $str = '"([^"]*)"';
        $datetime = '\[([^\]]*)\]';

        //正則匹配
        $matched = preg_match("/$datetime $ns $ns \- $ns $ns $str $ns $ns $str $str $str/i", $line, $matches);
        if (!$matched) {
            throw new Exception('NOT MATCH');
        }

        //空格區別 MODE URL HTTP協議
        list($mode, $url, $protocol) = explode(' ', $matches[6]);

        return [
            'accessTime' => datetime(strtotime($matches[1])), //訪問時間(秒)
            'timestamp' => floatval($matches[2]),//訪問時間戳(帶毫秒)
            'ip' => $matches[3], //請求者IP
            'requestTime' => floatval($matches[4]), //Nginx處理請求的時間
            'responseTime' => floatval($matches[5]), //Nginx完成整個響應的時間
            'method' => $mode, //GET/POST/...
            'url' => $url, //請求地址
            'code' => $matches[7], //響應代碼
            'length' => intval($matches[8]), //響應正文長度
            'referrer' => left($matches[9], 250), //引用
            'agentId' => $this->getAgentId($matches[10]), //用戶代理
            'forward' => $matches[11] //真實IP
        ];
    }

    //獲取Agent與ID的對應關係
    private function getAgentMap()
    {
        $rows = table('agent')->select('id,agent', null, 'agent')->toArray();
        return array_column($rows, 'id', 'agent');
    }

    //根據一個Agent,獲取對應ID,如果沒有則創建一個對應關係
    private function getAgentId($agent)
    {
        //如果UA爲空
        if (!$agent) {
            return 0;
        }

        //靜態內存緩存
        static $maps;
        if (!$maps) {
            $maps = $this->getAgentMap();
        }


        //縮減[FBAN/FBIOS;...]
        $sub = mid($agent, '[', ']');
        if ($sub) {
            $agent = str_replace('[' . $sub . ']', '[...]', $agent);
        }

        $agent = str_replace(' (KHTML, like Gecko)', '', $agent);

        //變種歸併
        $agent = preg_replace($this->agentPatterns, $this->agentReplaces, $agent);

        //Agent縮減到250個字符
        $agent = left($agent, 191);
        if (!isset($maps[$agent])) {
            $id = table('agent')->insertIgnore(['agent' => $agent]);
            $maps[$agent] = $id;
        }

        return $maps[$agent];
    }

}

————————————————
版權聲明:本文爲CSDN博主「藍冰大俠」的原創文章,遵循CC 4.0 BY-SA版權協議,轉載請附上原文出處鏈接及本聲明。
原文鏈接:https://blog.csdn.net/bluehire/article/details/79985203

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章