python找尋合適的日誌庫logging Handler——Handler自定義實現

最近在用python tornado開發一個app的服務端。投產的系統肯定需要包含日誌功能,這裏就自然想到了用python自帶的logging庫。
 
logging中日誌內容的輸出都交由Handler來實現,但是logging中的自帶的Handler都不能滿足我們的需求。
我們希望能按時間段分割日誌,如果使用FileHandler日誌只能保存在一個文件,到後期日誌文件會非常大,讀寫都成問題;而TimedRotatingFileHandler雖然可分割日誌但是多進程時可能會造成日誌文件被相互覆蓋,導致日誌丟失。
如此我便開始踏上找尋合適Handler的路上。
 
首先嚐試使用FileHandler,然後寫個腳本(比如用supervisord)定時切分日誌的方式。但這裏有一個問題是日誌文件會被正在運行FileHandler保持,無法重命名,日誌還是會一直寫到同一個文件,嘗試失敗。
 
然後嘗試使用繼承logging自帶的切分文件處理TimedRotatingFileHandler再重寫處理切分日誌的方法。
這裏使用了一個網友所寫的類
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright 2012 Ethan Zhang<http://github.com/Ethan-Zhang> 
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.


import time
import os
from logging.handlers import TimedRotatingFileHandler

class MultiProcessTimedRotatingFileHandler(TimedRotatingFileHandler):

    def doRollover(self):
        """ 
        do a rollover; in this case, a date/time stamp is appended to the filename
        when the rollover happens.  However, you want the file to be named for the
        start of the interval, not the current time.  If there is a backup count,
        then we have to get a list of matching filenames, sort them and remove
        the one with the oldest suffix.
        """
        if self.stream:
            self.stream.close()
        # get the time that this sequence started at and make it a TimeTuple
        t = self.rolloverAt - self.interval
        if self.utc:
            timeTuple = time.gmtime(t)
        else:
            timeTuple = time.localtime(t)
        dfn = self.baseFilename + "." + time.strftime(self.suffix, timeTuple)
        #if os.path.exists(dfn):
        #    os.remove(dfn)
        if not os.path.exists(dfn):
            os.rename(self.baseFilename, dfn)
        if self.backupCount > 0:
            # find the oldest log file and delete it
            #s = glob.glob(self.baseFilename + ".20*")
            #if len(s) > self.backupCount:
            #    s.sort()
            #    os.remove(s[0])
            for s in self.getFilesToDelete():
                os.remove(s)
        #print "%s -> %s" % (self.baseFilename, dfn)
        self.mode = 'a'
        self.stream = self._open()
        currentTime = int(time.time())
        newRolloverAt = self.computeRollover(currentTime)
        while newRolloverAt <= currentTime:
            newRolloverAt = newRolloverAt + self.interval
        #If DST changes and midnight or weekly rollover, adjust for this.
        if (self.when == 'MIDNIGHT' or self.when.startswith('W')) and not self.utc:
            dstNow = time.localtime(currentTime)[-1]
            dstAtRollover = time.localtime(newRolloverAt)[-1]
            if dstNow != dstAtRollover:
                if not dstNow:  # DST kicks in before next rollover, so we need to deduct an hour
                    newRolloverAt = newRolloverAt - 3600
                else:           # DST bows out before next rollover, so we need to add an hour
                    newRolloverAt = newRolloverAt + 3600
        self.rolloverAt = newRolloverAt
View Code
最後經過驗證還是會出現日誌文件被覆蓋的情況,這個方案又被自己給否定。
 
“文件式的Handler不能滿足需求,要就嘗試則用數據庫存日誌吧。” 經過以上的失敗後這個想法在浮現在腦海。
要實現這個方式需要兩個步驟;第一步,自定義一個Handler;第二步,選擇一中數據庫存放日誌。
 
如何自定義一個Handler呢?這個可用用順藤摸瓜的方式找到方案。
首先,到python logging庫中找到其自帶的handler;對比找到一個自定義Handler最基本的約束,也就是找到其中最簡單Handler的實現。
在handlers.py 文件下找到了最簡單的SMTPHandler
 
其中getSubject只是給emit內部調用,所以很容易就得出實現自定義Handler的方式爲:繼承logging.Handler 然後再實現emit方法(寫入日誌時會調用)。
 
第一步已經實現,第二步就是選數據庫了。
首先想選sqlite,查了一下其性能不太好,後面就嘗試選mongodb。
在要自力更生的寫一個之前,手賤的到網上一搜果然已經有人寫好的mongodb handler,唉...
import logging

from bson.timestamp import Timestamp
from pymongo import Connection
from pymongo.collection import Collection
from pymongo.errors import OperationFailure, PyMongoError


"""
Example format of generated bson document:
{
    'thread': -1216977216,
    'threadName': 'MainThread',
    'level': 'ERROR',
    'timestamp': Timestamp(1290895671, 63),
    'message': 'test message',
    'module': 'test_module',
    'fileName': '/var/projects/python/log4mongo-python/tests/test_handlers.py',
    'lineNumber': 38,
    'method': 'test_emit_exception',
    'loggerName':  'testLogger',
    'exception': {
        'stackTrace': 'Traceback (most recent call last):
                       File "/var/projects/python/log4mongo-python/tests/test_handlers.py", line 36, in test_emit_exception
                       raise Exception(\'exc1\')
                       Exception: exc1',
        'message': 'exc1',
        'code': 0
    }
}
"""


class MongoFormatter(logging.Formatter):

    DEFAULT_PROPERTIES = logging.LogRecord('', '', '', '', '', '', '', '').__dict__.keys()

    def format(self, record):
        """Formats LogRecord into python dictionary."""
        # Standard document
        document = {
            'timestamp': Timestamp(int(record.created), int(record.msecs)),
            'level': record.levelname,
            'thread': record.thread,
            'threadName': record.threadName,
            'message': record.getMessage(),
            'loggerName': record.name,
            'fileName': record.pathname,
            'module': record.module,
            'method': record.funcName,
            'lineNumber': record.lineno
        }
        # Standard document decorated with exception info
        if record.exc_info is not None:
            document.update({
                'exception': {
                    'message': str(record.exc_info[1]),
                    'code': 0,
                    'stackTrace': self.formatException(record.exc_info)
                }
            })
        # Standard document decorated with extra contextual information
        if len(self.DEFAULT_PROPERTIES) != len(record.__dict__):
            contextual_extra = set(record.__dict__).difference(set(self.DEFAULT_PROPERTIES))
            if contextual_extra:
                for key in contextual_extra:
                    document[key] = record.__dict__[key]
        return document


class MongoHandler(logging.Handler):

    def __init__(self, level=logging.NOTSET, host='localhost', port=27017, database_name='logs', collection='logs',
                 username=None, password=None, fail_silently=False, formatter=None, capped=False,
                 capped_max=1000, capped_size=1000000, **options):
        """Setting up mongo handler, initializing mongo database connection via pymongo."""
        logging.Handler.__init__(self, level)
        self.host = host
        self.port = port
        self.database_name = database_name
        self.collection_name = collection
        self.username = username
        self.password = password
        self.fail_silently = fail_silently
        self.connection = None
        self.db = None
        self.collection = None
        self.authenticated = False
        self.formatter = formatter or MongoFormatter()
        self.capped = capped
        self.capped_max = capped_max
        self.capped_size = capped_size
        self.options = options
        self._connect()

    def _connect(self):
        """Connecting to mongo database."""

        try:
            self.connection = Connection(host=self.host, port=self.port, **self.options)
        except PyMongoError:
            if self.fail_silently:
                return
            else:
                raise

        self.db = self.connection[self.database_name]
        if self.username is not None and self.password is not None:
            self.authenticated = self.db.authenticate(self.username, self.password)

        if self.capped:
            try:  # We don't want to override the capped collection (and it throws an error anyway)
                self.collection = Collection(self.db, self.collection_name, capped=True, max=self.capped_max, size=self.capped_size)
            except OperationFailure:
                # Capped collection exists, so get it.
                self.collection = self.db[self.collection_name]
        else:
            self.collection = self.db[self.collection_name]

    def close(self):
        """If authenticated, logging out and closing mongo database connection."""
        if self.authenticated:
            self.db.logout()
        if self.connection is not None:
            self.connection.disconnect()

    def emit(self, record):
        """Inserting new logging record to mongo database."""
        if self.collection is not None:
            try:
                self.collection.save(self.format(record))
            except Exception:
                if not self.fail_silently:
                    self.handleError(record)
View Code
還好我發現了他的連接不支持mongodb 的主從模式和副本集模式。哈哈,找到了發泄我寫代碼情緒的地方了(雖然只寫了兩三行代碼)。
mongodb_url='mongodb://192.168.10.200:10001,192.168.10.201:10001'
handler=MongoHandler(url=mongodb_url)
logger.addHandler(handler)
加上這幾句就可以開始使用啦。
 
本來這應該算高一段落了,但我又想到了這個方案的缺欠——這個日誌系統要依賴以數據庫,而我們的日誌系統是整個系統的基礎模塊,這個方案又不太合適了。
讓我們回到最初的夢想,日誌還是直接寫到文件中,logging只帶的不能有效的分割文件我們就自己寫一個。
於是有了這一個完全自己寫、支持按時間分割的Handler
#!/usr/bin/env python
# -*- coding:utf-8 -*-

import logging
import os,os.path
import datetime

_filefmt=os.path.join("logs","%Y-%m-%d","%H.log")
class MyLoggerHandler(logging.Handler):
    def __init__(self,filefmt=None):
        self.filefmt=filefmt
        if filefmt is None:
            self.filefmt=_filefmt
        logging.Handler.__init__(self)
    def emit(self,record):
        msg=self.format(record)
        _filePath=datetime.datetime.now().strftime(self.filefmt)
        _dir=os.path.dirname(_filePath)
        try:
            if os.path.exists(_dir) is False:
                os.makedirs(_dir)
        except Exception:
            print "can not make dirs"
            print "filepath is "+_filePath
            pass
        try:
            _fobj=open(_filePath,'a') 
            _fobj.write(msg)
            _fobj.write("\n")
            _fobj.flush()
            _fobj.close()
        except Exception:
            print "can not write to file"
            print "filepath is "+_filePath
            pass
這裏的思路是每次寫日誌完後馬上釋放文件的句柄,這樣這多進程中就不會照成衝突了。當然這裏還可以最一下優化就是先緩衝一部分內容、或一段時間再一次性寫入(這個後面覺得性能不敢再做吧)。
 
回顧一下這個找尋合適Handler的過程,主要集中在三步。第一、確認python自帶的handler不合適,第二、確認handler的自定義實現方式,第三、選擇日誌的存儲載體(這裏思路似乎走的有點遠,遠離了最初的設想)。值得欣慰的是這個過程終於可以告一段落了。
 
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章