mangos(一)概述與消息處理機制

一、概述

想看下開源的服務器框架,本以爲挺複雜,但mangos代碼寫的很清楚。mangos不是一個魔獸私服模擬器,它是一個開源的自由軟件項目,是用c++和C#編程語言,實現的一個支持大型多人在線角色扮演遊戲服務器的程序框架。svn的路徑:http://svn.code.sf.net/p/mangos/code/trunk 下載下來貌似有100多兆,我用的vs2005編譯vc8工程release版本一次就編譯過了。

主目錄中文件夾有:
contrib 第三方的工具
dep 依賴的開源庫ace sqlite等
src 項目代碼
sql 數據庫腳本

src目錄下文件夾有:
bindings文件夾中包含腳本文件,應該是對腳本進行綁定的。
framework文件夾中包括一些遊戲框架,其中包括網絡框架,遊戲系統框架,工具,平臺等內容。
game文件夾中應該是遊戲的文件,包括世界系統,戰鬥系統,遊戲事件,遊戲場景等的實現。
mangosd文件夾中是mangosd的主程序,包括程序的入口等。
realmd 文件夾中是遊戲區域信息,包括RealmList等內容。
shared文件夾中 應該是公用的函數和庫,database的內容包含在其中。

線程分佈:
1、主線程 main—- 主要功能:初始化world、創建子線程、回收資源
2、WorldRunnable ——-主線程
3、CliRunnable —–調試線程 command line
4、RARunnable ——-Remote Administration 處理遠程管理命令?
5、MaNGOSsoapRunnable—協議
6、FreezeDetectorRunnable —- 心跳檢測
7、SqlDelayThread — 數據線程
8、PatcherRunnable —- 給客戶端升級(發送補丁文件)

這裏對於線程類的命名都是以Runnable開始,以繼承的方式實現線程類,從而對線程的分佈一目瞭然,並且對類有一定說明作用。

事件分發和處理:
WorldRunnable::run—World:update—-World:UpdateSessions—WorldSession::Update(一個socket內所有事件)—各種各樣的handler

二、WorldRunnable類

/// Heartbeat for the World
void WorldRunnable::run()
{
    ///- Init new SQL thread for the world database
    WorldDatabase.ThreadStart();                                // let thread do safe mySQL requests (one connection call enough)
    sWorld.InitResultQueue();

    uint32 realCurrTime = 0;
    uint32 realPrevTime = getMSTime();

    uint32 prevSleepTime = 0;                               // used for balanced full tick time length near WORLD_SLEEP_CONST

    ///- While we have not World::m_stopEvent, update the world
    while (!World::m_stopEvent)
    {
        ++World::m_worldLoopCounter;
        realCurrTime = getMSTime();

        uint32 diff = getMSTimeDiff(realPrevTime,realCurrTime);

        sWorld.Update( diff );
        realPrevTime = realCurrTime;

        // diff (D0) include time of previous sleep (d0) + tick time (t0)
        // we want that next d1 + t1 == WORLD_SLEEP_CONST
        // we can't know next t1 and then can use (t0 + d1) == WORLD_SLEEP_CONST requirement
        // d1 = WORLD_SLEEP_CONST - t0 = WORLD_SLEEP_CONST - (D0 - d0) = WORLD_SLEEP_CONST + d0 - D0
        if (diff <= WORLD_SLEEP_CONST+prevSleepTime)
        {
            prevSleepTime = WORLD_SLEEP_CONST+prevSleepTime-diff;
            ZThread::Thread::sleep(prevSleepTime);
        }
        else
            prevSleepTime = 0;
    }

    ... // 清理資源
}
  1. 這是遊戲世界的驅動線程,一開始很困惑“Heartbeat”難道是用來保持長連接用的?其實這裏的Heartbeat應該理解爲是整個世界的驅動的地方,像人的心臟,汽車的發動機之類。

  2. WorldDatabase.ThreadStart();這裏名字很明確,“ThreadStart”,線程啓動,如果內部帶線程運行,名字中最好體現,不要叫做“Start”之類。同樣結束線程名字爲WorldDatabase.ThreadEnd();

  3. 在線程中,每次處理完都sleep了一段時間,有註釋如下
    // diff (D0) include time of previous sleep (d0) + tick time (t0)
    // we want that next d1 + t1 == WORLD_SLEEP_CONST
    // we can’t know next t1 and then can use (t0 + d1) == WORLD_SLEEP_CONST requirement
    // d1 = WORLD_SLEEP_CONST - t0 = WORLD_SLEEP_CONST - (D0 - d0) = WORLD_SLEEP_CONST + d0 - D0

這可能是遊戲服務器中的特殊的部分,遊戲服務器不需要“實時”處理用戶的請求,只需要讓用戶覺得是“實時”的就夠了。就像電視,不論是液晶的合適CRT的都有個刷新頻率。而且如果使用“實時”處理的方式還會引入很多不必要的問題,例如,如果實時處理沒有sleep,假設服務器能夠處理,用戶通過某種方法,在1秒內發送了1000次的“出拳”指令,如果不加以處理,那麼一碰到這拳頭其他人就掛了。這個固定的處理時間,也給整個遊戲世界定了一個時間的最小片段,動作頻率的最小片段,方便以後各種業務的處理。

sleep會不會浪費cpu呢?不會,因爲只有在線程能夠在一個WORLD_SLEEP_CONST處理完所有操作時候纔會sleep,如果這個處理線程一直是滿負荷的,那麼這個線程也會一直工作。

計算方法看着比較複雜,其實不難。不考慮其他情況,假如線程每次都能處理完(diff<=WORLD_SLEEP_CONST+prevSleepTime),
那麼有DO = d0 + t0, d1 = WORLD_SLEEP_CONST + d0 - D0 = WORLD_SLEEP_CONST - t0;
那麼假設WORLD_SLEEP_CONST爲100ms,處理花費10ms(t0),那麼就sleep90ms(100 - 10)就可以了;然後,diff = 10 + 90 = 100ms,這種理想的情況下,diff一直是100ms,但是在線程很忙的時候,sWorld.Update時間大於100ms時候,也就是diff>WORLD_SLEEP_CONST+prevSleepTime <==> t0 + d0 > WORLD_SLEEP_CONST+prevSleepTime(d0),即t0 > WORLD_SLEEP_CONST時候,diff就不等100了,這個時候prevSleepTime,diff等於sWorld.Update。

爲什麼需要計算diff這個參數呢?爲了給sWorld.Update中的定時器提供時間參數,如果diff每次都是100ms,那麼只傳入一個m_worldLoopCounter即可,但是diff並不是一直都是100ms,所以需要把“過去多長時間”這個參數傳入。相比定時器內部記錄上次時間,每次輪詢獲取當前時間比較來說,這樣使用一是簡單些,最重要的是準確,如果不是這樣使用,那麼第二個定時器與第一個定時器由於調用順序問題獲取到的系統時間可能是不一樣的,這樣對於遊戲玩家來說,時間就不同步了,同樣是在一個世界裏最小的時間片段內,爲什麼別人的時間就比我的快呢?

  1. 以後主要邏輯到sWorld.Update( diff ) 更新整個世界模型。

三、 World類

/// The World
class World
{
    public:
        ...
        //player Queue
        typedef std::list<WorldSession*> Queue;
        void AddQueuedPlayer(WorldSession*);
        void RemoveQueuedPlayer(WorldSession*);

        void Update(time_t diff);

        void UpdateSessions( time_t diff );

        void ProcessCliCommands();
        void QueueCliCommand(CliCommandHolder* command) { cliCmdQueue.add(command); }

        void UpdateResultQueue();
        void InitResultQueue();
        ...
    protected:
        void _UpdateGameTime();
        void ScriptsProcess();
        // callback for UpdateRealmCharacters
        void _UpdateRealmCharCount(QueryResult *resultCharCount, uint32 accountId);

        void InitDailyQuestResetTime();
        void ResetDailyQuests();
    private:
        ...
        typedef HM_NAMESPACE::hash_map<uint32, WorldSession*> SessionMap;
        SessionMap m_sessions;
        std::set<WorldSession*> m_kicked_sessions;
        uint32 m_maxActiveSessionCount;
        uint32 m_maxQueuedSessionCount;

        std::multimap<time_t, ScriptAction> m_scriptSchedule;

        uint32 m_ShutdownTimer;
        uint32 m_ShutdownMask;
        ...
        // CLI command holder to be thread safe
        ZThread::LockedQueue<CliCommandHolder*, ZThread::FastMutex> cliCmdQueue;
        SqlResultQueue *m_resultQueue;
        //Player Queue
        Queue m_QueuedPlayer;

        //sessions that are added async
        void AddSession_(WorldSession* s);
        ZThread::LockedQueue<WorldSession*, ZThread::FastMutex> addSessQueue;
        // 這裏,用戶添加不是直接添加到m_sessions列隊裏面,異步添加,先添加到一個臨時隊列,等待定時器到時候從隊列取出放入m_sessions列隊
};
/// Update the World !
void World::Update(time_t diff)
{
    ///- Update the different timers
    for(int i = 0; i < WUPDATE_COUNT; i++)
        if(m_timers[i].GetCurrent()>=0)
            m_timers[i].Update(diff);
    else m_timers[i].SetCurrent(0);

    ///- Update the game time and check for shutdown time
    _UpdateGameTime();

    /// Handle daily quests reset time
    if(m_gameTime > m_NextDailyQuestReset)
    {
        ResetDailyQuests();
        m_NextDailyQuestReset += DAY;
    }

    /// <ul><li> Handle auctions when the timer has passed
    if (m_timers[WUPDATE_AUCTIONS].Passed())
    {
        ...
    }

    /// <li> Handle session updates when the timer has passed
    if (m_timers[WUPDATE_SESSIONS].Passed())
    {
        m_timers[WUPDATE_SESSIONS].Reset();

        UpdateSessions(diff);
    }

    /// <li> Handle weather updates when the timer has passed
    if (m_timers[WUPDATE_WEATHERS].Passed())
    {
       ...
    }
    /// <li> Update uptime table
    if (m_timers[WUPDATE_UPTIME].Passed())
    {
       ...
    }

    /// <li> Handle all other objects
    if (m_timers[WUPDATE_OBJECTS].Passed())
    {
       ...
    }

    // execute callbacks from sql queries that were queued recently
    UpdateResultQueue();

    ///- Erase corpses once every 20 minutes
    if (m_timers[WUPDATE_CORPSES].Passed())
    {
        m_timers[WUPDATE_CORPSES].Reset();

        CorpsesErase();
    }

    ///- Process Game events when necessary
    if (m_timers[WUPDATE_EVENTS].Passed())
    {
        m_timers[WUPDATE_EVENTS].Reset();                   // to give time for Update() to be processed
        uint32 nextGameEvent = gameeventmgr.Update();
        m_timers[WUPDATE_EVENTS].SetInterval(nextGameEvent);
        m_timers[WUPDATE_EVENTS].Reset();
    }

    /// </ul>
    ///- Move all creatures with "delayed move" and remove and delete all objects with "delayed remove"
    MapManager::Instance().DoDelayedMovesAndRemoves();

    // update the instance reset times
    sInstanceSaveManager.Update();

    // And last, but not least handle the issued cli commands
    ProcessCliCommands();
}
void World::UpdateSessions( time_t diff )
{
    while(!addSessQueue.empty())
    {
      WorldSession* sess = addSessQueue.next ();
      AddSession_ (sess);
    }

    ///- Delete kicked sessions at add new session
    for (std::set<WorldSession*>::iterator itr = m_kicked_sessions.begin(); itr != m_kicked_sessions.end(); ++itr)
        delete *itr;
    m_kicked_sessions.clear();

    ///- Then send an update signal to remaining ones
    for (SessionMap::iterator itr = m_sessions.begin(), next; itr != m_sessions.end(); itr = next)
    {
        next = itr;
        ++next;

        if(!itr->second)
            continue;

        ///- and remove not active sessions from the list
        if(!itr->second->Update(diff))                      // As interval = 0
        {
            delete itr->second;
            m_sessions.erase(itr);
        }
    }
}
  1. Session的創建是在worldSocket中,而Session的釋放是在word中,在World::UpdateSessions( time_t diff )時候如果檢測到Session無效則釋放。非常規但是有利於管理。

  2. 整個程序的有個基本原則,邏輯單線程,儘量少用鎖。鎖還是比不可少的,因爲不能所有的處理都在tcp的reactor讀回調線程中,根據生產者消費者模型,從緩衝區取得數據處理的時候是需要加鎖的。其實鎖多少也不是關鍵問題,使用鎖首先不能讓鎖競爭時候等待時間過長,即持有鎖的時間不能長,否則直接影響系統的吞吐量。其次鎖競爭不要多。要做到上述條件即要求鎖的粒度要小。其次鎖多,不一定鎖競爭就多。就像線程多,不一定線程的切換開銷多一樣,可能大多數線程是sleep的不需要切換。持有鎖的時間足夠短,即使訪問鎖頻率很快也可能沒有競爭問題,線程佔用鎖的時候這個鎖是沒有被持有的,如果持有鎖時間短,那麼在其他線程申請佔用鎖的時候鎖已經被釋放掉了。

由於tcp讀線程與word的heartbeat線程不是一個線程,傳遞消息肯定會涉及到鎖了。這兩個線程使用鎖的地方有:
1)ZThread::LockedQueue

四、WorldSession類

/*
* A FastMutex is a small fast implementation of a non-recursive, mutually exclusive
* Lockable object. This implementation is a bit faster than the other Mutex classes
* as it involved the least overhead. However, this slight increase in speed is 
* gained by sacrificing the robustness provided by the other classes. 
*
* A FastMutex has the useful property of not being interruptable; that is to say  
* that acquire() and tryAcquire() will not throw Interrupted_Exceptions.
*/

/// Player session in the World
class MANGOS_DLL_SPEC WorldSession
{
    public:
        void QueuePacket(WorldPacket* new_packet);
        bool Update(uint32 diff);

    public:                                                 // opcodes handlers

        void Handle_NULL(WorldPacket& recvPacket);          // not used
        void Handle_EarlyProccess( WorldPacket& recvPacket);// just mark packets processed in WorldSocket::OnRead
        ...// 爲閱讀方便刪除很多handle方法
        void HandleGuildBankSetTabText(WorldPacket& recv_data);

    private:
        ...
        Player *_player;
        WorldSocket *m_Socket;
        ZThread::LockedQueue<WorldPacket*,ZThread::FastMutex> _recvQueue; // 這裏使用非遞歸鎖,可以加快速度
};

/// Update the WorldSession (triggered by World update)
bool WorldSession::Update(uint32 /*diff*/)
{
  if (m_Socket)
    if (m_Socket->IsClosed ())
      { 
        m_Socket->RemoveReference (); // 操作引用計數來表示對象釋放,沒有使用智能指針
        m_Socket = NULL;
      }

    WorldPacket *packet;

    ///- Retrieve packets from the receive queue and call the appropriate handlers
    /// \todo Is there a way to consolidate the OpcondeHandlerTable and the g_worldOpcodeNames to only maintain 1 list?
    /// answer : there is a way, but this is better, because it would use redundant RAM
    while (!_recvQueue.empty())
    {
        packet = _recvQueue.next();

        /*#if 1
        sLog.outError( "MOEP: %s (0x%.4X)",
                        LookupOpcodeName(packet->GetOpcode()),
                        packet->GetOpcode());
        #endif*/

        if(packet->GetOpcode() >= NUM_MSG_TYPES)
        {
            sLog.outError( "SESSION: received non-existed opcode %s (0x%.4X)",
                LookupOpcodeName(packet->GetOpcode()),
                packet->GetOpcode());
        }
        else
        {
            OpcodeHandler& opHandle = opcodeTable[packet->GetOpcode()]; // 非主流,提高性能
            switch (opHandle.status)
            {
                case STATUS_LOGGEDIN:
                    if(!_player)
                    {
                        // skip STATUS_LOGGEDIN opcode unexpected errors if player logout sometime ago - this can be network lag delayed packets
                        if(!m_playerRecentlyLogout)
                            logUnexpectedOpcode(packet, "the player has not logged in yet");
                    }
                    else if(_player->IsInWorld())
                        (this->*opHandle.handler)(*packet);
                    // lag can cause STATUS_LOGGEDIN opcodes to arrive after the player started a transfer
                    break;
                case STATUS_TRANSFER_PENDING:
                    if(!_player)
                        logUnexpectedOpcode(packet, "the player has not logged in yet");
                    else if(_player->IsInWorld())
                        logUnexpectedOpcode(packet, "the player is still in world");
                    else
                        (this->*opHandle.handler)(*packet);
                    break;
                case STATUS_AUTHED:
                    m_playerRecentlyLogout = false;
                    (this->*opHandle.handler)(*packet);
                    break;
                case STATUS_NEVER:
                    sLog.outError( "SESSION: received not allowed opcode %s (0x%.4X)",
                        LookupOpcodeName(packet->GetOpcode()),
                        packet->GetOpcode());
                    break;
            }
        }

        delete packet;
    }

    ///- If necessary, log the player out
    time_t currTime = time(NULL);
    if (!m_Socket || (ShouldLogOut(currTime) && !m_playerLoading))
        LogoutPlayer(true);

    if (!m_Socket)
        return false;                                       //Will remove this session from the world session map

    return true;
}
  1. 這個WorldSession::Update就是系統的hotpath,限制性能的地方。沒一句都會影響到系統的性能。這裏爲了提高性能,從寫法上,有點走偏鋒的意思,例如:
    OpcodeHandler& opHandle = opcodeTable[packet->GetOpcode()];
    1)opcodeTable結構體數組,而不是map,數組存取速度肯定比map快。
    2)結構體定義如下
    struct OpcodeHandler
    {
    char const* name;
    SessionStatus status;
    void (WorldSession::*handler)(WorldPacket& recvPacket);
    };
    一個名字說明,一個操作時候狀態,一個處理函數指針。真正非主流的做法就是直接拿到一個類中的成員函數指針,然後通過此函數指針調用方法。如果使用boost的人看到這一般會bind下,但是這樣做的話都會有對象聲明週期管理的開銷,尤其是boost的bind方法(從bind到釋放前後調用六七次構造析構函數)。上面的寫法可以快速找到對應的處理方法直接調用,通常來說這麼多處理方法經常寫程序的人可能會封裝成各種類對象,然後對不同的opcode創建對象來處理,這樣與上面一樣有對象生命週期管理的開銷。
    這樣做可以把複雜的容易變動的地方封裝但不影響性能(如果寫個大型的switch case也可以效率應該差不多,可讀性差,而且會經常修改)。

這種用法要特別注意,如果對象沒有實例化(上面已經實例化,在對象內調用自己的public方法),同樣也可以通過類的成員函數指針調用成員函數,但是需要特別注意這個調用不能操作任何數據對象,對象沒有實例化,沒有內存。也非靜態函數不能調用靜態數據成員。

  1. _recvQueue中的消息是m_Socket封包放入的。

五、WorldSocket類

/**
 * WorldSocket.
 * 
 * This class is responsible for the comunication with 
 * remote clients.
 * Most methods return -1 on failure. 
 * The class uses refferece counting.
 *
 * For output the class uses one buffer (64K usually) and 
 * a queue where it stores packet if there is no place on 
 * the queue. The reason this is done, is because the server 
 * does realy a lot of small-size writes to it, and it doesn't 
 * scale well to allocate memory for every. When something is 
 * writen to the output buffer the socket is not immideately 
 * activated for output (again for the same reason), there 
 * is 10ms celling (thats why there is Update() method). 
 * This concept is simmilar to TCP_CORK, but TCP_CORK 
 * usses 200ms celling. As result overhead generated by 
 * sending packets from "producer" threads is minimal, 
 * and doing a lot of writes with small size is tollerated.
 * 
 * The calls to Upate () method are managed by WorldSocketMgr
 * and ReactorRunnable.
 * 
 * For input ,the class uses one 1024 bytes buffer on stack 
 * to which it does recv() calls. And then recieved data is 
 * distributed where its needed. 1024 matches pritey well the 
 * traffic generated by client for now.
 *  
 * The input/output do speculative reads/writes (AKA it tryes 
 * to read all data avaible in the kernel buffer or tryes to 
 * write everything avaible in userspace buffer), 
 * which is ok for using with Level and Edge Trigered IO 
 * notification.
 * 
 */
class WorldSocket : protected WorldHandler
{
public:
  /// Add refference to this object.
  long AddReference (void);

  /// Remove refference to this object.
  long RemoveReference (void);

  int ProcessIncoming (WorldPacket* new_pct);

};

//關鍵函數
int WorldSocket::ProcessIncoming (WorldPacket* new_pct)
{
    ACE_ASSERT (new_pct);

    // manage memory ;)
    ACE_Auto_Ptr<WorldPacket> aptr (new_pct);

    const ACE_UINT16 opcode = new_pct->GetOpcode ();

    if (this->closing_)
        return -1;

    // dump recieved packet
    if (sWorldLog.LogWorld ())
    {
        sWorldLog.Log ("CLIENT:\nSOCKET: %u\nLENGTH: %u\nOPCODE: %s (0x%.4X)\nDATA:\n",
                     (uint32) get_handle (),
                     new_pct->size (),
                     LookupOpcodeName (new_pct->GetOpcode ()),
                     new_pct->GetOpcode ());

        uint32 p = 0;
        while (p < new_pct->size ())
        {
            for (uint32 j = 0; j < 16 && p < new_pct->size (); j++)
                sWorldLog.Log ("%.2X ", (*new_pct)[p++]);
            sWorldLog.Log ("\n");
        }
        sWorldLog.Log ("\n\n");
    }

    // like one switch ;)
    if (opcode == CMSG_PING)
    {
        return HandlePing (*new_pct);
    }
    else if (opcode == CMSG_AUTH_SESSION)
    {
        if (m_Session)
        {
            sLog.outError ("WorldSocket::ProcessIncoming: Player send CMSG_AUTH_SESSION again");
            return -1;
        }

        return HandleAuthSession (*new_pct);
    }
    else if (opcode == CMSG_KEEP_ALIVE)
    {
        DEBUG_LOG ("CMSG_KEEP_ALIVE ,size: %d", new_pct->size ());

        return 0;
    }
    else
    {
        ACE_GUARD_RETURN (LockType, Guard, m_SessionLock, -1);

        if (m_Session != NULL)
        {
            // OK ,give the packet to WorldSession
            aptr.release ();
            // WARNINIG here we call it with locks held.
            // Its possible to cause deadlock if QueuePacket calls back
            m_Session->QueuePacket (new_pct);
            // 這裏,向
            return 0;
        }
        else
        {
            sLog.outError ("WorldSocket::ProcessIncoming: Client not authed opcode = ", opcode);
            return -1;
        }
    }

    ACE_NOTREACHED (return 0);
}
  1. 這個類在封包後會調用ProcessIncoming這個方法,此時上線文還是在ACE_Reactor中唯一的一個讀事件處理線程中,如果此處一阻塞所有的tcp的讀就阻塞了。但是這裏也處理了一個業務HandleAuthSession,這個處理登錄的方法裏面還有查詢數據庫等操作,有點費時間。但是如果登錄數目不大的話,還可以理解。也可以理解這個操作是“十分重要”的,沒有用戶登錄後面操作都沒有任何意義。所以數目不大又重要的消息,直接在tcp線程中處理了。

六、總結

  1. 爲了提高系統性能,代碼中並不夠“面向對象”,很多地方switch case,甚至利用了一個不常用的方法例如通過函數指針調用類類方法。代碼中很少有new delete操作。沒有share_ptr智能指針,有部分使用auto_ptr。
  2. 對於遊戲服務器,單機能支持5000連接已經不錯,關鍵看在一個處理週期最長時間100ms是否能夠處理全部的請求。
  3. IO操作單開線程,其他邏輯上處理單線程,減少線程切換、鎖競爭開銷,可以充分的利用cpu。設計到IO的地方不多,網絡接收,這塊ACE的Reactor都做好了,其次讀寫數據庫需要一個線程,剩餘大部分的操作都是邏輯操作。
  4. 在處理數據庫操作的時候,有複雜的操作是放入單獨的線程中處理的,簡單的就在輪詢線程中處理掉了。對於放入單獨線程處理的地方,在更新完內存並放入數據庫處理線程後就直接返回成功了。目前未看到處理失敗的處理機制,看來mysql還是很靠譜的了。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章