PG數據庫內核分析學習筆記_MultiXact日誌管理器
MultiXact日誌是PG系統用來記錄組合事務ID的一種日誌. 由於PG採用了多版本併發控制, 因此同一個元組相關的事務ID可能有多個, 爲了在加鎖(行共享鎖)的時候統一操作, PG將與該元組相關聯的多個事務ID組合起來用一個MultiXactID代替來管理.同CLOG, SubTrans日誌一樣, MultiXact日誌也是利用SLRU緩衝區來實現.
typedef uint32 TransactionId;
/* MultiXactId must be equivalent to TransactionId, to fit in t_xmax */
typedef TransactionId MultiXactId;
1 MultiXact日誌管理器相關數據結構
MultiXact是一個多對一的映射關係, 需要在事務ID數組中標記哪一段映射到一個MultiXactID(如圖). 所以在映射的過程中需要存儲兩種信息, 即需要標識一段事務ID的偏移量(Offset), 還需要記錄這段偏移量的大小(NMembers).
圖 MultiXactID組合關係
由於需要對MultiXactID的分配進行維護, 於是定義了數據結構MultiXactStateData(數據結構如圖所示).
數據結構 MultiXactStateData
// multixact.c
/*
* MultiXact state shared across all backends. All this state is protected
* by MultiXactGenLock. (We also use MultiXactOffsetControlLock and
* MultiXactMemberControlLock to guard accesses to the two sets of SLRU
* buffers. For concurrency's sake, we avoid holding more than one of these
* locks at a time.)
*/
typedef struct MultiXactStateData
{
/* 下一個可分配的 MultiXactId */
MultiXactId nextMXact;
/* 下一個對應MultiXactId的起始偏移量 */
MultiXactOffset nextOffset;
/* Have we completed multixact startup? */
bool finishedStartup;
/*
* Oldest multixact that is still potentially referenced by a relation.
* Anything older than this should not be consulted. These values are
* updated by vacuum.
*/
MultiXactId oldestMultiXactId;
Oid oldestMultiXactDB;
/*
* Oldest multixact offset that is potentially referenced by a multixact
* referenced by a relation. We don't always know this value, so there's
* a flag here to indicate whether or not we currently do.
*/
MultiXactOffset oldestOffset;
bool oldestOffsetKnown;
/* support for anti-wraparound measures */
MultiXactId multiVacLimit;
MultiXactId multiWarnLimit;
MultiXactId multiStopLimit;
MultiXactId multiWrapLimit;
/* support for members anti-wraparound measures */
MultiXactOffset offsetStopLimit; /* known if oldestOffsetKnown */
/*
* Per-backend data starts here. We have two arrays stored in the area
* immediately following the MultiXactStateData struct. Each is indexed by
* BackendId.
*
* In both arrays, there's a slot for all normal backends (1..MaxBackends)
* followed by a slot for max_prepared_xacts prepared transactions. Valid
* BackendIds start from 1; element zero of each array is never used.
*
* OldestMemberMXactId[k] is the oldest MultiXactId each backend's current
* transaction(s) could possibly be a member of, or InvalidMultiXactId
* when the backend has no live transaction that could possibly be a
* member of a MultiXact. Each backend sets its entry to the current
* nextMXact counter just before first acquiring a shared lock in a given
* transaction, and clears it at transaction end. (This works because only
* during or after acquiring a shared lock could an XID possibly become a
* member of a MultiXact, and that MultiXact would have to be created
* during or after the lock acquisition.)
*
* OldestVisibleMXactId[k] is the oldest MultiXactId each backend's
* current transaction(s) think is potentially live, or InvalidMultiXactId
* when not in a transaction or not in a transaction that's paid any
* attention to MultiXacts yet. This is computed when first needed in a
* given transaction, and cleared at transaction end. We can compute it
* as the minimum of the valid OldestMemberMXactId[] entries at the time
* we compute it (using nextMXact if none are valid). Each backend is
* required not to attempt to access any SLRU data for MultiXactIds older
* than its own OldestVisibleMXactId[] setting; this is necessary because
* the checkpointer could truncate away such data at any instant.
*
* The oldest valid value among all of the OldestMemberMXactId[] and
* OldestVisibleMXactId[] entries is considered by vacuum as the earliest
* possible value still having any live member transaction. Subtracting
* vacuum_multixact_freeze_min_age from that value we obtain the freezing
* point for multixacts for that table. Any value older than that is
* removed from tuple headers (or "frozen"; see FreezeMultiXactId. Note
* that multis that have member xids that are older than the cutoff point
* for xids must also be frozen, even if the multis themselves are newer
* than the multixid cutoff point). Whenever a full table vacuum happens,
* the freezing point so computed is used as the new pg_class.relminmxid
* value. The minimum of all those values in a database is stored as
* pg_database.datminmxid. In turn, the minimum of all of those values is
* stored in pg_control and used as truncation point for pg_multixact. At
* checkpoint or restartpoint, unneeded segments are removed.
*/
MultiXactId perBackendXactIds[FLEXIBLE_ARRAY_MEMBER];
} MultiXactStateData;
MultiXactStateData用來管理和維護MultiXactID, 另外還需要定義一個統一的接口來操作MultiXactID, 即存儲這種多對一的映射關係, PG定義mXactCacheEnt來完成此工作.
/*
* Definitions for the backend-local MultiXactId cache.
*
* We use this cache to store known MultiXacts, so we don't need to go to
* SLRU areas every time.
*
* The cache lasts for the duration of a single transaction, the rationale
* for this being that most entries will contain our own TransactionId and
* so they will be uninteresting by the time our next transaction starts.
* (XXX not clear that this is correct --- other members of the MultiXact
* could hang around longer than we did. However, it's not clear what a
* better policy for flushing old cache entries would be.) FIXME actually
* this is plain wrong now that multixact's may contain update Xids.
*
* We allocate the cache entries in a memory context that is deleted at
* transaction end, so we don't need to do retail freeing of entries.
*/
typedef struct mXactCacheEnt
{
MultiXactId multi;
int nmembers;
dlist_node node;
MultiXactMember members[FLEXIBLE_ARRAY_MEMBER];
} mXactCacheEnt;
MultiXact日誌存儲在PGDATA/pg_multixact/目錄下, 其中會有兩個子目錄members和offset, 分別存儲XactID成員和偏移量.
[postgres@localhost ~/db1_normal/pg_multixact]$ ls
members offsets
[postgres@localhost ~/db1_normal/pg_multixact]$
從一個MultiXactID映射到具體的存儲位置是通過下面的變換完成的:
/* We need four bytes per offset */
#define MULTIXACT_OFFSETS_PER_PAGE (BLCKSZ / sizeof(MultiXactOffset))
#define MultiXactIdToOffsetPage(xid) \
((xid) / (MultiXactOffset) MULTIXACT_OFFSETS_PER_PAGE)
#define MultiXactIdToOffsetEntry(xid) \
((xid) % (MultiXactOffset) MULTIXACT_OFFSETS_PER_PAGE)
#define MultiXactIdToOffsetSegment(xid) (MultiXactIdToOffsetPage(xid) / SLRU_PAGES_PER_SEGMENT)
/* page in which a member is to be found */
#define MXOffsetToMemberPage(xid) ((xid) / (TransactionId) MULTIXACT_MEMBERS_PER_PAGE)
#define MXOffsetToMemberSegment(xid) (MXOffsetToMemberPage(xid) / SLRU_PAGES_PER_SEGMENT)
/* Location (byte offset within page) of flag word for a given member */
#define MXOffsetToFlagsOffset(xid) \
((((xid) / (TransactionId) MULTIXACT_MEMBERS_PER_MEMBERGROUP) % \
(TransactionId) MULTIXACT_MEMBERGROUPS_PER_PAGE) * \
(TransactionId) MULTIXACT_MEMBERGROUP_SIZE)
#define MXOffsetToFlagsBitShift(xid) \
(((xid) % (TransactionId) MULTIXACT_MEMBERS_PER_MEMBERGROUP) * \
MXACT_MEMBER_BITS_PER_XACT)
/* Location (byte offset within page) of TransactionId of given member */
#define MXOffsetToMemberOffset(xid) \
(MXOffsetToFlagsOffset(xid) + MULTIXACT_FLAGBYTES_PER_GROUP + \
((xid) % MULTIXACT_MEMBERS_PER_MEMBERGROUP) * sizeof(TransactionId))
MultiXact日誌的緩衝區用兩個SLRU緩衝區來實現, 分別是MultiXactOffsetCtl和MultiXactMemberCtl, 分別記錄Members和Offsets. 在Postmaster啓動後, 就註冊在共享內存中, 管理全局的MultiXactID.
// multixact.c
/*
* Links to shared-memory data structures for MultiXact control
*/
static SlruCtlData MultiXactOffsetCtlData;
static SlruCtlData MultiXactMemberCtlData;
#define MultiXactOffsetCtl (&MultiXactOffsetCtlData)
#define MultiXactMemberCtl (&MultiXactMemberCtlData)
// slru.h
/*
* SlruCtlData is an unshared structure that points to the active information
* in shared memory.
*/
typedef struct SlruCtlData
{
SlruShared shared;
/*
* This flag tells whether to fsync writes (true for pg_xact and multixact
* stuff, false for pg_subtrans and pg_notify).
*/
bool do_fsync;
/*
* Decide which of two page numbers is "older" for truncation purposes. We
* need to use comparison of TransactionIds here in order to do the right
* thing with wraparound XID arithmetic.
*/
bool (*PagePrecedes) (int, int);
/*
* Dir is set during SimpleLruInit and does not change thereafter. Since
* it's always the same, it doesn't need to be in shared memory.
*/
char Dir[64];
} SlruCtlData;
參考
<PG數據庫內核分析> 7.11.4 MultiXact日誌管理器