SGI STL内存配置器Allocator

前言

template <class T, class Alloc = alloc>
class vector {
typedef simple_alloc<value_type, Alloc> data_allocator;
//vector对象中内存分配和释放全部交给data_allocator负责,即data_allocator是内存的主导者
…
}


从上可知vector中的内存主导者data_allocator主要有4个函数组成,这四个函数都是转调用,其实还是Alloc在起作用。Alloc就是STL的内存配置器,STL中内存的分配和释放工作全部交由Alloc负责。所以如果能摸清Alloc的来龙去脉,剖析清楚其主要的工作和原理对于STL源码的学习非常重要。

STL配置器(allocator)头文件与基本的函数

主要为以下三个头文件:stl_constructor.h,stl_alloc.h,stl_uninitialized.h

构造和析构工具:construct()和destroy()

这两个函数的实现位于stl_construct.h,用来实现对象的构造和析构

construct()和destroy()被实现为全局函数

 

#ifndef __SGI_STL_INTERNAL_CONSTRUCT_H
#define __SGI_STL_INTERNAL_CONSTRUCT_H

#include <new.h>

__STL_BEGIN_NAMESPACE

template <class T1, class T2>
inline void construct(T1* p, const T2& value) {
  new (p) T1(value);
}

//泛化版本(ForwardIterator first, ForwardIterator last),接受两个迭代器
//此函数设法找出元素的数值型别,进而利用__type_traits<>求取最适当措施
template <class ForwardIterator>
inline void destroy(ForwardIterator first, ForwardIterator last) {
  __destroy(first, last, value_type(first));
}
//判断元素的数值型别(value_type)是否有trivial destructor
template <class ForwardIterator, class T>
inline void __destroy(ForwardIterator first, ForwardIterator last, T*) {
  typedef typename __type_traits<T>::has_trivial_destructor trivial_destructor;
  __destroy_aux(first, last, trivial_destructor());
}
//如果元素的数值型别(value_type)有trivial destructor
template <class ForwardIterator> 
inline void __destroy_aux(ForwardIterator, ForwardIterator, __true_type) {}
//如果元素的数值型别(value_type)有non-trivial destructor
template <class ForwardIterator>
inline void
__destroy_aux(ForwardIterator first, ForwardIterator last, __false_type) {
  for ( ; first < last; ++first)
    destroy(&*first);
}

//特化版本(char*, char*),对迭代器为char*的特化版
inline void destroy(char*, char*) {}
//特化版本(wchar_t*, wchar_t*),对迭代器为wchar_t*的特化版
inline void destroy(wchar_t*, wchar_t*) {}

//特化版本(T* pointer),接受一个指针
template <class T>
inline void destroy(T* pointer) {
    pointer->~T();
}

__STL_END_NAMESPACE

#endif /* __SGI_STL_INTERNAL_CONSTRUCT_H */

value_type()__type_traits<>实现问题?

construct()接受一个指针p和一个初值value,用于将初值设定到指针所指的空间上。Destroy()有两种版本,第一种接受一个指针,将指针所指之物析构掉,即直接调用对象的析构函数。第二个版本接受一对迭代器,将[first,last)范围内的对象都析构掉。首先判断迭代器所指对象的型别(value_type),再利用__type_traits<T>判断该型别的析构函数是否无关痛痒。如果每个对象的析构代价都是微小的(trivial destructor),就什么也不做,因为如果一次次的调用这些无关痛痒的析构函数,对于效率是一种伤害。否则就循环遍历,将一个个对象析构掉。

空间的配置和释放,std::alloc

这部分内容的函数实现位于stl_alloc.h,用来管理对象内存的分配和释放。

SGI对于内存配置和释放的设计哲学:


考虑到小型区块所可能造成的内存破碎问题,SGI使用了2级内存配置器。第一级配置器直接使用malloc和free函数,第二级适配器采用不同的策略,如果配置的内存超过128bytes,视为内存区块足够大,调用第一级内存配置器;否则认为内存区块过小,为了减小内存碎片问题,采用memory pool整理方式。

第一级内存配置器:malloc_alloc

第二级内存配置器:__default_alloc_template




         包装一个接口simple_alloc,包含四个的四个成员函数都只是单纯的转调用,调用传递给内存配置器的成员函数。SGI STL容器全部都使用这个simple_alloc接口进行内存的配置和释放。


以下讨论两级配置器的具体运行机制,暂时不把有关于线程的情况考虑进来。

第一级配置器__malloc_alloc_template剖析

template <int inst>
class __malloc_alloc_template {//第一级配置器

private:
//以下三个函数内存不足的处理函数
static void *oom_malloc(size_t);

static void *oom_realloc(void *, size_t);

#ifndef __STL_STATIC_TEMPLATE_MEMBER_BUG
    static void (* __malloc_alloc_oom_handler)();
#endif

public:

static void * allocate(size_t n)
{
    void *result = malloc(n);
    if (0 == result) result = oom_malloc(n);
    return result;
}

static void deallocate(void *p, size_t /* n */)
{
    free(p);
}

static void * reallocate(void *p, size_t /* old_sz */, size_t new_sz)
{
    void * result = realloc(p, new_sz);
    if (0 == result) result = oom_realloc(p, new_sz);
    return result;
}

static void (* set_malloc_handler(void (*f)()))()
{
    void (* old)() = __malloc_alloc_oom_handler;
    __malloc_alloc_oom_handler = f;
    return(old);
}

};

// malloc_alloc out-of-memory handling

#ifndef __STL_STATIC_TEMPLATE_MEMBER_BUG
template <int inst>
void (* __malloc_alloc_template<inst>::__malloc_alloc_oom_handler)() = 0;
#endif

template <int inst>
void * __malloc_alloc_template<inst>::oom_malloc(size_t n)
{
    void (* my_malloc_handler)();
    void *result;

    for (;;) {
        my_malloc_handler = __malloc_alloc_oom_handler;
        if (0 == my_malloc_handler) { __THROW_BAD_ALLOC; }
        (*my_malloc_handler)();
        result = malloc(n);
        if (result) return(result);
    }
}

template <int inst>
void * __malloc_alloc_template<inst>::oom_realloc(void *p, size_t n)
{
    void (* my_malloc_handler)();
    void *result;

    for (;;) {
        my_malloc_handler = __malloc_alloc_oom_handler;
        if (0 == my_malloc_handler) { __THROW_BAD_ALLOC; }
        (*my_malloc_handler)();
        result = realloc(p, n);
        if (result) return(result);
    }
}

第一级配置器的工作:

如果分配给客端或者客端归还的内存大小大于128bytes,则内存维护工作有第一级配置器担任。



new-handler机制?

内存不足的情况处理函数:

private:
//以下三个函数内存不足的处理函数
static void *oom_malloc(size_t);

static void *oom_realloc(void *, size_t);

#ifndef __STL_STATIC_TEMPLATE_MEMBER_BUG
    static void (* __malloc_alloc_oom_handler)();
#endif
// malloc_alloc out-of-memory handling

#ifndef __STL_STATIC_TEMPLATE_MEMBER_BUG
template <int inst>
void (* __malloc_alloc_template<inst>::__malloc_alloc_oom_handler)() = 0;
#endif

template <int inst>
void * __malloc_alloc_template<inst>::oom_malloc(size_t n)
{
    void (* my_malloc_handler)();
    void *result;

    for (;;) {
        my_malloc_handler = __malloc_alloc_oom_handler;
        if (0 == my_malloc_handler) { __THROW_BAD_ALLOC; }
        (*my_malloc_handler)();
        result = malloc(n);
        if (result) return(result);
    }
}

template <int inst>
void * __malloc_alloc_template<inst>::oom_realloc(void *p, size_t n)
{
    void (* my_malloc_handler)();
    void *result;

    for (;;) {
        my_malloc_handler = __malloc_alloc_oom_handler;
        if (0 == my_malloc_handler) { __THROW_BAD_ALLOC; }
        (*my_malloc_handler)();
        result = realloc(p, n);
        if (result) return(result);
    }
}

第二级配置器__default_alloc_template剖析

第二级配置器的宗旨:

如果分配给客端或者客端归还的内存大小小于128bytes,则内存维护工作有第二级配置器担任。

维护一个内存分配链表(各子链表上每个节点内存大小分别为8,16,...,128bytes),另有内存缓冲池用来为链表输送内存。如果供应不足,从heap空间调来内存给内存缓冲池和内存分配链表。内存分配链表上各节点的内存用来分配给需要的客端对象;客端对象归还内存也是归还给内存分配链表。

代码注释:

template <bool threads, int inst>
class __default_alloc_template {   //第二级配置器

private:
  // Really we should use static const int x = N
  // instead of enum { x = N }, but few compilers accept the former.
# ifndef __SUNPRO_CC
    enum {__ALIGN = 8};         //分配内存的基数
    enum {__MAX_BYTES = 128};   //分配内存的最大值
    enum {__NFREELISTS = __MAX_BYTES/__ALIGN};//free_list[]的大小
# endif
  static size_t ROUND_UP(size_t bytes) {//调整bytes大小,使得它的大小变为8的倍数
        return (((bytes) + __ALIGN-1) & ~(__ALIGN - 1));
  }
__PRIVATE:
  union obj {//使用union可以实现内存共享,缩小内存的使用
        union obj * free_list_link;
        char client_data[1];    /* The client sees this.        */
  };
private:
# ifdef __SUNPRO_CC
    static obj * __VOLATILE free_list[]; //管理内存的主链表,每个元素指向一个链表,其链表上链接着一系列同等大小的内存待予分配
        // Specifying a size results in duplicate def for 4.1
# else
    static obj * __VOLATILE free_list[__NFREELISTS]; 
# endif
  static  size_t FREELIST_INDEX(size_t bytes) {//指向所应分配的内存链表地址
        return (((bytes) + __ALIGN-1)/__ALIGN - 1);
  }

  // Returns an object of size n, and optionally adds to size n free list.
  static void *refill(size_t n);
  // Allocates a chunk for nobjs of size "size".  nobjs may be reduced
  // if it is inconvenient to allocate the requested number.
  static char *chunk_alloc(size_t size, int &nobjs);

  // Chunk allocation state.内存缓冲池的起始地址
  static char *start_free;
  static char *end_free;
  static size_t heap_size;

# ifdef __STL_SGI_THREADS
    static volatile unsigned long __node_allocator_lock;
    static void __lock(volatile unsigned long *); 
    static inline void __unlock(volatile unsigned long *);
# endif

# ifdef __STL_PTHREADS
    static pthread_mutex_t __node_allocator_lock;
# endif

# ifdef __STL_WIN32THREADS
    static CRITICAL_SECTION __node_allocator_lock;
    static bool __node_allocator_lock_initialized;

  public:
    __default_alloc_template() {
	// This assumes the first constructor is called before threads
	// are started.
        if (!__node_allocator_lock_initialized) {
            InitializeCriticalSection(&__node_allocator_lock);
            __node_allocator_lock_initialized = true;
        }
    }
  private:
# endif

    class lock {
        public:
            lock() { __NODE_ALLOCATOR_LOCK; }
            ~lock() { __NODE_ALLOCATOR_UNLOCK; }
    };
    friend class lock;

public:
    //宗旨:维护一个内存分配链表(各子链表上每个节点内存大小分别为8,16,...,128bytes),另有内存缓冲池用来为链表输送内存,
    //如果供应不足,从heap空间调来内存给内存缓冲池和内存分配链表
    //内存分配链表上各节点的内存用来分配给需要的客端对象
  /* n must be > 0      */
  static void * allocate(size_t n)
  {//构造,分配内存
    obj * __VOLATILE * my_free_list;
    obj * __RESTRICT result;

    if (n > (size_t) __MAX_BYTES) {//如果即将内存大小大于128bytes,则由一级配置器分配内存
        return(malloc_alloc::allocate(n));
    }
    my_free_list = free_list + FREELIST_INDEX(n);//定位到内存分配的地址,即寻找16个free_list子链表中适当的一个
    // Acquire the lock here with a constructor call.
    // This ensures that it is released in exit or during stack
    // unwinding.
#       ifndef _NOTHREADS
        /*REFERENCED*/
        lock lock_instance;
#       endif
    result = *my_free_list;
    if (result == 0) {//没有可用的free_list,准备重新填充free_list
        void *r = refill(ROUND_UP(n));//一方面会分配子链表上的内存,另一方面用来增加内存缓冲池的内存
        return r;
    }
    *my_free_list = result -> free_list_link;//调整free_list
    return (result);
  };

  /* p may not be 0 */
  static void deallocate(void *p, size_t n)
  {//析构,释放内存
    obj *q = (obj *)p;
    obj * __VOLATILE * my_free_list;

    if (n > (size_t) __MAX_BYTES) {//如果即将内存大小大于128bytes,则由一级配置器释放内存
        malloc_alloc::deallocate(p, n);
        return;
    }
    my_free_list = free_list + FREELIST_INDEX(n);//寻找对应的free_list
    // acquire lock
#       ifndef _NOTHREADS
        /*REFERENCED*/
        lock lock_instance;
#       endif /* _NOTHREADS */
    q -> free_list_link = *my_free_list;//调整free_list,回收内存,在free_list某个子链表的头节点上插入一块内存
    *my_free_list = q;
    // lock is released here
  }

  static void * reallocate(void *p, size_t old_sz, size_t new_sz);

} ;

typedef __default_alloc_template<__NODE_ALLOCATOR_THREADS, 0> alloc;
typedef __default_alloc_template<false, 0> single_client_alloc;



/* We allocate memory in large chunks in order to avoid fragmenting     */
/* the malloc heap too much.                                            */
/* We assume that size is properly aligned.                             */
/* We hold the allocation lock.                                         */
template <bool threads, int inst>
char*
__default_alloc_template<threads, inst>::chunk_alloc(size_t size, int& nobjs)
{//从内存池中取空间给free_list使用,
    char * result;
    size_t total_bytes = size * nobjs;
    size_t bytes_left = end_free - start_free;//内存缓冲池剩余空间

    if (bytes_left >= total_bytes) {//内存池的剩余空间完全满足需求量
        result = start_free;
        start_free += total_bytes;
        return(result);
    } else if (bytes_left >= size) {//内存池的剩余空间不能完全满足需求量,但足够分配一个objs区块大小
        nobjs = bytes_left/size;
        total_bytes = size * nobjs;
        result = start_free;
        start_free += total_bytes;
        return(result);
    } else {//内存池的剩余空间连一个objs区块大小都无法满足
        size_t bytes_to_get = 2 * total_bytes + ROUND_UP(heap_size >> 4);
        // Try to make use of the left-over piece.
        if (bytes_left > 0) {//内存缓冲池内还有一些零头,分配给free_list的某个子链表
            obj * __VOLATILE * my_free_list =
                        free_list + FREELIST_INDEX(bytes_left);

            ((obj *)start_free) -> free_list_link = *my_free_list;
            *my_free_list = (obj *)start_free;
        }
        //配置heap空间,用来补充内存缓冲池
        start_free = (char *)malloc(bytes_to_get);//分配bytes_to_get字节的内存到内存缓冲池中
        if (0 == start_free) {//如果系统的内存不足以用来分配bytes_to_get给内存缓冲池
            //heap空间不足,malloc失败
            obj * __VOLATILE * my_free_list, *p;
            // Try to make do with what we have.  That can't
            // hurt.  We do not try smaller requests, since that tends
            // to result in disaster on multi-process machines.
            for (i = size; i <= __MAX_BYTES; i += __ALIGN) {//从free_list子链表中(每个子节点内存大于size)各取一个节点的内存(如果存在的话),
                                                            //即释放出尚未使用的区块(区块足够大),用来增加节点内存大小为size这个子链表上的节点;
                                                            //另一方面,也会把多余的内存分配到内存缓冲池
            int i;
                my_free_list = free_list + FREELIST_INDEX(i);
                p = *my_free_list;
                if (0 != p) {
                    *my_free_list = p -> free_list_link;
                    start_free = (char *)p;
                    end_free = start_free + i;
                    return(chunk_alloc(size, nobjs));
                    // Any leftover piece will eventually make it to the
                    // right free list.
                }
            }
	    end_free = 0;	// In case of exception.如果出现意外(山穷水尽,到处都没内存可用)
            start_free = (char *)malloc_alloc::allocate(bytes_to_get);//调用第一级配置器,看看out-of-memory是否能尽力改变情况
            //在这边会抛出异常,或内存不足的情况得到改善
            // This should either throw an
            // exception or remedy the situation.  Thus we assume it
            // succeeded.
        }
        heap_size += bytes_to_get;
        end_free = start_free + bytes_to_get;
        return(chunk_alloc(size, nobjs));//内存缓冲池中已经有足够的内存,开始转去重新分配;一部分分配给free_list,剩下的内存缓冲池自己依然留着
    }
}


/* Returns an object of size n, and optionally adds to size n free list.*/
/* We assume that n is properly aligned.                                */
/* We hold the allocation lock.                                         */
template <bool threads, int inst>
void* __default_alloc_template<threads, inst>::refill(size_t n)//返回一个大小为n(n已经调整为8的倍数)的对象,并且有时候会为适当的free_list增加节点
{
    int nobjs = 20;
    char * chunk = chunk_alloc(n, nobjs);//尝试取得nobjs个区块作为free_list的新节点,当然其中一个区块用来分配掉
    obj * __VOLATILE * my_free_list;
    obj * result;
    obj * current_obj, * next_obj;
    int i;

    if (1 == nobjs) return(chunk);//如果只获得一个区块的内存,则将其直接分配给调用者,free_list未增加新的节点
    my_free_list = free_list + FREELIST_INDEX(n);//准备free_list,纳入新的节点

    /* Build free list in chunk */
      result = (obj *)chunk;//这一块准备给调用者
      *my_free_list = next_obj = (obj *)(chunk + n);//剩余的nobjs-1个区块分别链接起来,加入free_list的子链表
      for (i = 1; ; i++) {
        current_obj = next_obj;
        next_obj = (obj *)((char *)next_obj + n);
        if (nobjs - 1 == i) {
            current_obj -> free_list_link = 0;
            break;
        } else {
            current_obj -> free_list_link = next_obj;
        }
      }
    return(result);
}

template <bool threads, int inst>
void*
__default_alloc_template<threads, inst>::reallocate(void *p,
                                                    size_t old_sz,
                                                    size_t new_sz)//重新分配新的内存
{
    void * result;
    size_t copy_sz;

    if (old_sz > (size_t) __MAX_BYTES && new_sz > (size_t) __MAX_BYTES) {
        return(realloc(p, new_sz));
    }
    if (ROUND_UP(old_sz) == ROUND_UP(new_sz)) return(p);
    result = allocate(new_sz);
    copy_sz = new_sz > old_sz? old_sz : new_sz;
    memcpy(result, p, copy_sz);
    deallocate(p, old_sz);
    return(result);
}

#ifdef __STL_PTHREADS
    template <bool threads, int inst>
    pthread_mutex_t
    __default_alloc_template<threads, inst>::__node_allocator_lock
        = PTHREAD_MUTEX_INITIALIZER;
#endif

#ifdef __STL_WIN32THREADS
    template <bool threads, int inst> CRITICAL_SECTION
    __default_alloc_template<threads, inst>::__node_allocator_lock;

    template <bool threads, int inst> bool
    __default_alloc_template<threads, inst>::__node_allocator_lock_initialized
	= false;
#endif

#ifdef __STL_SGI_THREADS
__STL_END_NAMESPACE
#include <mutex.h>
#include <time.h>
__STL_BEGIN_NAMESPACE
// Somewhat generic lock implementations.  We need only test-and-set
// and some way to sleep.  These should work with both SGI pthreads
// and sproc threads.  They may be useful on other systems.
template <bool threads, int inst>
volatile unsigned long
__default_alloc_template<threads, inst>::__node_allocator_lock = 0;

#if __mips < 3 || !(defined (_ABIN32) || defined(_ABI64)) || defined(__GNUC__)
#   define __test_and_set(l,v) test_and_set(l,v)
#endif

template <bool threads, int inst>
void 
__default_alloc_template<threads, inst>::__lock(volatile unsigned long *lock)
{
    const unsigned low_spin_max = 30;  // spin cycles if we suspect uniprocessor
    const unsigned high_spin_max = 1000; // spin cycles for multiprocessor
    static unsigned spin_max = low_spin_max;
    unsigned my_spin_max;
    static unsigned last_spins = 0;
    unsigned my_last_spins;
    static struct timespec ts = {0, 1000};
    unsigned junk;
#   define __ALLOC_PAUSE junk *= junk; junk *= junk; junk *= junk; junk *= junk
    int i;

    if (!__test_and_set((unsigned long *)lock, 1)) {
        return;
    }
    my_spin_max = spin_max;
    my_last_spins = last_spins;
    for (i = 0; i < my_spin_max; i++) {
        if (i < my_last_spins/2 || *lock) {
            __ALLOC_PAUSE;
            continue;
        }
        if (!__test_and_set((unsigned long *)lock, 1)) {
            // got it!
            // Spinning worked.  Thus we're probably not being scheduled
            // against the other process with which we were contending.
            // Thus it makes sense to spin longer the next time.
            last_spins = i;
            spin_max = high_spin_max;
            return;
        }
    }
    // We are probably being scheduled against the other process.  Sleep.
    spin_max = low_spin_max;
    for (;;) {
        if (!__test_and_set((unsigned long *)lock, 1)) {
            return;
        }
        nanosleep(&ts, 0);
    }
}

template <bool threads, int inst>
inline void
__default_alloc_template<threads, inst>::__unlock(volatile unsigned long *lock)
{
#   if defined(__GNUC__) && __mips >= 3
        asm("sync");
        *lock = 0;
#   elif __mips >= 3 && (defined (_ABIN32) || defined(_ABI64))
        __lock_release(lock);
#   else 
        *lock = 0;
        // This is not sufficient on many multiprocessors, since
        // writes to protected variables and the lock may be reordered.
#   endif
}
#endif

template <bool threads, int inst>
char *__default_alloc_template<threads, inst>::start_free = 0;

template <bool threads, int inst>
char *__default_alloc_template<threads, inst>::end_free = 0;

template <bool threads, int inst>
size_t __default_alloc_template<threads, inst>::heap_size = 0;

template <bool threads, int inst>
__default_alloc_template<threads, inst>::obj * __VOLATILE
__default_alloc_template<threads, inst> ::free_list[
# ifdef __SUNPRO_CC
    __NFREELISTS
# else
    __default_alloc_template<threads, inst>::__NFREELISTS
# endif
] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };


管理内存最基本的数据结构:

union obj {//使用union可以实现内存共享,缩小内存的使用

       union obj * free_list_link;

       char client_data[1];    /* Theclient sees this.        */

  };

内存基本处理工具

STL定义有5个全局函数,作用于未初始化空间上,这样的功能对于容器的实现很有帮助。

例如:要实现一个容器,容器的全区间构造函数(range constructor)通常有2个步骤完成:

(1)配置内存区块,足以包含范围内的所有元素;

(2)使用unintialized_copy()在该内存块上构造元素。

用于构造的construct()    //本节刚开始已经讨论过

用于析构的destroy()     //本节刚开始已经讨论过

以下三个函数的实现均位于stl_uninitialized.h,其实现过程中对应的高层次函数copy(),fill(),fill_n()均位于stl_algobase.h

unintialized_copy()的泛型版本和特化版本:

unintialized_fill()的泛型版本:

unintialized_fill_n()的泛型版本:


参考文献:

代码来自于SGI STL

截图来源于侯捷老师的《STL源代码剖析》

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章