【leveldb】SkipList(七)

SkipList(跳錶)數據結構是用於Memtable,Immutable Memtable表中,對於此二表的作用點此查看Memtable作用
Memtable是內存中的表,用於存儲插入的KV數據。SkipList的作用就是解決KV的快速插入和查詢。

一、介紹

SkipList使用空間換時間的設計思路,通過構建多級索引來提高查詢的效率,實現了基於鏈表的“二分查找”。跳錶是一種動態數據結構,支持快速的插入、刪除、查找操作,時間複雜度都是 O(logn)。
skiplist實現了基於鏈表的“二分查找”,其通過空間換時間的設計,利用構建多級所以來提高查詢效率。
複雜度如下:

  • 插入、刪除、查找的時間複雜度都是O(logn);
  • 空間複雜度是O(n)。

二、結構

沿用網上一張圖,Leveldb實現的SkipLisp初始化時head部分是12個指針點。
在這裏插入圖片描述

三、源碼分析

Node

這裏用到了內存序的相關知識,有不清楚的可以百度查下,這裏就不介紹了。

// Implementation details follow
template <typename Key, class Comparator>
struct SkipList<Key, Comparator>::Node {
  explicit Node(const Key& k) : key(k) {}

  Key const key;

  <!有內存屏障操作>
  // Accessors/mutators for links.  Wrapped in methods so we can
  // add the appropriate barriers as necessary.
  Node* Next(int n) {
    assert(n >= 0);
    // Use an 'acquire load' so that we observe a fully initialized
    // version of the returned Node.
    return next_[n].load(std::memory_order_acquire);
  }
  void SetNext(int n, Node* x) {
    assert(n >= 0);
    // Use a 'release store' so that anybody who reads through this
    // pointer observes a fully initialized version of the inserted node.
    next_[n].store(x, std::memory_order_release);
  }
  
  <!無內存屏障操作,相比於無內存屏障操作,性能損耗更小>
  // No-barrier variants that can be safely used in a few locations.
  Node* NoBarrier_Next(int n) {
    assert(n >= 0);
    return next_[n].load(std::memory_order_relaxed);
  }
  void NoBarrier_SetNext(int n, Node* x) {
    assert(n >= 0);
    next_[n].store(x, std::memory_order_relaxed);
  }

 private:
  // Array of length equal to the node height.  next_[0] is lowest level link.
  <!
	 *當前節點的每個等級的下一個結點
	 *2級 N1 N2
	 *1級 N1 N2
	 *如果N1是本節點,則next_[x] 保存的是N2
	 *
	 *next_[0]就是原始鏈表。
	 >
  std::atomic<Node*> next_[1]; //大小是一個Node
};
SkipList
<!內存管理>
class Arena;

template <typename Key, class Comparator>
class SkipList {
 private:
  struct Node;

 public:
  // Create a new SkipList object that will use "cmp" for comparing keys,
  // and will allocate memory using "*arena".  Objects allocated in the arena
  // must remain allocated for the lifetime of the skiplist object.
  explicit SkipList(Comparator cmp, Arena* arena);

  SkipList(const SkipList&) = delete;
  SkipList& operator=(const SkipList&) = delete;

  // Insert key into the list.
  // REQUIRES: nothing that compares equal to key is currently in the list.
  void Insert(const Key& key);

  // Returns true iff an entry that compares equal to key is in the list.
  bool Contains(const Key& key) const;

   <!迭代器,英文註釋可直接看>
  // Iteration over the contents of a skip list
  class Iterator {
   public:
    // Initialize an iterator over the specified list.
    // The returned iterator is not valid.
    explicit Iterator(const SkipList* list);

    // Returns true iff the iterator is positioned at a valid node.
    bool Valid() const;

    // Returns the key at the current position.
    // REQUIRES: Valid()
    const Key& key() const;

    // Advances to the next position.
    // REQUIRES: Valid()
    void Next();

    // Advances to the previous position.
    // REQUIRES: Valid()
    void Prev();

    // Advance to the first entry with a key >= target
    void Seek(const Key& target);

    // Position at the first entry in list.
    // Final state of iterator is Valid() iff list is not empty.
    void SeekToFirst();

    // Position at the last entry in list.
    // Final state of iterator is Valid() iff list is not empty.
    void SeekToLast();

   private:
    <!當前迭代器關聯的SkipList>
    const SkipList* list_;
    <!當前迭代器所指向的值>
    Node* node_;
    // Intentionally copyable
  };

 private:
  <!跳錶的層數,最底層是第0>
  enum { kMaxHeight = 12 };
  
  <!獲取當前跳錶是多少層>
  inline int GetMaxHeight() const {
    return max_height_.load(std::memory_order_relaxed);
  }

  <!新建一個節點>
  Node* NewNode(const Key& key, int height);

  <!返回需要插入值的隨機高度,比方說4,
    那第0~3層都要插入對應的Node。
  >
  int RandomHeight();
  <!等值判斷>
  bool Equal(const Key& a, const Key& b) const { return (compare_(a, b) == 0); }

  <!當前key是否在節點n後面>
  // Return true if key is greater than the data stored in "n"
  bool KeyIsAfterNode(const Key& key, Node* n) const;

  // Return the earliest node that comes at or after key.
  // Return nullptr if there is no such node.
  //
  // If prev is non-null, fills prev[level] with pointer to previous
  // node at "level" for every level in [0..max_height_-1].
  Node* FindGreaterOrEqual(const Key& key, Node** prev) const;

  // Return the latest node with a key < key.
  // Return head_ if there is no such node.
  Node* FindLessThan(const Key& key) const;

  // Return the last node in the list.
  // Return head_ if list is empty.
  Node* FindLast() const;

  // Immutable after construction
  Comparator const compare_;
  Arena* const arena_;  // Arena used for allocations of nodes

  <!跳錶第0層的頭指針,指向第一個元素>
  Node* const head_;

  // Modified only by Insert().  Read racily by readers, but stale
  // values are ok.
  std::atomic<int> max_height_;  // Height of the entire list

  <!用於產生隨機數>
  // Read/written only by Insert().
  Random rnd_;
};

<!產生一個新節點,值是key,height表示此key存在於多少層,
  最底層是第0層,所以直接new,剩下的層就是(height - 1)個指針指示。
  此處通過Arena獲取內對齊的內存,提升CPU訪問速度。
 >
template <typename Key, class Comparator>
typename SkipList<Key, Comparator>::Node* SkipList<Key, Comparator>::NewNode(
    const Key& key, int height) {
    <!一個實際節點值,其它都是指針。>
  char* const node_memory = arena_->AllocateAligned(
      sizeof(Node) + sizeof(std::atomic<Node*>) * (height - 1)); 
   <!在已經分配好內存的node_memory上構造一個Node對象>
  return new (node_memory) Node(key); 
}

<!迭代器構造,迭代器都是基於第0層進行操作的>
template <typename Key, class Comparator>
inline SkipList<Key, Comparator>::Iterator::Iterator(const SkipList* list) {
  list_ = list;
  node_ = nullptr;
}

<!當前迭代器指向節點是否有效>
template <typename Key, class Comparator>
inline bool SkipList<Key, Comparator>::Iterator::Valid() const {
  return node_ != nullptr;
}

template <typename Key, class Comparator>
inline const Key& SkipList<Key, Comparator>::Iterator::key() const {
  assert(Valid());
  return node_->key;
}

template <typename Key, class Comparator>
inline void SkipList<Key, Comparator>::Iterator::Next() {
  assert(Valid());
  node_ = node_->Next(0); //迭代器都是操作的第0層的數據
}

<!當前節點的前一個節點>
template <typename Key, class Comparator>
inline void SkipList<Key, Comparator>::Iterator::Prev() {
  // Instead of using explicit "prev" links, we just search for the
  // last node that falls before key.
  assert(Valid());
  node_ = list_->FindLessThan(node_->key);
  if (node_ == list_->head_) {
    node_ = nullptr;
  }
}

<!定位到大於or等於此target的位置>
template <typename Key, class Comparator>
inline void SkipList<Key, Comparator>::Iterator::Seek(const Key& target) {
  node_ = list_->FindGreaterOrEqual(target, nullptr);
}

<!定位到第0層的一個節點值>
template <typename Key, class Comparator>
inline void SkipList<Key, Comparator>::Iterator::SeekToFirst() {
  node_ = list_->head_->Next(0); //迭代器操作的都是第0層,head是一個無值,只是一個指針。
}

template <typename Key, class Comparator>
inline void SkipList<Key, Comparator>::Iterator::SeekToLast() {
  node_ = list_->FindLast();
  if (node_ == list_->head_) {
    node_ = nullptr;
  }
}
<!生成要隨機插入的層高,比如4,那就是[0...3]都要插入>
template <typename Key, class Comparator>
int SkipList<Key, Comparator>::RandomHeight() {
  // Increase height with probability 1 in kBranching
  static const unsigned int kBranching = 4;
  int height = 1;
  while (height < kMaxHeight && ((rnd_.Next() % kBranching) == 0)) {
    height++;
  }
  assert(height > 0);
  assert(height <= kMaxHeight);
  return height;
}

<!判斷key是否在節點node之後>
template <typename Key, class Comparator>
bool SkipList<Key, Comparator>::KeyIsAfterNode(const Key& key, Node* n) const {
  // null n is considered infinite
  return (n != nullptr) && (compare_(n->key, key) < 0);
}

<!找到大於或等於key的節點,從最高層開始。
 1、如果未找到對應的Node,這返回的next是null。
 2、如果prev不爲null,則將每一層最近小於key的node
    地址保存起來。
 >
template <typename Key, class Comparator>
typename SkipList<Key, Comparator>::Node*
SkipList<Key, Comparator>::FindGreaterOrEqual(const Key& key,
                                              Node** prev) const {
  Node* x = head_;
  int level = GetMaxHeight() - 1;
  while (true) {
    Node* next = x->Next(level);
    if (KeyIsAfterNode(key, next)) {
      // Keep searching in this list
      x = next;
    } else {
      if (prev != nullptr) prev[level] = x;
      if (level == 0) {
        return next;
      } else {
        // Switch to next list
        level--;
      }
    }
  }
}

<!查找最近小於key的node,從最高層開始查起,
  如果未找到,返回head_。
>
template <typename Key, class Comparator>
typename SkipList<Key, Comparator>::Node*
SkipList<Key, Comparator>::FindLessThan(const Key& key) const {
  Node* x = head_;
  int level = GetMaxHeight() - 1;
  while (true) {
    assert(x == head_ || compare_(x->key, key) < 0);
    Node* next = x->Next(level);
    if (next == nullptr || compare_(next->key, key) >= 0) {
      if (level == 0) {
        return x;
      } else {
        // Switch to next list
        level--;
      }
    } else {
      x = next;
    }
  }
}

<!從最高層開始,定位到最後一個元素>
template <typename Key, class Comparator>
typename SkipList<Key, Comparator>::Node* SkipList<Key, Comparator>::FindLast()
    const {
  Node* x = head_;
  int level = GetMaxHeight() - 1;
  while (true) {
    Node* next = x->Next(level);
    if (next == nullptr) {
      if (level == 0) {
        return x;
      } else {
        // Switch to next list
        level--;
      }
    } else {
      x = next;
    }
  }
}

<!SkipList構造,及一些值的初始化>
template <typename Key, class Comparator>
SkipList<Key, Comparator>::SkipList(Comparator cmp, Arena* arena)
    : compare_(cmp),
      arena_(arena),
      head_(NewNode(0 /* any key will do */, kMaxHeight)),
      max_height_(1),
      rnd_(0xdeadbeef) {
  for (int i = 0; i < kMaxHeight; i++) {
    head_->SetNext(i, nullptr);
  }
}

<!插入key>
template <typename Key, class Comparator>
void SkipList<Key, Comparator>::Insert(const Key& key) {
  // TODO(opt): We can use a barrier-free variant of FindGreaterOrEqual()
  // here since Insert() is externally synchronized.
  <!找到大於等於key的節點x,並記錄沒一層最近不大於key的節點>
  Node* prev[kMaxHeight];
  Node* x = FindGreaterOrEqual(key, prev);

  <!要麼未找到這樣節點,如果找到了也可能和插入的值相等>
  // Our data structure does not allow duplicate insertion
  assert(x == nullptr || !Equal(key, x->key));

  <!產生需要隨機插入的高度>
  int height = RandomHeight();
  if (height > GetMaxHeight()) {
    <!如果高度超過現有的,則超過部分的前置節點賦值爲head_>
    for (int i = GetMaxHeight(); i < height; i++) {
      prev[i] = head_;
    }
    // It is ok to mutate max_height_ without any synchronization
    // with concurrent readers.  A concurrent reader that observes
    // the new value of max_height_ will see either the old value of
    // new level pointers from head_ (nullptr), or a new value set in
    // the loop below.  In the former case the reader will
    // immediately drop to the next level since nullptr sorts after all
    // keys.  In the latter case the reader will use the new node.
    max_height_.store(height, std::memory_order_relaxed);
  }

  <!生存一個新節點>
  x = NewNode(key, height);
  <!以下插入過程就是移動指針,從這裏我們也明白了爲什麼
    要有prev[kMaxHeight]了。
 >
  for (int i = 0; i < height; i++) {
    // NoBarrier_SetNext() suffices since we will add a barrier when
    // we publish a pointer to "x" in prev[i].
    x->NoBarrier_SetNext(i, prev[i]->NoBarrier_Next(i));
    prev[i]->SetNext(i, x);
  }
}

<!跳錶中是否存在此key>
template <typename Key, class Comparator>
bool SkipList<Key, Comparator>::Contains(const Key& key) const {
  Node* x = FindGreaterOrEqual(key, nullptr);
  if (x != nullptr && Equal(key, x->key)) {
    return true;
  } else {
    return false;
  }
}

四、總結

  1. SkipList的所有操作都是從上到下去執行。
  2. 隨機層數爲什麼是%4?因爲第x層節點數是第x-1層的1/4,換一個角度每個元素出現在每層的概率就是1/4,這樣通過%4來決定節點一共出現在多少層。
  3. leveldb用模板方式實現的SkipList,這樣更通用。
  4. leveldb的SkipList未實現del操作是因爲元素刪除也是插入,刪除某個Key的Value在 Memtable 內是作爲插入一條記錄實施的,但是會打上一個 Key 的刪除標記,真正的刪除操作是Lazy的,會在以後的 Compaction 過程中去掉這個KV。
  5. leveldb爲什麼使用SkipList來實現數據的插入查詢呢?爲什麼不是紅黑樹或其它數據結構?
    SkipList按照區間查找數據效率比較高,而且實現起來也不是太複雜。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章