存储引擎
动态查找树(为了快速查找)主要有:
- 二叉查找树(BST【中序遍历是一个递增序列,用来判定是否是BST】),
- 平衡二叉树(AVL, 如何判定是否是AVL树)
- 红黑树
- B树 (平衡多路查找树)
- B+树
二叉树存储在磁盘上,则io太多;所以B树,B+树比较适合关系型数据库的索引(MySQL)
- Hash存储引擎
哈希表的持久化实现,支持增、删、改以及随机读取操作,但不支持顺序扫描,对应的存储系统为key-value存储系统。对于key-value的插入以及查询,哈希表的复杂度都是O(1),明显比树的操作O(n)快,如果不需要有序的遍历数据,哈希表性能最好。
相关存储系统:Bitcask、ceph
- BTree存储引擎
应用最为广泛的数据结构;尤其在数据库领域。
不仅支持单条记录的增、删、读、改操作,还支持顺序扫描(B+树的叶子节点之间的指针),对应的存储系统就是关系数据库(Mysql等)
- LSMTree存储引擎(代表就是HBase, leveldb)
同样支持增、删、读、改、顺序扫描操作。而且通过批量存储技术规避磁盘随机写入问题。当然凡事有利有弊,LSM树和B+树相比,LSM树牺牲了部分读性能,用来大幅提高写性能。
适用于索引插入比检索更频繁的应用系统;
分为以上三类;
B+树和LSMTree比较:
- LSM具有批量特性,存储延迟。当写读比例很大的时候(写比读多),LSM树相比于B树有更好的性能。因为随着insert操作,为了维护B+树结构,节点分裂。读磁盘的随机读写概率会变大,性能会逐渐减弱。
- B树的写入过程:对B树的写入过程是一次原位写入的过程,主要分为两个部分,首先是查找到对应的块的位置,然后将新数据写入到刚才查找到的数据块中,然后再查找到块所对应的磁盘物理位置,将数据写入去。当然,在内存比较充足的时候,因为B树的一部分可以被缓存在内存中,所以查找块的过程有一定概率可以在内存内完成,不过为了表述清晰,我们就假定内存很小,只够存一个B树块大小的数据吧。可以看到,在上面的模式中,需要两次随机寻道(一次查找,一次原位写),才能够完成一次数据的写入,代价还是很高的。
LSM:
- Bloom filter: 就是个带随机概率的bitmap,可以快速的告诉你,某一个小的有序结构里有没有指定的那个数据的。于是就可以不用二分查找,而只需简单的计算几次就能知道数据是否在某个小集合里啦。效率得到了提升,但付出的是空间代价。
- compact:小树合并为大树:因为小树性能有问题,所以要有个进程不断地将小树合并到大树上,这样大部分的老数据查询也可以直接使用log2N的方式找到,不需要再进行(N/m)*log2n的查询了
LSM设计目地:
1、顺序读写磁盘快于随机读写磁盘(大约等于磁盘的理论速度),这就要求避免随机读写(最好是将随机读写设计成顺序读写);
2、将数据添加到文件(顺序写log),这就带来了怎么解决读的问题;
需要如何设计来为复杂的读场景(按key查找或者range)提供高效的性能;
- 二分查找: 将文件数据有序保存,使用二分查找来完成特定key的查找。
- 哈希:用哈希将数据分割为不同的bucket
- B+树:使用B+树 或者 ISAM 等方法,可以减少外部文件的读取
- 外部文件: 将数据保存为日志,并创建一个hash或者查找树映射相应的文件。
平和树合并/跳表合并/有序链表合并
aio 简单介绍
说到 aio,会有三个东西:
posix aio,在用户态使用 glic 实现,维护一个线程池来模拟异步IO。接口为 aio_read/aio_write/aio_xxxx。性能较差。
linux aio,linux 特有的 aio 实现,接口为 aio_submit/aio_cancel 等5个函数。
libaio,oracle 对 linux aio 的包装。
// description:
// Thread safety
// Writes require external synchronization, most likely a mutex.
// Reads require a guarantee that the SkipList will not be destroyed while the read is in progress. Apart from that, reads progress without any internal locking or synchronization.
//
// Invariants:
// (1) Allocated nodes are never deleted until the SkipList is destroyed. This is trivially guaranteed by the code since we never delete any skip list nodes.
// (2) The contents of a Node except for the next/prev pointers are immutable after the Node has been linked into the SkipList. Only Insert() modifies the list, and it is careful to initialize a node and use release-stores to publish the nodes in one or more lists.
template<typename Key, class Comparator>
class SkipList {
private:
struct Node;
public:
// Create a new SkipList object that will use "cmp" for comparing keys, and will allocate memory using "*arena". Objects allocated in the arena must remain allocated for the lifetime of the skiplist object.
explicit SkipList(Comparator cmp, Arena* arena);
// Insert key into the list.
// REQUIRES: nothing that compares equal to key is currently in the list.
void Insert(const Key& key); // 插入一个key到Skiplist中
// Returns true iff an entry that compares equal to key is in the list.
bool Contains(const Key& key) const; // Skiplist中key的节点是否存在
private:
enum { kMaxHeight = 12 }; // 最大level
// Immutable after construction
Comparator const compare_; // key值的比较函数,一旦初始化就不能变化了(当插入一些数据后,改变key,状态不可控)
Arena* const arena_; // Arena used for allocations of nodes // levelDB中使用的Arena内存池对象
Node* const head_; // Skiplist头结点
// Modified only by Insert(). Read racily by readers, but stale
// values are ok.
port::AtomicPointer max_height_; // Height of the entire list // Skiplist层数
inline int GetMaxHeight() const { // 返回Skiplist的层数
return reinterpret_cast<intptr_t>(max_height_.NoBarrier_Load());
}
// Read/written only by Insert().
Random rnd_; // 随机器,产生随机的level层数
Node* NewNode(const Key& key, int height); // 新建一个level=height,键位key的节点
int RandomHeight(); // 随机产生一个level层数
bool Equal(const Key& a, const Key& b) const { return (compare_(a, b) == 0); } // 比较2个key是否相等
// Return true if key is greater than the data stored in "n"
bool KeyIsAfterNode(const Key& key, Node* n) const; // 比较key与Node n中的key,是否key在后面
// Return the earliest node that comes at or after key.
// Return NULL if there is no such node.
// If prev is non-NULL, fills prev[level] with pointer to previous
// node at "level" for every level in [0..max_height_-1].
Node* FindGreaterOrEqual(const Key& key, Node** prev) const; // 找到key对应的Node或是key后面紧邻的Node
// Return the latest node with a key < key, return head_ if there is no such node.
Node* FindLessThan(const Key& key) const; // 找到key前面紧邻的Node
// Return the last node in the list.
// Return head_ if list is empty.
Node* FindLast() const; // Skiplist最后一个Node
// No copying allowed
SkipList(const SkipList&); // 拷贝构造和赋值构造操作不允许
void operator=(const SkipList&);
};
// Implementation details follow
template<typename Key, class Comparator>
struct SkipList<Key,Comparator>::Node { // Skiplist节点Node定义
explicit Node(const Key& k) : key(k) { }
Key const key;
// Accessors/mutators for links. Wrapped in methods so we can add the appropriate barriers as necessary.
Node* Next(int n) {
assert(n >= 0);
// Use an 'acquire load' so that we observe a fully initialized version of the returned Node.
return reinterpret_cast<Node*>(next_[n].Acquire_Load());
}
void SetNext(int n, Node* x) {
assert(n >= 0);
// Use a 'release store' so that anybody who reads through this pointer observes a fully initialized version of the inserted node.
next_[n].Release_Store(x);
}
// No-barrier variants that can be safely used in a few locations.
Node* NoBarrier_Next(int n) {
assert(n >= 0);
return reinterpret_cast<Node*>(next_[n].NoBarrier_Load());
}
void NoBarrier_SetNext(int n, Node* x) {
assert(n >= 0);
next_[n].NoBarrier_Store(x);
}
private:
// Array of length equal to the node height. next_[0] is lowest level link.
port::AtomicPointer next_[1]; // forward数组指针
};
template<typename Key, class Comparator>
typename SkipList<Key,Comparator>::Node* SkipList<Key,Comparator>::NewNode(const Key& key, int height) { // 新建一个Node节点(指定key及level层数)
char* mem = arena_->AllocateAligned(sizeof(Node) + sizeof(port::AtomicPointer) * (height - 1));
return new (mem) Node(key); // 显式调用new
}
template<typename Key, class Comparator>
SkipList<Key,Comparator>::SkipList(Comparator cmp, Arena* arena) // 构造函数
: compare_(cmp),
arena_(arena),
head_(NewNode(0 /* any key will do */, kMaxHeight)), // 头节点的key没有意义
max_height_(reinterpret_cast<void*>(1)),
rnd_(0xdeadbeef) {
for (int i = 0; i < kMaxHeight; i++) {
head_->SetNext(i, NULL); // 初始化头结点
}
}
template<typename Key, class Comparator>
int SkipList<Key,Comparator>::RandomHeight() { // 返回随机高度(Skiplist依赖于这个随机性)
// Increase height with probability 1 in kBranching
static const unsigned int kBranching = 4;
int height = 1;
while (height < kMaxHeight && ((rnd_.Next() % kBranching) == 0)) { //? 直接取一个随机数不行?为什么要循环几次?
height++;
}
assert(height > 0);
assert(height <= kMaxHeight);
return height;
}
template<typename Key, class Comparator>
bool SkipList<Key,Comparator>::KeyIsAfterNode(const Key& key, Node* n) const { // Return true if key is greater than the key stored in "n"
// NULL n is considered infinite,NULL被视为无限大(这样就考虑了结尾的NIL)
return (n != NULL) && (compare_(n->key, key) < 0);
}
template<typename Key, class Comparator>
typename SkipList<Key,Comparator>::Node* SkipList<Key,Comparator>::FindGreaterOrEqual(const Key& key, Node** prev) const { //
Node* x = head_;
int level = GetMaxHeight() - 1;
while (true) {
Node* next = x->Next(level);
if (KeyIsAfterNode(key, next)) { // key在next节点后面,如果返回true,那么肯定next不为NULL
// Keep searching in this list
x = next;
} else {
if (prev != NULL) prev[level] = x; // 当前level上,x为高度>=key节点高度,且正好排在其前面,插入和删除时使用
if (level == 0) {
return next;
} else {
// Switch to next list( low level link list)
level--;
}
}
}
}
template<typename Key, class Comparator>
typename SkipList<Key,Comparator>::Node* SkipList<Key,Comparator>::FindLessThan(const Key& key) const { // Return the latest node with a key < key, return head_ if there is no such node.
Node* x = head_;
int level = GetMaxHeight() - 1;
while (true) {
assert(x == head_ || compare_(x->key, key) < 0); //
Node* next = x->Next(level);
if (next == NULL || compare_(next->key, key) >= 0) { // 从最高level尽可能向后移动更远的距离
// 后面key>查找的key时,或next为空时,level--,直到level=0
if (level == 0) {
return x;
} else {
// Switch to next list
level--; // 从最高层往下后续查找
}
} else {
x = next;
}
}
}
template<typename Key, class Comparator>
typename SkipList<Key,Comparator>::Node* SkipList<Key,Comparator>::FindLast() const { // 先从最高level走到头,然后减少level继续走到头,一直到level=0
Node* x = head_;
int level = GetMaxHeight() - 1;
while (true) {
Node* next = x->Next(level);
if (next == NULL) {
if (level == 0) {
return x;
} else {
// Switch to next list
level--;
}
} else {
x = next;
}
}
}
template<typename Key, class Comparator>
void SkipList<Key,Comparator>::Insert(const Key& key) { // 插入key节点
// TODO(opt): We can use a barrier-free variant of FindGreaterOrEqual()
// here since Insert() is externally synchronized.
Node* prev[kMaxHeight];
Node* x = FindGreaterOrEqual(key, prev); // prev记录每个level上前一个节点
// Our data structure does not allow duplicate insertion
assert(x == NULL || !Equal(key, x->key));
int height = RandomHeight();
if (height > GetMaxHeight()) {
for (int i = GetMaxHeight(); i < height; i++) {
prev[i] = head_;
}
//fprintf(stderr, "Change height from %d to %d\n", max_height_, height);
// It is ok to mutate max_height_ without any synchronization
// with concurrent readers. A concurrent reader that observes
// the new value of max_height_ will see either the old value of
// new level pointers from head_ (NULL), or a new value set in
// the loop below. In the former case the reader will
// immediately drop to the next level since NULL sorts after all
// keys. In the latter case the reader will use the new node.
max_height_.NoBarrier_Store(reinterpret_cast<void*>(height));
}
x = NewNode(key, height); // 新建一个Node
for (int i = 0; i < height; i++) { // 根据当前节点的level层数,设置每个level的指针
// NoBarrier_SetNext() suffices since we will add a barrier when we publish a pointer to "x" in prev[i].
x->NoBarrier_SetNext(i, prev[i]->NoBarrier_Next(i));
prev[i]->SetNext(i, x);
}
}
template<typename Key, class Comparator>
bool SkipList<Key,Comparator>::Contains(const Key& key) const { // Skiplist是否包含key
Node* x = FindGreaterOrEqual(key, NULL); // 查找大于或等于key的节点
if (x != NULL && Equal(key, x->key)) { // 非空,且相同,表示包含
return true;
} else {
return false;
}
}
// Iteration over the contents of a skiplist
template<typename Key, class Comparator>
class SkipList<Key,Comparator>::Iterator { // Skiplist迭代器
public:
// Initialize an iterator over the specified list.
// The returned iterator is not valid.
explicit Iterator(const SkipList* list);
// Returns true iff the iterator is positioned at a valid node.
bool Valid() const;
// Returns the key at the current position.
// REQUIRES: Valid()
const Key& key() const;
// Advances to the next position.
// REQUIRES: Valid()
void Next();
// Advances to the previous position.
// REQUIRES: Valid()
void Prev();
// Advance to the first entry with a key >= target
void Seek(const Key& target);
// Position at the first entry in list.
// Final state of iterator is Valid() iff list is not empty.
void SeekToFirst();
// Position at the last entry in list.
// Final state of iterator is Valid() iff list is not empty.
void SeekToLast();
private:
const SkipList* list_;
Node* node_;
// Intentionally copyable 采用默认的copy构造函数,成员直接赋值
};
template<typename Key, class Comparator>
inline SkipList<Key,Comparator>::Iterator::Iterator(const SkipList* list) { // 构造函数,初始化iterator
list_ = list;
node_ = NULL;
}
template<typename Key, class Comparator>
inline bool SkipList<Key,Comparator>::Iterator::Valid() const { // Returns true iff the iterator is positioned at a valid node.
return node_ != NULL;
}
template<typename Key, class Comparator>
inline const Key& SkipList<Key,Comparator>::Iterator::key() const { // Returns the key at the current position.
assert(Valid());
return node_->key;
}
template<typename Key, class Comparator>
inline void SkipList<Key,Comparator>::Iterator::Next() { // Advances to the next position.
assert(Valid());
node_ = node_->Next(0); // 从level 0后移指向下一个
}
template<typename Key, class Comparator>
inline void SkipList<Key,Comparator>::Iterator::Prev() { // Advances to the previous position.
// Instead of using explicit "prev" links, we just search for the
// last node that falls before key.
assert(Valid());
node_ = list_->FindLessThan(node_->key); // 找到前一个节点,如果为head_,则设置为NULL
if (node_ == list_->head_) {
node_ = NULL;
}
}
template<typename Key, class Comparator>
inline void SkipList<Key,Comparator>::Iterator::Seek(const Key& target) { // Advance to the first entry with a key >= target
node_ = list_->FindGreaterOrEqual(target, NULL);
}
// 第一个节点
template<typename Key, class Comparator>
inline void SkipList<Key,Comparator>::Iterator::SeekToFirst() { // Position at the first entry in list.
node_ = list_->head_->Next(0);
}
template<typename Key, class Comparator>
inline void SkipList<Key,Comparator>::Iterator::SeekToLast() { // Position at the last entry in list.
node_ = list_->FindLast(); // 查找最后一个节点,如果链表为空时,设置为null
if (node_ == list_->head_) {
node_ = NULL;
}
}
mysql底层相关索引存储引擎
[索引能加快访问数据的速度,是因为存储引擎不在进行全表扫描来获取所需要的数据,而是从索引的根节点开始搜索]
存储引擎以不同方式使用B-Tree引擎;各有优劣
1. MyISAM使用前缀压缩技术使得索引更小
带来两个问题:为什么是B-Tree呢?(而不是hash,红黑树等)
1、B-tree是顺序组织数据的,范围查找的需求(相对于hash的优势)
2、IO少(相对于红黑树的优势)
1.NDB集群存储引擎使用T-Tree结构存储引擎
2.InnoDB使用B+Tree
B-tree索引适用于全键值,键值范围,键前缀查找(最左前缀查找):
- 全职匹配
- 匹配最左前缀
- 匹配列前缀
- 匹配范围值
- 精确匹配某一列并范围匹配另外一列
- 只访问索引的查询