Random keys in LMDB degenerates performance  

  RSS

jerry
(@jerry)
Eminent Member
Joined:9 months  ago
Posts: 31
09/10/2018 8:28 am  

I explored the rspace/data.mdb file generated by RNode 0.6.4 and saw the keys are distributed randomly.

lmdb

I would suggest RNode use monotonic keys in LMDB.
e.g.  prepending unix timestamp to the current key  

There are two major negative impacts if using  random keys in LMDB.

First,  LMDB is backed by B+ tree on disk. Inserting random keys would result in the tree fanout quickly. 
As the minimum fill rate is 1/2 for every internal node, the nodes are mostly filled up to the 1/2 (as the data spreads evenly due to its randomness) generating more internal nodes than before. More the internal nodes the more disk seeks LMDB has to perform to write the correct leaf to write the data in to. And you can't expect linear curve even on fast SSD disk.

Second, LMDB implements Copy-on-Write for robustness(Process crashes does not corrupt the file).
So after the transaction is done, the previously used pages are free. That means that they can be reclaimed, and the next transaction will do just that.
The principle of locality tells most recent added keys would be accessed soon.
The problem with random keys is that it tends to spread writes all over the file. it requires a lot more seeks when you need to commit a transaction.
An example can be found here:  https://dzone.com/articles/degenerate-performance

The above is just basing on theoretical analysis but I hope developers can consider it because RSpace is the kernel component for COMM events.  

To achieve the goal of 40K COMM events / sec ,  rspace has to work super fast.

 

Edited: 2 months  ago

ReplyQuote
  
Working

Please Login or Register