Concerns on RSpace performance  

  RSS

jerry
(@jerry)
Eminent Member
Joined:8 months  ago
Posts: 31
14/10/2018 8:11 am  

I am a big fan of RChain and eager for its success on its break-through technology. From recent load test I saw RNode reached 3K COMM event/s. That increased a lot after improvements like fine-grained locks. However, 3K is still far from 10K in my mind.

So where is the bottleneck? rholang interpreter? or (de)serialization? or disk I/O operation? -- I don't know.

But, I am mostly concerned about RSpace implementation which is backed by LMDB. As I understand, each COMM event matches the message between producer and consumer and I feel COMM events rely on LMDB heavily. Reading or Writing LMDB are IO-bound operations  Could it be the bottleneck ?

After reading RSpace's source code,  I noticed LBDM environment flag is set to MDB_NOTLS

object BlockStoreTestFixture {
  def env(
      path: Path,
      mapSize: Long,
      flags: List[EnvFlags] = List(EnvFlags.MDB_NOTLS)
  ): Env[ByteBuffer] =
    Env
      .create()
      .setMapSize(mapSize)
      .setMaxDbs(8)
      .setMaxReaders(126)
      .open(path.toFile, flags: _*)

Is it possible to add extra flags like MDB_WRITEMAP and MDB_MAPASYNC? From my experience that would speed up the LMDB commit() operation by 10%-30%. Certainly it involves risk of losing Durability from ACID when OS or application crashes before data is persistented to disk.

I think Durability is not so important for RNode.  When RNode process normally exits, it flushes data into disk.  If it crashes, some data is lost and RNode can download the missing data from neighbor nodes in next startup.

If RNode is ok to lose durability on crash, I think there is much room to improve. Here just are some ideas.

  • The current B+tree can be used to store pointer(file offset) instead of actual data.
  • The actual data(e.g. blocks) is stored in another file on disk.
  • In this case, the B tree could be very small and it can be hold entirely in RAM
  • B tree can even be replaced by HashMap if traversal/iteration are not needed (It seems the code using LMDB cursor in RSpace is going to be dropped)
  • A background dedicated thread to serialize data from memory to disk
  • The recent data(blocks/GNA?) can be cached in memory hence when accessing them, Disk-seek/deserialization are not needed. 

I don't want to make indiscreet remarks or criticisms as I am not familiar with RNode. Thus, you can treat my words as brainstorming or crap. At least I hope it may inspire RNode developer 😉 .

 

 


Will liked
ReplyQuote
  
Working

Please Login or Register