This is the first in a series of six blog posts where we discuss our benchmarking of the ledger we are building. As our ultimate goal is to build a scalable ledger, the absolute number of transactions per seconds is a poor metric to understand how well the system performs, as a perfectly scalable system would not be bounded in throughput. Rather, the interesting question is: How does the system perform as a function of computational resources available per node?
The answer to this question strongly depends on a number of different components in the system for each of which the performance will vary depending on the conditions under which they operate. For instance, transaction synchronization of a single shard can never exceed the limitations of the network connection it is running on. That is, if you run a shard using a 56K modem and average transaction size of 2kb, you would at the very best be able to get 20 – 30 transactions per second due to bandwidth limitations for a single shard. On the other hand, on a modern 100MBit connection, transaction synchronization could peak at 5k transactions / second.
To answer the above question, we will be benchmarking a number of individual components to first quantify their performance. Each of the these benchmarks will be based on test conditions and/or assumptions about the system that are expected to either reflect an production environment or the worst case scenario. In a series of blog posts, we will address:
For reference, we are working under the assumption that the average transaction sizes to be around 2048 bytes unless otherwise stated throughout these benchmarks. This is well above the size of a normal Bitcoin transaction and has been chosen to ensure that the numbers will stay true even with the more advanced synergetic and smart contracts which Fetch.AI are enabling with our ledger.
The high-level architecture of the Fetch.AI ledger transaction management system involves a main chain that that controls multiple shards which we refer to as lanes. These lanes can be considered individual sub-chains that are coordinated through the main chain and where the state maintenance is done by a number of transaction executors working across the lanes:
This simplified block diagram provides an overview of Fetch.AI’s scaling solution; Transactions arrive at the main node and are sharded according to their digest and dispatched to the appropriate lane, usually on a different machine. This allows a high degree of parallelism in transaction execution and dynamic balancing to system demands. For further information refer to the Fetch.AI yellow paper on the scalable ledger.
One of the key elements (and hence possible bottlenecks) of the system is the transaction store. Each lane in the Fetch.AI ledger adopts an architecture that is very close to those found in other ledgers where transactions are first held in a transient memory store and later written to the disk. We summarise the simplified architecture in the image below:
The object store was benchmarked and we summarise the results here:
|Test Name||Time [ms]||TX total||TX size [bytes]||Throughput
|tx_write_0_1||118||1||16||8||0.135 ∙ 10-3|
|tx_write_1_1||115||1||2048||9||17.7 ∙ 10-3|
|tx_write_0_8||118||8||16||68||1.08 ∙ 10-3|
|tx_write_1_8||120||8||2048||67||136 ∙ 10-3|
|tx_write_0_64||123||64||16||518||8.29 ∙ 10-3|
|tx_write_1_64||119||64||2048||537||1.1 ∙ 10-3|
|tx_write_0_512||122||512||16||4177||66.8 ∙ 10-3|
|tx_write_1_512||115||512||2048||4433||9.08 ∙ 10-3|
|tx_write_0_262k||2.16 ∙ 103||262144||16||121248||1.94|
|tx_write_1_262k||7.32 ∙ 103||262144||2048||35808||73.34|
|tx_write_0_1M||7.57 ∙ 103||2097152||16||132096||2.11|
|tx_write_1_1M||31.3 ∙ 103||2097152||2048||31973||65.5|
For this test, increasingly large amounts of transactions were submitted to the system, varying the size of these to be artificially small or the expected size.
Due to certain data structures in the object store designed to amortize the cost of disk writes the effect is that for larger numbers of transactions submitted, the average rate of transactions per second increases significantly. As can be seen from the table, for the expected transaction size of ~2048 bytes, the rate rapidly approaches just above 30k Tx/sec. Given that the system design is to shard the object store across the lanes (where each lane will have its own disk), this result is comfortably within bounds.
For the test of the transient store, transactions were written to and read back from the store. In addition, 10% of these transactions were scheduled to be written to the object store, representing a subset of the mempool which is committed long term. This represents the expected operation, where transactions arrive, are stored in the mempool for some time, are accessed for verification and mining, and are ultimately either committed to long term storage, or discarded.
|Test Name||TX count||Time [ns]||TX size [bytes]||Throughput
|trans_store_1M||1 ∙106||8.58 ∙109||2048||116527||239|
In conclusion the transient store is unlikely to pose a problem as it is decoupled from disk writes. It is able to make use of standard data structures to provide very high rates.
Having established that each lanes low-level transaction handling mechanisms are capable of handle 30k+ transactions pr. second, we will discuss the full performance of a single lane system in the next blog post.