By Somnath Mishra and Susmit Sil
Ethereum is a distributed public Blockchain network that supports development of applications to enable transfer of any valuable assets to happen automatically in a rule-based manner between the participating parties. However, there are severe scalability challenges with Ethereum Blockchain network platform. There are several potential solutions to address it, but a balanced study is needed to understand the benefits and limitations of each before deploying them.
A Brief Introduction to Ethereum
Ethereum has emerged as much more than being a public Blockchain platform capability to exchange digital currency amongst peers. A key feature called ‘Smart Contract’ distinguishes Ethereum by giving it immense flexibility for wider application in enterprise use cases and even launching more types of cryptocurrencies. This capability has catapulted it to the status of a popular Blockchain platform that is being used to develop several enterprise applications. This has led to a sheer explosion in the number of transactions on Ethereum public Blockchain.
Though the volume of transactions has been increasing, the inherent properties of Ethereum mining do restrict the network to perform between 7-15 transactions per second. What is the primary reason for the transaction volume limit on Ethereum network? What is the precise nature of this issue?
Mining: Genesis of Ethereum network’s throughput limitations
Mining is the process by which new transactions are added to the public Blockchain. Miners validate new transactions and record them on the global ledger. It helps in doing away with a central authority to govern the network.
As part of the block mining process, the block’s metadata is passed through a hashing function. This function returns a fixed length string of alphanumeric characters while the mathematical puzzle (which is part of the mining process) pertains to generating this hash by matching the criteria suggested by the mathematical puzzle. The only leeway miners have to affect the generated hash is the value of nonce. Thus, the miners keep changing value of the nonce and regenerate the hash. Once the generated hash matches current target, the puzzle is solved. First Miner, who is able to match the criteria, broadcasts the newly generated block to the whole network. Every other node on the network validates it and adds it to their copy of the ledger. For the freshly minted block, the miner is awarded financial reward, which is generally in the form of cryptocurrency of the platform.
While hash generation (matching the criteria) is compute-intensive, it takes much less time to verify whether the generated hash is as per the expected target or not. Thus, it is aptly named as proof of work (PoW).
On an Ethereum network, on an average, a block is generated every 12-15 seconds. In case the miners generate the blocks more quickly or slowly, the difficulty is automatically re-adjusted to ensure that the average of around 12 seconds is maintained.
Ethereum uses ethash as its proof of work consensus algorithm. This algorithm is designed in a way that it requires more memory, making it harder to mine using ASIC machines. Ethereum hopes to reduce economic incentives for mining centralization by specifically designing an ASIC-resistant PoW algorithm.
ethash emphasises a property called memory hardness. Its performance is limited by how fast one’s computer can move data around in memory; not driven by how fast it can perform calculating operations. Consumer graphics cards can greatly help here more than ASIC machines. Thus, though ASIC is more expensive, it yields little advantage compared to simply using the latest commodity hardware.
The Scalability Issue in Ethereum Network
The popularity of Ethereum has contributed to a large number of transactions being performed daily on its network. This has hampered the overall scalability of the platform. This transaction congestion forces users to wait for inordinately long time.
At today’s complexity level, which is self-adjusted by the network, Ethereum network can process 7 to 15 transactions per second. As a comparison, Visa can process around 45,000 transactions per second. Hence, in order to achieve enterprise class scalability, the Ethereum network needs to improve its transaction processing capacity on a mega scale.
In the Ethereum network, the transactions are sequentially processed on a node by the EVM (Ethereum Virtual Machine). Full nodes execute each transaction and store the complete state. In addition, the transaction generated by one node has to be executed by each and every node on the network. This need for all nodes to process each and every transaction is another major contributing factor for network’s slowness and low throughput.
What can be done to augment the network throughput? While there are several potential solutions, what does each solution entail? Every solution option has its own pros and cons and it is instructive to take them into full consideration before the scalability challenge of Ethereum Blockchain network is addressed.
Scalability Options for Ethereum Platform
There are primarily 3 potential solutions to address the scalability challenges of Ethereum platform effectively:
- Sharding: Sharding involves dividing a chain state into smaller partitions. In such a scheme, as every transaction is not required to be processed by every node on the network, it needs to be processed only by the nodes in the partition where the transaction originated. Thus, by reducing the number of nodes that must process each transaction, overall network throughput can be improved.
- State Channel: This solution mandates that the core platform prioritizes the operations it works on while the remaining operations are shifted off the chain (off-chain). A large sequence of transactions is processed off-chain and the proof is submitted to the chain later on.
- Plasma: Plasma is another off-chain scaling technique which helps in conducting off-chain transactions while relying on the underlying Blockchain to provide the needed security.
Overview of Sharding
Sharding is a time-tested concept widely used in database implementation. Its key objective is to reduce the burden on each network node by removing the need to process each and every transaction being executed on the network.
It is done by splitting the complete state of the network into shards or sections having their own specific fraction of state and transaction history. Transactions are processed in parallel in each shard. Each node has to only validate the transactions happening within their own shards. To scale this solution further, a shard can in turn, have sub-shards to provide even higher levels of throughput.
Sharding improves the performance of the main chain (known also as the beacon chain). It is known as an on-chain solution which requires a hard fork of the main network. There are several concepts that come into play while implementing Sharding:
It is equivalent to a block in the main chain. It consists of a header and a list of transactions. The transactions would be wrapped similar to what is done in a block. It also points to its parent collation in the shard chain.
Shard Chain Consensus
Only small pieces of proof of collations have to be recorded on the main chain. The shard chains have their own sandbox of transactions. There are shard validators to verify the shard they are watching for. Shard chains are also attached to the main chain in order to reach a higher level of consensus using proof of stake mechanism. A Collator is assigned for each period. This assignment is random for a certain period of time defined by the period needed for submission of shard chain’s collation header to the main chain.
Validation Manager Contract
Validation Manager Contract (VMC) is the core concept of Sharding. It is a Smart Contract that plays a significant role in the shard chains. It is the contract which facilitates joining of shard chains to the main chain. The primary features of 90C are:
- Need for a deposit ETH (Digital currency used on Ethereum) on the main PoW chain before a validator is registered.
- Induction of validator as an active validator when the transaction processing is completed.
- Slashing of the stake of the validators if they are found faulty based on proof of stake.
- Maintaining the pool of validators with complete privacy by randomly sampling a node to allocate to a specific shard chain as collator for a given time period.
- Verifying for and writing the valid collation header hash to provide immediate on-chain verification.
- Facilitating collators to provide their vote on chain and ensuring overall governance.
Figure 1: Proof of shard states recorded on main chain via VMC
VMC maintains n number of shard chains. Each shard chain operates in parallel. The clients of a shard say shard X has to only verify the transactions on shard X. Period is defined as a window of block times with a fixed period length. Period’s unit is the total number of blocks submitted on the main chain for that period. As an example, each period can have 5 blocks.
For every selected time period, a validator is randomly selected for each shard. The validator is given the rights to introduce a block on their selected shard. Each shard is provided with a set of 100 validators (also known as attesters) who are selected to attest the transactions. At least 67 signatures are published along with the header of the block to the main chain. Attesters are part of a committee who provide sign off on the beacon chain block thus creating a link to the shard block of a shard chain.
Levels of Nodes
- Top level node: It processes the main chain which include the headers and signatures of shard blocks. It doesn’t download the data for shard blocks.
- Super full node: It downloads the whole data of the main chain which includes each and every shard block referenced in the main chain.
- Light node: It downloads and verifies only the main chain block headers.
- Single shard node: It downloads every collation on specific shards.
Single Shard 1% Takeover Attack: There is a possibility that an attacker takes over a majority of the collators in one single shard. In the case of a PoW consensus-based Blockchain, one cannot stop miners from applying their control on a given shard as only 1% of the total hash power is required to take over the shard. This is why PoS consensus should be used for the Blockchain wherever sharding is to be implemented.
In order to counter this challenge, random sampling can be used. As per its implementation, collator doesn’t have the rights to choose the shard they want to work on; they get randomly allocated to a shard and this is revealed to them at the last moment. In addition, shards are reshuffled to avoid the possibility of collators forming a bond.
Cross-Shard Communication: There could be situations where there is a need for transactions to span multiple shards. If a transaction requires the use of addresses which are present in two different shard chains, a new protocol is developed where a user in a shard chain creates a receipt and the receipt is passed to the next shard chain. Merkle proof in the receipt moves from one shard to the other through the main chain.
It can be understood better with an example:
- There are two shards i.e. Shard A and Shard B.
- Transaction is made from account A.1 of shard A to account B.1 of shard B.
- Generate receipt by subtracting the balance of account A.1 of shard A by X coins.
- This generates a Merkle proof containing a receipt.
- Transaction is included in the collation and in the block on the main chain.
- VMC sends receipt to shard B and shard B confirms the receipt.
- One has to verify that the receipt is unspent.
- Once verified, the balance is increased for shard B’s B.1 account by X coins.
- Receipt is consumed by Shard B.
- Receipt is passed to Shard A via main chain.
- Receipt is deleted by Shard A bringing the whole transaction to completion.
Overview of State Channels
State Channel is a technique for performing state updates and off-chain transactions. Activities happening inside a state channel maintain a high degree of security & finality. Just as payment channels introduced Bitcoin through a lightening network, state channels also have more generic nature. In addition to payments, they can be used to perform arbitrary state update on the Blockchain.
- Two parties A and B are involved to perform some operation.
- Party A creates and signs a transaction and sends it to Party B.
- Party B also creates and signs a transaction and sends it back to Party A.
- Party B keeps a copy by itself.
- These to and fro transactions happen for a long time between them. They are always updating the current state of the operation between them. Each transaction contains a nonce to determine the sequence of a transaction.
- None of the transactions has impacted anything on chain till now.
- When the flow of transactions is over between Party A and Party B, they can close the channel by submitting final state to the chain. The final state i.e. all the transactions are submitted to a contract on the chain.
- Contract ensures that the final state is signed by both parties. The contract provides a fixed time for both the parties to oppose if the transactions submitted are not correct.
- Once confirmed based on the outcome of the flow, the relevant party gets the payment which was pre-deposited on the contract.
Challenge period is the period starting from transaction submission till confirmation receipt from both parties. The need for it can be explained with the help of an example. Assume that while submitting the final state of the transactions, the party submits an older version of the state. The contract is not smart enough and is not able to figure out whether it is the final state or not. Now, challenge period gives parties the opportunity to prove whether it is the final state or not. Contract can confirm which version is more recent and if anyone tries to commit a fraud, it can be easily prevented.
Features of State Channel
- Reliance on Availability – Reliance on Availability – If one of the parties lost its internet connection during a challenge, it won’t be possible for the truthful party to respond before the challenge period expires. But, parties can pay someone else to keep a copy of the state and maintain availability on its behalf.
- Useful with Multiple State Updates – It is preferred to have a state channel when multiple state updates are to be applied with a high volume of transactions. There is an initial cost in creating a channel on which the contract is deployed. However, the cost per state update within that channel is quite low.
- Strong privacy properties – This is required because everything is happening within the channel between the parties. It is not broadcasted publicly or recorded on chain. Just the opening and closing transactions need to be made public.
- Useful for applications with a defined set of participants – This is so because the contract needs to constantly know all the parties who are part of the channel. Parties could be added or removed but that would require a change in the contract every time the parties change.
- Instant Finality – The moment all the parties sign a state update, the transaction is considered final.
Overview of Plasma
The idea of Plasma is that one can create child Blockchains to be attached to the main Ethereum Blockchain. These child chains can further create child chains and so on. While using this technique, the complex operations are performed at the child chain level. Thus, entire applications could be created for adequate number of users needing minimal interaction with the main chain. Plasma is also categorized under off-chain scaling implementations.
The Flow of Plasma
Initially, Smart Contracts are created on the main chain. These Smart Contracts serve as the root to the child chain. Main chain smart contract records state hashes of the child chain and allow users to move assets between the main chain and the child chain.
Child chain is created only after it is rooted in the main chain. The child chain maintains its own consensus algorithm which is independent of the main chain, known as PoS. So, mining is not required for this. The block producers are economically incentivized to remain truthful by means of collateral. This collateral gets destroyed in case of a bad actor behaving fraudulently. Compared to PoW, this consensus mechanism leads to faster block creation.
Once the child chain is initialized, basic rules of the application can be set by deploying the actual application smart contract on the child chain, having all the application logic and rules.
When a single entity has control over 100% of block production on the child-chain, Plasma does guarantee that every party can always withdraw their funds and assets back onto the main chain at any time. The worst that can happen is that they may force one to leave the child-chain. Plasma exit is a process that allows the users of a Plasma Chain to stop participating in the child chain, and move their funds back to the main chain.
Challenges with Plasma
A major challenge with Plasma is the lack of control in a situation when everyone using a child chain tries to leave the sidechain at the same time. In such a situation, the main chain will not have enough capacity to process everyone’s transactions in the given challenge period. One of the techniques to prevent this is to extend the challenge period to make it more responsive to withdrawal demands.
Scalability Solutions – A Comparison
While the choice varies depending on the specific scenario, the following insights direct one to understand the various comparative scenarios:
- Sharding is implemented on Ethereum’s base protocol which requires hard forking and is not an off-chain solution. On the other hand, both State channels and Plasma are off-chain solutions.
- State channel is quite similar to Plasma. The goal is to move as much transaction load off the main chain as possible.
- When all the parties involved in a state channel agree to close the channel and withdraw their funds, it can be done immediately. This is not possible on Plasma as users need to go through a withdrawal process which does include the challenge of dealing with a limited challenge period.
- In comparison to Plasma, State channels may be less expensive and faster as well.
- State channels could be built on Plasma child chains to help lower the cost further.
Ethereum Platform Scalability – A Solution in Sight
Ethereum Blockchain platform extends way beyond digital cash applications and makes a number of unique applications possible which do not require any third-party facilitation or control. Beyond advanced expertise expected of the technical teams, Ethereum platform comes with an inherent scalability challenge in terms of the speed of transaction execution, especially, given the operating mode of each node having to constantly execute every transaction on the network. There are several solutions available to address and resolve the scalability challenge. These include Sharding, State Channel and Plasma. While some solutions split the transactions while some others move them off the main chain. There are specific advantages, efficiencies and complexities associated with each methodology. The detailed study of these 3 options indicates a high probability of success if an appropriate option is adopted before deploying a specific application on the Ethereum platform. It also makes ample sense to collaborate with a trusted partner with the requisite expertise in order to move quickly.
The Coforge Thought Board:
Unleash the True Potential of Ethereum Blockchain Platform by Resolving its Scalability Challenges
|PoW||Proof of work|
|ASIC||Application Specific Integrated Circuit|
|EVM||Ethereum Virtual Machine|
|VMC||Validation Manager Contract|
|PoS||Proof of Stake|