2024-10-27
Vitalik Buterin has introduced a plan
called "The Purge" to reduce Ethereum's data bloat and simplify the
protocol. The plan involves trimming unnecessary data storage, eliminating
outdated features, and making it easier for new nodes to join the network. This
includes proposals like history expiry and state expiry, which aim to manage
data more efficiently and ensure the network remains scalable.
Ethereum, one of the leading blockchain
platforms, faces the challenge of managing its long-term growth and complexity.
This issue is inherent to Ethereum blockchain protocol, where over time, the
size of historical data and protocol features continues to grow.
As Ethereum grows, it must balance the
need for maintaining its key feature of permanence while minimizing bloat and
complexity. "The Purge," is an initiative aimed at simplifying and
sustaining the network for the long term.
The Challenge of Blockchain Bloat
Every blockchain, by its nature,
accumulates vast amounts of data. Each transaction, contract, and account
created needs to be stored permanently by all nodes in the network.
This creates a burden on clients and
increases the time it takes for new clients to sync with the network.
Furthermore, adding new features to the protocol increases its complexity,
while removing outdated features is a much more challenging task.
Ethereum’s ability to survive and
thrive in the long term requires countermeasures to reduce this growing
complexity and bloat.
However, the challenge lies in
preserving Ethereum’s permanence—one of the platform’s core strengths. Users
must be able to trust that the records they leave on the blockchain, such as
NFTs or smart contracts, will remain unchanged and accessible even after long
periods.
This continuity is important for
decentralized applications (dapps) that aim to remove their upgrade keys and go
fully decentralized.
The Purge
Ethereum’s developers have acknowledged
this problem and come up with a solution to address it.
The Purge aims to minimize the burden
on nodes by reducing unnecessary data and features while ensuring that the core
properties of the blockchain remain intact.
Ethereum has already seen success in
these areas with the elimination of the proof-of-work consensus mechanism and
the removal of outdated features such as the SELFDESTRUCT opcode.
The goal is to continue these efforts,
achieving long-term scalability and security.
The Purge’s key objectives are twofold:
reducing the storage requirements for nodes and simplifying the protocol by
eliminating unnecessary features.
History Expiry: Addressing the Growing
Size of Historical Data
One of the most pressing problems
Ethereum faces is the sheer amount of historical data that each node is
required to store.
As of 2024, a fully synced Ethereum
node needs around 1.1 terabytes of disk space for the execution client alone,
with hundreds of gigabytes more needed for the consensus client.
The vast majority of this data consists
of historical blocks, transactions, and receipts, most of which are no longer
needed for day-to-day operations.
How It Works ?
Ethereum plans to tackle this issue
through the concept of history expiry.
While consensus requires agreement on
the current state of the blockchain, the network doesn’t need every node to
store the entire history forever.
Instead, historical data can be stored
in a distributed way, similar to how torrent networks operate, where each
participant only stores a small percentage of the total data.
The Ethereum network is already moving
in this direction. Currently, consensus blocks are only stored for about six
months, and blobs (data blobs used in sharding) are stored for 18 days.
Ethereum Improvement Proposal (EIP)
4444 aims to implement a one-year storage period for historical blocks and
receipts, after which this data can be offloaded to a distributed network of
nodes.
This model allows for reduction in
storage requirements without compromising data availability, as the network can
still retrieve old data through peer-to-peer sharing or Merkle proofs.
State Expiry: Managing Long-Term
Storage Growth
While history expiry addresses the
past, state expiry is aimed at controlling the ongoing growth of Ethereum’s
state, which includes account balances, contract code, and storage. The state
grows by approximately 50 GB per year, and without intervention, this will
continue indefinitely.
The State Expiry Solution
State expiry presents a more complex
challenge than history expiry, as the Ethereum Virtual Machine (EVM) is
designed with the assumption that state objects (such as account balances or
contract storage) are permanent. The idea behind state expiry is to allow state
objects to "expire" after a certain period of inactivity, at which
point they would be removed from the active state but could be
"resurrected" if needed later.
Various proposals have been made to
achieve this, including splitting the state into chunks that are only stored if
recently accessed. If an object is not accessed for a set period, it would be
removed, with a commitment (a cryptographic proof) stored in its place. If
needed, the object could be brought back by providing the proof.
One such proposal, EIP-7736, suggests a
stem-and-leaf design where related storage slots are grouped together. If a
group (or stem) is not accessed for a set time, it is removed, but a commitment
to the group remains. This allows for efficient state expiry while ensuring
that critical data can be recovered.
Address Space and Security
Considerations
State expiry introduces new
complexities, particularly around address space. Ethereum’s current 20-byte
address format might need to be expanded to accommodate additional information,
such as expiration periods. Several proposals have been made, including
expanding addresses to 32 bytes to include version numbers and expiration data.
However, this raises backward compatibility issues, as many existing contracts
rely on the 20-byte format.
Another proposal, address space
contraction, involves reducing the address space to free up room for additional
data. While this could introduce risks such as address collisions (two
different pieces of code being assigned the same address), careful management
could mitigate these risks.
Feature Cleanup: Reducing Protocol
Complexity
One of the primary goals of the Purge
is to reduce the complexity of the Ethereum protocol by removing outdated or
underused features. The SELFDESTRUCT opcode, for example, was originally
designed to allow contracts to voluntarily delete themselves, reducing the
state size. However, it introduced significant complexity and security risks.
Recent updates have effectively removed its use, and it may be fully deprecated
in the future.
Other simplification efforts include
transitioning from RLP (Recursive Length Prefix) encoding to SSZ (Simple
Serialize), which is used by Ethereum’s beacon chain. SSZ is more efficient and
easier to work with, and transitioning fully to SSZ would streamline Ethereum’s
data structures. Additionally, older transaction types and complex precompiles
(pieces of code that handle certain operations) may be removed to further
simplify the protocol.
The Purge is a crucial step in
Ethereum’s journey toward long-term sustainability and scalability. By
addressing both historical and state storage growth, reducing protocol
complexity, and implementing more efficient data handling techniques, Ethereum
can ensure that it remains a robust and decentralized platform for years to
come. These improvements is complex and will take time to fully implement.