Ethereum block, mining, and forking
Before we dive into the details of Ethereum block architecture, we need to revisit the blockchain structure of Ethereum. In Chapter 1, Blockchain Basics, we learned that an Ethereum blockchain is a Merkle tree where the leaves represent execution states of a code. For the sake of simplicity, a finer detail was suppressed. The Merkle tree of Ethereum is not a binary Merkle tree, as we saw for the bitcoin blockchain. You see, binary Merkle trees are great data structures when it comes to authenticating information that is in a list format, that is, a series of data chunks placed one after another. For such a transaction tree, it really doesn't matter how long it takes to edit the tree after it gets created. This is because the transactions remain in a form of one frozen tree which can only keep growing.
Ethereum, on the other hand, is a state tree. Here, the situation is more complex. The state in Ethereum is represented as a key-value map where the key signifies the address and the value signifies the declared amount, balance listing, nonce, code, and account storage. Even the storage is itself a tree.
Here, unlike the transaction history, the states are frequently updated. This happens due to the change in the balance and nonce of accounts, insertion of frequently created new accounts, and frequent insertion and deletion of keys in storage.
A desirable data structure in such cases should be able to quickly calculate the new tree root after an insert, update, edit or delete operation, without re-computing the entire tree. Additionally, the depth of such a tree should be bounded. This prevents any hacker from infinitely deepening the tree and launching a DoS attack on its legitimate users. Also, making updates in a different order or even re-computing the tree from scratch should not change the root; that is, the root should be dependent only on the data.
Donald R. Morrison described such a tree data structure around 1968. It was called the Practical Algorithm to Retrieve Information Coded in Alphanumeric (PATRICIA) tree. Basically, it is a Radix tree where Radix equals 2, which means that each bit of the key is compared individually and each node is a two-way branch. In such a tree, searching, inserting, and deleting occur with linear time complexity, even in the worst case scenario.
Figure 2.14 illustrates a how a typical insertion occurs within a Patricia tree:
Hence, the Ethereum blockchain is technically a Merkle-Patricia tree where each block header in the Ethereum blockchain contains three such trees for representing three kinds of object:
- Transactions (represent transfer of value)
- Receipts (data chunks showing the effect of each transaction)
- State (represents all information of an account)
These three kinds of objects are used by a miner to answer the following five queries, which is essentially the mining activity on an Ethereum blockchain:
- Has a transaction been included in a particular block?
- What are all instances of an event emitted by this address in the past 30 days?
- What is my account's current balance?
- Does this account even exist?
- What would the output be if I simulate execution of this transaction on this contract?
The first query is handled by the transaction tree, the second by the receipt tree, and the third and fourth by the state tree. The first four are fairly straightforward to compute. The miner simply finds the object, fetches the Merkle branch where the list of hashes goes up from the object to the tree root and replies back to the light client with the branch. The fifth query is also handled by the state tree, but the way it is computed is more complex. Here, we need to construct the Merkle state transition proof. Basically, this proof makes the following claim: if we run transaction T on the state with root R, the result will be a state with root R', with log L and output O. Although not theoretically necessary, the concept of an output exists in Ethereum, because every transaction is a function call. In order to compute the proof, the miner locally creates a fake block, sets the state to S, and pretends to be a light client while applying the transaction. That is, if the process of applying the transaction requires the client to determine the balance of an account, the light client makes a balance query. If the light client needs to check a particular item in the storage of a particular contract, the light client makes a query for that, and so on. The miner responds to all of its own queries correctly, but keeps track of all the data that it sends back. The miner then sends the client the combined data from all of these requests as a proof. The client then undertakes the exact same procedure, but using the provided proof as its database; if its result is the same as what the miner claims, then the client accepts the proof.
So, how is the proof accepted? This is done using a consensus algorithm, which compares PoW from different miners .This algorithm is called ETH hash. Even though the future versions of the Ethereum framework might switch to PoS, PoW is currently in use. We will cover this algorithm in detail in the later chapters. As of now, remember the fact that this algorithm combines the nonce with a DAG to generate a new hash and check whether it is less than the target threshold to solve the dumb puzzle of mining, as discussed in Chapter 1, Blockchain Basics. The problem is that the DAG, which is essentially a direct acyclic graph of a huge dataset (more than 2 GB), takes a lot of random access memory (RAM) to process. This memory hardness is the main reason why we cannot use an ASIC miner to mine Ethereum blocks, as ASICs are very low in memory caches. The best hardware clients to mine Ethereum are GPU miners.
Here are the listed specifications shown for configuring an Etherum miner using a GPU device:
For example: RX 470 Specs
- Cost: ∼ $175 (March 2017)
- Memory: 8 gigabytes
- Memory Bandwidth: 211 gigabyte/sec
Apart from these three tree objects, the block architecture has several other modules which are illustrated in Figure 2.15 and are defined in the Gavin Wood's yellow paper:
If we study this representation closely, we find that, apart from the parent hash (prev hash), we also have an uncle hash or ommers hash. So, what is an uncle doing in an Ethereum blockchain? If I am a block and the blockchain is my family tree, my uncle would be very close to my dad. Yet, it would have been quite different if I had been born from the genes of my uncle. In that case, my dad would have been my uncle. So, the parent hash is my dad and is the correct previous hash from which I have come. Uncle hashes represent those blocks that were very close to being my correct previous block, but could not make it, as it suffered a blow from my dad. By a blow, I mean that my dad had done more PoW than my uncle; hence, the mining consensus chose him and not my uncle. So now, what happens to my uncle? He has now become a miner activated hard fork (MAHF) with respect to my blockchain, hence, is not accessible to me as I lie on my parent blockchain just after my dad, that is, the prev hash.
In somewhat similar lines, we can discuss the difference between Ethereum and Ethereum classic. If you are new to cryptocurrency and new to Ethereum, you might have noticed that you can buy two different types of Ethereum. So, you can buy Ethereum that is ETH but you can also buy Ethereum classic that is ETC. So, what are the technical and ideological differences between these two? At the beginning of the Ethereum genesis block in the blockchain, there was only ETH and there was no Ethereum classic. So, how did the ETC get created? Everything started with the DAO project on Ethereum. Decentralized autonomous organization (DAO) was a project built on top of Ethereum. This project was basically an application which could act as a self-steering organization and it had the biggest crowd sale in the history of cryptocurrency. People invested their own fiat money into this project and they received ETH tokens for their investment.
However, there was a flaw in the smart contracts, which this decentralized organization consisted of. This flaw or exploit made it possible for a group of hackers to take 50 million worth of ether from the crowd sale. Note that these hackers were technically exploiters, because it wasn't really a hack due to a bug but an exploitation of a flaw in the smart contract design logic of the DAO. So, by using this flaw the exploiters were able to steal 50 million USD worth of ethers. This caused a lot of discussions and the majority of the Ethereum community wanted to do a hard fork and reverse this theft, so that investors could get their money back and the hackers would lose their money. However, to do that, they needed to make alterations to the blockchain and this is where the whole ideological debate started within the Ethereum community. While the majority of the community agreed on the alterations and hard fork, a smaller but vocal minority opposed this notion of mutating the blockchain because the whole idea of a blockchain is that it should be immutable. Code once engraved in the blockchain should be treated as law, so if an exploit does occur, we may amend it from happening again, but we can't just go back and reverse what has happened, however unfortunate. So, at this point of the debate, the community was standing on a juncture where one portion wanted the blockchain to mutate, while another wanted to keep going as if nothing happened. So, this vocal minority kept continuing in the old blockchain, which became the Ethereum classic as a result of user/community activated hard fork. Meanwhile, the ETH was eventually mutated to return the money to investors as the stakes were high. This is why we have both ETH and ETC as cryptocurrencies.
Now, if you bought some ETH or ETC, where would you store it? For fiat money, we generally store them in bank accounts or in our wallet. Now, in a decentralized setup we do not have a bank. However, we do have wallets and clients, which we will discuss in the next section.