Decentralized Applications and Blockchain

This is an introduction to decentralized applications and blockchain, with an emphasis on Ethereum and Bitcoin. It includes an explanation of the primary concepts, mostly focusing on Ethereum's idea of a "turing-complete programming language" to offer a powerful platform for decentralized applications.

Important Note: A large amount of this information was directly interpreted from the Ethereum Whitepaper. Most of the methods discussed or implemented are not my own original ideas.

Introducing Blockchain Technology

In many computer applications, it is useful for the application's state to be nearly entirely independent of any single entity.

What this means is that the application is not reliant on computer(s) owned by a single organization and is instead reliant on a network of anyone's computer(s). Computers in the network agree on the application's state in order for it to be valid, and anybody can join this network (with a catch, as you will see soon).

This is the concept behind blockchain. A blockchain is a growing list of objects called blocks. Blocks are stored in order, where the iith block references the (i1)(i - 1)th block by it's hash.

Assume that nn is the total number of blocks in the blockchain (n>1)(n > 1). The first block is called the genesis block. If the data in the nin - ith block changes, where 0<i<n0 < i < n, its hash also changes, effectively "breaking" the chain. Blocks n1,,ni+1n - 1, \dots, n - i + 1 are now invalid as they no longer reference a valid block.

If you feel unfamiliar with this concept, it might be useful to look into how cryptographic hashes work (remember that they are deterministic algorithms).

The State Machine

It is common to refer to the blockchain as an "infinite state machine" (technically, it is a transition system). There exists a state transition function APPLY(S, TX) -> S', where S' represents the state of the state machine after the execution of the function.

  • S represents the current state
  • TX represents the transactions that took place between S and the expected S'

Both Ethereum and Bitcoin implement the state machine in different ways according to their architecture.

An Overview

APPLY(S, TX) generally does the following:

  1. Verify TX, usually by ensuring that the entity that made the transaction owns the money.
  2. Apply transaction fees, if applicable.
  3. Make requested changes (transfer currency, etc.)
  4. Give "change" (remaining money)
  5. Return S'

This is a very general representation of what both Bitcoin and Ethereum do in their state transition function. However, there are some key changes that we will explore below.

Cryptocurrencies and Architectures

There are different representations of the cryptocurrencies associated with these platforms for decentralized applications. Ethereum's and Bitcoin's are discussed below.

Note that for now, the term "decentralized application" is just thrown around without a proper definition. We will discuss this more later on. For now, think of a decentralized application as a program that runs on a network of computers rather than on only one computer.

Bitcoin's Unspent Transaction Outputs (UTXO)

A large difference between the architectures of Ethereum and Bitcoin is the way digital currency is attributed to users. This major difference makes for some significant differences in the workings of each infrastructure's virtual machine.

Although Bitcoin's protocols are outside of the scope of this writing, I feel that it is important to include to gain a better understanding of the state machine.

UTXO stands for Unspent Transaction Output. As the name suggests, UTXO is the result of a transaction.

Generally, Bitcoin users will use a wallet to manage funds. For every transaction, they will make a bitcoin "address," a human-readable string which is referenced to claim ownership of the UTXO.

The state in Bitcoin's state machine contains all existing bitcoins and their owning addresses. The input and output UTXO refer to the UTXO at S and S' respectively.

This system is similar to how exchanges by cash work in real life. Assume there are two people, Person A and Person B. Person A has received 3 BTC, 2 BTC, 2 BTC, and 8 BTC, all from separate previous transactions. Person A wants to give 9 BTC to Person B. Then, instead of sending 9 BTC, the least possible BTC and UTXO that can be sent to minimize the change must be sent without splitting the UTXO.

In this case, the best way is to send 8 BTC and 2 BTC (the input UTXO), a total of 10 BTC. The output UTXO are:

  • 9 BTC belonging to Person A
  • 1 BTC of change, belonging to Person B

Note that the sum of all input UTXO must be greater than or equal to the sum of all output UTXO (10(8+2)10 \ge (8 + 2)). The reason why the sum can be greater is because transaction fees may exist to pay "miners." This concept will be discussed later.

UTXO must be sent as a whole and cannot be "split" or "merged" until ownership is verified (to prevent double payments). This is because the system by which UTXO is identified is its reference to the owner's Bitcoin address. If the BTC is not sent as a whole UTXO, the ownership cannot be verified.

Ethereum's Account System

Ethereum's accounts are a powerful alternative to UTXO in validating ownership of digital currencies and generalizing the idea of "value" and "data" to decentralized applications as well as users.

More concretely, Ethereum's state consists of "accounts" made up of the following data:

  • A nonce (a "counter used to make sure each transaction can only be processed once," as stated exactly in the Ethereum whitepaper)
  • The balance of the account in ETH (the native cryptocurrency of Ethereum)
  • The contract code, for "smart contracts" (see below) and decentralized applications (not-required field)
  • Persistent key-value storage

Users can control their account using public key cryptography. The user receives a private (secret) key as well as a public key. Whenever a transaction is made, the user digitally signs the transaction, and the public key can be used later to validate if the signature made by the owner of the private key associated with that public key.

Ethereum refers to these types of accounts as "externally owned accounts."

The other type of account is a "contract account," which is essentially a piece of software that runs on the Ethereum Virtual Machine (dicussed later). I like to think of contracts as "APIs" that are invoked when a message is sent to them (or when you make an "API call," to continue the analogy). Contracts can also access the persistent storage of their account. Throughout the article, these are referred to as "smart contracts," or just "contracts."

Transactions vs. Messages

In Ethereum, information is exchanged between accounts through data objects called messages. When externally owned accounts want to send information, they do so through a transaction.

A transaction contains the following data:

  • The receiver of the transaction
  • The sender's digital signature
  • The amount of ETH being transferred
  • A limit on the number of "computational steps" that can be taken by this transaction, denoted STARGTGAS
  • The cost of each "computational step," denoted GASPRICE
  • An arbitrary data field (not required)

Note that transactions can be sent only by users. Let's understand what these fields represent. The first three are self-explanatory. For more information on the second field, take a look at digital signatures.

The last field, the data field, is generally not used, but smart contracts can read from data field to get some information. A good example might be to use the data field to pass parameters when invoking the smart contract.

Smart Contracts and Transaction Fees

Smart contracts, or more generally, decentralized applications, are the primary motivation for Ethereum's architecture. As we will look into soon, Ethereum has a turing-complete programming language that can be used to execute instructions on what is called the "Ethereum Virtual Machine." A crucial issue with this (and possibly the reason why Bitcoin doesn't offer the same capabilities) is because it becomes difficult to stop abuse.

Ethereum's solution to this is to impose costs on computational power. Computational steps are measured in what is called "gas."

In essence, the purpose of STARGAS and GASPRICE is to ensure that contracts cannot make infinite loops or run a ton of computational steps without paying some price. The GASPRICE represents the cost that the sender pays for every computational step.

The gas price (transaction fee) is deducted from the sender of the transaction. Computations can only be done if there is enough gas remaining. Look here for more information on how gas prices are calculated. The main concept here is that these transaction fees determined by the use of computational resources allow for a full-fledged development platform on Ethereum while also preventing abuse.

Messages

Although contracts cannot send transactions, they can communicate with other contracts by sending messages. Note that a transaction is also a "type" of message. A message contains the following:

  • The sender of the message
  • The receiver of the message
  • Amount of ETH to transfer (added to the balance of the other contract)
  • STARTGAS
  • An arbitrary data field (not required)

When a contract sends a message to another contract, the gas expended by the other contract is accounted into the transaction fee for the initial transaction, meaning that it is still billed from the externally owned account.

The Ethereum Virtual Machine (EVM)

The EVM is an execution environment for decentralized applications. Code executed on the EVM is called "EVM Code." When executing contract code, data can be stored in the following ways:

  • A stack (the data structure)
  • A dynamically-sized byte array, called the "memory"
  • A persistent key-value store.

EVM Code is generally a low level language, somewhat resembling assembly language. It has its own "instruction set" for executing smart contract code. A good way to see this is to take a look at Ethereum's opcodes.

When code is executed, there is a program counter that is incremented for every instruction. When an instruction such as the JUMP instruction is executed, the program counter is set to whatever the operand is.

When a program is executed on the EVM, the execution can be characterized by the following:

  • Global State: the S that was discussed in the state machine, it contains all Ethereum accounts, their storage, and their balance
  • Transaction: the transaction that invoked the execution of the program
  • Message: the message that is causing the execution, if applicable
  • Code: the EVM code to be executed
  • Memory
  • Stack
  • Current (Remaining) Gas: the remaining gas, to monitor usage of computational power

Most of the variables are self-explanatory, but you can learn more about the EVM here or here. Implementations of the EVM can be found in many programming languages.

Ethereum code is executed as a part of the state transition function. A transaction is added to a block, the related EVM code is executed, and the block is validated.

The Merkle Tree

The Merkle (or hash) Tree is a common data structure used in cryptography. The concept is that the parent node is labeled with the hash of the labels of the child nodes, and the leaves connect to some data block (see the graphic below, from wikipedia)

Merkle Tree

In Bitcoin, a Merkle tree is used to store information regarding the transactions. The reason why a merkle tree is used is to quickly validate information. The tree ensures that, even if the information regarding the transactions comes from different sources, all of the downloaded informatin is the same as long as the hashes match. If two bottom hashes do not create the hash of the top block, then it is apparent that there is an error and some sort of error exists in the data. This error could potentially be malicious, so the Merkle tree ensures the validity of the transaction information for each block in the blockchain throughout the entire network.

Ethereum uses a variant of the Merkle tree, called the Patricia Trie. It is an extension of the radix trie, and it aims to solve the inefficiency of storing data in the trie.

Consensus

The aim of distributed applications is to manage it on a network of computers rather on just one computer. This section focuses strictly on consensus mechanisms. These are mechanisms that ensure a "voting" system, where all machines on the network vote for the true state of the application (or the blockchain). The majority vote is taken, so the application state is always determined by what the majority of the network wants.

A general rule of thumb is that the more computers there are on the network, the more "secure" the application (and cryptocurrency) is. This is because it would take "more votes" to top the majority of the computers on the network.

Proof of Work (PoW) and "Mining"

Proof of Work is a concensus mechanism that relies on the computational power of computers on the network. In order for a new block to be created, each computer must solve a computational problem.

This computational task is characteried by changing the content of a block so that its hash satisfies some requirement set (usually algorithmically, based on the number of computers in the network) by the network. Consider a hash function hash(x), where x is the data in a block. The data in a block can be represented by a tuple of objects b = (transactions, nonce, ..., prev_block_hash). Fix all fields except nonce.

This field is a number. Changing the number changes the output of hash(b). The computational problem to be solved is to ensure that hash(b) returns a value that meets some requirement, usually requiring a certain number of zeros at the beginning. For example, 000... might be a valid hash, but 00H... is not.

This is a rather difficult task to complete, as it is usually done by trial and error. The process would take lots of computational power, but with so many computers it becomes plausible. The process of trying to guess the nonce field is called mining. The more comptuers there are on the network, the more zeros there are.

We now explain why this works. Suppose a user wanted add a block with some information. The process of guessing this nonce is so difficult that it takes approximately a majority of the entire networks computing power to complete. This means that for anyone to add a block, they would need a huge amount of computing power that only grows according to the network's size.

This is a clever way of preventing malicious actions, but it also comes with a large cost of resources (compute). Proof of Stake (PoS) is another method that some platforms have already adopted. It involves the staking of cryptocurrency to validate blocks. It is meant to be faster, more efficient, and more sustainable than PoW. Ethereum plans to adopt PoS in the coming years.

Further Reading

This was a general introduction to blockchain and decentralized applications, but there is much more to learn. Here are some resources worth looking at (that I've looked at myself):

Last Updated: April 2022