Blockchain – fundamentals
It’s the heyday of blockchain technology. At the end if 2018, the term appeared in over 86 million internet publications and in over 27 thousand repositories on GitHub. There is also an ever increasing number of internet searches comprising this term or names of cryptocurrrencies, startups or blockchain-based projects initiated by leading IT companies and financial institutions. Is this phenomenon heralding a new technological revolution or is it just a reflection of an ephemeral bubble? To understand this, let us see what blockchain really is.
And to do so, we will anayse some fundamental concepts.
Hash function is an operation in which an input sequence of signs of any length is transformed into an output string of a fixed length and of following characteristics:
- collision-free – understood not as an absence of collisions, but as a situation where it’s hard to find two inputs x and two outputs y giving H(x) = H(y);
- hiding – it is hard to find input x for a given output y=H(x);
- puzzle-friendly – for every output y, if k is a random value (high min-entropy distribution), it is hard to find such an input x that H(k.x)=y.
Hashing functions help generate message digests – this is why they are also called shortcut functions.
Hash pointer is a structure indicating the place where data is stored and comprising data and its shortcuts. It enables locating information and confirming that it hasn’t been altered.
The hash pointer mechanism is applicable in a basic data structure, for instance in a list, to produce a new data structure – called a blockchain. Each block contains encrypted information and hashed pointers to a previous block:
If a given block is modified, its encryption code changes, meaning the hash pointer of the following block is altered and all the blocks of the chain must be updated. This is why the best way to add information to a chain is to add a new block at its end:
The immutability of Blockchains
A blockchain has a cryptographic guarantee of immutability. It means that it is impossible to add, delete or alter any block without modifying the “head” of a chain (to be precise, such an operation is only theoretically, but not practically, possible, as it would require generating a collision of a hash function). Cryptographic methods, in turn, do not guarantee an immutability of the “head” and an absolute immutability of the whole chain. We will come back to it further in the text.
To make our chain immune to changes, both those resulting from a crash and from an intentional interference, we may apply replication. Assuming that an undesirable modification concerns a relatively small number of copies, we will be able to easily identify it and define which data set is “true” and which is “false”.
However, in order for a replication to really protect a chain, separate copies must function independently – both technically, i.e., they must be placed on separate servers in different locations, and organisationally – i.e., they must be managed by different admins who don’t have the same supervisor or financial links.
Providing our system has two different versions of a chain – how to determine which is the right one?
The Byzantine Generals Problem was defined in 1980 by Marshall Pease, Leslie Lamport and Robert Shostak as a situation where components must be meshed in a dispersed system.
Generals besieging an enemy city must reach unanimity on whether to attack or withdraw. If they fail to agree and only some of the generals attack, a stronger enemy will win over the dispersed army.
Consensus in private blockchains
At present, there exist a couple of algorithms addressing the meshing problem. One of the most frequently used for blockchains is PBFT (Practical Byzantine Fault Tolerance), invented in 1999 by Miguel Castro and Barbara Liskov. We won’t go into details of its functioning here, however, we will discuss its two features.
First, this algorithm guarantees liveness and safety, providing that at most (N-1)/3 from among N copies behave abnormally. If we assume that f copies might behave abnormally in a given moment, we need 3f+1 replicas that will grant the desired algorithm performance.
Second, in order to function properly, this algorithm requires that the list of protocol members is set beforehand. In practice, this narrows down its application to private blockchains. It also means that a person controlling the membership can influence the protocol outcomes by embedding nodes into the network. These nodes will implement decisions according to this person’s intention and/or eject from the network nodes which function disruptively.
Consensus in public blockchains
In public blockchains, where there are no mechanisms of verification and control of membership, there exists a threat of “Sybil”attacks. For this reason, usually LCCR (Longest Chain Consensus Rule) schemes are applied. In their framework, the longest chain is the true version, but there are also limits of frequency according to which following blocks are created. Some of the most popular algorithms from this group are:
- proof-of-work – in order to add a new block to a chain, solving a complex equation and providing its proof is required. The inconvenience of this approach lies is the cost of energy spent on performing the required task;
- proof-of-stake – a chain version is voted upon. The weight of every vote is proportional to a wager placed at stake. This approach, however, has a significant inconvenience called “nothing-at-stake”;
LCCR-type schemes /proof-of-xxx are prone to attacks of the “51%” type which means that whoever controls more than a half of resources constituting the base of an algorithm’s functioning (computing power in the case of PoW and resources in the case of PoS), de facto decides which version of the chain will be put in place. This isn’t a theoretical susceptibility – internet has a lot of such cases, for instance the most recent attack on Verge.
It is also worth noting that there is no notion of “stop” for LCCR algorithms in public blockchains. As the time passes, there is only a decreased possibility of change of the current version of the chain. For instance, in the case of Bitcoin, a block is considered “confirmed” if it’s followed by 6 other blocks in the chain which happens after about an hour, providing that an average time gap between blocks is 10 minutes.
For practical reasons, a chain usually doesn’t store full data of records. Instead, records are usually hashed in blocks or in hash trees – Merkle trees – invented in 1979 by Ralph Merkle.
Merkle trees allow to separate chunks of data, the immutability of which we want to preserve with a blockchain, from the chain itself. The replicas can be divided into two categories:
- full replicas – storing both the chain and the data;
- partial replicas – storing only the chain.
It’s particularly useful in the case of:
- big chunks of data;
- a large number of items (proving that a node belongs to a tree requires logarythmic complexity);
- secret data.
Regarding its idiosyncratic character, a blockchain should not be treated as a substitute for a traditional database. A good practice is to place data in a dedicated base (relational, document, etc.) and place only hashes in a chain. It’s a good way of preserving the proof-of-work role of blockchain, while simultaneously minimising its size.
So far, we analysed blockchain as a set of data characterized by immutability, disregarding the question of what exactly we would like to store in it. As we already know how to achieve immutability of a chain, it’s now time to ensure that the data we embed inside is correct.
For instance, in the case of assets, we expect that users can dispose of only their own resources and that there won’t be a possibility of introducing to a blockchain a transfer of resources exceeding the value of a sender’s resources.
Additionally, in the case of cryptocurrencies, we want to limit the frequency of issuing of “coins” to avoid devaluation. By creating a blockchain-based database of documents, we are trying to ensure that the time stamp on a newly added file is neither older than the date and time of a recently saved document, nor that it’s placed in the future.
To provide data coherence, validation of such rules should be included in the communication protocol between replicas as an extension of the above described consensus algorithm. The mechanism should be implemented in such a way that only those versions of the chain that comply with the defined rules can be taken into account while setting a current version of a chain.
However, it should be remembered that in public blockchains there is no guarantee that all nodes will apply the same validation logic. In the event of a crash or an intentional interference in the consensus protocol, a version of a chain supported by a part of nods and rejected by others can be inserted. In such a situation, the final result of coordination will depend on the applied algorithm and on the distribution of powers of supporters and opponents of a given version of a chain.
Blockchain as a platform
The validation of rules can be performed in two ways:
- by rigidly embedding the logic in to the protocol;
- by embedding into the protocol a specific language which allows implementation of rules.
The first method limits the application of a chain to one and it’s strictly regulated by the implemented set of rules. If the second method is applied, a chain is transformed into a platform, which can be multifunctional.
While constructing a blockchain-based platform, it is worth noting that there are two categories of languages: Turing complete (for example Solidity language used in Ethereum) and Turing incomplete (for example Bitcoin Script which – as the name indicates – uses Bitcoin and Multichain). Turing complete languages have a considerably greater expresiveness, because their use permits to implement a logic of any complexity. On the other hand, it’s in general impossible to state beforehand how much time will be required to design a software written in this language and if it will ever stop. If the rule validation logic is too complex, it extends the time required to reach consensus and might threaten the stability of the whole protocol.
Smart Contracts are a special type of solutions that can be built on such a platform. The name has been coined in 1994 by Nick Szabo. It happened 4 years before the famous Wei Dai’a (1998) containing the first mention of cryptocurrencies (b-money) on one of cyberpunks’ mailing lists and 14 years before the creation of the first block of Bitcoin (2009).
Smart Contracts permit to address the problem of a mutual lack of trust without the need to engage a third party. Escrow contracts are the easiest example: in their framework two parties plan to exchange goods, however they can’t do it simultaneously. None of the parties wants to be the initiator and take the risk of the other party not complying with their constractual obligations. In the analog world, such circumstances would bring a trusted middleman – for example a bank – to the table. In the case of Smart Contracts, the compliance with contractual obligations is guaranteed by the code performed by nodes of a given network in the framework of the consensus protocol.
Blockchain-based technologies offer powerful possibilities. However, while designing solutions based on them, one needs to take two issues into account.
First, the limitations of this technology should not be overlooked – especially those which result from requirements and characteristics of a consensus protocol applied.
Second, it’s always worth checking at the beginning if a blockchain is always the best way of solving a given problem. To do this, a couple of simple questions should be answered:
If we decide to use a blockchain, we should plan beforehand what will be the range of data we want to store in a chain.
Even a brief analysis of specific properties of a blockchain permits to notice that this technology is not as universal as for example databases (relational or document) and is not likely to revolutionize as many areas of life and business as its current popularity might suggest.
At the same time, it is clearly visible that all new solutions based on blockchain are born from experiments and are approved by the market. Here are some worthwhile examples that we have found while doing our research:
Software House ASC engineers are enthousiasts of new technologies. However, the thrill of their making doesn’t outshine our obligation to make them a source of a real value.Author: Robert Kuśmierek, Lead Software Engineer, ASC LAB
Have you enjoyed the article? If yes, please share it with your network!