Blockchain and Bitcoin – Block Structure and Data Model
Up to this point you already have articles that explain what UTXOs are, how transactions work, how wallets produce keys and signatures, how fees and mempools behave, and how nodes form a network.
All of that lives inside a container. The container is the block.
Blocks are the units that miners create, nodes validate, and that everyone links together into a single chain. They are not only a list of transactions. They are a specific data structure with a defined header, transaction list, and internal tree. The way this structure is defined is what allows nodes to verify proof of work, to check integrity, and to store and index the chain efficiently.
This article looks inside that container. It explains what a block is, how its fields are organized, how transactions inside a block are summarized by a Merkle tree, and how all blocks together form a data model for the whole system.
1. Core ideas in brief
Before diving into details, we can quickly name the main pieces.
1.1 Block
A block is a bundle of data that contains:
-
A header with basic metadata and a proof of work target
-
A list of transactions, starting with a special coinbase transaction
Every block (after the very first one known as the genesis block) references the hash of a previous block in its header. This forms a chain.
1.2 Block header
The block header is a fixed size structure at the front of the block. It includes:
-
A version number
-
The hash of the previous block
-
A Merkle root that summarizes the transactions inside the block
-
A timestamp field
-
A compact representation of the difficulty target
-
A nonce that miners vary in their search for a valid hash
Nodes can hash the header alone to check proof of work without looking at all transaction data.
1.3 Block body and transactions
The block body contains:
-
A compact count of how many transactions are included
-
The transactions themselves
The first transaction is the coinbase. It is special because it creates new coins as part of the block reward and can collect fees. All other transactions must spend existing UTXOs.
1.4 Coinbase transaction
The coinbase transaction has no real inputs from previous UTXOs. Instead it has a fake input that miners can use to put arbitrary extra data, such as block height or pool tags. Its outputs usually send newly created coins and collected fees to addresses chosen by the miner.
Consensus rules limit the total value that can be created in the coinbase. Nodes check this carefully.
1.5 Merkle tree and Merkle root
The Merkle tree is a binary tree built from the transaction hashes in a block. The root of this tree is a single hash called the Merkle root.
The Merkle root in the header commits to the exact set and order of transactions in the block. Nodes can use Merkle proofs to verify that a given transaction is part of a block without downloading all other transactions.
1.6 Block hash
The block hash is the result of hashing the block header in a specific way. Miners search for a header whose hash is below a target value determined by the difficulty.
Nodes refer to blocks by this hash when they talk about the best chain. The header includes the previous block hash, so each block hash indirectly commits to all earlier history.
1.7 Height and chain
The block height is a simple count of how many blocks are in the chain from the genesis block to this block. The genesis block has height zero. The next block has height one, and so on.
Nodes use both heights and hashes when indexing blocks. The chain is a sequence of blocks where each header’s previous hash matches the hash of the block at the previous height.
2. The block header in detail
The header is the most critical part of a block. It is small, but it carries everything that proof of work and chain linking needs.
We can look at each field and ask why it is there.
2.1 Version
The version is an integer that can be used by software to signal certain features or readiness for certain soft forks. It does not affect the basic validation of old rules. Nodes that do not understand some bits can still accept the block as long as all other rules are satisfied.
Over time, particular bits in the version field have been used to signal support for new consensus changes. This is part of the upgrade process, which you may describe in a separate article on governance.
2.2 Previous block hash
The previous block hash is a 256 bit value that identifies the parent block. The header stores the hash of the header of that parent.
This field is what ties blocks into a chain. It has two important effects:
-
It fixes the order. To be valid, a block must point to a parent that the node already knows or will learn about.
-
It links proof of work. If an attacker wants to change some old block, they must redo proof of work for that block and for every later block that depends on it.
The chain is not stored as an array inside each block. Instead, the pointers are one way. Each header points back to exactly one parent. Nodes build the forward links in memory by indexing hashes.
2.3 Merkle root
The Merkle root is a hash that summarizes all transactions in the block. The block header does not list the transactions themselves. Instead, it commits to them via this single root.
This makes it possible to:
-
Prove that a given transaction is included in the block without listing all transactions
-
Detect any change in any transaction, because it would change the Merkle root
The details of how the tree is built are covered in the Merkle section, but conceptually the root is the fingerprint of the transaction list.
2.4 Timestamp
The timestamp is a number representing the time the miner claims to have created the block, usually in seconds since a standard epoch.
Nodes use the timestamp for purposes such as:
-
Checking that a block is not unreasonably far in the future compared to their local time
-
Driving the difficulty adjustment algorithm over long periods
The timestamp does not need to be exact, but it must fall within some constraints relative to past blocks and node clocks. That is enough for the protocol.
2.5 Difficulty target (nBits)
The difficulty target is stored in a compact format called nBits. It encodes a 256 bit target value. For a block to be valid, its header hash must be less than or equal to this target.
Nodes verify:
-
That the target value encoded in nBits is consistent with the difficulty adjustment rules and the chain history
-
That the header hash is below this target
Changing this field changes how hard it is to find a valid block. Difficulty retargeting uses the timestamps of past blocks to adjust nBits over time. That topic is usually covered in a consensus and security article.
2.6 Nonce
The nonce is a 32 bit field that miners vary when searching for different header hashes. Each distinct nonce gives a different candidate header hash.
Since the header hash depends on all header fields, miners also vary other parts, such as the coinbase transaction and extra nonce fields inside it, to get more variation beyond 32 bits.
The nonce itself has no meaning beyond being part of the header data. It is a knob miners can turn while they search.
2.7 Header as the unit of proof
Together, these fields form a fixed size header that nodes can hash very quickly.
The idea is:
-
Validation of the rules for transactions and UTXO spending is done separately
-
Proof of work is checked by hashing the header and comparing with the target
This separation lets nodes check proof of work for many candidate headers without touching all transaction data until needed, which saves bandwidth and time.
3. The block body and transaction list
The rest of the block, after the header, is the body that contains the transactions.
3.1 Transaction count and encoding
Immediately after the header, the block encodes the number of transactions it contains. This number is stored in a compact variable length format.
Then each transaction is serialized one after another. Nodes can parse the stream by reading a transaction, moving forward by its length, and reading the next one until the count is satisfied.
3.2 The coinbase transaction
The coinbase transaction appears first and has special properties.
From a structural point of view:
-
It has exactly one input
-
That input does not reference any previous transaction
-
Instead, it contains arbitrary data in a field known as coinbase data
The outputs of the coinbase transaction are regular outputs. They send funds to scripts that can be later spent.
From a consensus point of view:
-
The total value of coinbase outputs must not exceed the sum of the current block subsidy and the total fees from all other transactions in the block
-
Nodes check this by summing input and output values across the block
The coinbase is also where some extension mechanisms are anchored, for example commitments for Segregated Witness data.
3.3 Regular transactions
All other transactions in the block are ordinary transactions that:
-
Spend one or more UTXOs created by previous transactions
-
Create one or more new outputs
In the data model, each transaction has a transaction id, often called TXID, which is a hash of its serialized form (with details around witness data after Segregated Witness). Inputs refer to previous outputs by their TXID and index.
From the block’s perspective, the ordering of transactions matters for validation, because some transactions in the block can depend on outputs created earlier in the same block. Nodes must process transactions in order to ensure that any such internal dependencies are satisfied.
3.4 Ordering and miner choice
Miners are free to choose the order of transactions in a block, as long as dependencies are satisfied and all consensus rules are followed.
In practice, miners order transactions mainly by economic criteria such as fee rate, and by technical constraints such as package dependencies.
The Merkle tree is built over the transactions in exactly this order, so the order becomes part of the block’s committed structure.
4. Merkle trees and transaction commitment
The Merkle tree is a central part of the block’s data model. It connects the full list of transactions to the compact Merkle root in the header.
4.1 Building the Merkle tree
The process starts from the transaction ids.
-
Take the hash of each transaction in the block. These hashes form the leaves of the tree. The first transaction in the block becomes the first leaf.
-
Pair the leaves in order. For each pair, concatenate the two hashes and hash them together to get a parent node.
-
If there is an odd number of leaves at any level, the last hash is duplicated to form a pair with itself.
-
Repeat this process on the resulting list of parent hashes, climbing up level by level, until only one hash remains.
The final hash at the top is the Merkle root. It depends on all leaf hashes and on their order.
4.2 Why this structure is useful
The tree has two main properties.
First, it gives a compact commitment. A single 256 bit Merkle root can represent any number of transactions. If any transaction changes, the leaf hash changes and so does the root.
Second, it supports efficient proofs. A Merkle proof for a particular transaction includes:
-
The hash of the transaction itself
-
The hashes of the siblings of each node along the path from that leaf to the root
A verifier:
-
Hashes the transaction to get the leaf hash.
-
Iteratively combines it with the provided sibling hashes, following the correct left and right ordering.
-
Computes a candidate root.
-
Checks whether this root matches the Merkle root in the block header.
The length of the proof grows with the logarithm of the number of transactions. This is very efficient.
4.3 Simplified payment verification
Simplified payment verification clients use Merkle proofs to check that a transaction is included in a block without storing the block’s full contents.
They:
-
Download the chain of block headers and verify proof of work
-
For a given transaction, ask a full node for a Merkle proof that this transaction appears in a specific block
-
Check the proof against the Merkle root in the header
If the header is part of the longest chain with valid proof of work, and the proof is correct, the client has strong evidence that the transaction was accepted by miners, even if it does not verify all scripts and UTXO rules itself.
This illustrates how the header and tree together form a layered data model with different verification levels.
5. Size, weight and limits
The block is not allowed to grow without bound. There are consensus limits on its size and shape.
5.1 Historical size limit
Originally blocks had a simple size limit in bytes. This limited how much data miners could include and constrained the number of transactions per block.
The size limit helped protect nodes and the network from very large blocks that would be slow to download and process.
5.2 Weight and virtual size
Later, a concept of block weight was introduced. It assigns different costs to different parts of a transaction. Witness data, which carries signatures and some script parts, is discounted relative to other data.
From a data model perspective:
-
Each block has a maximum weight, measured in units
-
A transaction has a weight based on its structure and fields
-
For human reasoning, this is often mapped to virtual bytes by dividing weight by a constant factor and rounding up
The important idea is that blocks are constrained by a consensus rule that limits total weight. This shapes what miners can include and how transactions are encoded.
5.3 Impact on structure
The weight system influenced transaction formats and block composition. However, it does not change the logical view that a block is:
-
A header
-
A count
-
A list of transactions
It simply affects how much of that list can be present in one block while remaining valid.
6. The blockchain as a data model
Now that we understand individual blocks, we can look at how they combine to form the overall data model for Bitcoin.
6.1 Linear history of blocks
Conceptually, the chain is a linear sequence of blocks starting from the genesis block. In practice:
-
Each block header points back to a previous block hash
-
Nodes maintain a mapping from block hash to a record that includes height, header and some metadata
-
The active chain is the path from the genesis block to the current tip with the most accumulated work
There can be side branches where some blocks do not lie on the active chain. These are important for reorganization logic but not part of the main history.
6.2 UTXO set as current state
The blockchain is the history. The UTXO set is the current state derived from that history.
From a data model perspective, you can imagine that:
-
The sequence of blocks is the append only log
-
The UTXO set is a materialized view constructed by applying each transaction as an update
Nodes can reconstruct the UTXO set from scratch by replaying the log. In practice, they store the UTXO set in an indexed database for fast lookup, separate from raw block data on disk.
6.3 Transactions as edges in a graph
Inside this block sequence, transactions form a directed graph:
-
Each UTXO created by a transaction is a node in the graph
-
Spending that UTXO with a later transaction creates a directed edge from the old output to the new outputs
This graph structure coexists with the linear block structure:
-
Blocks provide time ordering and proof of work
-
The transaction graph describes how value flows
On chain analysis often works with this graph. Nodes themselves work with it implicitly when updating the UTXO set.
7. Transaction referencing inside the data model
Every input in every transaction needs to reference a previous output. This referencing is entirely hash based.
7.1 Transaction ids and output indices
Each transaction has a transaction id, which is a hash of its serialized form following certain rules.
Outputs in a transaction are stored in a numbered list. The first output has index zero, the next index one, and so on.
An input that wants to spend an output must specify:
-
The transaction id of the transaction that created the output
-
The index of that output inside that transaction
This pair identifies a specific UTXO uniquely in the data model.
7.2 Dependencies inside one block
Transactions in the same block can depend on each other. For example:
-
Transaction A creates an output
-
Transaction B in the same block spends that output
In order for the block to be valid:
-
The transaction that creates the output must appear earlier in the block
-
Nodes must apply transactions in order to maintain the UTXO set correctly
If a transaction tried to spend an output created later in the same block, the node would not find that UTXO in its current set when processing the spending transaction, and the block would be rejected.
This ordering requirement is part of the data model and affects how miners assemble blocks.
7.3 Preventing double spends
The UTXO set model and referencing rules also prevent double spends:
-
When a node processes a transaction, it checks that each input’s referenced UTXO is present in the UTXO set
-
It removes that UTXO when accepting the transaction
If another transaction later tries to reference the same UTXO as an input, the node will not find it in the UTXO set and will reject the transaction.
This holds both within a single block and across blocks in the chain.
8. Storage and indexing in full nodes
Although the protocol defines a logical structure, node software must choose concrete ways to store this data on disk and in memory.
8.1 Block storage
Nodes typically store blocks in files on disk:
-
Blocks are appended in roughly the order they are received during initial download
-
Metadata tables keep track of which file and offset contain a given block
Blocks can be compressed or stored in a relatively raw format. The exact layout varies between implementations.
8.2 Block index
Alongside raw block storage, nodes maintain a block index in a database. Each entry in the block index usually contains:
-
The block header
-
The block’s height
-
The hash of the previous block
-
Pointers to the position of the full block on disk
-
Some validation flags and metadata
With this index, a node can quickly navigate the chain, find ancestors, and determine the active tip.
8.3 UTXO set storage
The UTXO set is stored separately from blocks, usually in a key value database.
Each entry maps an outpoint, which is a combination of a transaction id and output index, to:
-
The amount of bitcoin in that output
-
The locking script and some flags
When processing transactions, nodes consult this database to check whether a referenced UTXO exists and to learn its properties. Updates to this database happen as blocks are connected and disconnected during chain changes.
8.4 Optional transaction index
Some node configurations also build an index from transaction id to its location in the blockchain. This allows quick lookup of any transaction by id.
This index is not required for consensus. It is a convenience feature. It increases disk usage and build time, so many nodes disable it when they do not need such queries.
8.5 Pruning old block data
Nodes that operate in pruned mode:
-
Still build and maintain the full block index and UTXO set
-
Still verify all blocks as they come in
-
After a block is sufficiently buried, delete its full transaction data from disk while keeping header and index metadata
From a data model standpoint, pruning trades away archival history in exchange for reduced disk use. The logical chain is still fully validated. Only the ability to serve old blocks to others is reduced.
9. Segregated Witness and block layout changes
So far we have talked about blocks as if all transaction data were in a single structure. The introduction of Segregated Witness added some complexity to the data model, while keeping compatibility with old nodes.
9.1 Witness data
Witness data includes signatures and some other script related elements. With Segregated Witness:
-
The main part of the transaction is serialized in a way that excludes witness data from the old style transaction id hash
-
The witness data is stored separately and can be discounted in weight calculations
This split affects:
-
How transaction ids are computed
-
How nodes serialize and deserialize blocks
Conceptually, though, a block still has a header and a list of transactions. The witness is an additional layer attached to those transactions.
9.2 Two kinds of transaction identifier
After Segregated Witness, there are two relevant identifiers for a transaction:
-
The traditional transaction id, which does not include witness data in its hash
-
A witness transaction id, which does include witness data
The Merkle tree in the block header continues to commit to the traditional transaction ids. An additional commitment to witness data is placed in a reserved field inside the coinbase transaction.
From a data model perspective:
-
Nodes that are not aware of Segregated Witness can still see a valid chain of blocks and verify the Merkle roots and transaction ids
-
Nodes that understand witness data can also check the extra commitment and fully validate signatures that live in the witness structure
This layered design allowed the new format to be deployed as a soft fork.
9.3 Block structure with witness
In practical serialization, a block with Segregated Witness transactions carries:
-
The same kind of header as before
-
A transaction count
-
Transaction data that includes markers and flags to indicate the presence of witness
-
Witness sections that follow the main parts of each transaction
The logical model remains:
-
Header
-
List of transactions with associated witness
The commitments in the header and coinbase give nodes a way to verify consistency between these pieces.
10. Reorganizations and structural change
Although blocks are designed to form a single chain, in practice there can be temporary disagreements about which block is at the tip.
10.1 Competing branches
When there are two branches of the chain that share a common ancestor but differ after that point:
-
Each branch has its own sequence of blocks and its own version of the UTXO set if you were to replay history along that branch
-
Nodes track both branches in their block index but designate one as active based on accumulated work
In this situation, the data model includes multiple possible views of history, with only one being considered the main one.
10.2 Reconnecting and disconnecting blocks
When a node switches from one branch to another:
-
It disconnects blocks from the old branch, undoing their transactions in reverse order and rolling back the UTXO set
-
It connects blocks from the new branch, applying their transactions forward and updating the UTXO set
The block index records which blocks belong to which branch. The UTXO set always reflects the active branch.
From the storage point of view, both branches share the same underlying block files and index entries. The difference is in which path through the index is considered authoritative.
10.3 Immutability in practice
At the structural level, immutability is not absolute. Blocks can move between branches, and a block that once was part of the active chain can become part of a side chain after a reorganization.
What gives practical finality is that switching to a different branch that diverges far in the past would require enormous new proof of work. The difficulty rules make deep reorganizations extremely unlikely under normal conditions.
This topic belongs mostly to a consensus and security article, but it helps to understand that the data model is designed to support multiple branches while keeping a clear criterion for choosing one active chain.
11. Summary
Block structure and the data model are the skeleton that holds together all the other concepts in your series.
In this article we saw that:
-
A block consists of a header and a list of transactions, with the coinbase first.
-
The header contains version, previous block hash, Merkle root, timestamp, difficulty target and nonce, which together support linking and proof of work.
-
Transactions inside a block are summarized by a Merkle tree whose root is stored in the header.
-
Blocks are limited by size and weight rules that shape how much data can fit into a single unit of history.
-
The blockchain is an append only log of blocks, while the UTXO set is the current state derived from that log.
-
Transactions reference previous outputs using transaction ids and indices, forming a graph inside the block sequence.
-
Full nodes store blocks, maintain a block index and a UTXO set, and may prune old data while preserving the verified state.
-
Segregated Witness added an extra layer of commitment and changed how some data is serialized, without changing the basic idea of a header plus transactions structure.
-
The data model supports temporary branches and reorganizations, with rules that select a single active chain based on accumulated work.
With this structural understanding, you have the missing piece between individual transactions and the higher level story of consensus and security. The next natural step is to focus on how blocks are chosen and agreed upon in a decentralized way, which leads into an article specifically about consensus, forks and the security model.
No comments:
Post a Comment