Privacy and a First Look at On-chain Analysis
Bitcoin is often described as anonymous digital cash. That description is misleading.
Bitcoin gives you addresses that are not directly tied to your real name in the protocol. At the same time, every transaction and every unspent output is stored forever on a public ledger. Anyone can download the chain and study how coins move.
This combination creates a very particular kind of privacy. It is not complete anonymity and it is not full transparency in the usual banking sense. It is something in between. To understand it, we need to look at how addresses, UTXOs, transactions and analysis techniques interact.
This article starts with short definitions of the main ideas. Then it goes deeper into how transaction structure affects privacy, how basic on chain analysis works and what practical habits can reduce unnecessary information leaks.
1. Core ideas in brief
1.1 Pseudonymity
Bitcoin does not know your legal identity. It only knows public keys, addresses, scripts and transaction data. When you send or receive, the chain records movements between addresses.
If nobody ever learns that a particular address belongs to you, the coins at that address are not trivially linked to your name. In that sense you are pseudonymous. Your activity is tied to a pseudonym rather than to your civil identity.
The problem is that in real life people eventually connect some of these pseudonyms to real identities through exchanges, merchants, reused addresses or simple communication.
1.2 UTXOs and addresses
The ledger is a set of unspent transaction outputs, UTXOs for short. Each UTXO has a fixed amount of bitcoin and a script that defines who can spend it.
Addresses are human friendly encodings of locking scripts or script hashes. When you receive bitcoin, you are really receiving one or more UTXOs that pay to some script controlled by your keys. The outputs are visible on chain and the addresses they pay to are also visible.
Privacy on chain is largely about how you create, combine and spend UTXOs associated with different addresses.
1.3 Transactions as a public graph
Every transaction consumes one or more previous UTXOs and creates new ones. This creates a directed graph:
-
Inputs point to previous outputs.
-
New outputs become inputs of future transactions.
Anyone can follow this graph and see how coins flow from one set of outputs to another over time. Analysis methods build on this graph structure.
1.4 Heuristics and clustering
On-chain analysis often uses heuristics. A heuristic is a rule that is not guaranteed to be correct in every case but that is often correct in practice.
For example, if a transaction has many inputs, analysts may assume that most or all of those inputs are controlled by the same entity, because a single wallet often gathers several UTXOs it controls when it needs more funds. This is one step in building clusters of addresses that likely belong to the same owner.
Such heuristics must be treated as probabilistic. They are useful, but they can be wrong in some situations.
1.5 On-chain vs off-chain information
On-chain information comes entirely from the blockchain itself. It includes:
-
Addresses and scripts in outputs.
-
Amounts and timing of all transactions.
-
Relations between inputs and outputs.
Off-chain information comes from outside sources:
-
Know your customer data at exchanges.
-
Merchant logs and invoices.
-
Public donations pages that display receiving addresses.
-
Communications, leaks or investigations.
Strong on-chain analysis usually combines both. The chain gives structure and links. Off-chain data provides labels and identities for parts of the graph.
2. Why Bitcoin is not private by default
At a glance, using a string of characters as an address looks anonymous. There is no name attached and no user profile embedded in the protocol. However, several properties of the system limit privacy.
2.1 Permanent and global transparency
Every full node stores and verifies the same ledger. This means:
-
All transactions are public.
-
All UTXOs are visible.
-
Anyone can replay the entire history.
There is no concept of a private account statement that only you and a bank can see. Once a transaction is confirmed, it becomes a permanent part of a public dataset that can be studied indefinitely.
2.2 Linkability through transaction structure
The way you construct transactions often reveals which UTXOs and addresses are likely controlled by the same entity.
If you regularly combine funds from many of your own addresses into a single transaction, observers see these inputs together and can suspect common ownership.
If you reuse the same address for many incoming payments, observers can see that all those coins are going to the same place.
Over time, these patterns accumulate.
2.3 Real world identity leaks
In practice, many interactions with bitcoin connect on-chain activity to off-chain identity:
-
When you deposit or withdraw from a regulated exchange, the exchange knows your identity and which addresses it uses for you.
-
When you publish a donation address on a website with your name, you link that address to yourself.
-
When you pay a merchant who keeps record of customer orders and addresses, there is a connection between your order and on-chain transactions.
Once one address is linked to your identity, on-chain analysis can reveal much more about your past and future activity than you might expect.
3. How transaction structure affects privacy
The basic Bitcoin transaction format does not know anything about privacy. It simply defines inputs and outputs. The way your wallet chooses inputs and constructs outputs can either reduce or amplify information leaks.
3.1 Address reuse
Address reuse means receiving multiple payments to the same address.
From a privacy perspective, this is nearly always undesirable:
-
Everyone can see that all those funds belong to the same script.
-
Any time you spend from that address, observers see the total balance and where it goes.
For example, if you publish a single donation address on a public website and reuse it forever, anyone can see:
-
The total amount that address has received over time.
-
How often funds are moved out.
-
What other addresses they end up in when you spend.
The common recommendation is to use a new address for each incoming payment whenever possible. Modern wallets can generate many addresses from a single seed and track them internally.
3.2 Multi input transactions
When a wallet needs to send more bitcoin than is contained in any single UTXO it controls, it often gathers several UTXOs as inputs in one transaction.
If those UTXOs come from different addresses, on-chain observers see a transaction that spends from all those addresses at once. One common heuristic is to assume that all inputs in such a transaction belong to the same owner, since one entity must provide the necessary signatures.
This is not always true. Advanced protocols can intentionally mix inputs from different users. For ordinary wallet use, however, the heuristic is frequently correct.
The result is that a single payment can link many previously separate addresses into one cluster in the analyst’s model.
3.3 Change outputs
When you spend, your inputs often add up to more than the amount you want to send. The difference returns to you as a change output.
Consider a simple example:
-
You have a single UTXO of 0.5 BTC.
-
You want to pay 0.3 BTC.
-
The wallet creates a transaction with one input (your 0.5 BTC UTXO) and two outputs. One is 0.3 BTC to the recipient. The other is about 0.2 BTC minus fees to a new address you control as change.
On chain, nobody sees a label that says “this output is change”. Analysts still try to guess which output is likely to be change and which is likely to be the payment. They use several clues:
-
Payment amounts are sometimes “round” values. Change often is not.
-
Wallets often use fresh addresses for change.
-
The distribution of amounts in many transactions can reveal typical patterns.
If an analyst correctly identifies the change output, they can follow that new UTXO and link it back to your existing cluster of addresses.
3.4 Consolidation and splitting
Wallets sometimes consolidate small UTXOs into larger ones when fees are low. This reduces future transaction size and can save fees later.
From a privacy angle, consolidation collects many separate outputs into a single transaction and creates a new cluster. All these previously scattered UTXOs now appear together as inputs.
Likewise, splitting one large UTXO into many outputs can reveal relationships between those outputs if done in an observable way.
These activities are not inherently bad. They simply change the shape of the transaction graph in ways that can be more or less revealing.
4. A first look at on-chain analysis
On-chain analysis is the practice of studying the blockchain data to infer patterns, behaviors and sometimes identities. It can be used by researchers, businesses and law enforcement.
This section sketches the main ideas at a high level.
4.1 Building an address graph
The first step is to represent the blockchain as a graph:
-
Nodes in the graph can represent addresses, scripts or UTXOs.
-
Edges represent flows of value or shared usage.
Analysts then apply heuristics to connect addresses that are likely controlled by the same entity. For example:
-
The multi input heuristic, where inputs in one transaction are assumed to share an owner.
-
The change output heuristic, where one output in a transaction is marked as likely change.
By repeating this process over many blocks, the analyst builds clusters of addresses. Each cluster is a guess about which addresses belong to the same entity.
4.2 Labeling clusters
Clusters by themselves are nameless. To become interesting, they need labels.
Labels can come from:
-
Exchanges and services that publish deposit or hot wallet addresses.
-
Research where people send tiny amounts to known services and observe where those coins move.
-
Public statements from organizations that reveal addresses they control.
-
Data shared by service operators or obtained through legal processes.
Once a cluster is labeled as “Exchange X” or “Service Y”, all transactions in and out of that cluster can be studied as flows to or from that entity.
4.3 Flow analysis
With clusters and labels in place, analysts can ask questions such as:
-
What fraction of mined coins go to exchanges over time.
-
How much value flows between two services.
-
How long coins tend to sit idle before they move again.
In some cases, when off-chain information indicates that a particular user interacted with a specific service at a certain time, analysts can narrow down the set of possible transactions associated with that user. From there, further flows may be studied.
For neutral research, this can provide insights into market behavior, liquidity and long term holding patterns. For investigations, it can form one piece of a larger puzzle.
4.4 Recognizing common patterns
Certain transaction patterns have recognizable shapes. Examples include:
-
Large consolidation transactions from services that gather many UTXOs into a few.
-
Regular payout patterns from mining pools or custodial wallets.
-
Transactions with distinctive timing or structure used by specific protocols.
On-chain analysis tools often include libraries of such patterns and highlight them in visualizations.
5. Everyday activities that leak information
Many ordinary activities that feel harmless in daily use can have large privacy consequences once seen in the context of the public ledger.
5.1 Using exchanges
When you use a regulated exchange that requires identity verification, the exchange sees:
-
Your legal identity.
-
The addresses it uses to receive deposits from you.
-
The addresses it uses when you withdraw.
If such an exchange shares some of this information with partners or authorities, those addresses and surrounding clusters can become labeled in analysis datasets.
This does not mean that every use of an exchange is automatically exposed to the world. It means that whoever has access to those labels and the blockchain can see much more about how funds move to and from that exchange.
5.2 Public donation and tip addresses
Publishing a static donation address on a public profile is convenient. It is also highly revealing.
Anyone can see:
-
Total donations received.
-
When donations arrive.
-
When funds are moved and how they are grouped with other addresses.
If later you reuse the same wallet in a more private context, these on-chain connections remain.
5.3 Reuse across roles
Suppose you run a small business and also use the same wallet for personal savings. If you accidentally mix outputs from business payments and personal funds in one transaction, analysis tools may connect these two roles in their models.
From the wallet’s point of view, it is just picking UTXOs to satisfy a payment. From the chain’s point of view, it is revealing that those coins share a common controller.
5.4 Revealing habits and schedules
Because timestamps are public, repeated payment patterns can reveal habits:
-
Salaries that are paid on a fixed schedule.
-
Regular subscription payments.
-
Periodic withdrawals and deposits to services.
Even without knowing who you are, an analyst can see patterns of activity over months or years tied to specific address clusters.
6. Practical steps that can improve privacy
Within normal and lawful use, there are habits that improve privacy on chain. None of them provide perfect secrecy, but they can reduce avoidable leaks.
6.1 Avoiding unnecessary address reuse
Using a fresh address for each incoming payment is a simple and effective habit.
Modern hierarchical wallets can generate many addresses from one seed. They keep track of which addresses belong to you and which UTXOs are associated with them. You do not need to remember all these addresses manually.
By not reusing addresses, you avoid giving observers an easy way to see that multiple incoming payments go to the same script.
6.2 Being careful with UTXO selection
Some wallets offer coin control features. These allow you to choose which UTXOs to spend. This can prevent unintentional linking of funds from different contexts.
Examples of cautious behavior include:
-
If you receive income from several unrelated sources, avoid using a single transaction that spends from all of them at once unless necessary.
-
If you want to pay someone from a particular subset of your holdings, try to select UTXOs that already belong to that subset.
This is a more advanced habit and requires an understanding of how your wallet presents UTXOs.
6.3 Being aware of consolidation
Consolidating many small UTXOs into one is a good idea from a fee management perspective. You can do it when fees are low and then enjoy smaller, cheaper transactions later.
However, consolidation also links all these UTXOs on chain. If they came from different roles or privacy contexts, consolidation removes separation between them in the on-chain view.
A cautious approach is to consolidate within reasonable boundaries. Group UTXOs that you are comfortable being seen as belonging together. Avoid consolidating across clearly distinct roles such as business and personal funds.
6.4 Using second layer solutions when appropriate
Second layer solutions, such as payment channels, can improve privacy in some scenarios because not every interaction appears directly on the base chain.
For example, you might:
-
Open a channel once, which appears as an on-chain transaction.
-
Then send many small payments over that channel that are not individually recorded on the chain.
This does not make you invisible. Channel openings and closings are still visible. Routing information may be partially observable by participants. However, it can reduce the amount of detailed activity that is written into the long term ledger.
7. Limits of privacy improvements
It is important to be realistic. No set of habits can create a perfect cloak of invisibility on a public blockchain.
Several factors limit what is possible.
7.1 Strong adversaries and long time horizons
Adversaries with significant resources can:
-
Run many full nodes and observe transactions at the network layer.
-
Collect off-chain data from services, leaks or legal processes.
-
Apply sophisticated clustering and statistical methods.
-
Combine information across long periods.
Even if you follow good practices, some parts of your activity may still be linkable.
7.2 Human error and complexity
The more complex a privacy setup becomes, the easier it is to make mistakes. For example:
-
Accidentally reusing an address that you intended to retire.
-
Mixing funds from different contexts in a single transaction.
-
Revealing information in communication that contradicts your on-chain strategy.
This is one reason why general best practices often focus on simple, repeatable habits rather than intricate schemes.
7.3 Interaction with others
Your privacy is not only determined by your own behavior. It also depends on how others handle data.
If you pay someone who reuses addresses or who publishes invoices with your address, or if a service you use experiences a data breach, these events can reveal more than your own on-chain behavior might suggest.
8. How on-chain analysis is used in practice
On-chain analysis is not inherently good or bad. It is a technique. Different actors use it for different purposes.
8.1 Research and education
Researchers and educators use on-chain analysis to:
-
Study adoption over time.
-
Understand how long holders keep coins before spending.
-
Measure the distribution of transaction sizes.
-
Explain how real users behave in aggregate.
These studies often use anonymized aggregates and inform theory and public understanding.
8.2 Business intelligence
Businesses that interact with Bitcoin can use on-chain data to:
-
Monitor liquidity across services.
-
Track flows between their own wallets.
-
Detect abnormal behavior that may indicate technical problems or fraud.
Since the ledger is public, they can also see competitors’ high level activity, although not internal details.
8.3 Compliance and investigations
Regulated entities and public authorities may use on-chain analysis as one input to their compliance programs. For instance:
-
Checking that incoming funds are not clearly linked to known thefts which have been publicly marked.
-
Tracing flows after a public incident such as a hack or a scam.
On-chain patterns alone rarely tell a full story. They are usually combined with off-chain records, communications and other evidence when serious investigations are conducted.
9. A realistic view of privacy on Bitcoin
Putting all these pieces together gives a more balanced picture of privacy on the Bitcoin base layer.
-
The ledger is completely transparent at the level of addresses, UTXOs and transaction structure.
-
Identities are not stored on the chain, but they can be linked through off-chain data and behavioral patterns.
-
Normal use habits such as address reuse, multi input transactions and consolidation can make linkability easier.
-
Careful habits can reduce avoidable leaks, but they do not guarantee perfect secrecy against strong analysis.
From a user perspective, this suggests a simple attitude:
-
Treat Bitcoin as pseudonymous rather than anonymous.
-
Assume that long term, interested observers can learn a great deal about flows between clusters of addresses.
-
Use best practices to avoid unnecessary exposure, especially when mixing different roles like personal and business funds.
10. Summary
This article has introduced privacy and on-chain analysis at a conceptual level.
Key points are:
-
Bitcoin uses pseudonyms in the form of addresses and scripts, but the ledger is public and permanent.
-
Transactions link UTXOs in a graph that can be studied by anyone.
-
On-chain analysis uses heuristics to build clusters of addresses and combines them with off-chain labels.
-
Everyday behaviors such as address reuse, multi input spending, change outputs and consolidation can reveal relationships between funds.
-
Good habits, like using fresh addresses and being careful with UTXO selection, can improve privacy but not make it perfect.
-
On-chain analysis is used for research, business intelligence and investigations, depending on who applies it.
With this first look, it becomes easier to understand both the strengths and the limits of privacy on the Bitcoin base layer. Future work can explore more advanced techniques and second layer designs that aim to provide stronger privacy guarantees, always with the understanding that they operate on top of a fundamentally transparent settlement system.
Other links
1. Blockchain and Bitcoin - Overview and Big Picture
https://shiluqi.blogspot.com/2025/11/blockchain-and-bitcoin-main-article.html
TODO
No comments:
Post a Comment