2026-06-04

What Chainalysis actually sees (and what it doesn't)

To most users, Chainalysis, TRM or Elliptic is a black box that somehow decided their funds are "dirty". In reality the methodology is well documented: clustering, attribution, path scoring. Understanding the mechanics is the first step to disputing its conclusions.

Step 1: clustering (grouping one owner's addresses)

Blockchains are pseudonymous: billions of addresses, orders of magnitude fewer owners. The engine's first task is clustering: determining which addresses one entity controls. The base heuristics have been known for decades: co-spend (addresses signing inputs of one transaction belong to one wallet), change-address detection, behavioural patterns (activity hours, typical amounts, recurring routes). Large exchange and service clusters are identified even more easily, through hot-wallet and consolidation patterns.

Step 2: attribution (pinning a label on the cluster)

A cluster by itself is just a set of addresses. The value of Chainalysis-grade databases is in the labels: "this is exchange X's hot wallet", "this is Tornado Cash", "this is scam project Y's cluster". Where labels come from: test purchases and deposits (the vendor sends funds to an exchange and watches where they land), public incidents and court materials, sanctions lists, partner tagging. Crucially: attribution is a database's claim, not a blockchain fact. It can be stale, incomplete and occasionally wrong, and it updates retroactively, which is why last year's transactions suddenly "got dirty".

Step 3: path scoring (computing your risk)

Your deposit is scored by its links: how many hops to risky clusters, what share of volume arrived from them, which risk category (sanctions weigh more than gambling). Hence the key counterintuitive consequence: your risk score is a property of your funds' path, not of your behaviour. The exchange-side view of this scoring is covered in a separate article.

What the engine does NOT see

It doesn't see identities: outside KYC touchpoints (exchanges, verified exchangers) the graph is anonymous, and the engine knows "cluster #483920", not a name. It doesn't see intent: "received a client's payment" and "received a scammer's tranche" are identical to the graph. It doesn't see off-chain context: your contracts, invoices and chats are not in the graph, which is exactly why they must be submitted separately as Source of Funds.

Where false positives come from

Aggressive clustering

Custodial services and exchangers commingle thousands of clients' funds: "neighbourhood" in a shared wallet drags someone else's risk into your trail.

Stale or coarse attribution

A cluster may have changed hands; a service may have been resold; a "high-risk exchange" label sometimes covers perfectly legal regional platforms.

Inherited deep-hop risk

At 4 to 5 hops the "dirty" share dilutes to single percentage points, yet a conservatively tuned exchange may still flag the deposit.

Why this works for your defence too

The methodology is symmetric: an independent forensic report built with the same class of tooling shows your graph the way the exchange sees it, including the specific point where the risk arose. The position then becomes concrete: here is the hop, here is the share, here are the documents for that transaction, rather than "I'm honest, trust me". Attempts to "clean the trail" with mixers after a flag backfire: severing the graph is itself a heavy flag. How we apply this in cases is described in the Onyx AML methodology.

Bottom line

Chainalysis is neither an all-seeing eye nor a random-accusation generator; it is a statistical machine with known heuristics and known weaknesses. You can argue with it concretely, in the language of graphs and documents. If your case has already been flagged, assess it or send a request: we'll look at your trail with the same eyes the exchange uses.