What Chainalysis actually sees (and what it doesn't)
To most users, Chainalysis, TRM or Elliptic is a black box that somehow decided their funds are "dirty". In reality the methodology is well documented: clustering, attribution, path scoring. Understanding the mechanics is the first step to disputing its conclusions.
Step 1: clustering — grouping one owner's addresses
Blockchains are pseudonymous: billions of addresses, orders of magnitude fewer owners. The engine's first task is clustering: determining which addresses one entity controls. The base heuristics have been known for decades: co-spend (addresses signing inputs of one transaction belong to one wallet), change-address detection, behavioural patterns (activity hours, typical amounts, recurring routes). Large exchange and service clusters are identified even more easily — by hot-wallet and consolidation patterns.
Step 2: attribution — pinning a label on the cluster
A cluster by itself is just a set of addresses. The value of Chainalysis-grade databases is in the labels: "this is exchange X's hot wallet", "this is Tornado Cash", "this is scam project Y's cluster". Where labels come from: test purchases and deposits (the vendor sends funds to an exchange and watches where they land), public incidents and court materials, sanctions lists, partner tagging. Crucially: attribution is a database's claim, not a blockchain fact. It can be stale, incomplete and occasionally wrong — and it updates retroactively, which is why last year's transactions suddenly "got dirty".
Step 3: path scoring — computing your risk
Your deposit is scored by its links: how many hops to risky clusters, what share of volume arrived from them, which risk category (sanctions weigh more than gambling). Hence the key counterintuitive consequence: your risk score is a property of your funds' path, not of your behaviour. The exchange-side view of this scoring is covered in a separate article.
What the engine does NOT see
It doesn't see identities: outside KYC touchpoints (exchanges, verified exchangers) the graph is anonymous — the engine knows "cluster #483920", not a surname. It doesn't see intent: "received a client's payment" and "received a scammer's tranche" are identical to the graph. It doesn't see off-chain context: your contracts, invoices and chats are not in the graph — which is exactly why they must be submitted separately as Source of Funds.
Where false positives come from
Aggressive clustering
Custodial services and exchangers commingle thousands of clients' funds: "neighbourhood" in a shared wallet drags someone else's risk into your trail.
Stale or coarse attribution
A cluster may have changed hands; a service may have been resold; a "high-risk exchange" label sometimes covers perfectly legal regional platforms.
Inherited deep-hop risk
At 4–5 hops the "dirty" share dilutes to single percentage points, yet a conservatively tuned exchange may still trigger.
Why this works for your defence too
The methodology is symmetric: an independent forensic report built with the same class of tooling shows your graph the way the exchange sees it — with the specific point where the risk arose. The position then becomes concrete: here is the hop, here is the share, here are the documents for that transaction — instead of "I'm honest, trust me". Attempts to "clean the trail" with mixers after a flag backfire: severing the graph is itself a heavy flag. How we apply this in cases — the Onyx AML methodology.
Bottom line
Chainalysis is neither an all-seeing eye nor a random-accusation generator — it is a statistical machine with known heuristics and known weaknesses. You can argue with it concretely, in the language of graphs and documents. If your case has already been flagged — assess it or send a request: we'll look at your trail with the same eyes the exchange uses.