Project · Architecture
Every month, accounting teams spend weeks turning raw transactions into financial statements. An agent can absorb this work — but only by earning trust one task at a time, proving itself through independent verification, and getting better every cycle.
At the end of every month, a company needs to produce financial statements — how much money came in, how much went out, what the company owns, what it owes. These numbers have to be right. The CEO, the board, investors, and auditors all rely on them.
Getting from raw transactions to trustworthy statements is called closing the books. It typically takes 5–10 business days — longer at companies with multiple entities or legacy systems — and follows six steps:
Capture every transaction — payments, invoices, payroll — in the company’s central financial record.
Add entries the system can’t capture automatically: estimated expenses, the monthly portion of annual contracts, equipment depreciation.
Prove the company’s records match reality. Does the cash in the ledger match the bank? Does the detail add up to the totals?
Write explanations for anything significant: “Expenses up 23% because we opened the Dublin office and hired three people.”
Generate the income statement, balance sheet, and cash flow in the formats the board, investors, and auditors expect.
The controller and CFO review everything. Auditors may request supporting documents. Then the books are “closed.”
The goal is to get this from 15 days to one.
Most of the close isn’t decision-making. It’s mechanical work that follows the same patterns every month.
The mechanical 65% follows patterns. Same entries, same reconciliations, same reports. This work matters — errors mean wrong financial statements — but it doesn’t require the controller’s expertise. It requires their time. That’s the opening.
An agent that absorbs the mechanical work — but only after it earns the right to.
The instinct with AI is to automate everything at once. That’s wrong for accounting. Financial statements are legal documents. They get audited. Errors have real consequences. You can’t hand an AI the keys on day one.
Instead, the agent earns trust one task at a time. It starts by watching. Then it starts drafting entries for the controller to review. Once it’s proven reliable for a specific type of entry — after months of getting it right — it starts posting automatically with a full audit trail.
Depreciation might be fully automated in three months. Revenue recognition might take a year. Tax work might always need a human. And that’s fine. The agent doesn’t need to own everything. It needs to own enough to turn a 12-day close into a 1-day close.
Four levels. Each task advances independently, based on its accuracy track record. The trajectory below is illustrative — the framework, not documented customer state.
Human does the work. Agent observes and learns the pattern.
Agent prepares the entry. Human reviews everything before it posts.
Agent posts automatically. Human only reviews flagged items.
Fully automated with audit trail. Human sees a summary.
| Task | Plausible level after 12 months | Why |
|---|---|---|
| Depreciation | Owns | Same calculation every month. Pure rule application. Candidate for early automation. |
| Payroll accrual | Owns | Pulled from payroll system on a known schedule. Reconciles cleanly. |
| Bank reconciliation | Posts | Most line items auto-match given good integrations. Humans handle the exceptions. |
| Vendor cost estimates | Posts | Once estimates track actuals closely for several months, the agent posts and humans spot-check. |
| Revenue recognition | Drafts | ASC 606 requires judgment on variable consideration and performance obligations. Drafted, not posted. |
| Intercompany & allocations | Watches | Multi-entity logic and policy choices that change with corporate structure. Human-led. |
The strategic insight: the trust ramp is the moat. After 12 months of earned trust, switching to a different system means every task drops back to Level 1. All those months of proven reliability? Gone. You can’t export earned trust.
How does the agent demonstrate it’s getting things right? The same way accountants do: by reconciling.
Every time the agent posts an entry, it checks the result against an independent source. Cash in the ledger matches cash at the bank. Customer invoices add up to the accounts receivable total. Revenue schedules tie to the balance sheet. If any check fails, the agent flags the problem before anyone reviews the work.
This is structurally better than the agent saying “I’m 94% confident this is right.” Confidence is a self-assessment. A reconciliation is independent evidence. It doesn’t prove everything — balance ties don’t guarantee correct classification, cutoff, or completeness, and those need their own controls (the safety layers in the next section). But it’s a verifiable signal the agent can’t talk its way past. And the reconciliation track record is what drives the trust ramp — a task advances not because someone decides to trust it, but because its entries have verified successfully for months in a row.
No single check protects the books. Multiple independent safety layers run on every entry — any one can catch an error, even if the others miss it.
The best AI agent systems in production share a design principle: never rely on one safety mechanism. Anthropic’s Claude Code layers permissions, hooks, a classifier, and sandboxed execution on top of each other — each independent, any one can block. The close agent applies the same principle:
Does this entry comply with the company’s accounting rules? A depreciation entry that violates the capitalization threshold gets blocked — regardless of the task’s trust level.
Is this entry unusually large or unusual for this account? A $500K entry to an account that normally sees $50K gets flagged for human review, even if the task is fully automated.
Does the ledger still tie to its independent sources? If posting an entry breaks a reconciliation, the agent flags it immediately.
Do all the entries make sense together? Debits equal credits, intercompany balances net to zero, the trial balance is in balance.
Evaluate the entry, not the explanation. Claude Code’s safety classifier evaluates what the agent is doing, not what the agent says about what it’s doing. The close agent applies the same principle — when deciding whether an entry can auto-post, the system looks at the amounts, the accounts, and the reconciliation result. It ignores the agent’s reasoning. If the entry is right, the evidence proves it. If it’s wrong, no argument should let it through.
Extensible by each company. The system supports hooks — custom checks that plug into the safety layers without changing the agent. A pharmaceutical company might flag R&D expenses above $25K. A public company might log every auto-posted entry to a compliance workpaper. Each company configures its own.
Two reinforcing loops. One absorbs manual work. The other escalates trust. Together they compound.
The loops reinforce each other. The first creates new automated tasks. The second advances their trust level. The close gets shorter every month.
Illustrative trajectory for a single growth-stage SaaS customer. Real-world reference point: Postscript ($100M+ ARR) closes in three days on Rillet today.
| Month | Manual entries | Agent-drafted | Auto-posted | Close days |
|---|---|---|---|---|
| 1 | 45 | 0 | 0 | 12 |
| 3 | 30 | 15 | 8 | 8 |
| 6 | 15 | 12 | 25 | 4 |
| 12 | 5 | 5 | 42 | 1 |
The agent doesn’t improvise. It executes accounting rules that the controller sets and the auditor can inspect.
Some decisions are clear-cut: “Depreciate laptops over 3 years, straight-line.” The agent runs these automatically. Other decisions require judgment: “How much should we estimate for this vendor’s invoice that hasn’t arrived?” The agent proposes an answer, the human decides.
Capitalization thresholds, depreciation schedules, lease amortization methods, intercompany eliminations once mappings are set. Codified, versioned, auditable. The auditor can inspect the exact rule applied to any entry.
Revenue recognition (variable consideration, performance obligations), accrual estimates, vendor classifications, contract modifications. The agent proposes based on data and history. The human reviews and decides. Over time, as proposals prove accurate, oversight lightens.
Beyond the rules, the agent accumulates knowledge a senior controller carries in their head: “Deloitte always samples the 10 largest customers.” “AWS invoices arrive on the 5th but cover the prior month.” “The Q4 revenue spike is annual renewals — don’t flag it.” “CFO wants GAAP revenue on the P&L but ARR on the KPI page.”
Instead of a month-end sprint, the agent works continuously — categorizing transactions as they arrive, reconciling daily, preparing estimates as data comes in. By the time the period ends, the work is mostly done.
The agent acts as a coordinator. It decomposes the close into independent tasks and runs them in parallel — depreciation, payroll, and bank reconciliation all execute simultaneously because none depend on the others. Each task can be handled by a specialized sub-process tuned for that specific area, and the coordinator assembles the results.
The most time-consuming part of the close isn’t posting entries — it’s explaining what changed. “Why is spending up 23%?” This is called flux analysis, and it usually takes 2–3 days.
The agent can draft these explanations because it has the entry-level detail, the policy context, and the institutional memory to tell the story:
“Operating expenses increased $312K (23%) month-over-month. Three items explain 94% of the change: (1) $180K from the new Dublin office lease, which started April 1. (2) $45K in one-time legal fees for the Acme acquisition that closed April 18. (3) $67K from three new sales hires who started mid-month — partial month impact; the full run-rate will be $110K/month. The remaining $20K is within normal variance. Excluding one-time items, run-rate expenses are 8% above budget, driven entirely by the Dublin lease which was approved after the budget was finalized.”
Every fact traces to a specific entry, a specific rule, or a specific data source. The controller can verify any claim with a click. They edit the narrative — they don’t write it from scratch.
Not every accounting platform can build this. Three prerequisites rule out the incumbents.
The agent needs continuous reconciliation and incremental period-close work, not a monthly catch-up. Rillet’s ledger and reporting layer were built to update incrementally. Legacy ERPs like NetSuite and Sage Intacct lean on batched consolidation, allocations, and reporting jobs — serviceable for a 10-day close, awkward for a 1-day one.
The agent needs accounting rules to be machine-readable, not buried in documentation. Rillet was built with AI as a primary consumer of its data — its existing AI already operates inside the ledger. Legacy ERPs would need to rebuild their data model, breaking compatibility with thousands of existing customers.
The absorption loop depends on new data sources being easy to connect. Rillet has native integrations with Stripe, Salesforce, Ramp, Brex, Rippling, and 12,000+ banks. When the agent proposes a new integration, it’s a configuration change — not a development project.
Rillet’s core customers — SaaS companies with complex revenue, multiple entities, and fast growth — are the ideal training ground. High volume, complex contracts, monthly cycles, and sophisticated CFOs mean the agent gets hard reps in a structured environment. With $100M+ in funding from Sequoia, Andreessen Horowitz, and ICONIQ, they have the runway to build it.
Vertical AI companies win by going deep enough that no one above or below them can follow. Rillet’s close agent fits a pattern that’s already producing category-defining companies.
The most valuable AI companies emerging right now aren’t building general-purpose tools. They’re going deep into a single regulated, high-stakes domain and owning the workflow end-to-end.
Trained on case law, integrated into legal workflows, accumulates firm-specific knowledge. A general LLM can draft a contract. Harvey knows how your firm drafts contracts for this type of deal.
Owns the customer service conversation end-to-end — not a chatbot bolted on, but an agent that resolves issues, processes refunds, and learns a brand’s policies. Replaces the workflow, not just the interface.
Lives inside the ledger, posts entries, reconciles balances, earns trust per-task over months. Not AI for accounting — AI that does accounting.
The pattern: pick a domain with regulatory gravity (compliance makes shortcuts impossible), structured workflows (the work follows repeatable patterns), and high-stakes outputs (errors have real consequences). Then build the AI into the system of record — don’t bolt it on top.
NetSuite, Sage Intacct, and the legacy ERP vendors face three structural barriers:
Legacy ERPs batch-process transactions overnight. The close agent needs real-time ingestion and continuous reconciliation. This isn’t a feature gap — it’s a foundational architecture difference that means rebuilding the core product.
NetSuite has tens of thousands of customers on its current data model. Every schema change risks breaking existing integrations. Rillet’s data model was designed for AI consumption from day one. Incumbents can’t get there without alienating the customers paying the bills today.
Even if a legacy ERP shipped a comparable agent tomorrow, every customer starts at Level 1. Twelve months in, a Rillet customer would have many tasks at Level 3 or 4 — verified reconciliations, institutional memory, tuned safety. That earned trust doesn’t transfer. It’s a state you have to earn.
Could OpenAI or Anthropic build this themselves? They have the best models. But the model is maybe 20% of the value.
The model provides reasoning, pattern recognition, and natural language. Rillet provides everything else: the ledger where entries post, integrations to Stripe and banks and payroll, GAAP compliance infrastructure, and company-specific state accumulated over months. Labs don’t want to build a general ledger, integrate with 12,000 banks, or carry the liability of posting entries. They want to sell the model. Rillet is the best customer for that model — not a competitor to it.
The hardest problem in AI right now isn’t capability — it’s deploying capability responsibly. Anthropic builds constitutional AI, RLHF, layered classifiers, and sandboxed execution so that powerful models can be deployed without unacceptable risk. The harness is what makes the capability usable.
Rillet faces the same problem, domain-specific. The model can already reason about accounting — but you can’t point it at a general ledger and say “close the books.” The hard problem isn’t “can the AI do accounting?” It’s “how do you let it?” The answer is the same design pattern:
| AI safety pattern | Frontier AI (Anthropic) | Accounting AI (Rillet) |
|---|---|---|
| Earned autonomy | RLHF, constitutional AI — alignment earned through training | Trust ramp — autonomy earned per-task through verified accuracy |
| Defense-in-depth | Permissions, classifiers, hooks, sandbox — independent layers | Policy, materiality, reconciliation, cross-check — any one catches errors |
| Output evaluation | Classifier evaluates the action, not the model’s reasoning | System evaluates the entry itself, not the agent’s explanation |
| Independent verification | Red-teaming and interpretability — external proof of correctness | Reconciliation — independent proof two records agree |
This parallel isn’t cosmetic. It reveals what Rillet is actually building: the AI safety infrastructure for financial systems. The model is interchangeable — swap GPT-4 for Claude for Gemini, the architecture works the same. What’s not interchangeable is the domain-specific harness that makes any model safe to deploy against a real ledger.
The counterintuitive insight: the safety architecture is what lets Rillet move fast, not what holds it back. Without it, every new capability requires manual risk assessment and careful rollout. With it, Rillet can ship a new capability every week — because every capability enters the trust ramp at Level 1. It watches. It drafts. It proves itself. Errors in draft mode are free. A mistake on one task type resets trust for that task only — everything else keeps running. The architecture constrains the blast radius automatically.
This is also a compounding moat. Every month the agent runs, the safety infrastructure gets more tuned — tighter materiality thresholds, more refined policy rules, richer institutional memory about what “normal” looks like. A competitor doesn’t just lack the trust data. They lack the safety calibration that makes the trust data meaningful.
The expansion opportunity runs along two axes: deeper into the workflow (what the agent does) and upmarket in company scale (who it does it for). Both compound from the same foundation.
Close automation is the entry point, not the ceiling. Every month the agent runs, it accumulates data, context, and trust that unlock adjacent workflows:
The expansion logic is simple: every adjacent workflow depends on the same ledger data, the same company context, and the same trust relationship. An FP&A tool without ledger access is guessing. An audit tool without the entry-level detail is generating templates. Rillet’s agent already has the data, the context, and the trust. Each new workflow is incremental — not a cold start.
Rillet starts with growth-stage SaaS companies — the ideal training ground. But the close agent is the mechanism that pulls the product upmarket over time. Each tier adds complexity the agent learns to handle, and that complexity becomes the barrier to entry for the next competitor.
The upmarket pull is structural. Each tier’s complexity is what the agent learns by operating in the tier below. Multi-entity consolidation is just the single-entity close repeated with intercompany elimination. SOX compliance is the safety architecture formalized into regulatory language — the audit trail, policy engine, and reconciliation record already exist. The agent doesn’t need to be rebuilt for enterprise. It needs enough reps at scale.
The contract values tell the story: growth-stage pays $50–100K/year for close automation, mid-market with multi-entity and SOX pays $250–500K, enterprise running continuous close across global entities is $1M+. The product gets more valuable as the customer gets more complex — and the complexity is what makes it harder for competitors to follow.
The agent doesn’t eliminate the accounting team. It changes what the team spends its time on — and what the company needs to hire for.
A typical SaaS company doing $50–200M in revenue employs 15–25 people in finance and accounting. Most exist to handle volume.
60% of the team executes mechanical workflows. Smart people doing work beneath their training — because someone has to, and the systems don’t.
As the agent absorbs mechanical work over 12–24 months, the team composition shifts. Not all at once — it follows the trust ramp. As tasks move from Level 1 to Level 4, the human time required for each task drops toward zero. The shape below is illustrative for a typical $50–200M ARR SaaS finance org — magnitudes will vary by company.
| Role | Today (20) | With agent (10–12) | What changes |
|---|---|---|---|
| Staff accountants | 6 | 1–2 | Agent handles routine entries and recs. Remaining staff handle exceptions and complex one-time transactions. |
| AP/AR specialists | 3 | 1 | Automated matching and payment processing. One person manages exceptions and vendor relationships. |
| Junior analysts | 3 | 1 | Agent generates reports and drafts narratives. Analyst focuses on insight, not data assembly. |
| Controller | 1 | 1 | Role transforms. No longer assembling the close — now reviewing a finished package and making judgment calls. |
| FP&A | 2 | 3–4 | Grows. Freed-up headcount and budget shift here. More scenario modeling, business partnering, strategic analysis. |
| Strategic finance | 0 | 1–2 | New. M&A analysis, unit economics, investor relations. Roles that didn’t exist because there was no bandwidth. |
| CFO + VP Finance | 2 | 2 | Same headcount, different time allocation. Less firefighting, more strategy. |
The headcount change understates the transformation. What matters is the value of the hours being spent. The same team, restructured, produces dramatically more valuable output — even with fewer people. The percentages and dollar figures below are illustrative, not benchmarks.
~400 person-hours/month on close
~240 person-hours/month on close + strategy
Today the controller spends most of their time managing the process — assigning tasks, chasing deadlines, assembling the package. It’s project management disguised as accounting.
Manages 6+ people through a 12-day close. Reviews every entry. Builds the financial package manually.
Agent handles routine entries. Controller reviews flagged items and exceptions. Close is 4 days.
Reviews a finished package. Makes judgment calls on estimates and classifications. Close is 1 day.
Partners with the CFO on business decisions. Uses real-time financial data for forward-looking analysis.
This isn’t a demotion — it’s an upgrade. The controller’s judgment, auditor relationships, and business context become more valuable. They stop spending that expertise on formatting spreadsheets and start spending it on decisions that affect the business. And the work that was never getting done starts happening:
Instead of delivering backward-looking reports two weeks after month-end, FP&A partners with department heads in real time. “Your engineering spend is trending 15% over budget — here are three options.”
The agent surfaces anomalies continuously, not once a month. Cash flow problems, margin erosion, concentration risk — flagged as they develop, not discovered during close.
Accounting teams spend most of the close on mechanical work that follows the same patterns every month. An agent can absorb this work — but only by earning trust one task at a time, proving each entry through independent reconciliation, and compounding that reliability month over month. Multiple independent safety layers ensure no single failure lets an error through. Two loops drive the system forward: one absorbs manual work into automation, the other escalates the agent’s autonomy as it proves itself. After twelve months, a 12-day close becomes a 1-day close — and the team’s composition inverts from 60% execution to 70% strategy. The moat is structural: earned trust, institutional memory, and tuned safety checks all reset to zero if you leave. Legacy ERPs can’t rebuild their architecture. Frontier labs won’t build a ledger. The close is the wedge — and the same data, context, and trust relationship expand naturally into FP&A, audit, treasury, and strategic finance. The result isn’t a smaller finance function. It’s a fundamentally better one: fewer people reconciling accounts, more people driving business decisions, and a CFO who operates with real-time financial intelligence instead of two-week-old reports.