Concepts
Run your first GL reconciliation
A walkthrough of the GL reconciliation workflow from first invocation to approved output — what happens at each layer and what the reviewer sees.
This article walks through a GL reconciliation run in Yig Thinker — from invocation to ready-to-ship output. It describes the layers, not the implementation details.
Before starting, you need beta access and a configured deployment. If you do not have access yet, join the waitlist at yig.com/waitlist.
What a GL reconciliation does
A GL reconciliation produces a list of differences for a reviewer, not a single corrected number. The reviewer decides.
A GL reconciliation compares two representations of the same period’s ledger data — typically a source GL system and a downstream system such as a data warehouse, reporting tool, or spreadsheet — and surfaces the differences.
The goal is not to produce a single “correct” number automatically. It is to identify every line where the two sources disagree, explain why each difference might exist, and present those findings to a reviewer.
Step 1 — Invoke the workflow
The first run is a calibration, not a verdict. Sarah is a controller at ACME US. It is day 4 of the Q2 2026 close. She is running her first GL reconciliation against ACME’s warehouse-loaded TB, comparing it line-by-line to the SAP general ledger of record.
From the CLI:
$ yig thinker run gl_recon --period Q2-2026 --entity "ACME US"
From Slack:
@yig-thinker run gl_recon for ACME US Q2
From either surface, the invocation reaches the agent at L2. The planner reads the gl_recon template — the declared sequence of data reads and comparisons — and begins execution.
Step 2 — The agent reads the sources
The reads are scoped to the period and entity specified. Nothing else is read. The planner calls the data connectors declared in the template — for a standard recon, two reads: the primary source (GL system) and the comparison source (warehouse or spreadsheet).
The audit log records each read: which connector, which scope, how many rows. If either read fails — network error, permission denied, source unavailable — the planner halts and records the failure. It does not proceed with partial data.
Step 3 — The planner compares the sources
The comparison runs against the declared tolerance, not a default the agent invented. For each GL line in the primary source, the planner looks for the corresponding line in the comparison source. It records:
- Lines that match exactly.
- Lines that are present in one source but not the other.
- Lines that are present in both but with different amounts.
The comparison result is the raw material for the draft.
Step 4 — The agent produces the draft
The output shape is fixed in the template. The content changes; the shape does not. A standard GL reconciliation draft contains:
- A summary: lines compared, matched, differences by type.
- A matched-lines section, collapsed by default and expandable.
- A differences table: each differing line with amounts, delta, candidate explanation.
- Yellow flags: lines the agent cannot place.
Sarah’s first draft renders like this in the Slack thread:
gl_recon · ACME US · 2026-Q2 · v1
─────────────────────────────────────────────────────
1,247 lines compared · 1,241 matched · 6 differences · 2 flags
Differences (collapsed; click to expand)
─────────────────────────────────────────────────────
line_id primary compare delta candidate
─────────────────────────────────────────────────
GL-0412 42,100 0 42,100 timing (accrual unposted)
GL-0718 128,400 128,414 -14 rounding (fx conversion)
GL-0904 0 128,000 -128,000 ⚑ no counterpart found
GL-1102 58,000 57,200 800 fx rate mismatch
GL-1244 0 42,500 -42,500 ⚑ post-period entry?
GL-1247 12,000 11,800 200 rounding (allocation)
Flags requiring judgment
─────────────────────────────────────────────────────
⚑ GL-0904 — large variance, no counterpart in primary
⚑ GL-1244 — entry visible in warehouse, missing in primary
The draft is presented to the reviewer at whatever surface the workflow was invoked from.
Step 5 — The reviewer acts
The reviewer’s attention belongs on the flagged lines first. The draft renders the same content on every surface; the chrome differs.
| Step 5 on | What Sarah sees | How she responds |
|---|---|---|
| CLI | The draft above as formatted text; prompt asks [a]ccept · [r]eject · [c]omment <id> | Types c GL-0904 "misposted to DE-001; reclass next cycle" then a to accept |
| Slack | Same draft as a structured message; action buttons under each flag | Replies in-thread to each flag; clicks Accept all candidates when done |
| Excel sidebar | Sidebar table with rows linked to TB cells; clicking GL-0904 scrolls the workbook to row 904 | Comments on each flag in the sidebar; staged edits appear as diffs before write-back |
On a first run, the reviewer’s order of attention is consistent:
- The flagged lines first. These are the lines the agent could not place. Each one needs a human judgment before the run can complete.
- The unexplained differences next. Lines with a candidate explanation that does not match Sarah’s expectation of why this line would differ. She can reject the candidate or edit the explanation.
- The matched-line spot-check last. Not every cycle; on a first run, expand a random sample of the matched section and verify the agent matched the right rows.
For each yellow-flagged line, Sarah reads the agent’s explanation and makes the call: timing difference, real error, policy item? She records her judgment in a comment. For each difference line with a candidate, she accepts, rejects, or edits it.
When Sarah is satisfied, she approves. The approval is recorded in the audit log: reviewer name, timestamp, and a reference to the version of the draft that was approved.
Step 6 — The output ships
The approved output is the data of record from this point on. It is written to the destination configured for this workflow — a spreadsheet, a reporting system, a close folder, or wherever this team’s reconciled GL data lives.
The audit trail at L3 records the full chain: invocation, data reads, draft, every reviewer action, and the final approved state. Append-only; does not change after the fact.
What you should expect across cycles
The flag rate is not a quality metric. It is a calibration curve.
Cycle 1. On Sarah’s first run, the agent has no context for what normal looks like. It errs toward flagging — expect 8–12 flags on a thousand-line TB. Sarah’s comments on each become the deployment’s record of normal.
Cycle 3. By August, the agent has seen three cycles. Routine timing-difference flags have collapsed into the candidate-explanation category. Expect 3–5 flags — the ones that genuinely need judgment.
Cycle 10. A year in. Material variances, post-period entries, and structural changes flag. The noise has been resolved into candidates. A reviewer on cycle 10 reviews exceptions; a reviewer on cycle 1 was teaching the deployment what exceptions are.
The agent does not learn this by training on data. It accumulates cycle-level context within the deployment — the reviewer’s resolved-flag history, the workflow definition’s evolution, the candidate-explanation patterns that have been accepted enough times to become defaults. The bet is that the cycle-10 reviewer trusts the agent because they remember teaching it on cycle 1 — and the audit trail makes that history defensible to a reviewer who arrives at cycle 14 having never met cycle 1.