Concepts

Run your first GL reconciliation

A walkthrough of the GL reconciliation workflow from first invocation to approved output — what happens at each layer and what the reviewer sees.

Published 2026-05-11

This article walks through a GL reconciliation run in Yig Thinker — from invocation to ready-to-ship output. It describes the layers, not the implementation details.

Before starting, you need beta access and a configured deployment. If you do not have access yet, join the waitlist at yig.com/waitlist.

What a GL reconciliation does

A GL reconciliation produces a list of differences for a reviewer, not a single corrected number. The reviewer decides.

A GL reconciliation compares two representations of the same period’s ledger data — typically a source GL system and a downstream system such as a data warehouse, reporting tool, or spreadsheet — and surfaces the differences.

The goal is not to produce a single “correct” number automatically. It is to identify every line where the two sources disagree, explain why each difference might exist, and present those findings to a reviewer.

Step 1 — Invoke the workflow

The first run is a calibration, not a verdict. Sarah is a controller at ACME US. It is day 4 of the Q2 2026 close. She is running her first GL reconciliation against ACME’s warehouse-loaded TB, comparing it line-by-line to the SAP general ledger of record.

From the CLI:

$ yig thinker run gl_recon --period Q2-2026 --entity "ACME US"

From Slack:

@yig-thinker run gl_recon for ACME US Q2

From either surface, the invocation reaches the agent at L2. The planner reads the gl_recon template — the declared sequence of data reads and comparisons — and begins execution.

Step 2 — The agent reads the sources

The reads are scoped to the period and entity specified. Nothing else is read. The planner calls the data connectors declared in the template — for a standard recon, two reads: the primary source (GL system) and the comparison source (warehouse or spreadsheet).

The audit log records each read: which connector, which scope, how many rows. If either read fails — network error, permission denied, source unavailable — the planner halts and records the failure. It does not proceed with partial data.

Step 3 — The planner compares the sources

The comparison runs against the declared tolerance, not a default the agent invented. For each GL line in the primary source, the planner looks for the corresponding line in the comparison source. It records:

Lines that match exactly.
Lines that are present in one source but not the other.
Lines that are present in both but with different amounts.

The comparison result is the raw material for the draft.

Step 4 — The agent produces the draft

The output shape is fixed in the template. The content changes; the shape does not. A standard GL reconciliation draft contains:

A summary: lines compared, matched, differences by type.
A matched-lines section, collapsed by default and expandable.
A differences table: each differing line with amounts, delta, candidate explanation.
Yellow flags: lines the agent cannot place.

Sarah’s first draft renders like this in the Slack thread:

gl_recon · ACME US · 2026-Q2 · v1
─────────────────────────────────────────────────────
1,247 lines compared · 1,241 matched · 6 differences · 2 flags

Differences (collapsed; click to expand)
─────────────────────────────────────────────────────
  line_id   primary   compare   delta     candidate
  ─────────────────────────────────────────────────
  GL-0412   42,100    0         42,100    timing (accrual unposted)
  GL-0718   128,400   128,414   -14       rounding (fx conversion)
  GL-0904   0         128,000  -128,000   ⚑ no counterpart found
  GL-1102   58,000    57,200    800       fx rate mismatch
  GL-1244   0         42,500   -42,500    ⚑ post-period entry?
  GL-1247   12,000    11,800    200       rounding (allocation)

Flags requiring judgment
─────────────────────────────────────────────────────
  ⚑ GL-0904 — large variance, no counterpart in primary
  ⚑ GL-1244 — entry visible in warehouse, missing in primary

The draft is presented to the reviewer at whatever surface the workflow was invoked from.

Step 5 — The reviewer acts

The reviewer’s attention belongs on the flagged lines first. The draft renders the same content on every surface; the chrome differs.

Step 5 on	What Sarah sees	How she responds
CLI	The draft above as formatted text; prompt asks `[a]ccept · [r]eject · [c]omment <id>`	Types `c GL-0904 "misposted to DE-001; reclass next cycle"` then `a` to accept
Slack	Same draft as a structured message; action buttons under each flag	Replies in-thread to each flag; clicks `Accept all candidates` when done
Excel sidebar	Sidebar table with rows linked to TB cells; clicking GL-0904 scrolls the workbook to row 904	Comments on each flag in the sidebar; staged edits appear as diffs before write-back

On a first run, the reviewer’s order of attention is consistent:

The flagged lines first. These are the lines the agent could not place. Each one needs a human judgment before the run can complete.
The unexplained differences next. Lines with a candidate explanation that does not match Sarah’s expectation of why this line would differ. She can reject the candidate or edit the explanation.
The matched-line spot-check last. Not every cycle; on a first run, expand a random sample of the matched section and verify the agent matched the right rows.

For each yellow-flagged line, Sarah reads the agent’s explanation and makes the call: timing difference, real error, policy item? She records her judgment in a comment. For each difference line with a candidate, she accepts, rejects, or edits it.

When Sarah is satisfied, she approves. The approval is recorded in the audit log: reviewer name, timestamp, and a reference to the version of the draft that was approved.

Step 6 — The output ships

The approved output is the data of record from this point on. It is written to the destination configured for this workflow — a spreadsheet, a reporting system, a close folder, or wherever this team’s reconciled GL data lives.

The audit trail at L3 records the full chain: invocation, data reads, draft, every reviewer action, and the final approved state. Append-only; does not change after the fact.

What you should expect across cycles

The flag rate is not a quality metric. It is a calibration curve.

Cycle 1. On Sarah’s first run, the agent has no context for what normal looks like. It errs toward flagging — expect 8–12 flags on a thousand-line TB. Sarah’s comments on each become the deployment’s record of normal.

Cycle 3. By August, the agent has seen three cycles. Routine timing-difference flags have collapsed into the candidate-explanation category. Expect 3–5 flags — the ones that genuinely need judgment.

Cycle 10. A year in. Material variances, post-period entries, and structural changes flag. The noise has been resolved into candidates. A reviewer on cycle 10 reviews exceptions; a reviewer on cycle 1 was teaching the deployment what exceptions are.

The agent does not learn this by training on data. It accumulates cycle-level context within the deployment — the reviewer’s resolved-flag history, the workflow definition’s evolution, the candidate-explanation patterns that have been accepted enough times to become defaults. The bet is that the cycle-10 reviewer trusts the agent because they remember teaching it on cycle 1 — and the audit trail makes that history defensible to a reviewer who arrives at cycle 14 having never met cycle 1.