Skip to content

aint-vscp/batayan-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧾 Batayan β€” AI that argues with receipts

Batayan is a reasoning agent that decides what you're entitled to β€” and proves every "yes" and "no" with the exact rule and citation β€” built on Microsoft Foundry with Foundry IQ.

tests eval track IQ license

πŸŽ₯ Demo video: docs/batayan-demo.mp4 β€” a ~90-second narrated walkthrough of the three reasoning beats.

Batayan (Filipino: basis, grounds, foundation) is a submission for the Microsoft Agents League Hackathon β€” Reasoning Agents track. It turns an eligibility question into a transparent, multi-step argument: it decomposes the question into atomic rules, retrieves a citation for each rule from a Foundry IQ knowledge base, reasons over the whole set of constraints, and returns a verdict that is either proven or honestly abstained β€” never guessed.

Question      : Am I eligible for DOST-SEI?
β–Έ PLAN β€” decomposed into 6 atomic rule(s)
β–Έ EVIDENCE LEDGER
  [3] financial_need                 βœ— FAIL
      evidence  : grounded "...household income must not exceed PHP 400,000." (as of 2026-01-15)
      check     : Not satisfied: household_income_annual ≀ 400000 (actual: 650000)
      fix       : Household income is above the cap. Consider a merit-only scholarship...
β–Έ VERDICT: βœ— INELIGIBLE β€” Fails 1 rule: financial_need. All 6 decisive rules were grounded and checked.
β–Έ YOU MAY STILL QUALIFY ELSEWHERE
  βœ” CHED Tulong-Dunong Grant β€” All 4 decisive rules are satisfied and individually cited.

🌟 Why this exists

Every day, people are told "you don't qualify" β€” for a scholarship, a benefit, an insurance claim, a visa β€” and never told why. The rules are real, but they're scattered across PDFs, they contradict each other, and the systems that apply them are black boxes. A plain chatbot makes this worse: it will confidently hallucinate a verdict.

Batayan's contract is the opposite: it will never assert a verdict it cannot cite, and it will abstain rather than guess. A denial comes with the exact rule that caused it and the single thing you can do about it.


🎯 What it does

  • Decomposes an eligibility question into atomic, independently-checkable rules (a reasoning step, not a retrieval step).
  • Grounds every rule in a cited source passage β€” source file, section, and the rule's effective as_of date β€” so verdicts stay auditable even after a rulebook changes.
  • Returns three outcomes, not two: ELIGIBLE, INELIGIBLE, or INSUFFICIENT_EVIDENCE. Abstention is a first-class, celebrated result.
  • Gates confidence on coverage: if any decisive rule can't be grounded or the applicant hasn't provided the needed fact, Batayan abstains and tells you exactly what's missing.
  • Turns a "no" into a path forward: auto-refers you to other programs you do qualify for, and gives a concrete fix for each failed rule.
  • Is rulebook-agnostic: the same engine adjudicates student scholarships and an enterprise HR leave policy with no code change β€” only the knowledge base differs.

🧠 How it works β€” Microsoft Foundry + Foundry IQ

Batayan is the same agent in two interchangeable run modes that share one result contract (ReasoningResult):

Offline mode (default) Foundry mode (--engine foundry)
Orchestration Local, deterministic reasoning loop Microsoft Foundry Agent Service runs the multi-step thread
Retrieval & grounding Local cited corpus + quote verification Foundry IQ β€” agentic, permission-aware retrieval with extractive citations
Dependencies Zero (stdlib only) β†’ reliable on stage azure-ai-projects, azure-identity
Purpose The demo runs anywhere, no Azure needed Real cloud integration, env-gated
flowchart LR
    Q["Eligibility question<br/>+ applicant profile"] --> AG

    subgraph AG["Microsoft Foundry Agent Service β€” reasoning loop"]
        direction TB
        D["1 Β· DECOMPOSE<br/>question β†’ atomic rules"] --> R
        R["2 Β· RETRIEVE per rule"] --> G
        G["3 Β· GROUND<br/>verify cited quote + as_of"] --> E
        E["4 Β· EVALUATE predicate<br/>vs. applicant fact"] --> A
        A["5 Β· AGGREGATE<br/>coverage gate β†’ verdict / abstain"]
    end

    R <-->|"agentic, permission-aware,<br/>cited retrieval"| IQ[("Foundry IQ<br/>Knowledge Base<br/>(rulebooks)")]
    AG --> RES["ReasoningResult<br/>verdict + per-rule citations<br/>+ remediation"]
Loading

Why Foundry IQ (the required Microsoft IQ layer)? The reasoning track is scored 60% on reasoning + reliability + accuracy. Foundry IQ is purpose-built for agentic knowledge retrieval: it decomposes complex queries, runs permission-aware retrieval across knowledge sources, and returns grounded answers with extractive citations β€” which is exactly what a trustworthy eligibility verdict needs. The offline engine mirrors this same decompose β†’ retrieve β†’ ground β†’ decide loop so every claim in this README is true in both modes.

πŸ“„ Architecture deep-dive: docs/architecture.md Β· Safety model: docs/safety.md


πŸƒ Run it locally (zero dependencies)

Requires Python 3.10+. The offline engine needs nothing else.

git clone https://github.com/aint-vscp/batayan-agent
cd batayan-agent
pip install -e .            # installs the `batayan` command (no runtime deps)

# Or run with no install at all:
#   $env:PYTHONPATH="src"; python -m batayan demo      (PowerShell)
#   PYTHONPATH=src python -m batayan demo               (bash)

The 60-second demo (three reasoning beats + the reuse beat)

batayan demo

Ask one question

batayan ask "Am I eligible for DOST-SEI?" --applicant examples/liza.json     # ELIGIBLE, fully cited
batayan ask --program "DOST-SEI"          --applicant examples/mateo.json    # INELIGIBLE + referral
batayan ask --program "DOST-SEI"          --applicant examples/aisha.json    # abstains (missing income)
batayan ask --program "parental leave"    --applicant examples/employee-ramon.json   # same engine, HR rulebook
batayan ask --program "DOST-SEI" --applicant examples/liza.json --json       # machine-readable result

Explore

batayan programs     # list every program across knowledge bases
batayan kb           # list knowledge bases and their cited sources
batayan eval         # run the labelled eval set β†’ confusion matrix

Run on Microsoft Foundry (real Foundry IQ)

pip install -e .[foundry]
cp .env.example .env          # fill in your Foundry endpoint, model, Foundry IQ KB
batayan ask --program "DOST-SEI" --applicant examples/liza.json --engine foundry

πŸ”¬ Reliability, measured (not claimed)

Batayan ships its own labelled evaluation set (eval/dataset.json, 15 cases across two knowledge bases, balanced across all three verdict classes). Ground truth is unarguable because the rulebooks are self-authored.

cases: 15   accuracy: 100.0%

expected \ predicted  ELIG      INELIG    ABSTAIN
ELIG                  4         0         0
INELIG                0         7         0
ABSTAIN               0         0         4

β˜… abstention recall (correctly refused to guess): 100.0%
β˜… denial precision  (no wrongful 'ineligible'):    100.0%

batayan eval exits non-zero if any case is wrong, so it doubles as a reproducible reliability gate β€” run it locally, or via the included GitHub Actions workflow (.github/workflows/ci.yml) when Actions is enabled on the account.


πŸ›‘οΈ Safety model

Batayan is engineered around the failure modes that sink LLM agents:

Pitfall Batayan's defense
Hallucinated verdicts Every decisive rule must be grounded in a cited quote or the verdict abstains.
Confidently wrong "no" Denial precision is measured; a denial always names the failing rule + a fix.
Guessing on missing data INSUFFICIENT_EVIDENCE is a native output; coverage gating blocks confident verdicts.
Stale rules Every citation carries an as_of date; verdicts remain auditable across rulebook versions.
Over-permissioned data Maps to Foundry IQ's permission-aware retrieval (Purview sensitivity labels) in cloud mode.
Tampered sources Grounding re-verifies the quote against the source; a missing/edited quote β†’ abstain (tested).

⚠️ The bundled rulebooks are simplified and illustrative for the demo β€” not official scholarship or HR criteria. Batayan's role is to reason over whatever rulebook it is given and prove its verdict, not to be the source of truth.


β™Ώ Accessibility & inclusion

  • Plain-language verdicts β€” the output reads as a human explanation, not a model dump; the on-screen gloss always defines Batayan = basis / grounds.
  • No-color / screen-reader friendly β€” honours NO_COLOR; the trace is linear text that reads cleanly aloud.
  • Low-resource by design β€” the offline engine has zero dependencies and runs on any machine with Python; no GPU, no paid API, no connectivity required to get a grounded answer.
  • Built for the under-served β€” the demo domain targets first-generation students navigating opaque scholarship rules.

πŸ” One engine, any rulebook (Best Use of IQ)

Eligibility is the same reasoning problem as HR benefits, insurance adjudication, and regulatory KYC. Batayan ships two knowledge bases to prove the engine is the product:

  • knowledge/scholarships/ β€” DOST-SEI & CHED Tulong-Dunong (student scholarships)
  • knowledge/hr-leave/ β€” Acme Corp paid parental leave (enterprise policy)

Add a new domain by dropping a manifest.json, a prose rulebook, and a *.rules.json into knowledge/ β€” no code changes.


πŸ—‚οΈ Project layout

batayan-agent/
β”œβ”€β”€ src/batayan/
β”‚   β”œβ”€β”€ schema.py        # ReasoningResult contract (verdict, outcomes, citations)
β”‚   β”œβ”€β”€ knowledge.py     # knowledge bases, offline agentic retrieval, grounding
β”‚   β”œβ”€β”€ reasoner.py      # the multi-step loop: decompose β†’ retrieve β†’ ground β†’ decide
β”‚   β”œβ”€β”€ foundry.py       # env-gated Foundry Agent Service + Foundry IQ integration
β”‚   β”œβ”€β”€ evaluate.py      # eval harness + confusion matrix + abstention recall
β”‚   └── cli.py           # the legible "evidence ledger" CLI
β”œβ”€β”€ knowledge/           # two knowledge bases (cited rulebooks + structured rules)
β”œβ”€β”€ examples/            # applicant/employee profiles for the demo beats
β”œβ”€β”€ eval/dataset.json    # labelled evaluation set (ground truth)
└── tests/               # behavioural tests (eligible / ineligible / abstain / grounding)

πŸ‘₯ Team

Name Role Background
Vash Puno Lead / Engineer / Pitch Microsoft Student Ambassador Β· President, PUP Microsoft Student Community Β· Cloud Architect (Azure)

πŸ“œ License

MIT β€” see LICENSE.


Built for the Microsoft Agents League Hackathon (Reasoning Agents track), 2026. "Batayan" β€” because every answer deserves a basis.

About

Batayan - a grounded, multi-step eligibility-reasoning agent on Microsoft Foundry + Foundry IQ. AI that argues with receipts. (Agents League - Reasoning Agents track)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages