# Reconciler PoC Validation — Design Document > **Status:** Draft **Author:** @prox **Date:** 2026-03-06 **Depends on:** > [NetBird Reconciler Design](2026-03-03-netbird-reconciler-design.md) ## Goal Validate the reconciler end-to-end on a fresh, isolated NetBird instance before pointing it at production. Prove that: 1. Declaring state in `netbird.json` → reconcile → resources appear in NetBird. 2. Event poller detects peer enrollment and renames the peer. 3. State export from a live NetBird instance produces a valid `netbird.json`. ## Scope ### In scope - Deploy a self-contained stack on VPS-A (`vps-a.networkmonitor.cc`): fresh NetBird, Caddy, Gitea, and reconciler — all via Docker Compose. - `GITEA_ENABLED` feature flag so the reconciler works without Gitea integration. - State export tool: `GET /export` endpoint + `--export` CLI flag. - Core reconcile: groups, setup keys, policies created via `/reconcile`. - Event poller: detect enrollment, rename peer — with or without Gitea commit-back. ### Out of scope (deferred) - Enrollment pipeline integration (docs site → Gitea PR). - CI workflows (dry-run on PR, reconcile on merge). - Production deployment to real NetBird environments. - Key encryption with `age` / artifact upload. ## Architecture ``` VPS-A (vps-a.networkmonitor.cc) ├── Caddy (reverse proxy, HTTPS, ACME) │ ├── / → NetBird Dashboard │ ├── /api → NetBird Management API │ ├── /signalexchange → Signal (gRPC) │ ├── /relay → Relay │ └── /reconciler/* → Reconciler HTTP API ├── NetBird Management (config, IdP, API) ├── NetBird Signal (gRPC peer coordination) ├── NetBird Relay (data relay for NATed peers) ├── Coturn (STUN/TURN) ├── Gitea (hosts netbird-gitops repo) └── Reconciler (reconcile API + event poller) ``` All containers share a single Docker Compose stack with a common network. Caddy terminates TLS and routes by path prefix. ## Changes to Reconciler ### 1. Feature Flag: `GITEA_ENABLED` New environment variable. Default: `true` (backward compatible). **When `GITEA_ENABLED=false`:** | Component | Behavior | | ----------------- | ------------------------------------------------------------------------------------------------------------------------------ | | Config validation | Skip `GITEA_*` env var requirements | | Startup | Don't create Gitea client | | `POST /reconcile` | Works normally — accepts `netbird.json` from request body, applies to NetBird API | | Event poller | Still runs. Detects `peer.setupkey.add` events, renames peers. Skips commit-back of `enrolled: true`. Logs enrollment instead. | | `GET /export` | Works normally — no Gitea dependency | **When `GITEA_ENABLED=true`:** Current behavior, unchanged. **Affected files:** - `src/config.ts` — conditional Gitea env var validation - `src/main.ts` — conditional Gitea client creation, pass flag to poller - `src/poller/loop.ts` — guard commit-back behind flag ### 2. State Export New module: `src/export.ts` Transforms `ActualState` (from `src/state/actual.ts`) into a valid `netbird.json` conforming to `DesiredStateSchema`. **Mapping logic:** | NetBird resource | Export strategy | | --------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | | Groups | Map ID → name. Skip auto-generated groups (`All`, `ch-` prefixed). Peer refs mapped to setup key names where possible, otherwise peer hostname. | | Setup keys | Export with current config. Set `enrolled: true` if `used_times >= usage_limit`, else `false`. | | Policies | Map source/destination group IDs → names. Include port rules. | | Routes | Map group IDs → names, include network CIDRs. | | DNS nameserver groups | Map group refs → names. | **Interfaces:** ``` GET /export → 200: { state: , meta: { exported_at, source_url, groups_count, ... } } CLI: deno run src/main.ts --export --netbird-api-url --netbird-api-token → stdout: netbird.json content ``` The CLI mode is standalone — it creates a NetBird client, fetches state, exports, and exits. No HTTP server started. **Affected files:** - `src/export.ts` — new: transformation logic - `src/server.ts` — new endpoint: `GET /export` - `src/main.ts` — new CLI flag: `--export` ### 3. No Structural Changes The reconcile engine (`diff.ts`, `executor.ts`), NetBird client, and state schema remain unchanged. The export tool and feature flag are additive. ## Ansible Playbook Location: `poc/ansible/` within this repo. ``` poc/ ansible/ inventory.yml playbook.yml group_vars/ all/ vars.yml # domain, ports, non-secret config vault.yml # secrets (gitignored) vault.yml.example # template for secrets templates/ docker-compose.yml.j2 management.json.j2 # NetBird management config (embedded IdP) Caddyfile.j2 dashboard.env.j2 relay.env.j2 turnserver.conf.j2 reconciler.env.j2 gitea.env.j2 ``` **Playbook tasks:** 1. Install Docker + Docker Compose (if not present) 2. Create working directory structure 3. Template all config files 4. Pull images, `docker compose up -d` 5. Wait for Gitea to be ready 6. Create Gitea admin user + `BlastPilot` org + `netbird-gitops` repo via API 7. Seed `netbird.json` into the repo with initial test state **Key config decisions:** - **Caddy** for reverse proxy (proven in existing PoC templates). - **Embedded IdP** for NetBird (no external OAuth — same as existing PoC). - **Secrets auto-generated** at deploy time (NetBird encryption key, TURN secret, relay secret). Printed to stdout for operator reference. - Reconciler env vars templated from `vault.yml` (NetBird API token, Gitea token). **SSH key:** `~/.ssh/hetzner` (same as docs site deployment). **Deploy command:** `ansible-playbook -i inventory.yml playbook.yml` ## Test netbird.json The seed state for validation: ```json { "groups": { "ground-stations": { "peers": [] }, "pilots": { "peers": [] } }, "setup_keys": { "GS-TestHawk-1": { "type": "one-off", "expires_in": 604800, "usage_limit": 1, "auto_groups": ["ground-stations"], "enrolled": false }, "Pilot-TestHawk-1": { "type": "one-off", "expires_in": 604800, "usage_limit": 1, "auto_groups": ["pilots"], "enrolled": false } }, "policies": { "pilots-to-gs": { "enabled": true, "sources": ["pilots"], "destinations": ["ground-stations"], "bidirectional": true } }, "routes": {}, "dns": { "nameserver_groups": {} } } ``` This creates two groups, two one-off setup keys, and a bidirectional policy between pilots and ground stations. Minimal but sufficient to validate the full reconcile + enrollment flow. ## Validation Plan ### Phase 1 — Deploy 1. Wipe VPS-A (or just `docker compose down -v` if redeploying). 2. Run playbook → full stack up. 3. Access NetBird dashboard at `https://vps-a.networkmonitor.cc` — verify clean state (only default "All" group). 4. Access Gitea at `https://vps-a.networkmonitor.cc/gitea` (or dedicated port) — verify `BlastPilot/netbird-gitops` repo exists with seed `netbird.json`. ### Phase 2 — Reconcile 5. `curl -X POST https://vps-a.networkmonitor.cc/reconciler/reconcile?dry_run=true -d @netbird.json` → Verify plan shows: create 2 groups, 2 setup keys, 1 policy. 6. `curl -X POST https://vps-a.networkmonitor.cc/reconciler/reconcile -d @netbird.json` → Verify response includes `created_keys` with actual key values. 7. Open NetBird dashboard → verify groups, setup keys, and policy exist. 8. `curl https://vps-a.networkmonitor.cc/reconciler/export` → Compare exported state with input. Verify round-trip consistency. ### Phase 3 — Enrollment 9. Copy a setup key value from step 6 response. 10. On a test machine: `netbird up --setup-key `. 11. Check NetBird dashboard: peer appears, gets auto-renamed by poller, placed in correct group. 12. Check reconciler logs: enrollment event detected, peer renamed, log entry written (no Gitea commit since `GITEA_ENABLED=false` for initial test). ### Phase 4 — State Export (against real instance) 13. Run CLI export against `dev.netbird.achilles-rnd.cc`: ``` deno run src/main.ts --export \ --netbird-api-url https://dev.netbird.achilles-rnd.cc/api \ --netbird-api-token ``` 14. Review output — validates we can bootstrap GitOps from existing environment. 15. Optionally: dry-run reconcile the exported state against the same instance — should produce an empty plan (no changes needed). ## Success Criteria - [ ] Reconcile creates all declared resources in NetBird. - [ ] Dry-run returns accurate plan without side effects. - [ ] Export produces valid `netbird.json` from a live instance. - [ ] Export → dry-run round-trip yields empty plan (idempotent). - [ ] Poller detects enrollment and renames peer within 30s. - [ ] Reconciler starts and operates correctly with `GITEA_ENABLED=false`. - [ ] Reconciler starts and operates correctly with `GITEA_ENABLED=true` + Gitea. ## Risks | Risk | Mitigation | | ------------------------------------------------------------- | -------------------------------------------------------------------------- | | NetBird Management API behavior differs from docs | Testing against real instance; reconciler has comprehensive error handling | | Export misses edge cases in resource mapping | Validate with dry-run round-trip (export → reconcile → empty plan) | | Poller misses events during 30s poll interval | Acceptable for PoC; production can tune interval or add webhook trigger | | VPS-A resources (2 vCPU, 4GB RAM) insufficient for full stack | Monitor; NetBird + Gitea are lightweight individually |