netbird-gitops/docs/plans/2026-03-06-schema-expansion.md
2026-03-06 16:28:01 +02:00

556 lines
16 KiB
Markdown

# Schema Expansion: Full NetBird State Coverage
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to
> implement this plan task-by-task.
**Goal:** Expand the reconciler schema and export to cover all NetBird resource
types: posture checks, networks (with resources and routers), peers, users, and
resource-backed policies.
**Architecture:** Each new resource type follows the existing pattern: add NB
types → add schema → add to ActualState → add client methods → add diff logic →
add executor handlers → add export → add tests. Policies are extended to support
`destination_resource` as an alternative to `destinations`. The "All" group gets
hardcoded exclusion from deletion.
**Tech Stack:** Deno 2.x, TypeScript, Zod, injectable fetch for testing.
---
### Task 1: Fix "All" group hardcoded exclusion + policy null-safety
**Files:**
- Modify: `src/reconcile/diff.ts:66-70` (add "All" name check)
- Modify: `src/reconcile/diff.ts:138-145` (null-safety for destinations)
- Modify: `src/reconcile/diff.test.ts` (add test for "All" exclusion with
`issued: "api"`)
The diff already filters `issued === "api"` but "All" has `issued: "api"` in
real environments. Add explicit name exclusion. Also guard against `null`
destinations in policy rules (resource-backed policies).
**Changes to `src/reconcile/diff.ts`:**
In `diffGroups`, line 67, change:
```typescript
if (!desiredNames.has(group.name) && group.issued === "api") {
```
to:
```typescript
if (!desiredNames.has(group.name) && group.issued === "api" && group.name !== "All") {
```
In `diffPolicies`, around line 143, wrap destinations extraction:
```typescript
const actualDests = extractGroupNames(
existing.rules.flatMap((r) => r.destinations ?? []),
actual,
).sort();
```
Add test: `computeDiff does not delete "All" group even when issued is "api"`.
Run: `deno task test`
---
### Task 2: Add posture check and network types to `src/netbird/types.ts`
**Files:**
- Modify: `src/netbird/types.ts`
Add these interfaces after the existing types:
```typescript
/** Posture check as returned by GET /api/posture-checks */
export interface NbPostureCheck {
id: string;
name: string;
description: string;
checks: Record<string, unknown>;
}
/** Network as returned by GET /api/networks */
export interface NbNetwork {
id: string;
name: string;
description: string;
resources: string[];
routers: string[];
policies: string[];
routing_peers_count: number;
}
/** Network resource as returned by GET /api/networks/{id}/resources */
export interface NbNetworkResource {
id: string;
name: string;
description: string;
type: "host" | "subnet" | "domain";
address: string;
enabled: boolean;
groups: Array<
{ id: string; name: string; peers_count: number; resources_count: number }
>;
}
/** Network router as returned by GET /api/networks/{id}/routers */
export interface NbNetworkRouter {
id: string;
peer: string | null;
peer_groups: string[] | null;
metric: number;
masquerade: boolean;
enabled: boolean;
}
/** User as returned by GET /api/users */
export interface NbUser {
id: string;
name: string;
email: string;
role: "owner" | "admin" | "user";
status: "active" | "invited" | "blocked";
auto_groups: string[];
is_service_user: boolean;
}
```
Also add `destinationResource` and `source_posture_checks` to `NbPolicy`:
```typescript
export interface NbPolicy {
id: string;
name: string;
description: string;
enabled: boolean;
rules: NbPolicyRule[];
source_posture_checks: string[]; // posture check IDs
}
```
And add to `NbPolicyRule`:
```typescript
export interface NbPolicyRule {
// ... existing fields ...
destinationResource?: { id: string; type: string } | null;
}
```
Run: `deno task check`
---
### Task 3: Add client methods for new resource types
**Files:**
- Modify: `src/netbird/client.ts`
Add sections for:
**Posture Checks:**
```typescript
listPostureChecks(): Promise<NbPostureCheck[]>
createPostureCheck(data: Omit<NbPostureCheck, "id">): Promise<NbPostureCheck>
updatePostureCheck(id: string, data: Omit<NbPostureCheck, "id">): Promise<NbPostureCheck>
deletePostureCheck(id: string): Promise<void>
```
**Networks:**
```typescript
listNetworks(): Promise<NbNetwork[]>
createNetwork(data: { name: string; description?: string }): Promise<NbNetwork>
updateNetwork(id: string, data: { name: string; description?: string }): Promise<NbNetwork>
deleteNetwork(id: string): Promise<void>
```
**Network Resources (nested under network):**
```typescript
listNetworkResources(networkId: string): Promise<NbNetworkResource[]>
createNetworkResource(networkId: string, data: { name: string; description?: string; address: string; enabled: boolean; groups: string[] }): Promise<NbNetworkResource>
updateNetworkResource(networkId: string, resourceId: string, data: { name: string; description?: string; address: string; enabled: boolean; groups: string[] }): Promise<NbNetworkResource>
deleteNetworkResource(networkId: string, resourceId: string): Promise<void>
```
**Network Routers:**
```typescript
listNetworkRouters(networkId: string): Promise<NbNetworkRouter[]>
createNetworkRouter(networkId: string, data: Omit<NbNetworkRouter, "id">): Promise<NbNetworkRouter>
updateNetworkRouter(networkId: string, routerId: string, data: Omit<NbNetworkRouter, "id">): Promise<NbNetworkRouter>
deleteNetworkRouter(networkId: string, routerId: string): Promise<void>
```
**Users:**
```typescript
listUsers(): Promise<NbUser[]>
createUser(data: { email: string; name?: string; role: string; auto_groups: string[]; is_service_user: boolean }): Promise<NbUser>
updateUser(id: string, data: { name?: string; role?: string; auto_groups?: string[] }): Promise<NbUser>
deleteUser(id: string): Promise<void>
```
Run: `deno task check`
---
### Task 4: Expand ActualState with new resource collections
**Files:**
- Modify: `src/state/actual.ts`
Add to `ActualState` interface:
```typescript
postureChecks: NbPostureCheck[];
postureChecksByName: Map<string, NbPostureCheck>;
networks: NbNetwork[];
networksByName: Map<string, NbNetwork>;
networkResources: Map<string, NbNetworkResource[]>; // networkId -> resources
networkRouters: Map<string, NbNetworkRouter[]>; // networkId -> routers
users: NbUser[];
usersByEmail: Map<string, NbUser>;
```
Expand `ClientLike` to include:
```typescript
| "listPostureChecks"
| "listNetworks"
| "listNetworkResources"
| "listNetworkRouters"
| "listUsers"
```
In `fetchActualState`: fetch posture checks, networks, users in the initial
`Promise.all`. Then for each network, fetch its resources and routers in a
second parallel batch.
Run: `deno task check`
---
### Task 5: Expand the Zod schema with new resource types
**Files:**
- Modify: `src/state/schema.ts`
Add schemas:
```typescript
export const PostureCheckSchema = z.object({
description: z.string().default(""),
checks: z.record(z.string(), z.unknown()),
});
export const NetworkResourceSchema = z.object({
name: z.string(),
description: z.string().default(""),
type: z.enum(["host", "subnet", "domain"]),
address: z.string(),
enabled: z.boolean().default(true),
groups: z.array(z.string()),
});
export const NetworkRouterSchema = z.object({
peer: z.string().optional(),
peer_groups: z.array(z.string()).optional(),
metric: z.number().int().min(1).max(9999).default(9999),
masquerade: z.boolean().default(true),
enabled: z.boolean().default(true),
});
export const NetworkSchema = z.object({
description: z.string().default(""),
resources: z.array(NetworkResourceSchema).default([]),
routers: z.array(NetworkRouterSchema).default([]),
});
export const PeerSchema = z.object({
groups: z.array(z.string()),
login_expiration_enabled: z.boolean().default(false),
inactivity_expiration_enabled: z.boolean().default(false),
ssh_enabled: z.boolean().default(false),
});
export const UserSchema = z.object({
name: z.string(),
role: z.enum(["owner", "admin", "user"]),
auto_groups: z.array(z.string()).default([]),
});
```
Extend `PolicySchema` to support `destination_resource`:
```typescript
export const DestinationResourceSchema = z.object({
id: z.string(), // resource name, resolved at reconcile time
type: z.string(),
});
export const PolicySchema = z.object({
description: z.string().default(""),
enabled: z.boolean(),
sources: z.array(z.string()),
destinations: z.array(z.string()).default([]),
destination_resource: DestinationResourceSchema.optional(),
bidirectional: z.boolean(),
protocol: z.enum(["tcp", "udp", "icmp", "all"]).default("all"),
action: z.enum(["accept", "drop"]).default("accept"),
ports: z.array(z.string()).optional(),
source_posture_checks: z.array(z.string()).default([]),
});
```
Add to `DesiredStateSchema`:
```typescript
export const DesiredStateSchema = z.object({
groups: z.record(z.string(), GroupSchema),
setup_keys: z.record(z.string(), SetupKeySchema),
policies: z.record(z.string(), PolicySchema).default({}),
posture_checks: z.record(z.string(), PostureCheckSchema).default({}),
networks: z.record(z.string(), NetworkSchema).default({}),
peers: z.record(z.string(), PeerSchema).default({}),
users: z.record(z.string(), UserSchema).default({}),
routes: z.record(z.string(), RouteSchema).default({}),
dns: z.object({
nameserver_groups: z.record(z.string(), DnsNameserverGroupSchema).default(
{},
),
}).default({ nameserver_groups: {} }),
});
```
Update `validateCrossReferences` to also check:
- Peer groups reference existing groups
- User auto_groups reference existing groups
- Network resource groups reference existing groups
- Policy source_posture_checks reference existing posture checks
- Policy destination_resource.id references an existing network resource name
Run: `deno task check`
---
### Task 6: Add operations for new resource types
**Files:**
- Modify: `src/reconcile/operations.ts`
Add to `OperationType`:
```typescript
| "create_posture_check" | "update_posture_check" | "delete_posture_check"
| "create_network" | "update_network" | "delete_network"
| "create_network_resource" | "update_network_resource" | "delete_network_resource"
| "create_network_router" | "update_network_router" | "delete_network_router"
| "create_user" | "update_user" | "delete_user"
| "update_peer"
```
Update `EXECUTION_ORDER` — networks must be created before resources/routers,
posture checks before policies that reference them:
```typescript
export const EXECUTION_ORDER: OperationType[] = [
"create_posture_check",
"update_posture_check",
"create_group",
"update_group",
"create_setup_key",
"rename_peer",
"update_peer_groups",
"update_peer",
"create_network",
"update_network",
"create_network_resource",
"update_network_resource",
"create_network_router",
"update_network_router",
"create_user",
"update_user",
"create_policy",
"update_policy",
"create_route",
"update_route",
"create_dns",
"update_dns",
// Deletions in reverse dependency order
"delete_dns",
"delete_route",
"delete_policy",
"delete_user",
"delete_network_router",
"delete_network_resource",
"delete_network",
"delete_peer",
"delete_setup_key",
"delete_posture_check",
"delete_group",
];
```
Run: `deno task check`
---
### Task 7: Add diff logic for new resource types
**Files:**
- Modify: `src/reconcile/diff.ts`
Add `diffPostureChecks`, `diffNetworks`, `diffPeers`, `diffUsers` functions and
call them from `computeDiff`.
**Posture checks:** Compare by name. Create if missing. Update if `checks`
object or description changed (deep JSON compare). Delete if not in desired.
**Networks:** Compare by name. Create network if missing. For each network, diff
resources and routers:
- Resources: match by name within the network. Create/update/delete.
- Routers: match by peer name (or peer_group). Create/update/delete.
**Peers:** Compare by name. Only update operations (never create/delete).
Compare `groups` (excluding "All"), `login_expiration_enabled`,
`inactivity_expiration_enabled`, `ssh_enabled`.
**Users:** Compare by email. Create if missing. Update if role or auto_groups
changed. Delete if not in desired (but never delete "owner" role).
**Policies update:** Handle `destination_resource` — when present, skip
group-based destination comparison. Handle `source_posture_checks`.
Run: `deno task check`
---
### Task 8: Add executor handlers for new operations
**Files:**
- Modify: `src/reconcile/executor.ts`
Add `case` handlers in `executeSingle` for all new operation types. Network
operations need special handling: resources and routers reference the network
ID, which may be newly created. Track `createdNetworkIds` similar to
`createdGroupIds`.
Posture check operations: create/update/delete via client methods. Track
`createdPostureCheckIds`.
User operations: resolve `auto_groups` names to IDs.
Network resource operations: resolve `groups` names to IDs.
Network router operations: resolve `peer` name to peer ID, or `peer_groups`
names to group IDs.
Update `ExecutorClient` type to include all new client methods.
Run: `deno task check`
---
### Task 9: Update export to cover new resource types
**Files:**
- Modify: `src/export.ts`
Add `exportPostureChecks`, `exportNetworks`, `exportPeers`, `exportUsers`
functions.
**Posture checks:** Keyed by name. Pass through `checks` object as-is. Include
`description`.
**Networks:** Keyed by name. For each network, fetch resources and routers from
ActualState maps. Resources: resolve group IDs to names. Routers: resolve peer
ID to peer name (via `actual.peersById`), resolve peer_group IDs to group names.
**Peers:** Keyed by peer name. Include groups (resolved to names, excluding
"All"), `login_expiration_enabled`, `inactivity_expiration_enabled`,
`ssh_enabled`.
**Users:** Keyed by email. Include name, role, auto_groups (resolved to names).
**Policies:** Handle `destinationResource` — resolve resource ID to resource
name. Include `source_posture_checks` resolved to posture check names.
Update the `exportState` return to include all new sections.
Run: `deno task check`
---
### Task 10: Export the three environments to state/*.json
Run the export against all three production NetBird instances:
```bash
mkdir -p state
deno task export -- --netbird-api-url https://dev.netbird.achilles-rnd.cc/api --netbird-api-token <DEV_TOKEN> > state/dev.json
deno task export -- --netbird-api-url https://achilles-rnd.cc/api --netbird-api-token <PROD_TOKEN> > state/prod.json
deno task export -- --netbird-api-url https://ext.netbird.achilles-rnd.cc/api --netbird-api-token <EXT_TOKEN> > state/ext.json
```
Verify each file parses with the updated schema. Visually inspect for
completeness against dashboards.
---
### Task 11: Update tests
**Files:**
- Modify: `src/reconcile/diff.test.ts` — tests for new diff functions
- Modify: `src/reconcile/executor.test.ts` — tests for new executor cases
- Modify: `src/export.test.ts` — tests for new export functions
- Modify: `src/state/schema.test.ts` — tests for new schema validation
- Modify: `src/state/actual.test.ts` — tests for expanded fetchActualState
- Modify: `src/integration.test.ts` — update mock data to include new resource
types
All existing tests must continue to pass. New tests should cover:
- Posture check CRUD diff/execute
- Network with resources and routers diff/execute
- Peer update diff (group changes, setting changes)
- User CRUD diff/execute
- Policy with destination_resource (export and diff)
- Policy with source_posture_checks (export and diff)
- Export of all new resource types
Run: `deno task test` — all tests must pass.
---
### Task 12: Final verification
Run full quality gate:
```bash
deno task check # type check
deno fmt --check # formatting
deno task test # all tests
```
All must pass.