Benedikt Linn · Code University Berlin · 2026-05-18
→ Classic multi-agent / asymmetric-roles setup with physically realistic drone dynamics (AR-Drone 2.0 digital twin).
| Phase | Algorithm | Curriculum | Demo mode |
|---|---|---|---|
| 1: DroneCSI | PPO | 5 levels (parcours_difficulty + wind_strength) | — |
| 4: DroneShepherd | PPO + GAIL | Init from DroneCSI.onnx (transfer) | Human demos (4 Hz recording) |
| Index | Observation | Range |
|---|---|---|
| 0 | Altitude / 10 | [0,1] |
| 1–3 | Velocity (vx, vy, vz) / 5 | [-1,1] |
| 4–6 | Euler rotation / 180 | [-1,1] |
| 7–9 | Angular velocity / 5 | [-1,1] |
| 10–12 | Direction-to-target (local, normalized) | unit |
| 13 | Distance-to-target / 50 | [0,1] |
Identical structure across both phases → DroneCSI.onnx is a drop-in curriculum start for phase 4.
behaviors:
DroneCSI:
trainer_type: ppo
hyperparameters:
batch_size: 512 # mini-batch per PPO update
buffer_size: 10240 # full rollout before update (20 × batch_size)
learning_rate: 3.0e-4
beta: 5.0e-3 # entropy bonus → exploration
epsilon: 0.2 # PPO clipping
learning_rate_schedule: linear
network_settings:
hidden_units: 256
num_layers: 3 # deeper MLP → richer dynamics representation
normalize: true
max_steps: 1_000_000
time_horizon: 256 # length of return bootstrap
Two orthogonal difficulty axes, both reward-triggered:
| Lesson | Parcours | Wind | Reward threshold |
|---|---|---|---|
| 0 | easy | calm | 1.5 → next |
| 1 | medium | 0.3 | 2.5 → next |
| 2 | hard | 0.6 | 3.5 → next |
| 3 | complex | 0.85 | 5.0 → next |
| 4 ✓ | full | full storm | terminal |
→ All 5 levels cleared in 150k steps · ~3 h wall time on BigOne (RTX 4090).
Spec: "The more fear in the wolves, the more reward. Each surviving sheep is a multiplier. 5-minute episode. Penalty for sheep killed."
| Signal | Value | Trigger |
|---|---|---|
| Δ wolf fear (per step) | +4 × dFear | Wolf accumulates fear inside the scarer cone |
| Wolf panic event | +3.0 | Once per panic trigger |
| Sheep alive | +0.0003 × N_sheep | Every FixedUpdate |
| Sheep killed | −3.0 | Immediately on catch |
| Drone crash (emergency) | −1.0 | EndEpisode |
| Drone destroyed (3 HP lost) | −1.0 + EndRound | 3× wolf collision (1 s i-frames) |
| Episode end: sheep saved | +5 × N_survived | EndRound() |
| Perfect defense | × 1.5 multiplier | All sheep survived |
| Fast-win bonus | +0.05 / remaining second | Only on perfect defense |
Risk: the agent could learn to maximise "keep wolves afraid" via overly aggressive play — sacrificing sheep as collateral.
survivalRatio = sheepSaved / initialSheepCountLerp(0.2, 1.0, survivalRatio)
// Survival-ratio shaping: pushes agent away from
// "trade sheep for fear-spam" exploits.
AddReward(endBonus * Mathf.Lerp(0.2f, 1f, survivalRatio));
EndEpisode();
Pure reward shaping leaves exploration too slow → we inject human demonstrations.
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
gail:
strength: 0.5 # weighted against extrinsic
gamma: 0.99
demo_path: Assets/Demonstrations/DroneSessions/
use_actions: true # GAIL discriminates on (state, action)
use_vail: false
shepherd.sessions APIReward signals need a working game loop. Today's iteration:
| Feature | Impl. | Why |
|---|---|---|
| Downward light-cone scarer | SpotLight 90° + SphereCollider trigger | Visual clarity + reliable fear transmission |
| Wolf fear for AI bots | WolfFear.AddFear() guard removed | Demo ran silent before — bots now react to the scarer |
| Drone HP system | 3 HP · 1 s i-frames · round-end signal | Before: drone went silent on death, round kept ticking |
| Wolf hunting drive | Speed 7 + 1.5× sprint < 10 m · target lock + stuck timer | Pre-fix: wolf zig-zagged, never caught a sheep |
| End-screen outcomes | "Drone destroyed – wolf wins" vs. "All sheep saved" | Clear demo UX instead of a silent timeout |
⚠ WebGL dropped: Unity 6 + URP = 72 GB shader variants, 3× build failures. New model: standalone clients.
/api/shepherd/buildsNo Docker Swarm needed · no GPU on u-server · training on BigOne; PPO 1M steps also fits on an 8 GB VGPU.
A user-friendly login — no one needs to copy-paste CLI tokens:
/oauth/authorize?response_type=code&client_id=…&code_challenge=… (PKCE / RFC 7636)http://127.0.0.1:51742/callback?code=… via loopback (RFC 8252)shepherd.{code}→ This was a 500 error this morning — two layers fixed: the php-cli storage volume mount was missing (passport:keys vanished), and Passport 13 ships no default consent view (inline closure registered).
Three platform builds currently in production, all from the shepherd-v0.3.0 release:
Auto-sync via gh workflow run "Shepherd · Upload builds from release" — new versions land on prod within a minute. JSON API: /api/shepherd/builds
Drone runs DroneCSI.onnx inference in demo mode; keyboard heuristic captures GAIL demos in training mode.
✓Phase 1: DroneCSI 150k steps, 35.25 reward
✓Phase 4: reward shaping + anti-exploit
✓GAIL demo-upload API + DemoUploader.cs
✓Standalone build pipeline (Linux/Win/Mac)
✓Demo sync for training (JWT pull)
✓Multiplayer session management (Reverb WebSocket)
✓OAuth2 PKCE login (Passport 13) today
✓Gameplay layer: HP, scarer cone, wolf-hunting AI today
✓Auto-deploy via shepherd-upload-builds workflow
⧗Phase 4 training run (1.5M steps planned) ⧗Multiplayer PvP tournament mode
○Recurrent net + LSTM for longer-horizon strategies ○Integration with a real drone SDK (DJI / Autel) ○SDF heatmap in the dashboard (where the AI defends well/poorly)
app.linn.games/shepherd · Builds API
github.com/nileneb/DroneDetect · github.com/nileneb/app.linn.games