DroneDetect

Curriculum-based reinforcement learning for autonomous shepherd drones — with a live multiplayer demo.

Unity 6 · ML-Agents 4 PPO + GAIL Standalone Multiplayer OAuth2 PKCE

Benedikt Linn · Code University Berlin · 2026-05-18

Problem

The drone acts as an active protector of a sheep flock
Threat: wolves — either rule-based (sprint + target lock) or human-controlled (multiplayer)
Defensive action: a scarer — light cone + sound that builds wolf fear and triggers a panic flight
The drone carries 3 HP; colliding with a wolf costs a life (1s i-frames)
Constraints: 3-minute demo round · 5-minute training episode

→ Classic multi-agent / asymmetric-roles setup with physically realistic drone dynamics (AR-Drone 2.0 digital twin).

Pipeline Architecture

Phase 1
DroneCSI
flight foundation

→

Phase 4
DroneShepherd
shepherding

→

Standalone Deploy
app.linn.games/shepherd

Phase	Algorithm	Curriculum	Demo mode
1: DroneCSI	PPO	5 levels (parcours_difficulty + wind_strength)	—
4: DroneShepherd	PPO + GAIL	Init from DroneCSI.onnx (transfer)	Human demos (4 Hz recording)

Observation & Action Space

14 observations · 4 continuous actions

Index	Observation	Range
0	Altitude / 10	[0,1]
1–3	Velocity (vx, vy, vz) / 5	[-1,1]
4–6	Euler rotation / 180	[-1,1]
7–9	Angular velocity / 5	[-1,1]
10–12	Direction-to-target (local, normalized)	unit
13	Distance-to-target / 50	[0,1]

Identical structure across both phases → DroneCSI.onnx is a drop-in curriculum start for phase 4.

Phase 1: DroneCSI — PPO Setup


behaviors:
  DroneCSI:
    trainer_type: ppo
    hyperparameters:
      batch_size: 512        # mini-batch per PPO update
      buffer_size: 10240     # full rollout before update (20 × batch_size)
      learning_rate: 3.0e-4
      beta: 5.0e-3           # entropy bonus → exploration
      epsilon: 0.2           # PPO clipping
      learning_rate_schedule: linear
    network_settings:
      hidden_units: 256
      num_layers: 3          # deeper MLP → richer dynamics representation
      normalize: true
    max_steps: 1_000_000
    time_horizon: 256        # length of return bootstrap

Phase 1: Curriculum Progression

Two orthogonal difficulty axes, both reward-triggered:

Lesson	Parcours	Wind	Reward threshold
0	easy	calm	1.5 → next
1	medium	0.3	2.5 → next
2	hard	0.6	3.5 → next
3	complex	0.85	5.0 → next
4 ✓	full	full storm	terminal

→ All 5 levels cleared in 150k steps · ~3 h wall time on BigOne (RTX 4090).

Phase 1: Training Results

Cumulative reward

35.25

↑ +39.5 vs. random

Episode length

297

↑ from 158 (88% longer)

Curriculum level

4 / 4

✓ all reached

Phase 4: Reward Shaping

Spec: "The more fear in the wolves, the more reward. Each surviving sheep is a multiplier. 5-minute episode. Penalty for sheep killed."

Signal	Value	Trigger
Δ wolf fear (per step)	+4 × dFear	Wolf accumulates fear inside the scarer cone
Wolf panic event	+3.0	Once per panic trigger
Sheep alive	+0.0003 × N_sheep	Every FixedUpdate
Sheep killed	−3.0	Immediately on catch
Drone crash (emergency)	−1.0	EndEpisode
Drone destroyed (3 HP lost)	−1.0 + EndRound	3× wolf collision (1 s i-frames)
Episode end: sheep saved	+5 × N_survived	EndRound()
Perfect defense	× 1.5 multiplier	All sheep survived
Fast-win bonus	+0.05 / remaining second	Only on perfect defense

Reward-Hacking Prevention

Risk: the agent could learn to maximise "keep wolves afraid" via overly aggressive play — sacrificing sheep as collateral.

Mitigation:

survivalRatio = sheepSaved / initialSheepCount
Episode-end bonus is scaled by Lerp(0.2, 1.0, survivalRatio)
→ losing every sheep: 80% penalty on the end-bonus even if fear was high
→ saving all sheep: full reward + 1.5× multiplier


// Survival-ratio shaping: pushes agent away from
// "trade sheep for fear-spam" exploits.
AddReward(endBonus * Mathf.Lerp(0.2f, 1f, survivalRatio));
EndEpisode();

Phase 4: GAIL Imitation Learning

Pure reward shaping leaves exploration too slow → we inject human demonstrations.


reward_signals:
  extrinsic:
    gamma: 0.99
    strength: 1.0
  gail:
    strength: 0.5             # weighted against extrinsic
    gamma: 0.99
    demo_path: Assets/Demonstrations/DroneSessions/
    use_actions: true         # GAIL discriminates on (state, action)
    use_vail: false

Demo recording: via DemonstrationRecorder, 4 Hz position logging
Replay pipeline: existing multiplayer sessions → GAIL demos via the app.linn.games shepherd.sessions API
Bootstrap: DroneCSI.onnx as initial policy (transfer)

Gameplay Layer 2026-05-18

Reward signals need a working game loop. Today's iteration:

Feature	Impl.	Why
Downward light-cone scarer	SpotLight 90° + SphereCollider trigger	Visual clarity + reliable fear transmission
Wolf fear for AI bots	`WolfFear.AddFear()` guard removed	Demo ran silent before — bots now react to the scarer
Drone HP system	3 HP · 1 s i-frames · round-end signal	Before: drone went silent on death, round kept ticking
Wolf hunting drive	Speed 7 + 1.5× sprint < 10 m · target lock + stuck timer	Pre-fix: wolf zig-zagged, never caught a sheep
End-screen outcomes	"Drone destroyed – wolf wins" vs. "All sheep saved"	Clear demo UX instead of a silent timeout

Deploy Topology: Standalone (Pivot)

⚠ WebGL dropped: Unity 6 + URP = 72 GB shader variants, 3× build failures. New model: standalone clients.

BigOne · Training & Build

RTX 4090 + 64 GB RAM
Unity 6 + ML-Agents training
Linux standalone build (IL2CPP)
Windows + macOS via local Mono cross-compile

⇄

u-server · Production

app.linn.games (Laravel 12 + Filament)
OAuth2 PKCE via Passport 13 (NEW)
Reverb WebSocket for multiplayer sessions
Event recording (4 Hz position tracks)
ZIP download via /api/shepherd/builds

No Docker Swarm needed · no GPU on u-server · training on BigOne; PPO 1M steps also fits on an 8 GB VGPU.

Multiplayer Login 2026-05-18 fix

A user-friendly login — no one needs to copy-paste CLI tokens:

Unity standalone button "Login with app.linn.games" → opens the system browser
Browser lands on /oauth/authorize?response_type=code&client_id=…&code_challenge=… (PKCE / RFC 7636)
If not signed in → 302 → /login, Fortify-2FA capable
After login → consent card "Approve / Deny"
Redirect to http://127.0.0.1:51742/callback?code=… via loopback (RFC 8252)
Unity exchanges the code for an access token, stores it in PlayerPrefs
MatchmakeManager flips to online mode; RevbClient connects to shepherd.{code}

→ This was a 500 error this morning — two layers fixed: the php-cli storage volume mount was missing (passport:keys vanished), and Passport 13 ships no default consent view (inline closure registered).

Live Downloads

Three platform builds currently in production, all from the shepherd-v0.3.0 release:

🐧 Linux x64

371 MB · IL2CPP

download .zip

🪟 Windows x64

60 MB · Mono

download .zip

🍎 macOS arm64

62 MB · Mono

download .zip

Auto-sync via gh workflow run "Shepherd · Upload builds from release" — new versions land on prod within a minute. JSON API: /api/shepherd/builds

Live-Demo Walkthrough

Grab a standalone client (see previous slide)
Click "Login with app.linn.games" → system browser → JWT stored in PlayerPrefs
Pick a role: wolf 🐺 or drone 🚁
Session code from backend / auto-join open session — OFFLINE mode for solo play
3-minute demo round (180 s) or 5-minute training round (300 s)
HUD: timer · sheep status · wolf-fear bar · drone HP
Round end: "Drone destroyed" / "Wolf wins" / "All sheep saved" + automatic .demo upload

Drone runs DroneCSI.onnx inference in demo mode; keyboard heuristic captures GAIL demos in training mode.

Status & Roadmap

Done

✓Phase 1: DroneCSI 150k steps, 35.25 reward ✓Phase 4: reward shaping + anti-exploit ✓GAIL demo-upload API + DemoUploader.cs ✓Standalone build pipeline (Linux/Win/Mac) ✓Demo sync for training (JWT pull) ✓Multiplayer session management (Reverb WebSocket) ✓OAuth2 PKCE login (Passport 13) today ✓Gameplay layer: HP, scarer cone, wolf-hunting AI today ✓Auto-deploy via shepherd-upload-builds workflow

In Progress

⧗Phase 4 training run (1.5M steps planned) ⧗Multiplayer PvP tournament mode

○Recurrent net + LSTM for longer-horizon strategies ○Integration with a real drone SDK (DJI / Autel) ○SDF heatmap in the dashboard (where the AI defends well/poorly)

Q & A

app.linn.games/shepherd · Builds API

github.com/nileneb/DroneDetect · github.com/nileneb/app.linn.games

DroneDetect

Curriculum-based reinforcement learning for autonomous shepherd drones — with a live multiplayer demo.

Problem

Pipeline Architecture

Observation & Action Space

14 observations · 4 continuous actions

Phase 1: DroneCSI — PPO Setup

Phase 1: Curriculum Progression

Phase 1: Training Results

Phase 4: Reward Shaping

Reward-Hacking Prevention

Mitigation:

Phase 4: GAIL Imitation Learning

Gameplay Layer 2026-05-18

Deploy Topology: Standalone (Pivot)

BigOne · Training & Build

u-server · Production

Multiplayer Login 2026-05-18 fix

Live Downloads

Live-Demo Walkthrough

Status & Roadmap

Done

In Progress

Next

Q & A