LLM-Guided UAV Mission Planning: The Frontier from Inference to Execution
UAV Intelligent Series ยท Chapter X+1 Spotlight: LLM as mission planner, symbolic planning integration, real-time inference architecture
1. Why is LLM suitable for UAV mission planning?
The challenge of UAV mission planning lies in open world uncertainty:
ไผ ็ป่งๅ๏ผๅบไบๆจกๅ๏ผ๏ผ
่พๅ
ฅ๏ผ็ฒพ็กฎ็ฎๆ ็ถๆ + ็ฒพ็กฎ็ฏๅขๆจกๅ
่พๅบ๏ผๆไผๅจไฝๅบๅ
ๅฑ้๏ผๆจกๅไธๅๅฐฑๅดฉๆบ๏ผๆ ๆณๅค็่ฏญ่จ็ฎๆ
LLM ่งๅ๏ผๅบไบ็ฅ่ฏ๏ผ๏ผ
่พๅ
ฅ๏ผ่ช็ถ่ฏญ่จๆไปค + ่ง่ง่งๆต + ไธ็็ฅ่ฏ
่พๅบ๏ผๅฏๆง่กๅจไฝๅบๅ
ไผๅฟ๏ผๆณๅๆงๅผบใ้ถๆ ทๆฌ็่งฃๆฐไปปๅก
Advantages of LLM:
- World Knowledge: Pre-training contains rich physical knowledge (โWater flowsโ, โCars are faster than peopleโ)
- Zero-shot inference: No need to train separately for each task
- Multi-step planning: Decompose complex tasks into sub-goal chains (Chain-of-Thought)
2. LLMโs paradigm for task planning
2.1 Paradigm 1: LLM as Planner (directly output actions)
Representative work:
ReAct (Reasoning + Acting)
- Core idea: LLM alternates โreasoningโ and โactionโ
- Each step:
obs โ think โ action โ next_obs - Applicable to: Scenarios with observable status and clear environmental feedback
- Adaptation on UAV: requires fast actionโobs loop
SayCan (PaLM-SayCan, 2022)
- Combine LLMโs โcapability descriptionโ with physical โfeasibilityโ
- The robot says โwhat it can doโ, and the LLM decides โwhat it should doโ
- Enlightenment: UAV can filter infeasible actions based on its own status (power, flight restrictions)
2.2 Paradigm 2: LLM + PDDL symbol planning
PDDL (Planning Domain Definition Language) is a classic robot task planning language that models tasks as discrete symbolic problems.
Core idea:
VLM ๆ็ฅ โ PDDL problem ็ๆ โ ็ปๅ
ธ่งๅๅจ โ UAV ๅจไฝๅบๅ
Advantages:
- Planning results can be explained and verified
- Mathematical proof to ensure task completion
- Suitable for safety-critical scenarios (urban airspace flights)
Challenge:
- PDDL modeling itself is a bottleneck (requires domain experts)
- The continuous dynamics of UAVs are not fully compatible with the discrete assumptions of PDDL
- Solution idea: PDDL handles high-level task decomposition, MPC handles low-level trajectory execution
---### 2.3 Paradigm 3: LLM + RAG (retrieval enhanced generation)
GenerativeMPC (arXiv, 2026)
Paper: GenerativeMPC: VLM-RAG-guided Whole-Body MPC with Virtual Impedance for Bimanual Mobile Manipulation Author: Marcelino Julio Fernando et al. Source: arXiv, April 2026
Core idea:
VLM ๆ็ฅๅฝๅๅบๆฏ โ ๆฃ็ดข็ธๅ
ณๆไฝ็ฅ่ฏๅบ โ RAG ็ๆๆไฝๅปบ่ฎฎ โ MPC ๆง่ก
Key technology:
- Knowledge retrieval: Retrieve examples most relevant to the current scenario from the operational knowledge base (including robot control experience data)
- Virtual Impedance: Generate compliance control parameters to avoid rigid collisions
- RAG filtering: Ensure that LLM output is physically executable
Adaptation on UAV:
- Search building codes (height restrictions, no-fly zones)
- Retrieve historical mission experience (flight parameters under similar weather conditions)
- Retrieve safety protocols (minimum obstacle avoidance distance, emergency procedures)
3. Real-time reasoning architecture
3.1 Dual-process architecture (arXiv, 2026)
Paper: A Dual-Process Architecture for Real-Time VLM-Based Indoor Navigation Author: Joonhee Lee, Hyunseung Shin, Jeonggil Ko Source: arXiv:2601.19401, January 2026
Core Design:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ System Architecture โ
โ โ
โ Process 1 (Slow): VLM Reasoning Thread โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ VLM: "What should I do next?" โ โ
โ โ Frequency: ~0.2-1 Hz โ โ
โ โ Output: Navigation goal / decision โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ goal โ
โ Process 2 (Fast): Control Execution Threadโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ MPC: Track trajectory to goal โ โ
โ โ Frequency: ~100 Hz โ โ
โ โ Output: Motor control signals โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Design principles:
- Quick Process (MPC): millisecond-level response, processing real-time obstacle avoidance
- Slow Process (VLM): Second-level reasoning, processing high-level decisions
- Decoupling critical: VLM is not on the critical path and does not affect the control frequency
3.2 Hierarchical planning framework
**High level (LLM/VLM, second level): **
ไปปๅก็่งฃ โ ๅญ็ฎๆ ๅ่งฃ โ ๅ
จๅฑ่ทฏๅพ่งๅ โ ๆๆไฝๅฑๆง่ก
**Middle layer (differentiable optimization, 100ms level): **
RRT*/MPC โ ๅฑ้จ่ทฏๅพ้่งๅ โ ๅนณๆป่ฝจ่ฟน็ๆ
```**Low layer (PID/MPC, millisecond level): **
ๅงฟๆๆงๅถ โ ็ตๆบๅ้ โ ๆง่ก
---
## 4. Key algorithm depth
### 4.1 VoxPoser: LLM synthetic 3D value map
**Paper:** *VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models*
**Author:** Wenlong Huang, Chen Wang, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Li Fei-Fei
**Source:** arXiv:2307.05973, July 2023
**Core contribution:**
- LLM output **3D spatial heat map** (composable 3D value map)
- Heat map encoding "where to go" and "what to avoid"
- Directly used as reward function for trajectory optimization
**Extension on UAV:**
- VLM output 3D occupancy heat map
- Heat map driven MPC cost function
- VoxPoser for UAV = "3D spatial affordance from language"
**Note:** VoxPoser was published on arXiv. No clear conference publication records have been found so far.
---
### 4.2 CoNVO (Conditional Neural Value Optimization)
Combine LLM planning with value iteration:
- LLM provides **prior preferences** (which actions are more reasonable)
- Value iteration provides **optimality guarantee**
- More robust than pure LLM planning and more flexible than pure planning
---
## 5. World model assisted planning
### 5.1 Why World Model?
The knowledge of the LLM is static, but the UAV environment is dynamic:
- The wind will change
- Obstacles will move
- GNSS signals can drift
The World Model allows UAVs to **predict the future**:
ๅฝๅ็ถๆ + ๅจไฝ โ ไธ็ๆจกๅ โ ้ขๆตๆชๆฅ็ถๆๅบๅ LLM ๅจ้ขๆต็ๆชๆฅ็ถๆๅบๅไธๅ่งๅ๏ผPlan over imagined futures๏ผ
### 5.2 Paper Representative**Dreamer Series** (Daniel Hafner, Jรผrg Widmer, etc.)
- Based on RSSM dynamic model
- Do reinforcement learning on imagined future
- Verified on robots (robot arms, unmanned vehicles)
**VMP (Video Motion Planning)**
- Use video generation models for motion planning
- Generate future frames โ extract motion vectors โ control UAV
---
## 6. Security and Authentication
### 6.1 Why security is key
When UAVs fly in cities, poor decision-making may cause **human casualties**. There is a fundamental contradiction between the probabilistic output of LLM and the deterministic guarantees required by aviation safety.
### 6.2 Security Framework
**CBF๏ผControl Barrier Functions๏ผ๏ผ**
- ASMA introduces CBF to UAV VLN
- Ensure that the unsafe state is never reachable
**Formal Verification๏ผ**
- Use TLA+ / NuSMV for state machine verification
- LLM planning results are executed after model verification
**Shielding:**
- Bottom layer protector (Shield): monitors LLM output and intercepts unsafe actions
- Upper-level LLM: Focus on task completion and do not consider security details
- **Autonomous driving-like "Guardian Angel" architecture**
---
## 7. Frontier hot spots and future directions
### 7.1 End-to-end VLA (Vision-Language-Action)
**Latest trend:** Skip the hierarchical design of "sensing โ planning โ control" and output **action token** directly from VLM.
Representative work:
- **RT-2** (Google Robotics): Directly fine-tune the output action of VLM
- **ฯโ** (Physical Intelligence): VLA for humanoid robots
- **UAV version** (emerging): similar ideas applied to drones
**Challenge:**
- Continuity of action space vs discreteness of language
- Difficulty in security verification (end-to-end black box)
- Data scarcity (requires large-scale robot teleoperation data)
### 7.2 Multi-machine collaborative LLM planning
**SysNav (arXiv, March 2026)****Paper:** *SysNav: Multi-Level Systematic Cooperation Enables Real-World, Cross-Embodiment Object Navigation*
**Author:** Haokun Zhu et al.
**Source:** arXiv:2603.xxxxx, March 2026
**Core contribution:**
- Multi-agent collaborative navigation across different robot platforms
- LLM does high-level coordination (who goes to which area)
- Distributed perception fusion (each agent shares vision)
### 7.3 Physical Intelligence ร UAV
- **Foundation Models for Manipulation** โ **Foundation Models for Flight**
- A dedicated "UAV brain" pre-training model may appear in the future
- Similar to LLaVA but specializing in 3D spatial reasoning + flight dynamics
---
## 8. Summary and suggestions
| Dimensions | Current Best | Future Directions |
|------|---------|---------|
| Planning paradigm | Dual-process architecture (real-time feasible) | End-to-end VLA (long-term goal) |
| World knowledge | RAG (reliable but slow) | World model (fast but requires training) |
| Security | CBF + Shielding | Formal verification (fully guaranteed) |
| Edge deployment | 4-bit LLaVA (barely real-time) | Special purpose chips (NPU/TPU) |
**Advice for you:**
1. **The fastest route to results**: Dual-process architecture + LLaVA-7B + UAV platform
2. **The most room for innovation**: VLM + security verification framework (almost no one is doing it currently)
3. **Long-term layout**: Collect your own UAV control data and train a dedicated VLA model
---
## ๐ References1. Lee et al. *A Dual-Process Architecture for Real-Time VLM-Based Indoor Navigation*. arXiv:2601.19401, 2026.
2. Fernando et al. *GenerativeMPC: VLM-RAG-guided Whole-Body MPC with Virtual Impedance*. arXiv, 2026.
3. Huang et al. *VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models*. arXiv:2307.05973, 2023.
4. Brohan et al. *RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control*. arXiv, 2023.
5. Zhu et al. *SysNav: Multi-Level Systematic Cooperation Enables Real-World, Cross-Embodiment Object Navigation*. arXiv, 2026.
6. Ahn et al. *Do As I Can and Not As I Say: Grounding Language in Robotic Affordances*. arXiv, 2022.