Urban low-altitude UAV route planning: semantic mapping and functional area division

Review the research progress of semantic mapping and functional area perception in urban UAV route planning, covering the latest work of CVPR/ICCV/IROS/RAL 2022-2025

Urban low-altitude UAV route planning: semantic mapping and functional area division

Direction Four: Semantic Mapping + Ribbon Awareness Extended Chapter · Technical Blog Series Part 4


1. Background: From geometric map to semantic map

Traditional UAV path planning relies on pure geometric environment representation - occupancy grid (Occupancy Grid), octree (Octree) or voxel map (Voxel Map). These representations only encode “whether the space is flyable” and cannot understand “where to fly” and “why it cannot fly”.

Semantic maps introduce scene understanding capabilities based on geometric representation: identifying semantic information such as building types (residential/commercial/industrial), road grades, crowd density, functional area boundaries, etc. This capability is critical for low-altitude urban planning—a UAV crossing a business district plaza has a completely different level of risk than crossing a school playground, but a purely geometric map would treat both as equivalent free space.

Furthermore, Functional Zoning divides urban low-altitude airspace into areas with different regulatory levels: True height 120m control, No-Fly Zone, Restricted Area, Controlled Area, etc. Semantic awareness enables UAVs to proactively understand and comply with these regulatory rules, rather than relying solely on pre-annotated static no-fly zone maps.


2. Basics of semantic mapping: perception → understanding

2.1 Semantic segmentation: from pixels to scene understanding

Semantic segmentation is the core perceptual basis of semantic mapping. Given an image , the semantic segmentation model outputs pixel-wise class labels:

Among them, is a set of semantic categories (such as buildings, roads, vegetation, vehicles, people, sky), and is the position encoding of pixel .

Mainstream semantic segmentation architectures for urban scenes include:- DeepLabv3+ (Chen et al., CVPR 2018): Use Atrous Convolution to expand the receptive field without losing resolution, effectively capturing large-scale structures such as urban buildings and roads.

2.2 Instance segmentation and target detection

On top of semantic segmentation, instance segmentation further distinguishes different individuals of similar objects - separating each pedestrian in the “pedestrian group” into an independent instance, providing granular support for intention prediction and collision avoidance.

MethodsCore IdeasReasoning SpeedRepresentative Work
Two-stageDetect boxes first, then segment masks~10 FPSMask R-CNN (ICCV 2017)
One-stageJointly predict masks and categories~25 FPSYOLACT (ICCV 2019)
Transformer-basedDETR-style detection + mask~15 FPSMask2Former (CVPR 2022)
Foundation ModelSAM + Detector~20 FPSSEEM (CVPR 2024)

YOLO series (Ultralytics YOLOv8, 2023) is widely used in UAV real-time semantic perception - it can reach a detection frame rate of 50+ FPS on Jetson Orin, with a latency of , which is suitable for the real-time perception requirements of flight control systems.

2.3 Depth estimation: 2D → 3D geometrySemantic mapping requires lifting 2D semantic labels into 3D space. Monocular Depth Estimation provides conversion capabilities from RGB images to dense depth maps:

Key methods include:

Combined with the camera intrinsic parameters , the 2D pixel coordinates and the depth can be back-projected into 3D points:


3. Urban functional area division and low-altitude airspace classification

3.1 Differences in flight constraints in urban functional areas

Urban space is divided into different functional areas according to the nature of use, and the degree of restrictions on UAV flight in each area varies significantly:| Functional Area | Typical Scenarios | Flight Constraints | Main Risks | |--------|---------|---------|---------| | Residential Area | Residential Area | Height restrictions (< 30m), time period restrictions | Privacy invasion, noise complaints | | Business District | CBD, shopping malls | Flying within visual range | Dense crowds, signal interference | | Industrial Area | Factories, warehouses | Possible no-fly zones | Electromagnetic interference, heavy vehicles | | School/Hospital | Primary and secondary schools, hospitals | Strict no-fly or approval system | Security sensitive | | Transportation hubs | Near train stations and airports | Total flight ban | Aviation safety | | Park/Green Space | City Park | Relatively relaxed (requires approval) | Crowd gathering |

3.2 Low-altitude airspace classification system

The “Interim Regulations on the Management of Unmanned Aircraft Flights” issued by the Civil Aviation Administration of China (effective in 2024) establishes a vertical control framework with a true height of 120m:

Semantic mapping requires encoding these regulatory constraints into the planning system so that the UAV can automatically determine the flyable height and area boundaries based on the functional area in which it is located.

3.3 Data sources for semantic classification of functional areas

The division of urban functional areas relies on multi-source geographical information:

Multi-source integration framework:$$ \mathcal{F}{\text{zone}}(\mathbf{x}) = \alpha \cdot f{\text{osm}}(\mathbf{x}) + \beta \cdot f_{\text{poi}}(\mathbf{x}) + \gamma \cdot f_{\text{remote}}(\mathbf{x}) + \delta \cdot f_{\text{plan}}(\mathbf{x})

You can't use 'macro parameter character #' in math mode --- ## 4. Dynamic semantic understanding: intention prediction and uncertainty quantification ### 4.1 Pedestrian/Vehicle Intention Prediction Dynamic obstacles (pedestrians, cyclists, vehicles) in urban streets pose a major threat to safe UAV flight. **Intention prediction** requires not only predicting the future location of obstacles, but also understanding their behavioral intentions:

\hat{\mathbf{a}}t^{(i)} = \arg\max{\mathbf{a} \in \mathcal{A}} P(\mathbf{a} | \mathbf{b}_{1:t}^{(i)}, \mathcal{E})

You can't use 'macro parameter character #' in math mode Among them, $\mathbf{b}_{1:t}^{(i)}$ is the historical behavior trajectory of obstacle $i$, $\mathcal{E}$ is the environmental context (traffic light status, crosswalk, zebra crossing, etc.), and $\mathcal{A}$ is the intention set (crossing the road, waiting on the roadside, walking along the sidewalk, etc.). **Social LSTM** (Alahi et al., CVPR 2016) introduced Social Pooling for the first time to model pedestrian interaction; **Trajectron++** (Salzmann et al., ICRA 2020) modeled multi-agent interaction based on graph neural network (GNN), significantly improving the prediction accuracy in urban intersection scenes. ### 4.2 UAV-UAV conflict detection In urban low-altitude corridors, multiple UAVs may operate simultaneously. **Collision Detection** requires predicting potential collisions in space and time:$$ \text{Conflict} \Leftrightarrow \exists t \in [t_{\text{start}}, t_{\text{end}}]: \|\mathbf{p}_A(t) - \mathbf{p}_B(t)\| < d_{\text{safe}}

Where is the safe distance (usually or greater), , are the predicted trajectories of the two UAVs.

Conflict resolution strategies include:

4.3 Uncertainty-aware planning

There is inherent uncertainty in semantic classification—a glass curtain wall on a building facade may be misclassified as sky, and vegetation may be misclassified as building. Uncertainty Aware Planning Incorporate perceived uncertainty into decision-making:

Plan trajectories only in free areas with high enough confidence to reserve a safety margin for sensing errors. This idea is in line with Robust Optimization - ensuring safety in the worst case of uncertain sets.


5. Semantic-aware planning: cost function design

5.1 Semantically enhanced cost map

Traditional planning uses a Geometric Costmap, and each grid cell only encodes the collision probability. Semantic Enhanced Cost Map superimposes semantic cost on top of geometric cost:

The semantic cost is set according to the functional area to which the unit belongs:$$ C_{\text{sem}}(i,j) = \begin{cases} 0 & \text{open park} \ 1 & \text{commercial plaza} \ 5 & \text{residential area} \ 20 & \text{school/hospital} \ +\infty & \text{no-fly zone} \end{cases}

You can't use 'macro parameter character #' in math mode ### 5.2 Soft constraints and hard constraints **Hard constraints** are physical/regulatory restrictions that cannot be violated: - It is absolutely forbidden to fly within the no-fly zone - Do not fly below the minimum safe altitude - The distance from the obstacle shall not be less than the safety margin **Soft constraints** are preferred goals that can be exceeded at a cost: - Try to fly over parks rather than residential areas - Try to stay close to building walls rather than crossing open squares (to reduce wind disturbance) - Try to fly outside of high-noise periods Semantic-aware planning handles these two types of constraints through **hierarchical optimization**: minimizing the cost of soft constraints while satisfying the hard constraints. ### 5.3 EGPBS: Semantic-aware security planning **EGPBS (Environment Graph-based Planning with Buffer Shrinking)** is a semantic-aware planning framework for urban scenes (ideas derived from IROS 2023 related research): 1. **Environment graph construction**: Model the urban scene as a graph structure $\mathcal{G} = (\mathcal{V}, \mathcal{E})$, nodes $\mathcal{V}$ represent semantic areas (building blocks, streets, parks), and edges $\mathcal{E}$ represent connection relationships between areas 2. **Safety buffer shrink**: In narrow areas of low-altitude passages, the semantic-aware safety buffer (Safety Buffer) will automatically shrink to allow passage (narrow corridors are still passable) 3. **Graph search + trajectory optimization**: A* searches for coarse-grained paths on the environment graph, followed by time-domain optimization through the MINCO trajectory family --- ## 6. Security and Compliance: STMP/LAANC Integration ### 6.1 STMP: Space-time Risk Matrix PlanningSTMP (Spatial-Temporal Mitigation Planning) is a drone risk assessment framework proposed by the FAA. It evaluates the comprehensive risk level of each flight by analyzing factors such as population density, airport distance, and military facilities in the flight area. Semantic mapping can directly support STMP evaluation: - **Population Density Layer**: Statistics of pedestrian population density on the ground through semantic segmentation $\rho_{\text{people}}(\mathbf{x})$ - **Sensitive Facility Layer**: Mark schools, hospitals, and religious places through POI data - **Aviation facilities layer**: superimposed airport clearance area and route protection zone Comprehensive risk score:

R(\mathcal{T}) = \int_0^T \left( \alpha \cdot \rho_{\text{people}}(\mathbf{p}(t)) + \beta \cdot I_{\text{airport}}(\mathbf{p}(t)) + \gamma \cdot I_{\text{sensitive}}(\mathbf{p}(t)) \right) dt

You can't use 'macro parameter character #' in math mode ### 6.2 LAANC: Real-time Airspace Authorization LAANC (Low Altitude Authorization and Notification Capability) is a real-time airspace authorization system for drones provided by the FAA. The UAV queries whether the current location is within the authorized airspace through the UTM (UAV Traffic Management) interface, and can apply for real-time authorization. Integration path of semantic perception system and LAANC: 1. UAV semantic mapping to identify the current location functional area 2. If you are near the boundary of the restricted area, initiate an authorization application to LAANC 3. LAANC returns authorization status (Approved / Pending / Denied) 4. After the authorization is passed, the planning system will unlock the flight permission in the area. --- ## 7. Mathematical framework: multi-modal perception fusion and semantic cost map construction ### 7.1 Bayesian semantic fusion The core of multi-sensor fusion is Bayesian inference. Assume $z_t$ is the semantic observation (camera segmentation result) at time $t$, and the prior semantic map is $m$, then the posterior semantic map is:$$ P(m | z_{1:t}) \propto P(z_t | m, z_{1:t-1}) \cdot P(m | z_{1:t-1})

In a practical implementation, is modeled by a CRF (Conditional Random Field) or MLP classifier, taking into account spatial smoothing priors (neighboring pixels tend to have similar labels).

7.2 Factor graph optimization of semantic SLAM

The joint optimization of semantic mapping and positioning is realized through factor graph:

Among them, is the odometry residual, is the loop closure detection residual, and is the semantic observation residual (consistency constraint between 3D semantic points and semantic map).

The key challenge of semantic SLAM lies in the ambiguity of semantic observations: the same type of semantic labels may correspond to completely different geometric shapes (for example, buildings of different styles are labeled “building”), and appropriate relaxation needs to be introduced in the factor graph.


8.1 Large language model + semantic awareness

Visual-language models (VLMs) such as GPT-4V bring open vocabulary awareness capabilities to semantic mapping—no longer limited to a predefined set of closed semantic categories, but can understand arbitrary semantic concepts described in natural language.

Application Scenario: The user says “Avoid the school area”, VLM can identify school features (playground, flag-raising platform, school sign) from the image; the user says “Fly over the road with the coffee shop”, VLM can locate the target road. This upgrades semantic mapping from “passive query” to “active understanding”.

8.2 Privacy protection and data desensitizationSemantic mapping involves a large number of images of urban environments, raising privacy concerns (visibility inside buildings, recording of human activities). Technical response strategies include:


9. Summary

Semantic mapping elevates urban low-altitude UAV planning from geometric perception to cognitive understanding. Through semantic segmentation, depth estimation and functional area division, UAV can understand “where am I flying”, “why is it sensitive here”, “how should I get around”, instead of just knowing “are there any obstacles here”.

Key research directions include: Open vocabulary semantic awareness (large model empowerment), Uncertainty-aware planning (coping with perception errors), STMP/LAANC compliance integration (regulation-driven semantic constraints). As the regulatory framework for the urban low-altitude economy continues to improve, semantic awareness capabilities will become a standard component of urban UAV planning systems.


References


*This article is the fourth extended chapter in a series of articles on urban low-altitude drone route planning. *