Why does STRtree sometimes miss collisions with rotated labels?

Shapely's STRtree indexes axis-aligned bounding boxes (MBRs). A rotated label's MBR is larger than the actual glyph footprint, so the tree may flag a collision that doesn't exist visually, but it will never miss a true overlap as long as your candidate bboxes are also axis-aligned. If you need rotated support, store the rotated polygon explicitly and use 'predicate=intersects' on that geometry, not on the MBR.

How do I account for label halos (text buffers) in the bounding box?

Add twice the halo radius to both width and height in estimate_label_bbox: width += 2 * halo_px and height += 2 * halo_px, where halo_px is the halo size converted to the same coordinate units as your CRS. In EPSG:3857 at zoom 14, 1 pixel ≈ 9.5 metres; compute the conversion as (256 * 2**zoom / (2 * π * 6378137)) ** -1 * halo_screen_px.

Can I run this resolver in parallel across priority tiers?

Within a single priority tier, placements are independent of each other (they only need to avoid already-placed higher-priority labels), so you can parallelise candidate testing with concurrent.futures.ThreadPoolExecutor. However, the commit step (appending to placed_boxes and rebuilding the STRtree) must remain serial. A practical pattern is to test all candidates for a tier in parallel, collect the winning bbox per feature, then bulk-insert them and rebuild the tree once per tier.

Solving Label Overlap in Dense Urban Maps with Python

Solving label overlap in dense urban maps with Python requires a priority-driven bounding-box collision resolver: sort features by cartographic importance, generate multiple candidate anchor positions per feature, and accept the first position that does not intersect an already-placed label tracked in a shapely STRtree. When implemented correctly, this approach eliminates manual label nudging for 70–90% of features while preserving visual hierarchy across street networks, POIs, and administrative boundaries.

Core Algorithm and Workflow

Dense urban environments produce thousands of candidate label positions that inevitably intersect. A robust Python pipeline treats this as a constrained placement problem — not a simple text-rendering task — and resolves it in four deterministic steps.

Priority Assignment. Rank features by cartographic weight — for example, highway=10, arterial=7, local=4, poi=3. Higher-priority labels claim positions first, ensuring critical infrastructure stays visible regardless of density.
Candidate Generation. For each feature, produce 4–8 anchor positions. Points use radial offsets; linestrings use along-line positions via shapely.ops.interpolate; polygons use representative_point() rather than centroid to avoid placing labels outside irregular boundaries.
Collision Testing. Convert each candidate anchor into a bounding box using estimated font metrics, then query the growing STRtree of already-placed label boxes with predicate='intersects'. Any intersection disqualifies the candidate.
Commit and Fallback. Accept the first collision-free candidate and add its bounding box to the spatial index. If all candidates collide, retry at fallback_scale=0.85 before considering suppression as a last resort.

This methodology directly implements the spatial-index-driven approach described in the Label Collision Avoidance Algorithms overview and integrates cleanly into any Programmatic Map Styling and Label Automation pipeline.

Production-Ready Python Implementation

The script below is a complete, copy-pasteable collision resolver. It uses geopandas for spatial operations and shapely for bounding-box geometry. Font metrics are approximated by a character-width ratio — sufficient for batch processing; for pixel-perfect output, replace estimate_label_bbox with measurements from the Matplotlib Text API (renderer.get_text_width_height_descent).

import geopandas as gpd
from shapely.geometry import box
from shapely.strtree import STRtree


def estimate_label_bbox(text: str, font_size: float, scale: float = 1.0,
                        char_width_ratio: float = 0.55,
                        line_height_ratio: float = 1.2,
                        halo_units: float = 0.0) -> tuple[float, float]:
    """
    Estimate bounding-box dimensions for a label string.

    Parameters
    ----------
    text            : The label string.
    font_size       : Font size in the same units as the projected CRS.
    scale           : Multiplier applied to both dimensions (use <1 for fallback).
    char_width_ratio: Average glyph width as a fraction of font_size.
    line_height_ratio: Line height as a fraction of font_size.
    halo_units      : Halo/buffer radius in CRS units; added to all four sides.

    Returns
    -------
    (width, height) in CRS units.
    """
    width = len(text) * font_size * char_width_ratio * scale + 2 * halo_units
    height = font_size * line_height_ratio * scale + 2 * halo_units
    return width, height


def generate_candidates(geom, width: float, height: float,
                        offsets: list | None = None) -> list:
    """
    Generate candidate bounding boxes around a geometry's representative point.

    Default offsets cover 8 compass positions.  For linestring features,
    replace this with along-line positions using shapely.ops.interpolate.
    """
    if offsets is None:
        half_w, half_h = width / 2, height / 2
        offsets = [
            (0,       half_h),   # top-centre
            (0,      -half_h),   # bottom-centre
            (half_w,  0),        # right
            (-half_w, 0),        # left
            (half_w,  half_h),   # top-right
            (-half_w, -half_h),  # bottom-left
            (half_w, -half_h),   # bottom-right
            (-half_w, half_h),   # top-left
        ]
    cx, cy = geom.representative_point().coords[0]
    return [
        box(cx + dx - width / 2, cy + dy - height / 2,
            cx + dx + width / 2, cy + dy + height / 2)
        for dx, dy in offsets
    ]


def resolve_label_collisions(
    gdf: gpd.GeoDataFrame,
    font_size: float = 10.0,
    scale: float = 1.0,
    priority_col: str = "priority",
    label_col: str = "label",
    fallback_scale: float = 0.85,
    halo_units: float = 0.0,
) -> gpd.GeoDataFrame:
    """
    Place labels by priority, resolving overlaps via candidate testing.

    Parameters
    ----------
    gdf           : Input GeoDataFrame in a projected CRS (e.g. EPSG:3857).
    font_size     : Approximate label height in CRS units.
    scale         : Global scale multiplier for all labels.
    priority_col  : Column name containing numeric priority (higher = more important).
    label_col     : Column name containing the label string.
    fallback_scale: Scale factor applied on a second attempt when all candidates collide.
    halo_units    : Halo radius in CRS units to include in collision extent.

    Returns
    -------
    GeoDataFrame with columns [label, bbox (geometry), priority, geometry, scaled].

    Performance note
    ----------------
    STRtree in Shapely ≥2.0 is immutable after construction.  This implementation
    rebuilds it after each placement, giving O(N² log N) overall.  For datasets
    with N > 5 000 features, process in priority tiers and rebuild once per tier.
    """
    gdf = gdf.sort_values(priority_col, ascending=False).reset_index(drop=True)

    placed_boxes: list = []
    records: list[dict] = []
    tree: STRtree | None = None

    for _, row in gdf.iterrows():
        text = str(row[label_col])
        w, h = estimate_label_bbox(text, font_size, scale, halo_units=halo_units)
        candidates = generate_candidates(row.geometry, w, h)

        def _try_place(cands, is_fallback: bool = False) -> bool:
            nonlocal tree
            for cand in cands:
                if tree is not None and len(tree.query(cand, predicate="intersects")) > 0:
                    continue
                placed_boxes.append(cand)
                records.append({
                    "label":    text,
                    "bbox":     cand,
                    "priority": row[priority_col],
                    "geometry": row.geometry,
                    "scaled":   is_fallback,
                })
                # Shapely 2.x STRtree is immutable; rebuild after each commit.
                tree = STRtree(placed_boxes)
                return True
            return False

        if not _try_place(candidates):
            # Fallback: shrink label and retry before suppressing.
            w_f, h_f = estimate_label_bbox(
                text, font_size, scale * fallback_scale, halo_units=halo_units
            )
            _try_place(generate_candidates(row.geometry, w_f, h_f), is_fallback=True)
        # Features that fail both passes are silently suppressed (omitted from output).

    return gpd.GeoDataFrame(records, geometry="bbox", crs=gdf.crs)


# ---------------------------------------------------------------------------
# Example usage
# ---------------------------------------------------------------------------
# gdf = gpd.read_file("city_features.geojson").to_crs("EPSG:3857")
#
# priority_map = {"highway": 10, "arterial": 7, "local": 4, "hospital": 9,
#                 "school": 8, "cafe": 3, "residential": 2}
# gdf["priority"] = gdf["feature_type"].map(priority_map).fillna(1).astype(int)
# gdf["label"]    = gdf["name"]
#
# # font_size in metres (EPSG:3857): ~11 m ≈ 11 px at zoom 14
# placed = resolve_label_collisions(
#     gdf, font_size=11.0, scale=1.0, halo_units=2.0, fallback_scale=0.85
# )
# placed.to_file("resolved_labels.geojson")

Performance Tuning and Cartographic Best Practices

Rebuild the index per priority tier, not per label. The O(N² log N) rebuild loop above is correct and readable for datasets up to ~5,000 features. Beyond that, group rows by priority tier, test all candidates within the tier concurrently (the tree only needs to be read, not written), then bulk-insert the winning boxes and rebuild once. This brings practical complexity closer to O(T · N log N) where T is the number of tiers.
Convert coordinates to a projected CRS before computing extents. All metric calculations in estimate_label_bbox assume linear units. Always call gdf.to_crs("EPSG:3857") — or a local UTM zone for higher accuracy — before running the resolver. Passing geographic coordinates (EPSG:4326) will produce nonsensical pixel-equivalent offsets. The importance of CRS selection is detailed further in Projection Selection Algorithms.
Include the halo radius in the collision extent. Text halos (glows or buffers applied in the renderer) increase the effective footprint of each label. Add twice the halo radius to both width and height via the halo_units parameter; failing to do so causes halos to overlap even when the resolver reports clean placement.
Use abbreviation dictionaries before scaling. Rather than immediately shrinking a label on the first collision, attempt substitution first: "Street" → "St", "Avenue" → "Ave", "Boulevard" → "Blvd". This preserves readability better than a 15% size reduction, and the shorter string often fits in a previously-rejected candidate slot.
Respect minimum legibility thresholds. Set a floor at approximately 8 pt equivalent (in your projected-CRS units) below which labels are suppressed rather than further reduced. Labels below this threshold are illegible at the target export DPI and create visual noise rather than information — this threshold interacts directly with the DPI strategy described in DPI and Resolution Management.

Integration and Next Steps

The resolver outputs a GeoDataFrame whose bbox geometry column contains the committed label extents alongside the original feature geometry. This separation makes it renderer-agnostic:

Matplotlib / static export. Pass bbox.centroid coordinates directly to ax.annotate calls. The committed bounding boxes can optionally be drawn as debug rectangles with gdf.plot(ax=ax, facecolor='none', edgecolor='red').
Vector tile pipelines. Serialize resolved labels with placed.to_file("resolved_labels.geojson") or placed.to_postgis("resolved_labels", engine), then reference this layer in your tile generator (Tippecanoe, Martin, or pg_tileserv) with placement already solved — bypassing the engine’s own heuristic placement entirely.
Mapbox GL / MapLibre. Export resolved [cx, cy] centroids as a GeoJSON FeatureCollection and use layout: { "text-field": ["get", "label"], "text-allow-overlap": true } — the overlap guard is now unnecessary because collisions were resolved upstream.
CI integration. Wrap the resolver in a dask.dataframe.map_partitions call to parallelize across tiles or administrative units, and store the resolved GeoJSON artefacts as build outputs. This pairs naturally with the batch-processing patterns covered under Programmatic Map Styling and Label Automation.

For multi-scale atlas work, run the resolver independently at each zoom level with appropriately scaled font_size values, then merge the outputs into a single GeoJSON with a zoom_level property. The Typography Rules for Maps cluster covers the scale-dependent type specifications that should drive those per-zoom font sizes.

Frequently Asked Questions

Why does STRtree sometimes report a collision for labels that don’t visually overlap?

STRtree indexes the minimum bounding rectangle (MBR) of each geometry, not the geometry itself. For axis-aligned bounding boxes (which shapely.geometry.box produces), the MBR equals the box exactly, so there are no false positives. False positives arise when you index rotated label polygons — their MBRs are larger than the actual footprint. The fix is to use predicate='intersects' with the actual rotated polygon, not with its envelope.

How do I handle multi-line labels?

Multiply height by the number of lines: height = font_size * line_height_ratio * n_lines * scale. Detect newlines in the label string with n_lines = text.count('\n') + 1, and use max(len(line) for line in text.split('\n')) as the character count for width. The candidate generation logic is unchanged.

What happens if I skip sorting by priority?

Without a deterministic sort, the placement order depends on row position in the GeoDataFrame, which varies with file format and read order. Lower-priority features (cafés, residential streets) can claim positions before highways and hospitals, degrading cartographic hierarchy. Always include a stable sort key — add a secondary sort on name to break ties deterministically: gdf.sort_values([priority_col, label_col], ascending=[False, True]).

Label Collision Avoidance Algorithms — the parent page covering the full spectrum of collision detection strategies, from greedy placement to simulated annealing.
Automating Multi-Layer Legend Creation with GeoPandas — apply the same priority-driven bounding-box logic to legend item placement.
Typography Rules for Maps — the cartographic type specifications (size, weight, spacing) that drive the font_size and halo_units inputs to this resolver.

Back to Programmatic Map Styling and Label Automation