Identifying Canopy Gaps Using Morphological Filters (scipy.ndimage Black Top-Hat)

Identifying canopy gaps with a morphological black top-hat transform gives you a deterministic, scale-aware alternative to flat height-thresholding or manual digitization. When you reduce an airborne LiDAR Canopy Height Model (CHM) to a binary CHM < 2 m mask, you inherit fragmented gap edges, false positives from tall understory, and terrain-induced artifacts on slopes. Treating the CHM as a continuous elevation surface and probing it with a structuring element of known radius fixes all three. This page is the morphological-filter implementation that the Forest Gap & Understory Analysis workflow references, and it consumes the normalized rasters produced upstream in the Canopy Height Modeling & Terrain Extraction pipeline. If your CHM still carries ground-interpolation error, fix that first in Canopy Height Model Creation — morphological filters cannot recover a depression that the height normalization never preserved.

When to Use a Morphological Filter

Reach for a morphological black top-hat when you need gaps defined relative to their local canopy envelope rather than against a single global height. The table contrasts the three approaches you will realistically choose between for raster-based gap delineation.

Method	How it defines a gap	Strengths	Weaknesses
Flat height threshold (`CHM < h`)	Any pixel below an absolute height	Trivial, one parameter	Ignores local context; flags low-stature stands and slope shadows; ragged edges
Morphological black top-hat (this page)	Pixel sits in a depression deeper than the structuring element can bridge and below `h`	Scale-aware, suppresses noise, consolidates fragments deterministically	Two coupled parameters (radius + drop); sensitive to CRS units
Region-growing / watershed	Seeded flood-fill from local minima	Captures irregular gap shapes well	Stochastic seeding, slower, harder to reproduce at tile scale

Choose the morphological approach when reproducibility and tile-parallel throughput matter — the structuring element makes the scale of detection explicit, and the operation is a pure function of its parameters. The combination used here is closing minus original (a black top-hat), which is large precisely where the CHM sits in a basin of low canopy surrounded by taller crowns.

Opening (erosion then dilation) removes isolated canopy peaks smaller than the element.
Closing (dilation then erosion) bridges narrow canopy breaks and fills minor depressions.
Black top-hat (closing − original) extracts depressions that fall below the structurally smoothed surface — the basis of this detector.

Minimal Reproducible Example

The implementation uses scipy.ndimage for the morphology and rasterio for windowed I/O, so it never loads an entire regional CHM into memory. Each block is read with explicit overlap padding so edge pixels always see a full neighbourhood; the padding is trimmed before write-back. For the windowing contract, see the Rasterio windowed read/write documentation.

import numpy as np
import rasterio
from rasterio.windows import Window
from scipy.ndimage import grey_closing, label, sum as ndimage_sum
from skimage.morphology import disk

def detect_canopy_gaps(
    chm_path: str,
    output_path: str,
    gap_height_thresh: float = 2.0,
    min_gap_area_m2: float = 15.0,
    struct_radius_m: float = 5.0,
    overlap_px: int = 32,
) -> None:
    """
    Identify canopy gaps with a morphological black top-hat transform
    (grey_closing - chm), which highlights dark valleys — i.e. low spots
    surrounded by taller canopy. Processed in overlapping windows so edge
    pixels never see a truncated neighbourhood.
    """
    with rasterio.open(chm_path) as src:
        if src.count != 1:
            raise ValueError("Input CHM must be a single-band raster.")
        if src.crs is None or src.transform is None:
            raise ValueError("Input CHM must contain valid CRS and transform metadata.")

        res = src.res[0]  # Assume square pixels for simplicity
        struct_elem = disk(max(1, int(round(struct_radius_m / res))))
        area_thresh_px = min_gap_area_m2 / (res ** 2)

        meta = src.meta.copy()
        meta.update(dtype='uint8', count=1, nodata=0)

        with rasterio.open(output_path, 'w', **meta) as dst:
            for _ji, window in src.block_windows(1):
                # Expand window with overlap to prevent edge truncation.
                col_start_src = max(window.col_off - overlap_px, 0)
                row_start_src = max(window.row_off - overlap_px, 0)
                win = Window(
                    col_start_src,
                    row_start_src,
                    min(window.width  + 2 * overlap_px, src.width  - col_start_src),
                    min(window.height + 2 * overlap_px, src.height - row_start_src),
                )

                chm_block = src.read(1, window=win).astype(np.float32)

                # Treat non-positive samples as nodata for top-hat math.
                valid_mask = chm_block > 0
                if not np.any(valid_mask):
                    continue
                chm_block[~valid_mask] = 0

                # Black top-hat: closing(chm) - chm — large where chm is locally low.
                closed       = grey_closing(chm_block, structure=struct_elem)
                gap_surface  = closed - chm_block

                # Pixels are gaps if they sit in a deep depression AND fall below
                # the absolute height threshold (so tall canopy with a small dip
                # is not flagged as ground).
                raw_gap_mask = (
                    (gap_surface >= gap_height_thresh)
                    & (chm_block < gap_height_thresh)
                ).astype(np.uint8)

                # Filter connected components by minimum area.
                labeled, num_features = label(raw_gap_mask)
                if num_features > 0:
                    sizes = ndimage_sum(raw_gap_mask, labeled, range(1, num_features + 1))
                    keep_labels = np.where(sizes >= area_thresh_px)[0] + 1
                    final_mask = np.isin(labeled, keep_labels).astype(np.uint8)
                else:
                    final_mask = np.zeros_like(raw_gap_mask, dtype=np.uint8)

                # Trim the overlap padding before writing back to the un-padded window.
                # The expanded window starts `min(overlap_px, window.row_off)` rows
                # *above* the original — that prefix is the slice we discard.
                row_start = min(overlap_px, window.row_off)
                col_start = min(overlap_px, window.col_off)
                h = min(window.height, final_mask.shape[0] - row_start)
                w = min(window.width,  final_mask.shape[1] - col_start)

                dst.write(
                    final_mask[row_start:row_start + h, col_start:col_start + w],
                    1,
                    window=window,
                )

The two coupled tests — gap_surface >= gap_height_thresh (the depression is deep enough that the structuring element could not bridge it) and chm_block < gap_height_thresh (the pixel is genuinely low) — are what separate a true gap from a small dip in tall, closed canopy.

Parameter Reference

Parameter	Type	Default	Recommended range	Ecological rationale
`gap_height_thresh`	`float` (m)	`2.0`	2.0–3.5 closed canopy; 1.0–1.5 open woodland	Minimum vertical drop that defines a gap; doubles as the absolute ceiling below which a pixel is considered sub-canopy.
`min_gap_area_m2`	`float` (m²)	`15.0`	10–50 temperate/boreal	Removes micro-depressions and single-pixel noise that carry no ecological meaning; converted to pixels via native resolution.
`struct_radius_m`	`float` (m)	`5.0`	3–8	Should approximate the dominant crown radius. Too large merges adjacent gaps; too small leaves canopy noise unsuppressed.
`overlap_px`	`int` (px)	`32`	`≥ 2 × struct_radius_m / res`	Pad width that lets every edge pixel see a full neighbourhood; prevents seams at tile boundaries.

Two hard constraints sit underneath this table. First, the CHM must be in a metric projected CRS (UTM, State Plane) — geographic degrees distort the structuring-element radius and invalidate every area calculation. Second, area_thresh_px is computed from the raster’s native resolution, so the same min_gap_area_m2 yields a different pixel count at 0.5 m versus 1.0 m; never hard-code a pixel area.

Expected Output and Verification

detect_canopy_gaps writes a single-band uint8 GeoTIFF, 1 for gap pixels and 0 (nodata) elsewhere, perfectly co-registered with the input CHM. Before trusting it downstream, assert three things: the grid aligns, the result is genuinely binary, and the total gap fraction is ecologically plausible (closed temperate canopy is rarely more than ~20–30 % gap).

import numpy as np
import rasterio

with rasterio.open("chm.tif") as chm, rasterio.open("gaps.tif") as gaps:
    # 1. Grid alignment — gaps must inherit the CHM transform and shape.
    assert chm.transform == gaps.transform, "Output is not co-registered with the CHM."
    assert chm.shape == gaps.shape, "Output grid differs from input grid."

    mask = gaps.read(1)
    # 2. Strictly binary.
    assert set(np.unique(mask)).issubset({0, 1}), "Mask is not binary."

    # 3. Sanity-check the gap fraction.
    gap_fraction = mask.mean()
    print(f"Gap fraction: {gap_fraction:.1%}")
    assert gap_fraction < 0.40, "Gap fraction implausibly high — re-check struct_radius_m."

For a visual check, overlay the mask on the CHM in QGIS or with rasterio.plot.show: true gaps should be compact, sit inside taller canopy, and follow real openings (skid trails, blowdowns, treefall gaps) rather than tracking the slope aspect — the latter signals leftover terrain bias from the DTM.

Common Pitfalls

Geographic CRS input. Running on WGS84 degrees makes struct_radius_m / res meaningless and area filtering silently wrong. Reproject to a metric CRS during Canopy Height Model Creation, not here.
NaN propagation through grey_closing. If nodata reaches the morphology as NaN, the closing spreads it across the whole block. The chm_block[~valid_mask] = 0 line must run before grey_closing, exactly as written.
Insufficient tile overlap. An overlap_px smaller than the element radius leaves a seam of false gaps along block edges. Keep overlap_px ≥ 2 × struct_radius_m / res.
Tall understory read as canopy. In multi-strata stands, regeneration above gap_height_thresh hides real gaps. Raise the threshold to ≥3 m or mask known shrub zones with a separate land-cover raster before detection.

Forest Gap & Understory Analysis — the parent workflow this detector plugs into (fragmentation metrics, PAR estimation).
Canopy Height Model Creation — produce the normalized CHM this method consumes.
Generating high-res DTM from ALS data — the terrain surface whose quality determines whether gaps align with field observation.
Normalizing LiDAR point clouds with PDAL — height-above-ground normalization that precedes any CHM.

Up to the parent workflow: Forest Gap & Understory Analysis · Up to the pipeline overview: Canopy Height Modeling & Terrain Extraction.