Digital Terrain Model Generation for Forestry and Ecological Workflows

A Digital Terrain Model (DTM) is the bare-earth surface that every vertical forest metric is measured against, so when it is wrong, every downstream number — canopy height, biomass, flow accumulation — inherits the error. The concrete problem this page solves: you have a classified or unclassified LiDAR tile over steep, densely vegetated terrain, and you need a 1 m bare-earth raster whose elevations hold up against RTK survey checks and agency specifications. Unlike a Digital Surface Model (DSM) that captures the uppermost canopy, rooftops, and power lines, a DTM isolates the true ground so topographic gradient, hydrological flow path, and microhabitat variability can be quantified cleanly. This work sits inside the broader Canopy Height Modeling & Terrain Extraction framework, and its output is the geometric baseline that the rest of that pipeline subtracts against.

Prerequisites

Confirm each of these before running the pipeline — a single mismatch here silently corrupts every downstream raster:

PDAL ≥ 2.6 with the filters.smrf, filters.csf, and writers.gdal stages available (pdal --version).
Python ≥ 3.10 with rasterio ≥ 1.3, numpy ≥ 1.24, and optionally richdem or whitebox for derivatives.
Input tiles as LAS/LAZ with a populated, correct SRS in the header (verify with pdal info --metadata tile.laz).
A known projected CRS — a metre-based system such as a UTM zone or a State Plane zone. Geographic (degree) CRSs produce meaningless slope and resolution values.
Vertical datum noted (e.g. NAVD88 via a geoid model) so the DTM is orthometric, not ellipsoidal, when paired with field elevations.
Ground-return density target of at least 1 return/m² across the area of interest; flag anything sparser for manual review.

If the CRS or datum is unverified, resolve it first via Coordinate Reference Systems for Forestry — reprojecting a DTM after the fact reintroduces interpolation error that classification can no longer fix.

Concept: Ground Filtering as a Surface Estimation Problem

Generating a DTM is fundamentally a surface-estimation problem: from a point set $P = {(x_{i}, y_{i}, z_{i})}$ contaminated by vegetation and noise returns, recover the subset that lies on the ground and interpolate it onto a regular grid. Two filter families dominate forested work.

The Cloth Simulation Filter (CSF) inverts the point cloud and drapes a simulated cloth of rigidity $r$ over it under gravity; a point is classified as ground when its distance to the settled cloth falls below a threshold $h_{t}$ :

ground (p_{i}) = {10 if ∣ z_{i} - z_{cloth} (x_{i}, y_{i}) ∣ \leq h_{t} otherwise

The Simple Morphological Filter (SMRF) instead opens the surface with a progressively widening window and rejects points whose elevation exceeds a slope-dependent tolerance. Across a window of size $w$ on terrain of slope $s$ , the maximum permitted elevation difference for a ground point is approximately:

Δ z_{m a x} = s \cdot w + ε

where $ε$ absorbs sensor noise. The practical consequence: CSF excels on gently rolling broadleaf terrain where the cloth can settle smoothly, while SMRF’s explicit slope term makes it the safer choice on karst, terraces, and abrupt breaks. Choosing between them is exactly the trade-off explored in depth on the comparison work under LiDAR Point Cloud Preprocessing, which owns the upstream noise-removal stage this pipeline assumes is already complete.

Step-by-Step Python Pipeline

The end-to-end flow is: preprocess → classify ground → rasterize → validate. The first stage is documented separately; this page picks up from a cleaned tile and produces a validated bare-earth GeoTIFF.

Step 1 — Classify ground returns

Run a ground filter to tag terrain points as ASPRS Class 2. Driving PDAL from Python keeps the parameters version-controlled alongside the rest of the workflow.

import json
import pdal

def classify_ground(in_laz: str, out_laz: str, slope: float = 0.15, window: float = 18.0) -> int:
    """Classify ground returns with SMRF and write a tagged LAZ. Returns ground-point count."""
    pipeline = {
        "pipeline": [
            in_laz,
            {
                "type": "filters.smrf",
                "slope": slope,
                "window": window,
                "scalar": 1.2,
                "threshold": 0.45,
            },
            {
                "type": "writers.las",
                "filename": out_laz,
                "forward": "all",
                "compression": "laszip",
            },
        ]
    }
    p = pdal.Pipeline(json.dumps(pipeline))
    p.execute()
    arr = p.arrays[0]
    return int((arr["Classification"] == 2).sum())

In karst or terraced stands, slope=0.15 with window=18.0 tracks abrupt elevation changes more reliably than a tension-based cloth. For dense broadleaf canopy where the cloth tends to sag into gaps, swap in filters.csf and reduce its threshold to 0.25–0.35.

Step 2 — Isolate ground and rasterize to bare earth

With ground returns tagged, keep only Class 2 and aggregate them into a grid. output_type="min" selects the lowest return in each cell, which avoids the upward bias that mean or max aggregation introduces from residual low vegetation.

import pdal
import json

def rasterize_dtm(classified_laz: str, out_tif: str, resolution: float = 1.0) -> None:
    """Write a bare-earth GeoTIFF from classified ground returns."""
    pipeline = {
        "pipeline": [
            classified_laz,
            {"type": "filters.range", "limits": "Classification[2:2]"},
            {
                "type": "writers.gdal",
                "filename": out_tif,
                "resolution": resolution,
                "output_type": "min",
                "data_type": "float32",
                "gdalopts": "COMPRESS=DEFLATE,PREDICTOR=2,TILED=YES",
            },
        ]
    }
    pdal.Pipeline(json.dumps(pipeline)).execute()

Step 3 — Fill small voids without flattening real terrain

Sparse-return pockets under dense canopy leave NoData holes. Fill only narrow gaps so genuine topography survives; rasterio.fill.fillnodata interpolates across a bounded search distance.

import numpy as np
import rasterio
from rasterio.fill import fillnodata

def fill_small_voids(in_tif: str, out_tif: str, max_search: int = 6) -> None:
    """Interpolate across NoData cells within max_search pixels; leave large voids untouched."""
    with rasterio.open(in_tif) as src:
        band = src.read(1, masked=True)
        profile = src.profile
        mask = (~band.mask).astype(np.uint8)
        filled = fillnodata(band.filled(src.nodata), mask=mask,
                            max_search_distance=max_search, smoothing_iterations=0)
    with rasterio.open(out_tif, "w", **profile) as dst:
        dst.write(filled, 1)

Keep max_search small (4–8 px at 1 m). A larger radius will hallucinate ground across genuine gaps such as wide riparian corridors, which is exactly the artifact validation is meant to catch.

PDAL Configuration Reference

When the workflow runs from a job scheduler rather than Python, the same logic lives in a declarative JSON pipeline. Storing it in version control guarantees reproducibility across multi-temporal campaigns. The annotated stages:

{
  "pipeline": [
    "classified_tile.laz",
    {
      "type": "filters.range",
      "limits": "Classification[2:2]"
    },
    {
      "type": "writers.gdal",
      "filename": "dtm_1m.tif",
      "resolution": 1.0,
      "output_type": "min",
      "data_type": "float32",
      "gdalopts": "COMPRESS=DEFLATE,PREDICTOR=2,TILED=YES"
    }
  ]
}

filters.range with Classification[2:2] keeps only ASPRS ground returns; everything else is dropped before aggregation.
resolution is the grid cell size in CRS units — keep it at or above the inverse of your ground-return density so most cells contain at least one return.
output_type of "min" is correct for bare earth; "idw" produces a smoother surface but can bridge real voids, so reserve it for hydrological grids that need full coverage.
gdalopts with PREDICTOR=2 roughly halves file size on smooth float elevation data at no accuracy cost.

The canonical reference for these stages is the PDAL Documentation, which details every filter and writer option used here.

Validation & Verification

Spatial validation is non-negotiable in conservation and research contexts. A production DTM must be assessed against independent ground control points or high-precision RTK survey data before it feeds any model. The headline metric is Root Mean Square Error against $n$ check points:

RMSE = \frac{1}{n} i = 1 \sum n (z_{DTM, i} - z_{survey, i})^{2}

The script below samples the DTM at each survey coordinate and reports RMSE, mean error (bias), and mean absolute error.

import numpy as np
import rasterio

def validate_dtm(dtm_tif: str, control_xyz: np.ndarray) -> dict:
    """control_xyz: (n,3) array of survey x, y, z in the DTM's CRS."""
    with rasterio.open(dtm_tif) as src:
        sampled = np.array([v[0] for v in src.sample(control_xyz[:, :2])], dtype="float64")
        nodata = src.nodata
    valid = sampled != nodata
    resid = sampled[valid] - control_xyz[valid, 2]
    return {
        "n": int(valid.sum()),
        "rmse": float(np.sqrt(np.mean(resid ** 2))),
        "mean_error": float(np.mean(resid)),   # signed → vertical bias
        "mae": float(np.mean(np.abs(resid))),
    }

A near-zero mean_error with a small rmse is the target; a persistently positive bias usually signals residual vegetation leaking into ground returns, while error that climbs with slope points to sparse returns under canopy. Log these metrics alongside the pipeline JSON so every product carries an auditable provenance chain that satisfies agency standards such as the USGS LiDAR Base Specification.

Failure Modes & Gotchas

CRS mismatch between tiles and control points. src.sample returns garbage if the survey points are in a different CRS than the DTM — reproject the control set first and never assume the LAS header was correct.
Geographic CRS leaking through. A resolution of 1.0 in degrees is roughly 111 km per cell. Always rasterize in a projected, metre-based CRS.
NoData treated as a real elevation. If nodata is unset or wrong, void cells (often -9999 or 0) get averaged into slope and flow calculations, producing cliff artifacts. Set and propagate NoData explicitly.
Over-aggressive void filling. A large max_search_distance invents ground across wide gaps; keep it small and let validation flag what is left.
output_type="max" on bare earth. Selecting the highest return per cell biases the surface upward by the height of any misclassified low vegetation — use min for DTMs.
Tile-edge seams. Classifying tiles in isolation produces discontinuities at boundaries; buffer each tile with a halo of neighbouring points before classification and crop back afterward.

Performance & Scale Notes

Regional inventories span thousands of tiles, so the pipeline must run tile-by-tile rather than loading a whole survey into memory. Process standard 1 km² tiles in parallel with a buffered halo (typically 30–50 m) so ground filters see context across seams, then crop each result to its nominal extent before mosaicking with gdalbuildvrt followed by gdal_translate. PDAL’s filters.splitter or filters.chipper subdivides oversized tiles in memory, and a process pool over independent tiles scales nearly linearly because each tile is embarrassingly parallel.

import concurrent.futures
from pathlib import Path

def build_dtms(tiles: list[str], out_dir: str, workers: int = 6) -> list[str]:
    """Classify + rasterize many tiles in parallel; returns output paths."""
    out = Path(out_dir); out.mkdir(parents=True, exist_ok=True)
    def one(tile: str) -> str:
        stem = Path(tile).stem
        classified = str(out / f"{stem}_g.laz")
        dtm = str(out / f"{stem}_dtm.tif")
        classify_ground(tile, classified)
        rasterize_dtm(classified, dtm)
        return dtm
    with concurrent.futures.ProcessPoolExecutor(max_workers=workers) as pool:
        return list(pool.map(one, tiles))

Keep workers at or below physical core count — ground classification is CPU-bound, and oversubscription thrashes memory once several large tiles decompress at once.

Downstream Ecological Integration

The validated DTM becomes the geometric baseline for the rest of the analysis. Subtracting the DTM from the matching DSM yields the normalized height surface that drives Canopy Height Model Creation, and the same terrain surface feeds hydrological routing — richdem and whitebox derive flow direction, flow accumulation, and catchment boundaries directly from it. Topographic derivatives such as slope, aspect, Terrain Ruggedness Index, and Topographic Wetness Index then inform habitat suitability, soil-moisture, and erosion models across heterogeneous landscapes. For the full algorithm-selection treatment of the rasterization step, see Generating high-res DTM from ALS data.

Frequently Asked Questions

Should I use CSF or SMRF for forested terrain?

Use CSF on gently rolling broadleaf or mixed stands where a draped cloth settles cleanly, and SMRF where slope is steep or breaks are abrupt (karst, terraces, road cuts) because its explicit slope tolerance handles sharp transitions that make a cloth sag. When in doubt, run both on a representative tile and compare RMSE against control points.

What resolution should my DTM be?

Match the cell size to ground-return density: with roughly one ground return per square metre, a 1 m grid leaves most cells populated. Going finer than your return density produces a raster full of voids that fill-interpolation then has to invent. For wide-area hydrology, a 1–2 m DTM is usually sufficient and far cheaper to store and process.

Why is my DTM systematically higher than the field survey?

A positive mean error almost always means low vegetation or noise is leaking into the ground class. Tighten the CSF threshold (or lower SMRF threshold), confirm outlier removal ran during preprocessing, and re-check the validation bias. Selecting output_type="max" instead of "min" produces the same upward shift.

How do I keep the DTM aligned with my other rasters?

Rasterize every layer to the same CRS, origin, and resolution. When scripting, derive the transform with rasterio.transform.from_bounds so pixel edges align to the native coordinate grid, which eliminates the sub-pixel shifts that compound during raster algebra against vegetation indices and canopy models.

Can I generate a DTM without a separate ground-classification step?

Only if the tiles already carry valid ASPRS Class 2 labels from the vendor. Even then, validate the classification before trusting it — vendor ground tags are often tuned for general mapping, not for the dense-understory conditions that ecological work demands, so re-running a filter with forest-tuned parameters frequently lowers RMSE.

LiDAR Point Cloud Preprocessing — the upstream noise-removal and ingestion stage this pipeline assumes.
Generating high-res DTM from ALS data — interpolation algorithm selection in depth.
Canopy Height Model Creation — subtracts this DTM from the DSM to derive vegetation height.
Forest Gap & Understory Analysis — uses the normalized surface built on this terrain baseline.
Coordinate Reference Systems for Forestry — getting CRS and datum right before you rasterize.

Up: Canopy Height Modeling & Terrain Extraction

Explore this section

Generating a High-Resolution DTM from ALS Data: A Production PDAL Workflow