Do I need to reproject before normalizing a LiDAR point cloud?

Only if the horizontal or vertical datum does not match your analysis grid. Normalization itself is datum-agnostic, but a vertical-datum mismatch produces a constant offset that masquerades as a height bias. Apply filters.reprojection with the correct compound CRS before ground filtering.

Can normalization run in the same PDAL pipeline as ground classification?

Yes. filters.outlier, filters.csf, and filters.hag_nn chain in a single pipeline so the cloud is read once and HeightAboveGround is written in one pass. Checkpointing the classified cloud to disk only helps when you want to reuse it for a separate DTM export.

Normalizing LiDAR Point Clouds with PDAL: Height-Above-Ground Workflows

Normalizing LiDAR point clouds with PDAL converts absolute elevations referenced to a geodetic datum into heights above ground level (HAGL). This transformation is non-negotiable for accurate canopy work: without it, a point at 350 m ASL inside a 30-metre stand is indistinguishable from a bare-ground point at 350 m ASL in flat terrain. This page is one focused recipe inside LiDAR Point Cloud Preprocessing, which itself sits within the wider Canopy Height Modeling & Terrain Extraction workflow. It names the exact PDAL filters — filters.csf, filters.hag_nn, and filters.hag_delaunay — and the parameters that decide whether your normalized cloud carries clean height-above-ground or systematic artifacts. Executed through PDAL’s declarative pipeline architecture, normalization becomes reproducible, memory-efficient, and fully scriptable within Python-based ecological workflows.

Normalization subtracts the classified ground surface from every return, so a 30 m tree reads 30 m HAGL whether it sits on flat ground or a steep ravine.

When to use this approach

Height-above-ground normalization is the right move whenever a downstream metric must be slope-invariant — canopy height percentiles, gap fraction, vertical complexity, and aboveground biomass all assume each return’s value is measured from the local ground, not from sea level. PDAL is the right tool when you need this to be deterministic, scriptable, and able to chain straight into rasterization. Two normalization filters ship with PDAL, and the choice between them is dictated by ground-return density and tile size.

Scenario	Recommended filter	Why	Avoid
Dense ground returns (4+ pts/m²), large tiles	`filters.hag_nn` with `count=1`	Nearest-ground interpolation is fast and memory-light; dense ground makes single-neighbour lookup accurate	`hag_delaunay` (TIN build is needless overhead here)
Sparse ground (<2 pts/m²), smooth terrain	`filters.hag_delaunay`	A TIN interpolates a continuous surface across gaps, avoiding bullseye artifacts around isolated ground points	`hag_nn count=1` (produces stepped, bullseye heights)
Noisy understory, mixed conifer–broadleaf	`filters.hag_nn` with `count=3`–`5`	Averaging several nearest ground returns smooths residual classification noise	`hag_delaunay` on huge tiles (memory exhaustion)
You only need a gridded bare-earth surface, not per-point heights	Neither — rasterize instead	Feed classified ground into Digital Terrain Model Generation and grid it	Normalizing when no per-point HAGL is needed

If your goal is a continuous canopy surface rather than height-tagged points, the normalized cloud you build here is the direct input to Canopy Height Model Creation; normalize first, rasterize the height dimension second.

Spatial constraints and coordinate reference validation

Before initiating ground classification, validate spatial metadata. PDAL reads Vertical Reference System (VRS) metadata from LAS/LAZ Variable Length Records (VLRs), but inconsistent datum definitions or mixed vertical units (feet vs. metres) propagate silently through pipelines, introducing systematic height offsets. Audit the header with pdal info --summary input.laz and confirm that srs.horizontal and srs.vertical are populated and correct. If vertical units are in feet, use filters.assign to scale the Z dimension before classification:

{
  "type": "filters.assign",
  "value": "Z = Z * 0.3048"
}

Horizontal and vertical datums must align with your ecological analysis grid. If the source data uses NAVD88 (orthometric) but your downstream GIS expects WGS84 ellipsoidal heights, apply filters.reprojection prior to ground filtering. Misaligned datums cause canopy models to drift by 10–40 metres in mountainous terrain, invalidating all subsequent biomass calculations. Getting CRS and datum correct up front is exactly the discipline covered in Coordinate Reference Systems for Forestry.

Minimal reproducible example

PDAL normalizes point clouds through a chained, declarative JSON pipeline. The production-ready sequence for forested environments follows three sequential operations: statistical outlier removal, ground point classification, and height-above-ground computation. filters.hag_nn (Height Above Ground — nearest neighbour) computes HAGL for every point by interpolating from the classified ground surface.

The JSON below encodes these five stages in order; outlier removal must precede CSF so noise is never misclassified as ground.

{
  "pipeline": [
    {
      "type": "readers.las",
      "filename": "input_tile.laz"
    },
    {
      "type": "filters.outlier",
      "method": "statistical",
      "mean_k": 12,
      "multiplier": 2.2
    },
    {
      "type": "filters.csf",
      "ignore": "Classification[7:7]",
      "resolution": 1.0,
      "threshold": 0.5,
      "rigidness": 3,
      "iterations": 500,
      "step": 0.65,
      "classify": true
    },
    {
      "type": "filters.hag_nn",
      "count": 1
    },
    {
      "type": "writers.las",
      "filename": "normalized_output.laz",
      "extra_dims": "all",
      "forward": "all"
    }
  ]
}

filters.hag_nn adds a HeightAboveGround extra dimension to every point by querying the count nearest ground-classified neighbours (Class 2) and interpolating the surface beneath each point. Setting count=1 gives the nearest-ground interpolation; count=3 or higher averages more neighbours and smooths the surface but is slower. An alternative, filters.hag_delaunay, constructs a TIN from ground returns before interpolating — it is more accurate in areas with sparse ground density at the cost of higher memory use.

Ground classification and parameter tuning

Ground filtering in dense, multi-layered forests fails when default parameters are applied. The filters.smrf (Simple Morphological Filter) performs adequately on gentle, open slopes but struggles with steep ravines or dense shrub layers due to its fixed window morphology. For temperate and tropical canopies, filters.csf (Cloth Simulation Filter) is preferred due to its tunable cloth rigidity and slope tolerance. A full side-by-side of these two ground filters lives in Digital Terrain Model Generation; here the goal is simply a clean Class 2 surface to subtract.

In areas exceeding 30° gradient, reduce threshold to 0.3 to prevent low vegetation from being classified as ground. Increasing rigidness to 3 prevents the simulated cloth from sagging into canopy voids, which artificially inflates ground elevation and compresses normalized tree heights. Misconfigured CSF parameters are the primary cause of negative HAGL values in understory returns — a common artifact that breaks downstream biomass allometric equations. For rigorous validation, consult the USGS 3D Elevation Program (3DEP) LiDAR Base Specification regarding ground classification accuracy thresholds.

Exporting an explicit DTM alongside the normalized cloud

For workflows requiring a DTM GeoTIFF as well as the normalized cloud, chain writers.gdal after classification, isolating Class 2 with filters.range so only ground returns reach the grid:

{
  "pipeline": [
    "ground_classified.laz",
    {
      "type": "filters.range",
      "limits": "Classification[2:2]"
    },
    {
      "type": "writers.gdal",
      "filename": "dtm_1m.tif",
      "resolution": 1.0,
      "output_type": "min",
      "data_type": "float32",
      "gdalopts": "COMPRESS=DEFLATE,PREDICTOR=2"
    }
  ]
}

Parameter reference

CSF parameter names in PDAL differ slightly from the original CSF library; the table below lists the height-relevant arguments with dense-forest starting values and the ecological reason each matters.

Parameter	Stage	Type	Default	Recommended range	Ecological rationale
`mean_k`	`filters.outlier`	int	8	8–16	Neighbours used in the statistical distance test; raise it for dense surveys so isolated noise is caught before it can pose as ground
`multiplier`	`filters.outlier`	float	2.0	2.0–2.5	Standard-deviation cutoff; too low erases legitimate steep-slope ground returns
`resolution`	`filters.csf`	float	1.0	0.5–1.0 (raise to 1.5–2.0 if sparse)	Cloth grid spacing; coarsen in sparse ground to force broader aggregation and suppress bullseye artifacts
`threshold`	`filters.csf`	float	0.5	0.25–0.5 (0.3 on >30° slope)	Max cloth-to-point distance classified as ground; tighten so low vegetation is not pulled into Class 2
`rigidness`	`filters.csf`	int	1	3	Cloth stiffness (1 = flexible, 3 = rigid); 3 stops the cloth sagging into canopy voids and inflating ground height
`iterations`	`filters.csf`	int	500	500	Simulation steps; 500 converges on most forest tiles
`step`	`filters.csf`	float	0.65	0.65	Time-step per iteration of the cloth simulation
`count`	`filters.hag_nn`	int	1	1–5	Nearest ground neighbours averaged per point; raise to smooth residual classification noise, lower for speed

In low-density acquisitions, two mitigations keep the interpolated ground surface honest: increase resolution in filters.csf to 1.5–2.0 m to force broader ground-point aggregation, and always run filters.outlier before CSF so noise points are never misclassified as ground and dragged into the height surface.

Expected output and verification

A correct run emits a LAZ that carries a HeightAboveGround extra dimension on every point, with minimum values near 0 for bare-ground returns and maximum values bounded by plausible canopy height for the stand type. Validate before trusting the cloud downstream:

import json
import pdal

stats = {
    "pipeline": [
        "normalized_output.laz",
        {"type": "filters.stats", "dimensions": "HeightAboveGround"},
    ]
}

pipeline = pdal.Pipeline(json.dumps(stats))
pipeline.execute()
hag = pipeline.metadata["metadata"]["filters.stats"]["statistic"][0]

print(f"min HAGL: {hag['minimum']:.2f} m   max HAGL: {hag['maximum']:.2f} m")

# Bare ground should floor near zero; a strongly negative minimum means
# the cloth sagged or low noise was misclassified as ground.
assert hag["minimum"] > -1.0, "negative HAGL — tighten CSF threshold / run outlier first"
assert hag["maximum"] < 80.0, "implausible canopy height — check vertical datum"

Running pdal info --stats normalized_output.laz reports the same HeightAboveGround statistics from the command line. Cross-check the minimum against known flat terrain or GNSS-surveyed ground control points: a systematic offset there usually traces to a vertical-datum mismatch rather than to classification error.

Common pitfalls

Negative HAGL in the understory. The cloth sagged into canopy gaps, or low vegetation was misclassified as Class 2. Raise rigidness to 3 and reduce threshold to 0.3 on steep terrain so shrub returns stay out of the ground class.
Spiky canopy heights. Noise points were classified as ground before CSF ran. Always place filters.outlier (statistical) ahead of filters.csf so a single high return cannot anchor the ground surface.
Missing HeightAboveGround dimension in the output. The extra dimension was not forwarded by the writer. Set extra_dims="all" and forward="all" in writers.las.
Memory exhaustion on large tiles. filters.hag_delaunay builds a TIN from every ground return at once. Switch to filters.hag_nn with count=1, or split the tile with filters.splitter before normalization.

Python integration and memory management

For ecological research pipelines, PDAL’s Python bindings enable batch processing and dynamic parameter injection. The following pattern demonstrates memory-efficient tile processing:

import pdal
import json
import pathlib

def normalize_tile(input_laz: pathlib.Path, output_laz: pathlib.Path) -> int:
    """
    Normalize a single LAZ tile: outlier removal -> CSF ground classification
    -> height-above-ground computation -> write with HeightAboveGround dimension.

    Returns the number of points processed.
    """
    pipeline_def = {
        "pipeline": [
            {"type": "readers.las", "filename": str(input_laz)},
            {
                "type": "filters.outlier",
                "method": "statistical",
                "mean_k": 12,
                "multiplier": 2.2,
            },
            {
                "type": "filters.csf",
                "resolution": 1.0,
                "threshold": 0.5,
                "rigidness": 3,
                "iterations": 500,
                "step": 0.65,
                "classify": True,
            },
            {"type": "filters.hag_nn", "count": 1},
            {
                "type": "writers.las",
                "filename": str(output_laz),
                "extra_dims": "all",
                "forward": "all",
            },
        ]
    }

    pipeline = pdal.Pipeline(json.dumps(pipeline_def))
    count = pipeline.execute()
    return count


# Batch across a directory of tiles
tiles = sorted(pathlib.Path("raw_tiles").glob("*.laz"))
for tile in tiles:
    out = pathlib.Path("normalized") / tile.name
    n = normalize_tile(tile, out)
    print(f"{tile.name}: {n:,} points normalized -> {out}")

For multi-core environments, wrap normalize_tile in concurrent.futures.ProcessPoolExecutor to parallelize across tiles. PDAL handles its own internal memory pooling per pipeline execution, so each worker operates independently without shared-state issues. Detailed filter syntax is documented in the PDAL Filters Reference. Once tiles are normalized, the height-tagged cloud feeds straight into Forest Gap & Understory Analysis for gap detection and vertical complexity metrics.

Frequently Asked Questions

Should I use filters.hag_nn or filters.hag_delaunay?

Use hag_nn with count=1 for dense ground returns and large tiles — it is fast and memory-light. Switch to hag_delaunay when ground returns are sparse (<2 pts/m²) and you see bullseye artifacts around isolated points, because the TIN interpolates a continuous surface across gaps. On very large tiles hag_delaunay can exhaust memory, so split first or stay with hag_nn.

Why are some of my height-above-ground values negative?

Negative HAGL means the interpolated ground surface ended up above some non-ground returns, usually from the cloth bridging across a gap or residual low outliers misclassified as Class 2. Raise CSF rigidness to 3, lower threshold toward 0.3, and confirm filters.outlier ran before classification.

Do I need to reproject before normalizing?

Only if the horizontal or vertical datum does not match your analysis grid. Normalization itself is datum-agnostic — it subtracts a local ground surface — but a vertical-datum mismatch (ellipsoidal vs. orthometric) produces a constant offset that masquerades as a height bias. Apply filters.reprojection with the correct compound CRS before ground filtering, never after.

Can normalization run in the same pipeline as ground classification?

Yes, and it should. filters.outlier, filters.csf, and filters.hag_nn chain in a single PDAL pipeline so the cloud is read once and the HeightAboveGround dimension is written in one pass. Checkpointing the classified cloud to disk is only worthwhile when you want to reuse it for a separate DTM export.

How do I keep the HeightAboveGround dimension in the output file?

Set both extra_dims="all" and forward="all" on writers.las. The HAGL value is stored as an extra dimension, not a core LAS field, so a writer left at defaults silently drops it and downstream tools see an unnormalized cloud.

LiDAR Point Cloud Preprocessing — the parent workflow this normalization step belongs to
Canopy Height Modeling & Terrain Extraction — the full point-cloud-to-canopy pipeline
Digital Terrain Model Generation — rasterize the classified ground surface this step subtracts
Canopy Height Model Creation — turn the height-tagged cloud into a continuous canopy surface
Forest Gap & Understory Analysis — structural analysis built on the normalized cloud

Up: LiDAR Point Cloud Preprocessing · Canopy Height Modeling & Terrain Extraction