Circular Image

M.J. Pronk

info

Please Note

1 records found

Master thesis (2026) - H.B. Rotteveel, H. Ledoux, M.J. Pronk
Accurate data on inland water surface elevations are becoming increasingly important as climate change intensifies extreme weather events worldwide. Classic data collection methods on water are often expensive and both spatially and temporally limited, making water data largely concentrated in the global north. Ice, Cloud and Land Elevation Satellite 2 (ICESat-2) and its advanced Light Detection And Ranging (LiDAR) instrument are, however, well-suited for collecting data as it is capable of measuring elevation with centimeter-level accuracy across the planet. Its existing inland water data products, ATL13 and ATL22 are, however, constrained by their reliance on water masks that exclude lakes and reservoirs smaller than 0.1 km2 and rivers narrower than 50 m.

The goal of this thesis is therefore to create a Random Forest (RF) model using ICESat-2 data that can predict the presence of inland water, specifically for bodies smaller than 25 m. A set of window-based features that encode the interaction between photons and water was derived at multiple window radii. These features try to quantify and characterize the presence of afterpulses, bottom reflectances, a low slope, a low distribution of photon elevation, and a high photon density. The features were identified by first de-correlating all features using Ward’s linkage clustering and then selecting the best using a Mean Decrease in Impurity (MDI) and permutation importance score. Photon density proved to be the best predicting feature, contributing 41.2% of the total mean decrease in impurity in the model.

The final random forest model used a 2.5 m window and was trained on 7 million points from water segments smaller than ≤ 25 m and an equal number of land points in the Netherlands. The model was evaluated on approximately 520 million ICESat-2 photons across the country and achieved a recall above 80.0% for water segments longer than 6 m and up to 87.1% for water bodies between 10 m and 25 m. Potential improvements of using the features from multiple windows or selecting only windows with a minimum number of photons present proved to be ineffective in gaining better results.

The main sources of misclassifications are currently the presence of snow, uncertainty near water edges, and incorrect ground truth data. Manually validating results in the Swiss Alps, Greenland, and a Mexican mangrove forest showed promising results. The RF performs reasonably well outside its training environment, suggesting that the model identified general photon-water interactions rather than region-specific characteristics. Future work should focus on expanding the training dataset to include geographically diverse data to improve performance and combining the classified photons using clustering to create line segments of water surface elevation. ...