T. Jia | TU Delft Repository

Self-supervised learning for multi-label sewer defect classification

Journal article (2026) - Tugba Yildizli, Tianlong Jia, Jeroen Langeveld, Riccardo Taormina

Automated sewer defect detection has advanced through deep learning, particularly supervised methods using CCTV images, but based on large annotated datasets. This paper proposes a semi-supervised learning (SSL) approach to reduce labeling demands. The method comprises self-supervised pre-training on unlabeled images using SwAV (Swapping Assignments between multiple Views) followed by fine-tuning for multi-label classification. Experiments on the Sewer-ML dataset demonstrate that the SSL approach, trained on only 35k labeled images, achieves an F1-score of 69.11%, and F2_CIW of 54.22%, surpassing the fully supervised baseline trained from scratch on 1.04 million images. Increasing the unlabeled pre-training data further enhances performance, while ImageNet initialization consistently outperforms training from scratch. Self-supervised learning also helps mitigate the effects of mislabeled data, which is observed to be present even in the Sewer-ML ground truth. Overall, self-supervised learning provides an accurate, scalable, and cost-effective alternative to fully supervised approaches, particularly in data-scarce or imperfectly labeled scenarios. ...

Exploring transferability of plastic-water hyacinth interaction and detection in rivers

Journal article (2026) - Giel W.A. Hagenbeek, Tim H.M. van Emmerik, Tianlong Jia, Pummarin Khamdahsag, Kittiphon Boonma, Riccardo Taormina, Thomas Mani, Marc Rußwurm

Rivers are major pathways for plastic pollution to oceans, with high emissions in tropical regions. Research in the Saigon River showed that invasive water hyacinths (WHs) can trap macroplastics and serve as proxies for detecting river plastic using remote sensing. We explore this phenomenon and its detection methods transferability to the Chao Phraya River. Along a 62.1 km river course, WHs trapped an average of 32% of floating plastics, reaching local maxima of 78%, comparable to 54%–82% in the Saigon. Plastic concentration in WHs was 59 times higher than in open water, increasing downstream. Object detection models transferred well for WHs and entangled plastics (Chao Phraya: mAP50 = 68% and 54%; Saigon River: mAP50 = 70% and 52%) but poorly for free-floating plastics (23% vs. 48%). Physical sampling found 14 times more plastics within WHs than imagery, highlighting WHs’ role in trapping plastics and their potential for monitoring and targeted clean-up efforts. ...

Deep learning-based Methods for Detecting and Quantifying floating litter in Riverine Environments

Doctoral thesis (2025) - T. Jia, R. Taormina, Z. Kapelan

Litter, particularly plastic, accumulating in water bodies is a challenging environmental issue that affects ecosystems, human health and the economy. Rivers are the main pathways of land-based plastic waste to the ocean, but they also act as potential temporary and long-termplastic sinks, where significant amounts of plastic waste accumulate, and even remain trapped for decades. The detection and quantification of floating litter in rivers and urban waterways is thus essential for evaluating pollution levels and informing mitigation actions. However, traditional monitoring methods, such as sampling with nets and booms, are not suitable for large-scale structured monitoring across multiple geographic locations in extensive river systems. Deep Learning (DL) methods have shown great promise in automatic detection and quantification of floating litter from images or videos. Given that this specific field is still in its early stages, this thesis aims to enhance the understanding of DL-based litter detection and quantification in riverine environments, identify key knowledge gaps, and explore methodologies to address these gaps and drive further advancements in this field.... ...

Semi-supervised learning-based identification of the attachment between sludge and microparticles in wastewater treatment

Journal article (2025) - Tianlong Jia, Jing Yu, Ao Sun, Yipeng Wu, Shuo Zhang, Zhaoxu Peng

Monitoring the microparticle transfer process in wastewater treatment systems is crucial for improving treatment performance. Supervised deep learning methods show high performance to automatically detect particles, but they rely on vast amounts of labeled data for training. To overcome this issue, we proposed a semi-supervised learning (SSL) method based on the Simple framework for Contrastive Learning of visual Representations (SimCLR), to detect microparticles free from sludge and attached to sludge. First, we pre-trained a ResNet50 backbone by SimCLR, to extract features from much unlabeled data (1,000 images). Then, we constructed a Mask R-CNN architecture based on the pre-trained ResNet50, and fine-tuned it on a small quantity of labeled data (≈200 images with ≈600 annotated particles) in supervised learning fashion. We showcased its performance and practical applicability for microscopy images obtained from the water lab of TU Delft. The results demonstrate that the SSL methods obtain a significant improvement in mean average precision of up to 5% compared to the conventional supervised learning method, when a limited amount of labeled data is available (e.g., 91 labeled images). Furthermore, these methods improve the average precision for detecting attached particles by over 12%. With the detection results from the SSL methods, we measured the attachment efficiency of microparticles to sludge under varying mixed liquor suspended solids concentration and aeration intensity. The precise measurements demonstrate the effectiveness and practical applicability of the SSL method in facilitating long-term monitoring of particle transfer processes in biological wastewater treatment systems. ...

A semi-supervised learning-based framework for quantifying litter fluxes in river systems

Journal article (2025) - Tianlong Jia, Riccardo Taormina, Rinze de Vries, Zoran Kapelan, Tim H.M. van Emmerik, Paul Vriend, Imke Okkerman

Supervised deep learning methods have been widely employed to detect floating macroplastic litter (>5 mm) in (fresh)water bodies. However, few studies used them to quantify floating litter fluxes in rivers with wide cross-sections, that is important for pollution assessment. Additionally, commonly used supervised learning (SL) models rely on extensive labeled data, that is time-consuming and expensive to obtain. Moreover, regardless of the model type, current deep learning models for litter detection usually fail to correctly identify small litter items. To overcome these issues, we propose a semi-supervised learning (SSL)-based framework combined with Slicing Aided Hyper Inference (SAHI) for quantifying cross-sectional floating litter fluxes in rivers. The framework includes four steps: (a) collecting camera images of river surfaces from multiple locations across the river, (b) developing a robust litter detection model using SSL, (c) applying this model with SAHI to detect litter items in images, and (d) post-processing the detection results to quantify fluxes. The SSL method involves: (i) self-supervised pre-training of a ResNet50 on a large amount of unlabeled data, and (ii) supervised fine-tuning of a Faster R-CNN with the ResNet50 backbone on a limited amount of labeled data. We evaluated the in-domain detection performance of SSL models with varying pre-training epochs and pre-training dataset sizes, using images from waterways of The Netherlands, Indonesia and Vietnam, that were used for model pre-training and fine-tuning. Additionally, we assessed the zero-shot out-of-domain detection performance of SSL models and litter flux quantification performance of the proposed framework on a Vietnam case study, that was not used for model development. We benchmarked our results against the SL methods and human visual counting. The results show that SSL models benefit from longer pre-training time and larger pre-training dataset, achieving an in-domain F1-score increase of 0.2 and a zero-shot out-of-domain increase of up to 0.14, over baseline SL benchmarks. Furthermore, the SAHI method correctly identifies 45 additional small litter items (areas < 1,000 cm2), improving the F1-score by up to 0.19, compared to the results obtained without SAHI. The flux measurement results indicate that the SSL-based framework substantially underestimates fluxes by a factor of 3–4 compared to human measurements, due to missed detections of transparent litter items and items entrapped in water hyacinths. However, it estimates nearly twice the fluxes of the baseline SL-based framework, aligning more closely with human measurements. These findings highlight the potential of SSL-based framework to enhance litter flux measurement. Scaling it with broader datasets could significantly advance global-scale litter monitoring systems. ...

Supervised deep learning methods have been widely employed to detect floating macroplastic litter (>5 mm) in (fresh)water bodies. However, few studies used them to quantify floating litter fluxes in rivers with wide cross-sections, that is important for pollution assessment. Additionally, commonly used supervised learning (SL) models rely on extensive labeled data, that is time-consuming and expensive to obtain. Moreover, regardless of the model type, current deep learning models for litter detection usually fail to correctly identify small litter items. To overcome these issues, we propose a semi-supervised learning (SSL)-based framework combined with Slicing Aided Hyper Inference (SAHI) for quantifying cross-sectional floating litter fluxes in rivers. The framework includes four steps: (a) collecting camera images of river surfaces from multiple locations across the river, (b) developing a robust litter detection model using SSL, (c) applying this model with SAHI to detect litter items in images, and (d) post-processing the detection results to quantify fluxes. The SSL method involves: (i) self-supervised pre-training of a ResNet50 on a large amount of unlabeled data, and (ii) supervised fine-tuning of a Faster R-CNN with the ResNet50 backbone on a limited amount of labeled data. We evaluated the in-domain detection performance of SSL models with varying pre-training epochs and pre-training dataset sizes, using images from waterways of The Netherlands, Indonesia and Vietnam, that were used for model pre-training and fine-tuning. Additionally, we assessed the zero-shot out-of-domain detection performance of SSL models and litter flux quantification performance of the proposed framework on a Vietnam case study, that was not used for model development. We benchmarked our results against the SL methods and human visual counting. The results show that SSL models benefit from longer pre-training time and larger pre-training dataset, achieving an in-domain F1-score increase of 0.2 and a zero-shot out-of-domain increase of up to 0.14, over baseline SL benchmarks. Furthermore, the SAHI method correctly identifies 45 additional small litter items (areas < 1,000 cm2), improving the F1-score by up to 0.19, compared to the results obtained without SAHI. The flux measurement results indicate that the SSL-based framework substantially underestimates fluxes by a factor of 3–4 compared to human measurements, due to missed detections of transparent litter items and items entrapped in water hyacinths. However, it estimates nearly twice the fluxes of the baseline SL-based framework, aligning more closely with human measurements. These findings highlight the potential of SSL-based framework to enhance litter flux measurement. Scaling it with broader datasets could significantly advance global-scale litter monitoring systems.

Corrigendum to “Advancing deep learning-based acoustic leak detection methods towards application for water distribution systems from a data-centric perspective” [Water Research 261(2024) 121999]

Journal article (2024) - Yipeng Wu, Xingke Ma, Guancheng Guo, Tianlong Jia, Yujun Huang, Shuming Liu, Jingjing Fan, Xue Wu

The authors regret the implementation order of data augmentation and data splitting was incorrectly stated. Data augmentation should be implemented after data splitting. While the correct implementation order and its impacts on leakage detection performance were accurately discussed in Section 3.2 “Biased results caused by data leakage”, there were errors in the highlights, abstract, and conclusions sections. The corrections are as follows: 1. The second highlight should be corrected to “Data augmentation after splitting prevents biased results due to data leakage.”2. In the abstract, the corresponding sentence should be corrected to “Results indicate the importance of implementing data augmentation after data splitting to prevent data leakage and overly optimistic outcomes.”3. In the second paragraph of the conclusions, the first sentence should be corrected to “It is recommended to implement data augmentation after data splitting to avoid data leakage, which could lead to biased and overly optimistic results.”The authors would like to apologise for any inconvenience caused. ...

Detecting the interaction between microparticles and biomass in biological wastewater treatment process with Deep Learning method

Journal article (2024) - Tianlong Jia, Zhaoxu Peng, Jing Yu, Antonella L. Piaggio, Shuo Zhang, Merle K. de Kreuk

Investigating the interaction between influent particles and biomass is basic and important for the biological wastewater treatment. The micro-level methods allow for this, such as the microscope image analysis method with the conventional ImageJ processing software. However, these methods are cost and time-consuming, and require a large amount of work on manual parameter tuning. To deal with this problem, we proposed a deep learning (DL) method to automatically detect and quantify microparticles free from biomass and entrapped in biomass from microscope images. Firstly, we introduced a “TU Delft-Interaction between Particles and Biomass” dataset containing labeled microscope images. Then, we built DL models using this dataset with seven state-of-the-art model architectures for a instance segmentation task, such as Mask R-CNN, Cascade Mask R-CNN, Yolact and YOLOv8. The results show that the Cascade Mask R-CNN with ResNet50 backbone achieves promising detection accuracy, with a mAP50_box and mAP50_mask of 90.6 % on the test set. Then, we benchmarked our results against the conventional ImageJ processing method. The results show that the DL method significantly outperforms the ImageJ processing method in terms of detection accuracy and processing cost. The DL method shows a 13.8 % improvement in micro-average precision, and a 21.7 % improvement in micro-average recall, compared to the ImageJ method. Moreover, the DL method can process 70 images within 1 min, while the ImageJ method costs at least 6 h. The promising performance of our method allows it to offer a potential alternative to examine the interaction between microparticles and biomass in biological wastewater treatment process in an affordable manner. This approach offers more useful insights into the treatment process, enabling further reveal the microparticles transfer in biological treatment systems. ...

Detecting floating litter in freshwater bodies with semi-supervised deep learning

Journal article (2024) - Tianlong Jia, Rinze de Vries, Zoran Kapelan, Tim H.M. van Emmerik, Riccardo Taormina

Researchers and practitioners have extensively utilized supervised Deep Learning methods to quantify floating litter in rivers and canals. These methods require the availability of large amount of labeled data for training. The labeling work is expensive and laborious, resulting in small open datasets available in the field compared to the comprehensive datasets for computer vision, e.g., ImageNet. Fine-tuning models pre-trained on these larger datasets helps improve litter detection performances and reduces data requirements. Yet, the effectiveness of using features learned from generic datasets is limited in large-scale monitoring, where automated detection must adapt across different locations, environmental conditions, and sensor settings. To address this issue, we propose a two-stage semi-supervised learning method to detect floating litter based on the Swapping Assignments between multiple Views of the same image (SwAV). SwAV is a self-supervised learning approach that learns the underlying feature representation from unlabeled data. In the first stage, we used SwAV to pre-train a ResNet50 backbone architecture on about 100k unlabeled images. In the second stage, we added new layers to the pre-trained ResNet50 to create a Faster R-CNN architecture, and fine-tuned it with a limited number of labeled images (≈1.8k images with 2.6k annotated litter items). We developed and validated our semi-supervised floating litter detection methodology for images collected in canals and waterways of Delft (the Netherlands) and Jakarta (Indonesia). We tested for out-of-domain generalization performances in a zero-shot fashion using additional data from Ho Chi Minh City (Vietnam), Amsterdam and Groningen (the Netherlands). We benchmarked our results against the same Faster R-CNN architecture trained via supervised learning alone by fine-tuning ImageNet pre-trained weights. The findings indicate that the semi-supervised learning method matches or surpasses the supervised learning benchmark when tested on new images from the same training locations. We measured better performances when little data (≈200 images with about 300 annotated litter items) is available for fine-tuning and with respect to reducing false positive predictions. More importantly, the proposed approach demonstrates clear superiority for generalization on the unseen locations, with improvements in average precision of up to 12.7%. We attribute this superior performance to the more effective high-level feature extraction from SwAV pre-training from relevant unlabeled images. Our findings highlight a promising direction to leverage semi-supervised learning for developing foundational models, which have revolutionized artificial intelligence applications in most fields. By scaling our proposed approach with more data and compute, we can make significant strides in monitoring to address the global challenge of litter pollution in water bodies. ...

Researchers and practitioners have extensively utilized supervised Deep Learning methods to quantify floating litter in rivers and canals. These methods require the availability of large amount of labeled data for training. The labeling work is expensive and laborious, resulting in small open datasets available in the field compared to the comprehensive datasets for computer vision, e.g., ImageNet. Fine-tuning models pre-trained on these larger datasets helps improve litter detection performances and reduces data requirements. Yet, the effectiveness of using features learned from generic datasets is limited in large-scale monitoring, where automated detection must adapt across different locations, environmental conditions, and sensor settings. To address this issue, we propose a two-stage semi-supervised learning method to detect floating litter based on the Swapping Assignments between multiple Views of the same image (SwAV). SwAV is a self-supervised learning approach that learns the underlying feature representation from unlabeled data. In the first stage, we used SwAV to pre-train a ResNet50 backbone architecture on about 100k unlabeled images. In the second stage, we added new layers to the pre-trained ResNet50 to create a Faster R-CNN architecture, and fine-tuned it with a limited number of labeled images (≈1.8k images with 2.6k annotated litter items). We developed and validated our semi-supervised floating litter detection methodology for images collected in canals and waterways of Delft (the Netherlands) and Jakarta (Indonesia). We tested for out-of-domain generalization performances in a zero-shot fashion using additional data from Ho Chi Minh City (Vietnam), Amsterdam and Groningen (the Netherlands). We benchmarked our results against the same Faster R-CNN architecture trained via supervised learning alone by fine-tuning ImageNet pre-trained weights. The findings indicate that the semi-supervised learning method matches or surpasses the supervised learning benchmark when tested on new images from the same training locations. We measured better performances when little data (≈200 images with about 300 annotated litter items) is available for fine-tuning and with respect to reducing false positive predictions. More importantly, the proposed approach demonstrates clear superiority for generalization on the unseen locations, with improvements in average precision of up to 12.7%. We attribute this superior performance to the more effective high-level feature extraction from SwAV pre-training from relevant unlabeled images. Our findings highlight a promising direction to leverage semi-supervised learning for developing foundational models, which have revolutionized artificial intelligence applications in most fields. By scaling our proposed approach with more data and compute, we can make significant strides in monitoring to address the global challenge of litter pollution in water bodies.

Advancing deep learning-based acoustic leak detection methods towards application for water distribution systems from a data-centric perspective

Journal article (2024) - Yipeng Wu, Xingke Ma, Guancheng Guo, Tianlong Jia, Yujun Huang, Shuming Liu, Jingjing Fan, Xue Wu

Against the backdrop of severe leakage issue in water distribution systems (WDSs), numerous researchers have focused on the development of deep learning-based acoustic leak detection technologies. However, these studies often prioritize model development while neglecting the importance of data. This research explores the impact of data augmentation techniques on enhancing deep learning-based acoustic leak detection methods. Five random transformation-based methods—jittering, scaling, warping, iterated amplitude adjusted Fourier transform (IAAFT), and masking—are proposed. Jittering, scaling, warping, and IAAFT directly process original signals, while masking operating on time-frequency spectrograms. Acoustic signals from a real-world WDS are augmented, and the efficacy is validated using convolutional neural network classifiers to identify the spectrograms of acoustic signals. Results indicate the importance of implementing data augmentation before data splitting to prevent data leakage and overly optimistic outcomes. Among the techniques, IAAFT stands out, significantly increasing data volume and diversity, improving recognition accuracy by over 7%. Masking enhances performance mainly by compelling the classifier to learn global features of the spectrograms. Sequential application of IAAFT and masking further strengthens leak detection performance. Furthermore, when applying a complex model to acoustic leakage detection through transfer learning, data augmentation can also enhance the effectiveness of transfer learning. These findings advance artificial intelligence-driven acoustic leak detection technology from a data-centric perspective towards more mature applications. ...

Advancing deep learning-based detection of floating litter using a novel open dataset

Journal article (2023) - Tianlong Jia, Andre Jehan Vallendar, Rinze de Vries, Zoran Kapelan, Riccardo Taormina

Supervised Deep Learning (DL) methods have shown promise in monitoring the floating litter in rivers and urban canals but further advancements are hard to obtain due to the limited availability of relevant labeled data. To address this challenge, researchers often utilize techniques such as transfer learning (TL) and data augmentation (DA). However, there is no study currently reporting a rigorous evaluation of the effectiveness of these approaches for floating litter detection and their effects on the models' generalization capability. To overcome the problem of limited data availability, this work introduces the “TU Delft—Green Village” dataset, a novel labeled dataset of 9,473 camera and phone images of floating macroplastic litter and other litter items, captured using experiments in a drainage canal of TU Delft. We use the new dataset to conduct a thorough evaluation of the detection performance of five DL architectures for multi-class image classification. We focus the analysis on a systematic evaluation of the benefits of TL and DA on model performances. Moreover, we evaluate the generalization capability of these models for unseen litter items and new device settings, such as increasing the cameras' height and tilting them to 45°. The results obtained show that, for the specific problem of floating litter detection, fine-tuning all layers is more effective than the common approach of fine-tuning the classifier alone. Among the tested DA techniques, we find that simple image flipping boosts model accuracy the most, while other methods have little impact on the performance. The SqueezeNet and DenseNet121 architectures perform the best, achieving an overall accuracy of 89.6 and 91.7%, respectively. We also observe that both models retain good generalization capability which drops significantly only for the most complex scenario tested, but the overall accuracy raises significantly to around 75% when adding a limited amount of images to training data, combined with flipping augmentation. The detailed analyses conducted here and the released open source dataset offer valuable insights and serve as a precious resource for future research. ...

Supervised Deep Learning (DL) methods have shown promise in monitoring the floating litter in rivers and urban canals but further advancements are hard to obtain due to the limited availability of relevant labeled data. To address this challenge, researchers often utilize techniques such as transfer learning (TL) and data augmentation (DA). However, there is no study currently reporting a rigorous evaluation of the effectiveness of these approaches for floating litter detection and their effects on the models' generalization capability. To overcome the problem of limited data availability, this work introduces the “TU Delft—Green Village” dataset, a novel labeled dataset of 9,473 camera and phone images of floating macroplastic litter and other litter items, captured using experiments in a drainage canal of TU Delft. We use the new dataset to conduct a thorough evaluation of the detection performance of five DL architectures for multi-class image classification. We focus the analysis on a systematic evaluation of the benefits of TL and DA on model performances. Moreover, we evaluate the generalization capability of these models for unseen litter items and new device settings, such as increasing the cameras' height and tilting them to 45°. The results obtained show that, for the specific problem of floating litter detection, fine-tuning all layers is more effective than the common approach of fine-tuning the classifier alone. Among the tested DA techniques, we find that simple image flipping boosts model accuracy the most, while other methods have little impact on the performance. The SqueezeNet and DenseNet121 architectures perform the best, achieving an overall accuracy of 89.6 and 91.7%, respectively. We also observe that both models retain good generalization capability which drops significantly only for the most complex scenario tested, but the overall accuracy raises significantly to around 75% when adding a limited amount of images to training data, combined with flipping augmentation. The detailed analyses conducted here and the released open source dataset offer valuable insights and serve as a precious resource for future research.

PHyL v1.0

A parallel, flexible, and advanced software for hydrological and slope stability modeling at a regional scale

Journal article (2023) - Guoding Chen, Ke Zhang, Sheng Wang, Tianlong Jia

Physically-based hydrological-geotechnical modeling at large scales is difficult, especially due to the time-consuming nature of flow routing and 3D soil stability models. Although parallelization techniques are commonly used for each model individually, there is currently no concurrent parallelization strategy for both. This study proposed an open-source, Parallelized, and modular modeling software for regional Hydrologic processes and Landslides simulation and prediction (PHyL v1.0). It offers parallel computation in both hydrological and 3D slope stability modules, cross-scale modeling ability via a soil moisture downscaling method, and advanced input/output (I/O) and post-processing visualization. Additionally, PHyL v1.0 is flexible and extensible, making it compatible with all mainstream operating systems. We applied PHyL v1.0 in the Yuehe River Basin, where the computational efficiencies, parallel performance, parameter sensitivity analysis, and predictive capabilities were evaluated. The PHyL v1.0 is therefore appropriately used as an advanced software for high-resolution and complex simulations of regional floods and landslides. ...

Deep learning for detecting macroplastic litter in water bodies

A review

Review (2023) - Tianlong Jia, Zoran Kapelan, Rinze de Vries, Paul Vriend, Eric Copius Peereboom, Imke Okkerman, Riccardo Taormina

Plastic pollution in water bodies is an unresolved environmental issue that damages all aquatic environments, and causes economic and health problems. Accurate detection of macroplastic litter (plastic items >5 mm) in water is essential to estimate the quantities, compositions and sources, identify emerging trends, and design preventive measures or mitigation strategies. In recent years, researchers have demonstrated the potential of computer vision (CV) techniques based on deep learning (DL) for automated detection of macroplastic litter in water bodies. However, a systematic review to describe the state-of-the-art of the field is lacking. Here we provide such a review, and we highlight current knowledge gaps and suggest promising future research directions. The review compares 34 papers with respect to their application and modeling related criteria. The results show that the researchers have employed a variety of DL architectures implementing different CV techniques to detect macroplastic litter in various aquatic environments. However, key knowledge gaps must be addressed to overcome the lack of: (i) DL-based macroplastic litter detection models with sufficient generalization capability, (ii) DL-based quantification of macroplastic (mass) fluxes and hotspots and (iii) scalable macroplastic litter monitoring strategies based on robust DL-based quantification. We advocate for the exploration of data-centric artificial intelligence approaches and semi-supervised learning to develop models with improved generalization capabilities. These models can boost the development of new methods for the quantification of macroplastic (mass) fluxes and hotspots, and allow for structural monitoring strategies that leverage robust DL-based quantification. While the identified gaps concern all bodies of water, we recommend increased efforts with respect to riverine ecosystems, considering their major role in transport and storage of litter. ...

Plastic pollution in water bodies is an unresolved environmental issue that damages all aquatic environments, and causes economic and health problems. Accurate detection of macroplastic litter (plastic items >5 mm) in water is essential to estimate the quantities, compositions and sources, identify emerging trends, and design preventive measures or mitigation strategies. In recent years, researchers have demonstrated the potential of computer vision (CV) techniques based on deep learning (DL) for automated detection of macroplastic litter in water bodies. However, a systematic review to describe the state-of-the-art of the field is lacking. Here we provide such a review, and we highlight current knowledge gaps and suggest promising future research directions. The review compares 34 papers with respect to their application and modeling related criteria. The results show that the researchers have employed a variety of DL architectures implementing different CV techniques to detect macroplastic litter in various aquatic environments. However, key knowledge gaps must be addressed to overcome the lack of: (i) DL-based macroplastic litter detection models with sufficient generalization capability, (ii) DL-based quantification of macroplastic (mass) fluxes and hotspots and (iii) scalable macroplastic litter monitoring strategies based on robust DL-based quantification. We advocate for the exploration of data-centric artificial intelligence approaches and semi-supervised learning to develop models with improved generalization capabilities. These models can boost the development of new methods for the quantification of macroplastic (mass) fluxes and hotspots, and allow for structural monitoring strategies that leverage robust DL-based quantification. While the identified gaps concern all bodies of water, we recommend increased efforts with respect to riverine ecosystems, considering their major role in transport and storage of litter.