J. Wen | TU Delft Repository

Anticipating daily human actions

Comparing pipelines for long-term skeleton-based prediction in real-world scenarios

Journal article (2026) - Junhan Wen, Xucong Zhang, Jouh Yeong Chew

Human action anticipation remains a key challenge to achieve efficient human-robot interaction due to the difficulties to learn the higher level of abstraction. This work explores three action anticipation pipelines as a guideline for future work. Specifically, two pipelines adopt a top-down approach: they recognize current actions and then anticipate future actions using either traditional machine learning models or Large Language Models (LLMs). The third pipeline follows a bottom-up strategy by first forecasting future motions and then inferring actions. Our results show that top-down pipelines achieve higher accuracy and robustness, demonstrating the advantage of abstract reasoning over direct motion-based inference. ...

Performance and interaction assessment of neural network architectures and bivariate smart predict-then-optimize

Journal article (2025) - Junhan Wen, Thomas Abeel, Mathijs de Weerdt

Smart “predict, then optimize” (SPO) (Elmachtoub in Manag Sci 68(1): 9–26, 2022) is an end-to-end learning strategy for models that predict parameters in optimization problems. Unlike minimizing mean squared error (MSE) which cares about prediction accuracies, SPO aims to ensure that predictions lead to the best possible decisions. The associated loss function, termed SPO loss, measures the decision’s regret from optimal outcomes with parameter realizations. Existing literature has demonstrated the viability of SPO, however, these studies often focus on classical optimization problems and employ a limited set of models for benchmarking. In this study, we tackled a decision-making task inspired by real-world challenges across a wide range of neural network models. Unlike classical problems, our task requires a unique approach: collaboratively training two models to predict different variables. On top of that, one of the decision variables also affects the feasibility of the decisions, further increasing the complexity. While our implementation validates the benefits of SPO, we were surprised to find that models trained exclusively on SPO loss do not consistently attain the minimum regret. Our further investigation into hyperparameters illustrates that the well-tuned models learned very similar patterns from the feature set, irrespective of whether MSE or SPO loss was used. In other words, the change from MSE to SPO loss in training primarily affected the layer biases. Therefore, to improve the learning efficacy with SPO loss, we propose prioritizing learning feature patterns as the fundamental step. Possible strategies include using specialized neural network layers to capture deeper patterns more effectively or simply warming up by training with MSE. Specifically, a warming-up process is particularly advantageous for model(s) where the outputs are closely tied to constraints, as their prediction accuracy significantly impacts the decision feasibility. The insights are investigated empirically through two real-world trading scenarios. By leveraging datasets with diverse properties, we demonstrate the novelty and generalizability of our investigation. ...

Smart “predict, then optimize” (SPO) (Elmachtoub in Manag Sci 68(1): 9–26, 2022) is an end-to-end learning strategy for models that predict parameters in optimization problems. Unlike minimizing mean squared error (MSE) which cares about prediction accuracies, SPO aims to ensure that predictions lead to the best possible decisions. The associated loss function, termed SPO loss, measures the decision’s regret from optimal outcomes with parameter realizations. Existing literature has demonstrated the viability of SPO, however, these studies often focus on classical optimization problems and employ a limited set of models for benchmarking. In this study, we tackled a decision-making task inspired by real-world challenges across a wide range of neural network models. Unlike classical problems, our task requires a unique approach: collaboratively training two models to predict different variables. On top of that, one of the decision variables also affects the feasibility of the decisions, further increasing the complexity. While our implementation validates the benefits of SPO, we were surprised to find that models trained exclusively on SPO loss do not consistently attain the minimum regret. Our further investigation into hyperparameters illustrates that the well-tuned models learned very similar patterns from the feature set, irrespective of whether MSE or SPO loss was used. In other words, the change from MSE to SPO loss in training primarily affected the layer biases. Therefore, to improve the learning efficacy with SPO loss, we propose prioritizing learning feature patterns as the fundamental step. Possible strategies include using specialized neural network layers to capture deeper patterns more effectively or simply warming up by training with MSE. Specifically, a warming-up process is particularly advantageous for model(s) where the outputs are closely tied to constraints, as their prediction accuracy significantly impacts the decision feasibility. The insights are investigated empirically through two real-world trading scenarios. By leveraging datasets with diverse properties, we demonstrate the novelty and generalizability of our investigation.

“From iMage to Market”: Machine-Learning-Empowered Fruit Supply

Doctoral thesis (2025) - J. Wen, M.M. de Weerdt, Thomas Abeel

Artificial intelligence (AI) has become a widely discussed and transformative technology, with its adoption growing across industries to drive insights and impact. In this thesis, we explore how AI methods and algorithms can facilitate the operation of soft-fruit supply chains, using strawberries as a case study.

The thesis begins by presenting the general background and various perspectives from related works on how AI and machine learning (ML) have been applied to address problems in agricultural or horticultural practices. This includes tasks that, while not directly optimizing supply strategies, still contribute to solving broader challenges. In a nutshell, this thesis categorizes the scope of study into three scales: the single-fruit scale, the greenhouse scale, and the market scale. Within each scale, we review the existing research, identify knowledge gaps, and introduce robust and applicable methodologies capable of dealing with real-world conditions.

Since no publicly available datasets met the requirements of the research plan, we established several datasets for research on the soft-fruit supply chain through collecting, annotating, and (pre-)processing data. These newly curated datasets not only support the research presented in this thesis but also lay a foundation for future research from various perspectives. Details about these datasets are introduced in Chapter 2. Moreover, we conceptualize the process of gathering longitudinal observations from growth monitoring images as a multiple object tracking (MOT) task. We named the image collection and their MOT annotations as ``The Growing Strawberries (GSD)''. The computer vision challenge that GSD brings are further benchmarked and discussed in Chapter 3. Following this, the core contributions of the thesis is presented from Chapter 3 to 6, each corresponding to a published paper or one currently under review. Finally, Chapter 7 summarizes the research findings, answering the research questions proposed in Chapter 1 and discussing the overall work of the thesis.

We discuss these contributions for each of the three mentioned scales separately:

At the fruit scale, we designed and analyzed novel methodologies to keep track of the fruit growths and to predict key properties, including both external characteristics like ripeness and internal qualities such as sweetness. For the ripeness, we propose to use appearance properties, mainly the hue, as an objective metric to quantify it. For the sweetness, we trained deep neural networks to perform non-destructive prediction using environmental and image data, individually and integrally.

Our employment of color analysis and ML models provides a non-destructive and generalizable manner that ensures consistency when upstream and downstream parties in a supply chain estimate the properties of fruits. Meanwhile, the models perform comparatively with laboratory benchmarks even under imperfect, outdoor data collection. We further demonstrated the model in a mobile app to further facilitate adoption in the field.

By benchmarking state-of-the-art MOT algorithms on GSD, we illustrated the new challenges that are brought by this use case: first, the MOT objects change appearance during the tracking due to their biological development, and second, sparse frame rates introduce irregular movements from image to image. We showcased how fruit properties, such as ripeness, change over its life cycle. The results not only provide quantitative measurements that describe the fruit's biological development, but also depict the pain points of current MOT algorithms' predictions. In the meantime, by quantifying these changes over the biological development, we also retrieval relevant information and datasets to support predictions of the changes.

At the greenhouse scale, we designed a framework that optimizes the timing of fruit harvesting by integrating the aforementioned quantified changes over biological development, based on sequential demands about the desired quantities to be harvested. Essentially, the framework makes fruit-specific decisions on dates of harvests by leveraging the monitoring data. The decisions are thus made to enhance both current and future demand-fulfillment capabilities. At each stage of this framework, we evaluated various methods and discussed their effectiveness in achieving the stage targets. For example, how to process the infield data to achieve coherent functions about the ripeness development, how to predict future changes, how to include different perspectives in the optimization model, and etc. As the decisions are made for each specific fruit, the work also demonstrates significant potential for integration with mobile apps and harvesting robots. On top of that, the information retrieval function can also serve as a standalone application to provide objective fruit-level quality assessment.

At the market scale, we focus on the portfolio optimization of a grower under a widely applied mechanism of the market system: the majority of demands for harvests are predetermined through advance contracts, which also serves as an a priori condition of the solution proposed at the greenhouse level. The local market, with dynamic prices and demands, can be used to save losses from the difference in contracted demands and the actual yield. To mitigate outlying decision failures, we introduced the ``smart predict-then-optimize (SPO)'' method, which trains models to predict future yield and local market prices.
Our results illustrate that SPO loss primarily affects the bias layer in neural networks, contrasting with models trained using mean squared error (MSE). This difference essentially leads to more conservative estimations in decision-making scenarios, and also motivates and highlights the importance of effective MSE-based pre-training. Additionally, our study reveals how SPO loss makes models interact when multiple neural networks are trained to predict decision parameters with diverse functions. This insight expands the applicability of SPO loss across a broader range of use cases and model architectures, underscoring its contribution to the field of decision-focused learning.

In conclusion, this thesis introduces diverse data-driven methodologies to tackle the distinct tasks involved in optimizing fruit supply, using strawberries as a case study. Central to our approach is the effective utilization of data, which serves as the foundation for solutions that span from fruit-level evaluations to market-level planning. By leveraging analytics of non-destructive data, our solutions provide objective estimations of fruit quality, fostering a more consistent shared understanding between sellers and buyers while reducing potential food waste. Overall, these advancements push the boundaries of AI in supporting decision-making during the supply of soft fruits, particularly for smaller growers. The findings not only empower more efficient and sustainable supply chain operations but also highlight the strong potential for many practical real-world applications. ...

Artificial intelligence (AI) has become a widely discussed and transformative technology, with its adoption growing across industries to drive insights and impact. In this thesis, we explore how AI methods and algorithms can facilitate the operation of soft-fruit supply chains, using strawberries as a case study.

The thesis begins by presenting the general background and various perspectives from related works on how AI and machine learning (ML) have been applied to address problems in agricultural or horticultural practices. This includes tasks that, while not directly optimizing supply strategies, still contribute to solving broader challenges. In a nutshell, this thesis categorizes the scope of study into three scales: the single-fruit scale, the greenhouse scale, and the market scale. Within each scale, we review the existing research, identify knowledge gaps, and introduce robust and applicable methodologies capable of dealing with real-world conditions.

Since no publicly available datasets met the requirements of the research plan, we established several datasets for research on the soft-fruit supply chain through collecting, annotating, and (pre-)processing data. These newly curated datasets not only support the research presented in this thesis but also lay a foundation for future research from various perspectives. Details about these datasets are introduced in Chapter 2. Moreover, we conceptualize the process of gathering longitudinal observations from growth monitoring images as a multiple object tracking (MOT) task. We named the image collection and their MOT annotations as ``The Growing Strawberries (GSD)''. The computer vision challenge that GSD brings are further benchmarked and discussed in Chapter 3. Following this, the core contributions of the thesis is presented from Chapter 3 to 6, each corresponding to a published paper or one currently under review. Finally, Chapter 7 summarizes the research findings, answering the research questions proposed in Chapter 1 and discussing the overall work of the thesis.

We discuss these contributions for each of the three mentioned scales separately:

At the fruit scale, we designed and analyzed novel methodologies to keep track of the fruit growths and to predict key properties, including both external characteristics like ripeness and internal qualities such as sweetness. For the ripeness, we propose to use appearance properties, mainly the hue, as an objective metric to quantify it. For the sweetness, we trained deep neural networks to perform non-destructive prediction using environmental and image data, individually and integrally.

Our employment of color analysis and ML models provides a non-destructive and generalizable manner that ensures consistency when upstream and downstream parties in a supply chain estimate the properties of fruits. Meanwhile, the models perform comparatively with laboratory benchmarks even under imperfect, outdoor data collection. We further demonstrated the model in a mobile app to further facilitate adoption in the field.

By benchmarking state-of-the-art MOT algorithms on GSD, we illustrated the new challenges that are brought by this use case: first, the MOT objects change appearance during the tracking due to their biological development, and second, sparse frame rates introduce irregular movements from image to image. We showcased how fruit properties, such as ripeness, change over its life cycle. The results not only provide quantitative measurements that describe the fruit's biological development, but also depict the pain points of current MOT algorithms' predictions. In the meantime, by quantifying these changes over the biological development, we also retrieval relevant information and datasets to support predictions of the changes.

At the greenhouse scale, we designed a framework that optimizes the timing of fruit harvesting by integrating the aforementioned quantified changes over biological development, based on sequential demands about the desired quantities to be harvested. Essentially, the framework makes fruit-specific decisions on dates of harvests by leveraging the monitoring data. The decisions are thus made to enhance both current and future demand-fulfillment capabilities. At each stage of this framework, we evaluated various methods and discussed their effectiveness in achieving the stage targets. For example, how to process the infield data to achieve coherent functions about the ripeness development, how to predict future changes, how to include different perspectives in the optimization model, and etc. As the decisions are made for each specific fruit, the work also demonstrates significant potential for integration with mobile apps and harvesting robots. On top of that, the information retrieval function can also serve as a standalone application to provide objective fruit-level quality assessment.

At the market scale, we focus on the portfolio optimization of a grower under a widely applied mechanism of the market system: the majority of demands for harvests are predetermined through advance contracts, which also serves as an a priori condition of the solution proposed at the greenhouse level. The local market, with dynamic prices and demands, can be used to save losses from the difference in contracted demands and the actual yield. To mitigate outlying decision failures, we introduced the ``smart predict-then-optimize (SPO)'' method, which trains models to predict future yield and local market prices.
Our results illustrate that SPO loss primarily affects the bias layer in neural networks, contrasting with models trained using mean squared error (MSE). This difference essentially leads to more conservative estimations in decision-making scenarios, and also motivates and highlights the importance of effective MSE-based pre-training. Additionally, our study reveals how SPO loss makes models interact when multiple neural networks are trained to predict decision parameters with diverse functions. This insight expands the applicability of SPO loss across a broader range of use cases and model architectures, underscoring its contribution to the field of decision-focused learning.

In conclusion, this thesis introduces diverse data-driven methodologies to tackle the distinct tasks involved in optimizing fruit supply, using strawberries as a case study. Central to our approach is the effective utilization of data, which serves as the foundation for solutions that span from fruit-level evaluations to market-level planning. By leveraging analytics of non-destructive data, our solutions provide objective estimations of fruit quality, fostering a more consistent shared understanding between sellers and buyers while reducing potential food waste. Overall, these advancements push the boundaries of AI in supporting decision-making during the supply of soft fruits, particularly for smaller growers. The findings not only empower more efficient and sustainable supply chain operations but also highlight the strong potential for many practical real-world applications.

The Growing Strawberries Dataset

Tracking Multiple Objects with Biological Development over an Extended Period

Conference paper (2024) - Junhan Wen, Camiel R. Verschoor, Chengming Feng, Irina Mona Epure, Thomas Abeel, Mathijs De Weerdt

Multiple Object Tracking (MOT) is a rapidly developing research field that targets precise and reliable tracking of objects. Unfortunately, most available MOT datasets typically contain short video clips only, disregarding the indispensable requirement for adequately capturing substantial long-term variations in real-world scenarios. Long-term MOT poses unique challenges due to changes in both the objects and the environment, which remain relatively unexplored. To fill the gap, we propose a time-lapse image dataset inspired by the growth monitoring of strawberries, dubbed The Growing Strawberries Dataset (GSD). The data was captured hourly by six cameras, covering a span of 16 months in 2021 and 2022. During this time, it encompassed a total of 24 plants in two separate greenhouses. The changes in appearance, weight, and position during the ripening process, along with variations in the illumination during data collection, distinguish the task from previous MOT research. These practical issues resulted in a drastic performance downgrade in the track identification and association tasks of state-of-the-art MOT algorithms. We believe The Growing Strawberries will provide a platform for evaluating such long-term MOT tasks and inspire future research. The dataset is available at https://doi.org/10.4121/e3b31ece-cc88-4638-be10-8ccdd4c5f2f7.v1. ...

“How sweet are your strawberries?”

Predicting sugariness using non-destructive and affordable hardware

Journal article (2023) - Junhan Wen, Thomas Abeel, Mathijs de Weerdt

Global soft fruit supply chains rely on trustworthy descriptions of product quality. However, crucial criteria such as sweetness and firmness cannot be accurately established without destroying the fruit. Since traditional alternatives are subjective assessments by human experts, it is desirable to obtain quality estimations in a consistent and non-destructive manner. The majority of research on fruit quality measurements analyzed fruits in the lab with uniform data collection. However, it is laborious and expensive to scale up to the level of the whole yield. The “harvest-first, analysis-second” method also comes too late to decide to adjust harvesting schedules. In this research, we validated our hypothesis of using in-field data acquirable via commodity hardware to obtain acceptable accuracies. The primary instance that the research concerns is the sugariness of strawberries, described by the juice’s total soluble solid (TSS) content (unit: °Brix or Brix). We benchmarked the accuracy of strawberry Brix prediction using convolutional neural networks (CNN), variational autoencoders (VAE), principal component analysis (PCA), kernelized ridge regression (KRR), support vector regression (SVR), and multilayer perceptron (MLP), based on fusions of image data, environmental records, and plant load information, etc. Our results suggest that: (i) models trained by environment and plant load data can perform reliable prediction of aggregated Brix values, with the lowest RMSE at 0.59; (ii) using image data can further supplement the Brix predictions of individual fruits from (i), from 1.27 to as low up to 1.10, but they by themselves are not sufficiently reliable. ...

Non-Destructive Infield Quality Estimation of Strawberries using Deep Architectures

Conference paper (2023) - Cees Jol, Junhan Wen, Jan van Gemert

Strawberries are profitable fruits, yet they have a short shelf life. Therefore, it is crucial to anticipate their quality and harvest them at the best time, which is vital not only for finding the appropriate market but also for minimizing food and economic waste. To this end, non-destructive strawberry quality measurements are useful. Much research is conducted on post-harvest strawberries: the fruits were only analyzed after harvesting and thus, these methods cannot be used to find a good time to harvest. Our research targets pre-harvest analysis for supporting the timing decisions of harvests. As such, we used an infield image dataset that was collected during the cultivation of strawberries. The images are labeled by quality assessments and measurements from post-harvest destructive tests. We evaluated deep learning for quality estimation and trained our algorithms to predict the ripeness, firmness, and sweetness of strawberries. Additionally, we applied depth estimation algorithms and shape inpainting models to estimate the size of strawberries using images. Our results demonstrate the feasibility of infield quality attribute prediction. ...