GP
G.W.K. Paardekooper
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
3 records found
1
Transformer-Based Synthetic Relational Data
Closing the Gap Between Diffusion-Based and Transformer-Based Synthetic Relational Data Generation
Data sharing for research and industrial applications faces significant challenges due to privacy constraints and regulatory requirements, driving the need for high-quality synthetic alternatives.
Recent advances in synthetic data generation have demonstrated considerable success for single-table datasets, with emerging research extending these capabilities to multi-table relational scenarios.
While transformer and diffusion architectures achieve state-of-the-art performance in single-table generation, a notable performance gap emerges when applied to relational data, where diffusion approaches consistently outperform transformer-based methods.
This thesis examines the factors contributing to this performance difference, conducting an evaluation using multiple baselines across both single and relational tabular datasets, with REaLTabformer and ClavaDDPM as state-of-the-art transformer- and diffusion-based approaches, respectively.
Our investigation reveals that the performance can mainly be attributed to the inadequate processing of contextual relationships and suboptimal strategies for representing inter-table dependencies in transformer-based models.
To close this gap, we introduce two changes for transformer-based models: layer sharing to enhance parameter utilization and contextual encoding to better preserve the relational structure.
These changes provide insight into the key design principles behind effective synthetic relational data generation using transformer-based models, particularly the need for architectures that account for context and facilitate practical knowledge transfer.
The proposed methods result in substantial performance improvements, with a 1.52-fold improvement in Logistic Detection and a 1.94-fold reduction in the Discriminator Measure metric.
...
Recent advances in synthetic data generation have demonstrated considerable success for single-table datasets, with emerging research extending these capabilities to multi-table relational scenarios.
While transformer and diffusion architectures achieve state-of-the-art performance in single-table generation, a notable performance gap emerges when applied to relational data, where diffusion approaches consistently outperform transformer-based methods.
This thesis examines the factors contributing to this performance difference, conducting an evaluation using multiple baselines across both single and relational tabular datasets, with REaLTabformer and ClavaDDPM as state-of-the-art transformer- and diffusion-based approaches, respectively.
Our investigation reveals that the performance can mainly be attributed to the inadequate processing of contextual relationships and suboptimal strategies for representing inter-table dependencies in transformer-based models.
To close this gap, we introduce two changes for transformer-based models: layer sharing to enhance parameter utilization and contextual encoding to better preserve the relational structure.
These changes provide insight into the key design principles behind effective synthetic relational data generation using transformer-based models, particularly the need for architectures that account for context and facilitate practical knowledge transfer.
The proposed methods result in substantial performance improvements, with a 1.52-fold improvement in Logistic Detection and a 1.94-fold reduction in the Discriminator Measure metric.
...
Data sharing for research and industrial applications faces significant challenges due to privacy constraints and regulatory requirements, driving the need for high-quality synthetic alternatives.
Recent advances in synthetic data generation have demonstrated considerable success for single-table datasets, with emerging research extending these capabilities to multi-table relational scenarios.
While transformer and diffusion architectures achieve state-of-the-art performance in single-table generation, a notable performance gap emerges when applied to relational data, where diffusion approaches consistently outperform transformer-based methods.
This thesis examines the factors contributing to this performance difference, conducting an evaluation using multiple baselines across both single and relational tabular datasets, with REaLTabformer and ClavaDDPM as state-of-the-art transformer- and diffusion-based approaches, respectively.
Our investigation reveals that the performance can mainly be attributed to the inadequate processing of contextual relationships and suboptimal strategies for representing inter-table dependencies in transformer-based models.
To close this gap, we introduce two changes for transformer-based models: layer sharing to enhance parameter utilization and contextual encoding to better preserve the relational structure.
These changes provide insight into the key design principles behind effective synthetic relational data generation using transformer-based models, particularly the need for architectures that account for context and facilitate practical knowledge transfer.
The proposed methods result in substantial performance improvements, with a 1.52-fold improvement in Logistic Detection and a 1.94-fold reduction in the Discriminator Measure metric.
Recent advances in synthetic data generation have demonstrated considerable success for single-table datasets, with emerging research extending these capabilities to multi-table relational scenarios.
While transformer and diffusion architectures achieve state-of-the-art performance in single-table generation, a notable performance gap emerges when applied to relational data, where diffusion approaches consistently outperform transformer-based methods.
This thesis examines the factors contributing to this performance difference, conducting an evaluation using multiple baselines across both single and relational tabular datasets, with REaLTabformer and ClavaDDPM as state-of-the-art transformer- and diffusion-based approaches, respectively.
Our investigation reveals that the performance can mainly be attributed to the inadequate processing of contextual relationships and suboptimal strategies for representing inter-table dependencies in transformer-based models.
To close this gap, we introduce two changes for transformer-based models: layer sharing to enhance parameter utilization and contextual encoding to better preserve the relational structure.
These changes provide insight into the key design principles behind effective synthetic relational data generation using transformer-based models, particularly the need for architectures that account for context and facilitate practical knowledge transfer.
The proposed methods result in substantial performance improvements, with a 1.52-fold improvement in Logistic Detection and a 1.94-fold reduction in the Discriminator Measure metric.
A Data Management System for P3D
Building a data management system for a moulding company
Bachelor thesis
(2021)
-
Zeger Mouw, Abri Bharos, Tim Pelser, Erik Sennema, Gijs Paardekooper, A. Katsifodimos, O.W. Visser, J. Gross
P3D is an injection moulding company that uses a technology called PRIM® (Printed Injection Mould) to create products for its customers. Different from traditional moulding companies, P3D’s main business model resolves around an efficient workflow and fast delivery of products to its customers. At the moment P3D is able to quickly fabricate and deliver its products be-cause of a small team of skilled designers and engineers who have lots of expertise in the field of injection moulding. Unfortunately, relying on the expertise of its employees, a lot of data about the company’s workflow is memorised or written down. This poses a problem as P3D would like to grow in the future and will not be able to entirely rely on memorised or analogue data.The goal of this project is to create a data management system (DMS) that digitises information about P3D its workflow and saves it in an easily accessible centralised system.
...
P3D is an injection moulding company that uses a technology called PRIM® (Printed Injection Mould) to create products for its customers. Different from traditional moulding companies, P3D’s main business model resolves around an efficient workflow and fast delivery of products to its customers. At the moment P3D is able to quickly fabricate and deliver its products be-cause of a small team of skilled designers and engineers who have lots of expertise in the field of injection moulding. Unfortunately, relying on the expertise of its employees, a lot of data about the company’s workflow is memorised or written down. This poses a problem as P3D would like to grow in the future and will not be able to entirely rely on memorised or analogue data.The goal of this project is to create a data management system (DMS) that digitises information about P3D its workflow and saves it in an easily accessible centralised system.
Student report
(2020)
-
Victor Ionescu, Mike van der Meer, Bram van Kooten, Gijs Paardekooper, Jasper Teunissen, Mathijs de Weerdt, Jesse Mulderij
Currently, literature regarding Multiagent Path Finding (MAPF) does not give a broad enough overview of all the different approaches. Many papers are hard to read and require proper knowledge of MAPF. The goal of this report is to give a global overview of MAPF. To achieve this goal, we provide a detailed explanation of what MAPF problems look like, as well as giving a clear overview of the strength and weaknesses of different solutions. Besides this theoretical analysis, we also analyse and critique benchmarking performed by other researchers. Following all this, we conclude that the field of MAPF lacks agreement on terminology. Furthermore, performance analysis is limited to researchers choice, skewing research in their own favour.
...
Currently, literature regarding Multiagent Path Finding (MAPF) does not give a broad enough overview of all the different approaches. Many papers are hard to read and require proper knowledge of MAPF. The goal of this report is to give a global overview of MAPF. To achieve this goal, we provide a detailed explanation of what MAPF problems look like, as well as giving a clear overview of the strength and weaknesses of different solutions. Besides this theoretical analysis, we also analyse and critique benchmarking performed by other researchers. Following all this, we conclude that the field of MAPF lacks agreement on terminology. Furthermore, performance analysis is limited to researchers choice, skewing research in their own favour.