NT
N.H.C. Tomassen
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
Decentralized learning allows data owners to collaboratively train machine learning models without relying on a central server, making it attractive for privacy sensitive and distributed environments. However, despite keeping data on-premises, model updates exchanged between peers can still leak private information about the underlying dataset. In particular, Membership Inference Attacks (MIAs) allow an adversary to determine whether a specific data sample was used in the training process, posing a significant privacy risk. Differential privacy (DP) is a common defense against this leakage, which injects carefully calibrated noise into model updates, but this inevitably hurts utility. Recent studies have shown that chunking, where model updates are split into chunks and only a subset of chunks is shared to neighboring nodes, can also mitigate leakage. Existing approaches include topology-aware chunking, where the number of chunks for a specific node is dependent on the communication topology, and topology-independent fixed-𝐾 chunking, where a fixed number of chunks 𝐾 is used for all nodes. However, it remains unclear how the underlying topology influences the effectiveness of such defenses.
This thesis investigates whether topology-aware chunking can improve the privacy-utility tradeoff compared to topology-independent chunking strategies. We study decentralized image classification on CIFAR-100 across several communication topologies, including ring, star, grid, fully connected, 𝑑-regular, and Erdős-Rényi graphs. Privacy leakage is measured through the accuracy of the MIA (Area Under the Curve), while utility is measured by global test accuracy. The results show that the effectiveness of topology-aware chunking is strongly influenced by the underlying
communication graph. Without defenses, MIA AUC remains high across all graph families (around 0.97-0.99). Topology-aware chunking reduces leakage significantly in dense graphs, for example, lowering AUC to 0.61 in the fully connected graph, but introduces uneven protection for sparse or heterogeneous topologies, where low-degree nodes remain vulnerable.
Compared to topology-aware chunking, topology-independent fixed-𝐾 chunking proves to be a stronger and more uniform graph-independent baseline. It often achieves equal or better privacy-utility tradeoffs, especially in utility-focused settings. To address the key limitation of topology-aware chunking, we propose ChunkDP, a defense that combines topology-aware chunking with degree-scaled DP noise. ChunkDP improves over DP-only by recovering a portion of the lost accuracy while keeping leakage close to random guessing performance (AUC 0.53). We show that ChunkDP can outperform fixed-𝐾 chunking in balanced privacy-utility settings.
Overall, the results show that topology-awareness alone does not guarantee a better privacy-utility tradeoff. Its effectiveness depends on graph density, node degree, and the desired privacy-utility balance. Fixed-𝐾 remains a robust defense, while the topology-aware ChunkDP can be useful in balanced privacy-utility scenarios. ...
This thesis investigates whether topology-aware chunking can improve the privacy-utility tradeoff compared to topology-independent chunking strategies. We study decentralized image classification on CIFAR-100 across several communication topologies, including ring, star, grid, fully connected, 𝑑-regular, and Erdős-Rényi graphs. Privacy leakage is measured through the accuracy of the MIA (Area Under the Curve), while utility is measured by global test accuracy. The results show that the effectiveness of topology-aware chunking is strongly influenced by the underlying
communication graph. Without defenses, MIA AUC remains high across all graph families (around 0.97-0.99). Topology-aware chunking reduces leakage significantly in dense graphs, for example, lowering AUC to 0.61 in the fully connected graph, but introduces uneven protection for sparse or heterogeneous topologies, where low-degree nodes remain vulnerable.
Compared to topology-aware chunking, topology-independent fixed-𝐾 chunking proves to be a stronger and more uniform graph-independent baseline. It often achieves equal or better privacy-utility tradeoffs, especially in utility-focused settings. To address the key limitation of topology-aware chunking, we propose ChunkDP, a defense that combines topology-aware chunking with degree-scaled DP noise. ChunkDP improves over DP-only by recovering a portion of the lost accuracy while keeping leakage close to random guessing performance (AUC 0.53). We show that ChunkDP can outperform fixed-𝐾 chunking in balanced privacy-utility settings.
Overall, the results show that topology-awareness alone does not guarantee a better privacy-utility tradeoff. Its effectiveness depends on graph density, node degree, and the desired privacy-utility balance. Fixed-𝐾 remains a robust defense, while the topology-aware ChunkDP can be useful in balanced privacy-utility scenarios. ...
Decentralized learning allows data owners to collaboratively train machine learning models without relying on a central server, making it attractive for privacy sensitive and distributed environments. However, despite keeping data on-premises, model updates exchanged between peers can still leak private information about the underlying dataset. In particular, Membership Inference Attacks (MIAs) allow an adversary to determine whether a specific data sample was used in the training process, posing a significant privacy risk. Differential privacy (DP) is a common defense against this leakage, which injects carefully calibrated noise into model updates, but this inevitably hurts utility. Recent studies have shown that chunking, where model updates are split into chunks and only a subset of chunks is shared to neighboring nodes, can also mitigate leakage. Existing approaches include topology-aware chunking, where the number of chunks for a specific node is dependent on the communication topology, and topology-independent fixed-𝐾 chunking, where a fixed number of chunks 𝐾 is used for all nodes. However, it remains unclear how the underlying topology influences the effectiveness of such defenses.
This thesis investigates whether topology-aware chunking can improve the privacy-utility tradeoff compared to topology-independent chunking strategies. We study decentralized image classification on CIFAR-100 across several communication topologies, including ring, star, grid, fully connected, 𝑑-regular, and Erdős-Rényi graphs. Privacy leakage is measured through the accuracy of the MIA (Area Under the Curve), while utility is measured by global test accuracy. The results show that the effectiveness of topology-aware chunking is strongly influenced by the underlying
communication graph. Without defenses, MIA AUC remains high across all graph families (around 0.97-0.99). Topology-aware chunking reduces leakage significantly in dense graphs, for example, lowering AUC to 0.61 in the fully connected graph, but introduces uneven protection for sparse or heterogeneous topologies, where low-degree nodes remain vulnerable.
Compared to topology-aware chunking, topology-independent fixed-𝐾 chunking proves to be a stronger and more uniform graph-independent baseline. It often achieves equal or better privacy-utility tradeoffs, especially in utility-focused settings. To address the key limitation of topology-aware chunking, we propose ChunkDP, a defense that combines topology-aware chunking with degree-scaled DP noise. ChunkDP improves over DP-only by recovering a portion of the lost accuracy while keeping leakage close to random guessing performance (AUC 0.53). We show that ChunkDP can outperform fixed-𝐾 chunking in balanced privacy-utility settings.
Overall, the results show that topology-awareness alone does not guarantee a better privacy-utility tradeoff. Its effectiveness depends on graph density, node degree, and the desired privacy-utility balance. Fixed-𝐾 remains a robust defense, while the topology-aware ChunkDP can be useful in balanced privacy-utility scenarios.
This thesis investigates whether topology-aware chunking can improve the privacy-utility tradeoff compared to topology-independent chunking strategies. We study decentralized image classification on CIFAR-100 across several communication topologies, including ring, star, grid, fully connected, 𝑑-regular, and Erdős-Rényi graphs. Privacy leakage is measured through the accuracy of the MIA (Area Under the Curve), while utility is measured by global test accuracy. The results show that the effectiveness of topology-aware chunking is strongly influenced by the underlying
communication graph. Without defenses, MIA AUC remains high across all graph families (around 0.97-0.99). Topology-aware chunking reduces leakage significantly in dense graphs, for example, lowering AUC to 0.61 in the fully connected graph, but introduces uneven protection for sparse or heterogeneous topologies, where low-degree nodes remain vulnerable.
Compared to topology-aware chunking, topology-independent fixed-𝐾 chunking proves to be a stronger and more uniform graph-independent baseline. It often achieves equal or better privacy-utility tradeoffs, especially in utility-focused settings. To address the key limitation of topology-aware chunking, we propose ChunkDP, a defense that combines topology-aware chunking with degree-scaled DP noise. ChunkDP improves over DP-only by recovering a portion of the lost accuracy while keeping leakage close to random guessing performance (AUC 0.53). We show that ChunkDP can outperform fixed-𝐾 chunking in balanced privacy-utility settings.
Overall, the results show that topology-awareness alone does not guarantee a better privacy-utility tradeoff. Its effectiveness depends on graph density, node degree, and the desired privacy-utility balance. Fixed-𝐾 remains a robust defense, while the topology-aware ChunkDP can be useful in balanced privacy-utility scenarios.
Uncovering the Secrets of the Maven Repository
Analysis of Library Sizes in Maven Central
This research explores the size variations of artifacts in Maven Central, a repository containing a large collection of Java artifacts. This analysis sheds light on the coding habits and dependency management ecosystems within Maven Central, emphasizing the importance of managing artifact sizes effectively. It also provides valuable insights to library maintainers and clients who want to download libraries. For example, we can determine the average amount of space required to download 100 libraries.
The analysis is done by selecting a single version for each artifact in Maven Central and extracting metadata from the corresponding files.
The results reveal that the average size of an artifact is 1447 KB, although this average is heavily influenced by a few exceptionally large artifacts. Approximately 86% of the artifacts have a size smaller than 400 KB, indicating that the majority of artifacts are relatively lightweight.
The large artifacts identified in the analysis are predominantly attributed to two categories. The first category contains extensive projects with a substantial number of files, while the second category includes machine learning or big data projects that include massive data files. ...
The analysis is done by selecting a single version for each artifact in Maven Central and extracting metadata from the corresponding files.
The results reveal that the average size of an artifact is 1447 KB, although this average is heavily influenced by a few exceptionally large artifacts. Approximately 86% of the artifacts have a size smaller than 400 KB, indicating that the majority of artifacts are relatively lightweight.
The large artifacts identified in the analysis are predominantly attributed to two categories. The first category contains extensive projects with a substantial number of files, while the second category includes machine learning or big data projects that include massive data files. ...
This research explores the size variations of artifacts in Maven Central, a repository containing a large collection of Java artifacts. This analysis sheds light on the coding habits and dependency management ecosystems within Maven Central, emphasizing the importance of managing artifact sizes effectively. It also provides valuable insights to library maintainers and clients who want to download libraries. For example, we can determine the average amount of space required to download 100 libraries.
The analysis is done by selecting a single version for each artifact in Maven Central and extracting metadata from the corresponding files.
The results reveal that the average size of an artifact is 1447 KB, although this average is heavily influenced by a few exceptionally large artifacts. Approximately 86% of the artifacts have a size smaller than 400 KB, indicating that the majority of artifacts are relatively lightweight.
The large artifacts identified in the analysis are predominantly attributed to two categories. The first category contains extensive projects with a substantial number of files, while the second category includes machine learning or big data projects that include massive data files.
The analysis is done by selecting a single version for each artifact in Maven Central and extracting metadata from the corresponding files.
The results reveal that the average size of an artifact is 1447 KB, although this average is heavily influenced by a few exceptionally large artifacts. Approximately 86% of the artifacts have a size smaller than 400 KB, indicating that the majority of artifacts are relatively lightweight.
The large artifacts identified in the analysis are predominantly attributed to two categories. The first category contains extensive projects with a substantial number of files, while the second category includes machine learning or big data projects that include massive data files.