C.I. Ugwuoke | TU Delft Repository

Privacy Threats and Cryptographic Solutions to Genome Data Processing

Doctoral thesis (2021) - C.I. Ugwuoke

The genome is the blueprint of life and has a detailed genotype and phenotype description of any organism. This in itself attributes sensitivity to genetic data, be it in the biological or electronic format. The possibility of sequencing the genome has opened doors to further probing of the data in its electronic form. Post sequencing of the biological genome sample, the electronic genome is stored, processed, and transmitted for variety of purposes including but not limited to Medicare, research, solving crimes and entertainment. However, due to the sensitivity of the genome data, security and privacy of the electronic data is considered to be imperative. Owing to the privacy and security concerns associated with sharing genome data with third-party entities for processing, various secure and privacy-preserving solutions have been considered. Such scenarios include, a researcher obtains research data which includes genome of individuals, orwhen a healthcare institution outsources the genome of its patients to a cloud environment for storage and processing. In all of these scenarios, it is important that the utility (accuracy and efficiency) of the data is maintained while preserving privacy (confidentiality and unlinkability) simultaneously. In this thesis,we focus on maintaining data utility when processing electronic genome data as well as preserving the privacy of the individuals whose data are analysed. We employ privacy enhancing techniques such as secure multi-party computation and homomorphic encryption to existing problems and develop provably secure cryptographic protocols that are fit for purpose for each scenario. ...

PREDICT

Efficient Private Disease Susceptibility Testing in Direct-to-Consumer Model

Conference paper (2020) - Chibuike Ugwuoke, Zekeriya Erkin, Marcel Reinders, Reginald Lagendijk

Genome sequencing has rapidly advanced in the last decade, making it easier for anyone to obtain digital genomes at low costs from companies such as Helix, MyHeritage, and 23andMe. Companies now offer their services in a direct-to-consumer (DTC) model without the intervention of a medical institution. Thereby, providing people with direct services for paternity testing, ancestry testing and disease susceptibility testing (DST) to infer diseases' predisposition. Genome analyses are partly motivated by curiosity and people often want to partake without fear of privacy invasion. Existing privacy protection solutions for DST adopt cryptographic techniques to protect the genome of a patient from the party responsible for computing the analysis. Said techniques include homomorphic encryption, which can be computationally expensive and could take minutes for only a few single-nucleotide polymorphisms (SNPs). A predominant approach is a solution that computes DST over encrypted data, but the design depends on a medical unit and exposes test results of patients to the medical unit, making the design uncomfortable for privacy-aware individuals. Hence it is pertinent to have an efficient privacy-preserving DST solution with a DTC service. We propose a novel DTC model that protects the privacy of SNPs and prevents leakage of test results to any other party save for the genome owner. Conversely, we protect the privacy of the algorithms or trade secrets used by the genome analyzing companies. Our work utilizes a secure obfuscation technique in computing DST, eliminating expensive computations over encrypted data. Our approach significantly outperforms existing state-of-the-art solutions in runtime and scales linearly for equivalent levels of security. As an example, computing DST for 10,000 SNPs requires approximately 96 milliseconds on commodity hardware. With this efficient and privacy-preserving solution which is also simulation-based secure, we open possibilities for performing genome analyses on collectively shared data resources. ...

Genome sequencing has rapidly advanced in the last decade, making it easier for anyone to obtain digital genomes at low costs from companies such as Helix, MyHeritage, and 23andMe. Companies now offer their services in a direct-to-consumer (DTC) model without the intervention of a medical institution. Thereby, providing people with direct services for paternity testing, ancestry testing and disease susceptibility testing (DST) to infer diseases' predisposition. Genome analyses are partly motivated by curiosity and people often want to partake without fear of privacy invasion. Existing privacy protection solutions for DST adopt cryptographic techniques to protect the genome of a patient from the party responsible for computing the analysis. Said techniques include homomorphic encryption, which can be computationally expensive and could take minutes for only a few single-nucleotide polymorphisms (SNPs). A predominant approach is a solution that computes DST over encrypted data, but the design depends on a medical unit and exposes test results of patients to the medical unit, making the design uncomfortable for privacy-aware individuals. Hence it is pertinent to have an efficient privacy-preserving DST solution with a DTC service. We propose a novel DTC model that protects the privacy of SNPs and prevents leakage of test results to any other party save for the genome owner. Conversely, we protect the privacy of the algorithms or trade secrets used by the genome analyzing companies. Our work utilizes a secure obfuscation technique in computing DST, eliminating expensive computations over encrypted data. Our approach significantly outperforms existing state-of-the-art solutions in runtime and scales linearly for equivalent levels of security. As an example, computing DST for 10,000 SNPs requires approximately 96 milliseconds on commodity hardware. With this efficient and privacy-preserving solution which is also simulation-based secure, we open possibilities for performing genome analyses on collectively shared data resources.

ECONoMy

Ensemble collaborative learning using masking

Conference paper (2019) - Lars Van De Kamp, Chibuike Ugwuoke, Zekeriya Erkin

In a society where digital data has become ubiquitous and has been projected to continue in this trajectory for the foreseeable future, machine learning has become a dependable tool to aid in analyzing these big datasets. However, where the data or machine learning algorithms are considered to be privacy-sensitive, one is then faced with the challenge of preserving the utility of machine learning in a privacy-preserving setting. In this paper, we focus on a use case where decentralized parties have privately owned machine learning algorithms, and would want to jointly generate a public model while not violating the privacy of their individual models, and data. We present ECONoMy: a privacy-preserving protocol that supports collaborative learning using an ensemble technique. Set in an honest-but-curious security model, ECONoMy is lightweight and provides efficiency and privacy in settings with large participant such as with IoT devices. ...

Secure Fixed-point Division for Homomorphically Encrypted Operands

Conference paper (2018) - Chibuike Ugwuoke, Zekeriya Erkin, Inald Lagendijk

Due to privacy threats associated with computation of outsourced data, processing data on the encrypted domain has become a viable alternative. Secure computation of encrypted data is relevant for analysing datasets in areas (such as genome processing, private data aggregation, cloud computations) that require basic arithmetic operations. Performing division operation over-all encrypted inputs has not been achieved using homomorphic schemes in non-interactive modes. In interactive protocols, the cost of obtaining an encrypted quotient (from encrypted values) is computationally expensive. To the best of our knowledge, existing homomorphic solutions on encrypted division are often relaxed to consider public or private divisor. We acknowledge that there are other techniques such as secret sharing and garbled circuits adopted to compute secure division, but we are interested in homomorphic solutions. We propose an efficient and interactive two-party protocol that computes the fixed-point quotient of two encrypted inputs, using an efficient and secure comparison protocol as a sub-protocol. Our proposal provides a computational advantage, with a linear complexity in the digit precision of the quotient. We provide proof of security in the universally composable framework and complexity analyses. We present experimental results for two cryptosystem implementations in order to compare performance. An efficient prototype of our protocol is implemented using additive homomorphic scheme (Paillier), whereas a non-efficient fully-homomorphic scheme (BGV) version is equally presented as a proof of concept and analyses of our proposal. ...

Privacy-safe linkage analysis with homomorphic encryption

Conference paper (2017) - Chibuike Ugwuoke, Zekeriya Erkin, Reginald L. Lagendijk

Genetic data are important dataset utilised in genetic epidemiology to investigate biologically coded information within the human genome. Enormous research has been delved into in recent years in order to fully sequence and understand the genome. Personalised medicine, patient response to treatments and relationships between specific genes and certain characteristics such as phenotypes and diseases, are positive impacts of studying the genome, just to mention a few. The sensitivity, longevity and non-modifiable nature of genetic data make it even more interesting, consequently, the security and privacy for the storage and processing of genomic data beg for attention. A common activity carried out by geneticists is the association analysis between allele-allele, or even a genetic locus and a disease. We demonstrate the use of cryptographic techniques such as homomorphic encryption schemes and multiparty computations, how such analysis can be carried out in a privacy friendly manner. We compute a 3 × 3 contingency table, and then, genome analyses algorithms such as linkage disequilibrium (LD) measures, all on the encrypted domain. Our computation guarantees privacy of the genome data under our security settings, and provides up to 98:4% improvement, compared to an existing solution. ...

A Privacy-Preserving GWAS Computation with Homomorphic Encryption

Conference paper (2016) - Chibuike Ugwuoke, Zekeriya Erkin, Inald Lagendijk

The continuous decline in the cost of DNA sequencing has contributed bothpositive and negative feelings in the academia and research community. It hasnow become possible to harvest large amounts of genetic data, which researches believe their study will help improve preventive and personalised healthcare, better understanding of diseases and response to treatments. However, there are more information embedded in genes than are currently understood, just as a genomic data contains information of not just the owner, but relatives who might not subscribe to sharing them. Unrestricted access to genomic data can be privacy invasive, hence the urgent need to regulate access to them and develop protocols that would allow privacy-preserving techniques in both computations and analysis that involve these very sensitive data. In this work, we discuss how a careful combination of cryptographic primitives such as homomorphic encryption, can be used to privately implement common algorithms peculiar to genome-wide association studies (GWAS). This obviously comes at a cost, where we have to accommodate the trade-off between speed of computations and privacy. ...