| 1 |
|
Evolutionary Optimization of Kernel Weights Improves Protein Complex Comembership Prediction
In recent years, more and more high-throughput data sources useful for protein complex prediction have become available (e.g., gene sequence, mRNA expression, and interactions). The integration of these different data sources can be challenging. Recently, it has been recognized that kernel-based classifiers are well suited for this task. However, the different kernels (data sources) are often combined using equal weights. Although several methods have been developed to optimize kernel weights, no large-scale example of an improvement in classifier performance has been shown yet. In this work, we employ an evolutionary algorithm to determine weights for a larger set of kernels by optimizing a criterion based on the area under the ROC curve. We show that setting the right kernel weights can indeed improve performance. We compare this to the existing kernel weight optimization methods (i.e., (regularized) optimization of the SVM criterion or aligning the kernel with an ideal kernel) and find that these do not result in a significant performance improvement and can even cause a decrease in performance. Results also show that an expert approach of assigning high weights to features with high individual performance is not necessarily the best strategy.
|
[PDF]
[Abstract]
|
| 2 |
|
Delineation of amplification, hybridization and location effects in microarray data yields better-quality normalization
Background: Oligonucleotide arrays have become one of the most widely used high-throughput tools in biology. Due to their sensitivity to experimental conditions, normalization is a crucial step when comparing measurements from these arrays. Normalization is, however, far from a solved problem. Frequently, we encounter datasets with significant technical effects that currently available methods are not able to correct.
Results: We show that by a careful decomposition of probe specific amplification, hybridization and array location effects, a normalization can be performed that allows for a much improved analysis of these data. Identification of the technical sources of variation between arrays has allowed us to build statistical models that are used to estimate how the signal of individual probes is affected, based on their properties. This enables a model-based normalization that is probe-specific, in contrast with the signal intensity distribution normalization performed by many current methods. Next to this, we propose a novel way of handling background correction, enabling the use of background information to weight probes during summarization. Testing of the proposed method shows a much improved detection of differentially expressed genes over earlier proposed methods, even when tested on (experimentally tightly controlled and replicated) spike-in datasets.
Conclusions: When a limited number of arrays are available, or when arrays are run in different batches, technical effects have a large influence on the measured expression of genes. We show that a detailed modelling and correction of these technical effects allows for an improved analysis in these situations.
|
[PDF]
[Abstract]
|
| 3 |
|
Exploring Sequence Characteristics Related to High- Level Production of Secreted Proteins in Aspergillus niger
Protein sequence features are explored in relation to the production of over-expressed extracellular proteins by fungi. Knowledge on features influencing protein production and secretion could be employed to improve enzyme production levels in industrial bioprocesses via protein engineering. A large set, over 600 homologous and nearly 2,000 heterologous fungal genes, were overexpressed in Aspergillus niger using a standardized expression cassette and scored for high versus no production. Subsequently, sequence-based machine learning techniques were applied for identifying relevant DNA and protein sequence features. The amino-acid composition of the protein sequence was found to be most predictive and interpretation revealed that, for both homologous and heterologous gene expression, the same features are important: tyrosine and asparagine composition was found to have a positive correlation with high-level production, whereas for unsuccessful production, contributions were found for methionine and lysine composition. The predictor is available online at http://bioinformatics.tudelft.nl/hipsec. Subsequent work aims at validating these findings by protein engineering as a method for increasing expression levels per gene copy.
|
[PDF]
[Abstract]
|
| 4 |
|
Detecting recurrent gene mutation in interaction network context using multi-scale graph diffusion
Background
Delineating the molecular drivers of cancer, i.e. determining cancer genes and the pathways which they deregulate, is an important challenge in cancer research. In this study, we aim to identify pathways of frequently mutated genes by exploiting their network neighborhood encoded in the protein-protein interaction network. To this end, we introduce a multi-scale diffusion kernel and apply it to a large collection of murine retroviral insertional mutagenesis data. The diffusion strength plays the role of scale parameter, determining the size of the network neighborhood that is taken into account. As a result, in addition to detecting genes with frequent mutations in their genomic vicinity, we find genes that harbor frequent mutations in their interaction network context.
Results
We identify densely connected components of known and putatively novel cancer genes and demonstrate that they are strongly enriched for cancer related pathways across the diffusion scales. Moreover, the mutations in the clusters exhibit a significant pattern of mutual exclusion, supporting the conjecture that such genes are functionally linked. Using multi-scale diffusion kernel, various infrequently mutated genes are found to harbor significant numbers of mutations in their interaction network neighborhood. Many of them are well-known cancer genes.
Conclusions
The results demonstrate the importance of defining recurrent mutations while taking into account the interaction network context. Importantly, the putative cancer genes and networks detected in this study are found to be significant at different diffusion scales, confirming the necessity of a multi-scale analysis.
|
[PDF]
[Abstract]
|
| 5 |
|
Transformatie van kantoorgebouwen : sturingsmiddelen om herbestemming van kantoorpanden te bevorderen
Al enkele jaren is er op ruime schaal sprake van structurele leegstand in kantoorgebouwen. In
1996 heeft de kantorenmarkt zich hersteld en is de vraag fors toegenomen. Dit herstel heeft
zich in 1997 met name doorgezet in het midden- en topsegment. In diverse deelmarkten verbeteren
de huurprijzen en neemt de leegstand af. Een groot deel van de nieuwbouwproductie
is echter niet het gevolg van een uitbreidingsvraag, maar van verhuisbewegingen. Door hogere
eisen aan de kwaliteit trekken kantoororganisaties naar nieuwe panden met een hoge gebouwkwaliteit
en gelegen op een hoogwaardige locatie. De achtergelaten panden blijken moeilijk
verhuurbaar. Ze voldoen niet meer aan de marktvraag: 'good buildings drive out bad buildings'.
Voor deze panden en ook voor al langer leegstaande kantoorgebouwen in het onderste
segment van de kantorenmarkt moet een oplossing gevonden worden. Transformatie C.q. herbestemming
naar andere functies kan hierin voorzien.
Reeds enkele jaren wordt door de FGH Bank gepleit voor een actief herbestemmingsbeleid.
De FGH Bank verwacht dat de markt voor transformatie sterk zal groeien. Concrete informatie
hierover ontbreekt echter. Om meer inzicht te krijgen in de potentie van de markt en het
belang ervan voor een gezonde en evenwichtige kantorenmarkt heeft FGH het initiatief genomen
tot een uitgebreid wetenschappelijk onderzoek. Aan de groep Bouwmanagement & Vastgoedbeheer
van de Faculteit Bouwkunde van de Technische Universiteit Delft is gevraagd een
onderzoek uit te voeren naar de markt voor herbestemming. Het onderzoek omvat drie delen:
1. een verkenning van de effectiviteit en haalbaarheid van sturingsmiddelen om herbestemming
te bevorderen;
2. ontwikkeling van een instrumentarium voor de match van vraag en aanbod, op voorraad
niveau en gebouw niveau;
3. vaststellen van de markt voor herbestemming in de vier grote stadsgewesten.
Dit werkdocument beschrijft de bevindingen van de eerste fase van het onderzoek.
Dit onderzoek zou niet hebben kunnen plaatsvinden zonder de steun van anderen. Onze dank
gaat uit naar de heer H. Copier, MBA, directeur van de FGH bank, voor zijn bereidheid dit
onderzoek financieel mogelijk te maken. Door enthousiasme en volharding van de auteurs is
het eerste deel afgerond met in onze ogen zeer interessante uitkomsten. Dr. G.P.M.R. Dewulf
en dr.ir DJ.M. van der Voordt hebben bijdragen geleverd aan de enquĂȘtes en de eindversie
van dit rapport. Een speciaal woord van dank gaat uit naar de respondenten, die hun mening
hebben gegeven over mogelijke en gewenste middelen om herbestemming van leegstaande
kantoren te bevorderen.
|
[PDF]
[Abstract]
|