1 

Improving the analysis of designed studies by combining statistical modelling with study design information
article 
2009

Author: 
Thissen, U.
·
Wopereis, S.
·
Berg, S.A.A. van den
·
Bobeldijk, I.
·
Kleemann, R.
·
Kooistra, T.
·
Dijk, K.W. van
·
Ommen, B. van
·
Smilde, A.K.

Keywords: 
Biology · Biomedical Research · Animals · Databases, Factual · Humans · Metabolomics · Models, Statistical

Background: In the fields of life sciences, socalled designed studies are used for studying complex biological systems. The data derived from these studies comply with a study design aimed at generating relevant information while diminishing unwanted variation (noise). Knowledge about the study design can be used to decompose the total data into data blocks that are associated with specific effects. Subsequent statistical analysis can be improved by this decomposition if these are applied on selected combinations of effects. Results: The benefit of this approach was demonstrated with an analysis that combines multivariate PLS (Partial Least Squares) regression with data decomposition from ANOVA (Analysis of Variance): ANOVAPLS. As a case, a nutritional intervention study is used on Apoliprotein E3Leiden (APOE3Leiden) transgenic mice to study the relation between liver lipidomics and a plasma inflammation marker, Serum Amyloid A. The ANOVAPLS performance was compared to PLS regression on the nondecomposed data with respect to the quality of the modelled relation, model reliability, and interpretability. Conclusion: It was shown that ANOVAPLS leads to a better statistical model that is more reliable and better interpretable compared to standard PLS analysis. From a following biological interpretation, more relevant metabolites were derived from the model. The concept of combining data composition with a subsequent statistical analysis, as in ANOVAPLS, is however not limited to PLS regression in metabolomics but can be applied for many statistical methods and many different types of data. © 2009 Thissen et al; licensee BioMed Central Ltd.

[PDF]
[Abstract]

2 

Residential exposure should be considered in appropriate terms : summary of discussions
Chemicals/CAS: Pesticides

[Abstract]

3 

Toxicological evaluation of chemical mixtures
This paper addresses major developments in the safety evaluation of chemical mixtures during the past 15 years, reviews today's state of the art of mixture toxicology, and discusses challenges ahead. Wellthoughtout tailormade mechanistic and empirical designs for studying the toxicity of mixtures have gradually substituted trialanderror approaches, improving the insight into the testability of joint action and interaction of constituents of mixtures. The acquired knowledge has successfully been used to evaluate the safety of combined exposures and complex mixtures such as, for example, the atmosphere at hazardous waste sites, drinking water disinfection byproducts, natural flavouring complexes, and the combined intake of food additives. To consolidate the scientific foundation of mixture toxicology, studies are in progress to revisit the biological concepts and mathematics underlying formulas for lowdose extrapolation and risk assessment of chemical mixtures. Conspicuous developments include the production of new computer programs applicable to mixture research (CombiTool, BioMol, Reaction Network Modelling), the application of functional genomics and proteomics to mixture studies, the use of nanooptochemical sensors for in vivo imaging of physiological processes in cells, and the application of optical sensor micro and nanoarrays for complex sample analysis. Clearly, the input of theoretical biologists, biomathematicians and bioengineers in mixture toxicology is essential for the development of this challenging branch of toxicology into a scientific subdiscipline of full value. © 2002 Elsevier Science Ltd. All rights reserved. Chemicals/CAS: Air Pollutants; Xenobiotics

[Abstract]

4 

Assessing reasonable worstcase fullshift exposure levels from data of variable quality
Exposure assessors involved in regulatory risk assessments often need to estimate a reasonable worstcase fullshift exposure level from very limited exposure information. Fullshift exposure data of very high quality are rare. A fullshift value can also be calculated from (short term) taskbased values, either derived from measured data or from models. The most simple option is to use the task based exposure levels as the fullshift value. A second option is to calculate a timeweighted average (TWA), using (reasonable worst case) estimates of the duration and the exposure level of the relevant tasks. The third option is to use a Monte Carlo analysis with estimated input distributions for exposure level and duration of exposure. If an estimated distribution of respiratory volume is also included, this leads to a distribution of inhaled amounts. The 90th percentile of such a distribution is generally substantially lower than the fixed point estimates calculated using high end values for each parameter. This technique can thus prevent unnecessary conservative estimates in risk assessment. The output distribution can also be used as valuable input to the risk management process, because it provides information on probabilities of exposure levels, that can influence the costbenefit analysis of the risk management process. Finally, the sensitivity analysis of Monte Carlo simulation can give guidance for further studies to increase the accuracy of the exposure assessment.

[Abstract]

5 

Probabilistic exposure assessment is essential for assessing risks : summary of discussions
Chemicals/CAS: Pesticides

[Abstract]

6 

Risk assessment of worker and residential exposure to pesticides : conclusions and recommendations
Chemicals/CAS: Pesticides

[Abstract]

7 

The optimum decision rules for the oddity task
This paper presents the optimum decision rule for an minterval oddity task in which m1 intervals contain the same signal and one is different or odd. The optimum decision rule depends on the degree of correlation among observations. The present approach unifies the different strategies that occur with "roved" or "fixed" experiments (Macmillan & Creelman, 1991, p. 147). It is shown that the commonly used decision rule for an minterval oddity task corresponds to the special case of highly correlated observations. However, as is also true for the samedifferent paradigm, there exists a different optimum decision rule when the observations are independent. The relation between the probability of a correct response and d′ is derived for the threeinterval oddity task. Tables are presented of this relation for the three, four, and fiveinterval oddity task. Finally, an experimental method is proposed that allows one to determine the decision rule used by the observer in an oddity experiment.

[Abstract]

8 

Multiple imputation of discrete and continuous data by fully conditional specification
The goal of multiple imputation is to provide valid inferences for statistical estimates from incomplete data. To achieve that goal, imputed values should preserve the structure in the data, as well as the uncertainty about this structure, and include any knowledge about the process that generated the missing data. Two approaches for imputing multivariate data exist: joint modeling (JM) and fully conditional specification (FCS). JM is based on parametric statistical theory, and leads to imputation procedures whose statistical properties are known. JM is theoretically sound, but the joint model may lack flexibility needed to represent typical data features, potentially leading to bias. FCS is a semiparametric and flexible alternative that specifies the multivariate model by a series of conditional models, one for each incomplete variable. FCS provides tremendous flexibility and is easy to apply, but its statistical properties are difficult to establish. Simulation work shows that FCS behaves very well in the cases studied. The present paper reviews and compares the approaches. JM and FCS were applied to pubertal development data of 3801 Dutch girls that had missing data on menarche (two categories), breast development (five categories) and pubic hair development (six stages). Imputations for these data were created under two models: a multivariate normal model with rounding and a conditionally specified discrete model. The JM approach introduced biases in the reference curves, whereas FCS did not. The paper concludes that FCS is a useful and easily applied flexible alternative to JM when no convenient and realistic joint distribution can be specified. © 2007 SAGE Publications.

[Abstract]

9 

Revision of the ICIDH Severity of Disabilities Scale by data linking and item response theory
The Severity of Disabilities Scale (SDS) of the ICIDH reflects the degree to which an individual's ability to perform a certain activity is restricted. This paper describes the application of two models from item response theory (IRT), the graded response model and the partial credit model, in order to derive a tentative proposal for a revised SDS. The key ingredient of the approach is to scale existing disability items obtained in different studies on a common scale by exploiting the overlap. Both IRT models are fitted to a linked data set containing items for measuring walking disability. Based on these solutions, a tentative SDS is constructed. The paper concludes with a discussion of the implications, limitations and advantages of the approach. Copyright © 2001 John Wiley & Sons, Ltd.

[Abstract]

10 

Development of good modelling practice for physiologically based pharmacokinetic models for use in risk assessment: The first steps
article 
2008

Author: 
Loizou, G.
·
Spendiff, M.
·
Barton, H.A.
·
Bessems, J.
·
Bois, F.Y.
·
d'Yvoire, M.B.
·
Buist, H.
·
Clewell III, H.J.
·
Meek, B.
·
GundertRemy, U.
·
Goerlitz, G.
·
Schmitt, W.

Keywords: 
Health · Good modelling practice · PBPK · Risk assessment · article · Canada · documentation · dosimetry · Europe · Greece · priority journal · risk assessment · scientific literature · United States · Animals · Humans · Legislation, Drug · Models, Statistical · Pharmacokinetics · Quantitative StructureActivity Relationship · Risk Assessment

The increasing use of tissue dosimetry estimated using pharmacokinetic models in chemical risk assessments in various jurisdictions necessitates the development of internationally recognized good modelling practice (GMP). These practices would facilitate sharing of models and model evaluations and consistent applications in risk assessments. Clear descriptions of good practices for (1) model development i.e., research and analysis activities, (2) model characterization i.e., methods to describe how consistent the model is with biology and the strengths and limitations of available models and data, such as sensitivity analyses, (3) model documentation, and (4) model evaluation i.e., independent review that will assist risk assessors in their decisions of whether and how to use the models, and also model developers to understand expectations for various purposes e.g., research versus application in risk assessment. Next steps in the development of guidance for GMP and research to improve the scientific basis of the models are described based on a review of the current status of the application of physiologically based pharmacokinetic (PBPK) models in risk assessments in Europe, Canada, and the United States at the International Workshop on the Development of GMP for PBPK Models in Greece on April 2729, 2007. Crown Copyright © 2008.

[Abstract]

11 

Health expectancy and the problem of substitute morbidity
During the past century, the developed world has not only witnessed a dramatic increase in life expectancy (ageing), but also a concomitant rise in chronic disease and disability. Consequently, the tension between 'living longer' on the one hand and healthrelated 'quality of life' on the other has become an increasingly important health policy problem. The paper deals with two consequences of this socalled epidemiological transition in population health. The first one concerns the question of howgiven the impressive changespopulation health can be measured in an adequate and policy relevant presentday fashion. The second one is the socalled phenomenon of 'substitute morbidity and mortality': more and more acute fatal diseases are replaced by nonfatal delayed degenerative diseases like dementia and arthritis. How the phenomenon of substitute morbidity and mortality affects the development of population health is illustrated with the epidemiological transitions, worldwide shifts in the main causes of death, assumptions used in models, adverse consequences of medical technologies and some results from intervention trials. Substitute morbidity and mortality may thwart our diseasespecific expectations of interventions and asks for a shift to a 'total population health' perspective when judging potential health gains of interventions. Better understanding of the dynamics that underly the changes in population health is necessary. Implications for data collections are more emphasis on morbidity data and their relation with mortality, more longitudinal studies, stricter requirements for intervention trials and more use of modelling as a tool. A final recommendation is the promotion of integrative measures of population health. For the latter several results are presented suggesting that, although the amount of morbidity and disability is growing with an increasing life expectancy, this is mild unhealthiness in particular. This finding supports the 'dynamic equilibrium' theory. In absolute numbers, however, the burden of disease will continue to increase with further ageing of the population.

[Abstract]

12 

Unidimensionality and reliability under Mokken scaling of the Dutch language version of the SF36
The subscales of the SF36 in the Dutch National Study are investigated with respect to unidimensionality and reliability. It is argued that these properties deserve separate treatment. For unidimensionality we use a nonparametric model from item response theory, called the Mokken scaling model, and compute the corresponding scalability coefficients. We estimate reliability under the Mokken model, assuming that the items are double homogeneous, and compare it to Cronbach's α. The scalability of the subscale general health perceptions is medium (H = 0.46), and for the other subscales it is strong (H ≥ 0.6). The reliability in terms of α indicates that all subscales can be used in basic research (α > 0.70), but that only physical functioning can be used for clinical applications of quality of life (α > 0.90). The relative merits of our approach are discussed.

[Abstract]

13 

Crossvalidation and refinement of the Stoffenmanager as a first tier exposure assessment tool for REACH
Objectives: For regulatory risk assessment under REACH a tiered approach is proposed in which the first tier models should provide a conservative exposure estimate that can discriminate between scenarios which are of concern and those which are not. The Stoffenmanager is mentioned as a first tier approach in the REACH guidance. In an attempt to investigate the validity of the Stoffenmanager algorithms, a crossvalidation study was performed. Methods: Exposure estimates using the Stoffenmanager algorithms were compared with exposure measurement results (n=254). Correlations between observed and predicted exposures, bias and precision were calculated. Stratified analyses were performed for the scenarios "handling of powders and granules" (n=82), "handling solids resulting in comminuting" (n=60), "handling of lowvolatile liquids" (n=40) and "handling of volatile liquids" (n=72). Results: The relative bias of the four algorithms ranged between 9% and 77% with a precision of approximately 1.7. The 90th percentile estimate of one out of four algorithms was not conservative enough. Based on these statistics and analyses of residual plots the underlying algorithm was adapted. Subsequently, the calibration and the crossvalidation dataset were merged into one dataset (n=952) used for calibrating the adapted Stoffenmanager algorithms. This new calibration resulted in new exposure algorithms for the four scenarios. Conclusions: The Stoffenmanager is capable of discriminating among exposure levels mainly between scenarios in different companies. The 90th percentile estimates of the Stoffenmanager are verified to be sufficiently conservative. Therefore, the Stoffenmanager could be a useful tier 1 exposure assessment tool for REACH.

[Abstract]

14 

Discovering gene expression patterns in time course microarray experiments by ANOVASCA
article 
2007

Author: 
Nueda, M.J.
·
Conesa, A.
·
Westerhuis, J.A.
·
Hoefsloot, H.C.J.
·
Smilde, A.K.
·
Talón, M.
·
Ferrer, A.

Keywords: 
Biology · Analytical research · analysis of variance · article · bioinformatics · correlation analysis · data base · gene expression profiling · genetic selection · genetic transcription · mathematical analysis · microarray analysis · nonhuman · priority journal · simultaneous component analysis · statistical analysis · time series analysis · Algorithms · Analysis of Variance · Computational Biology · Computer Simulation · Data Interpretation, Statistical · Gene Expression Profiling · Models, Genetic · Models, Statistical · Oligonucleotide Array Sequence Analysis · Principal Component Analysis · Time Factors · Transcription, Genetic

Motivation: Designed microarray experiments are used to investigate the effects that controlled experimental factors have on gene expression and learn about the transcriptional responses associated with external variables. In these datasets, signals of interest coexist with varying sources of unwanted noise in a framework of (co)relation among the measured variables and with the different levels of the studied factors. Discovering experimentally relevant transcriptional changes require methodologies that take all these elements into account. Results: In this work, we develop the application of the Analysis of variancesimultaneous component analysis (ANOVASCA) Smilde et al. Bioinformatics, (2005) to the analysis of multiple series time course microarray data as an example of multifactorial gene expression profiling experiments. We denoted this implementation as ASCAgenes. We show how the combination of ANOVAmodeling and a dimension reduction technique is effective in extracting targeted signals from data bypassing structural noise. The methodology is valuable for identifying main and secondary responses associated with the experimental factors and spotting relevant experimental conditions. We additionally propose a novel approach for gene selection in the context of the relation of individual transcriptional patterns to global gene expression signals. We demonstrate the methodology on both real and synthetic datasets. © 2007 The Author(s).

[Abstract]

15 

Risk assessment and food allergy: the probabilistic model applied to allergens
In order to assess the risk of unintended exposure to food allergens, traditional deterministic risk assessment is usually applied, leading to inconsequential conclusions as 'an allergic reaction cannot be excluded'. TNO therefore developed a quantitative risk assessment model for allergens based on probabilistic techniques resulting in a more exhaustive risk assessment and more detailed information. By now, this approach is recognized as the future approach in allergen risk assessment. A case study (hazelnut proteins in chocolate spread) is presented as a proof of concept. © 2006 Elsevier Ltd. All rights reserved.

[Abstract]

16 

Simplivariate models : Ideas and first examples
article 
2008

Author: 
Hageman, J.A.
·
Hendriks, M.M.W.B.
·
Westerhuis, J.A.
·
Werf, M.J. van der
·
Berger, R.
·
Smilde, A.K.

Keywords: 
Phenylalanine · Algorithm · Article · Biological model · Biology · Computer program · Computer simulation · Escherichia coli · Genetics · Genomics · Metabolism · Metabolomics · Statistical model · Systems biology · Theoretical model · Algorithms · Computational Biology · Computer Simulation · Escherichia coli · Genomics · Metabolomics · Models, Biological · Models, Genetic · Models, Statistical · Models, Theoretical · Phenylalanine · Software · Systems Biology

One of the new expanding areas in functional genomics is metabolomics: measuring the metabolome of an organism. Data being generated in metabolomics studies are very diverse in nature depending on the design underlying the experiment. Traditionally, variation in measurements is conceptually broken down in systematic variation and noise where the latter contains, e.g. technical variation. There is increasing evidence that this distinction does not hold (or is too simple) for metabolomics data. A more useful distinction is in terms of informative and noninformative variation where informative relates to the problem being studied. In most common methods for analyzing metabolomics (or any other highdimensional xomics) data this distinction is ignored thereby severely hampering the results of the analysis. This leads to poorly interpretable models and may even obscure the relevant biological information. We developed a framework from first data analysis principles by explicitly formulating the problem of analyzing metabolomics data in terms of informative and noninformative parts. This framework allows for flexible interactions with the biologists involved in formulating prior knowledge of underlying structures. The basic idea is that the informative parts of the complex metabolomics data are approximated by simple components with a biological meaning, e.g. in terms of metabolic pathways or their regulation. Hence, we termed the framework 'simplivariate models' which constitutes a new way of looking at metabolomics data. The framework is given in its full generality and exemplified with two methods, IDR analysis and plaid modeling, that fit into the framework. Using this strategy of 'divide and conquer', we show that meaningful simplivariate models can be obtained using a reallife microbial metabolomics data set. For instance, one of the simple components contained all the measured intermediates of the Krebs cycle of E. coli. Moreover, these simplivariate models were able to uncover regulatory mechanisms present in the phenylalanine biosynthesis route of E. coli. © 2008 Hageman et al.

[PDF]
[Abstract]

17 

Probabilistic risk assessment model for allergens in food: sensitivity analysis of the minimum eliciting dose and food consumption
Previously, TNO developed a probabilistic model to predict the likelihood of an allergic reaction, resulting in a quantitative assessment of the risk associated with unintended exposure to food allergens. The likelihood is estimated by including in the model the proportion of the population who is allergic, the proportion consuming the food and the amount consumed, the likelihood of the food containing an adventitious allergen and its concentration, and the minimum eliciting dose (MED) distribution for the allergen. In the present work a sensitivity analysis was performed to identify which parts of the model most influence the output. A shift in the distribution of the MED reflecting a more potent allergen, and an increase in the proportion of the population consuming a food, increased the number of estimated allergic reactions considerably. In contrast, the number of estimated allergic reactions hardly changed when the MEDs were based on a more severe response, or when the amount of food consumed was increased. Development of this work will help to generate a more accurate picture of the potential public health impact of allergens. It highlights areas where research is best focused, specifically the determination of minimum eliciting doses and understanding of the food choices of allergic individuals. © 2008 ILSI Europe and TNO Quality of Life.

[Abstract]

18 

Glutathione depletion in rat hepatocytes : a mixture toxicity study with alpha,ßunsaturated esters
Glutathione (GSH) depletion is often reported as an early cytotoxic effect, caused by many reactive organic chemicals. In the present study, GSH depletion in primary rat hepatocytes was used as an in vitro effectequivalent to measure the toxic potency of α,βunsaturated esters (acrylates and methacrylates). 2. When these compounds were administered as a mixture, GSH depletion was dose additive. The result of the mixture study shows that GSH depletion may be a useful effectequivalent for the risk assessment of mixtures of α,βunsaturated esters. 3. To get more insight in the underlying mechanisms of GSH depletion, the metabolism of two esters was investigated in greater detail. One of them, allyl methacrylate, was metabolized to acrolein. This metabolic pathway can explain the high potency of allyl methacrylate to deplete GSH despite its low intrinsic chemical reactivity.

[Abstract]

19 

Toxicological evaluation and risk assessment of chemical mixtures
A major objective of combination toxicology is to establish whether a mixture of chemicals will result in an effect similar to that expected on the basis of additivity. This requires understanding of the basic concepts of the combined toxicological action of the compounds of the mixture: simple similar action (dose addition), simple dissimilar action (effect or response addition), and interaction (synergism, potentiation, antagonism). The number of possible combinations of chemicals is innumerable, and in vivo testing of these mixtures is unattainable from an ethical, economical, or pragmatic perspective. Prediction of the effect of a mixture based on the knowledge of each of the constituents requires detailed information on the composition of the mixture, exposure level, mechanism of action, and receptor of the individual compounds. Often, such information is not or is only partially available and additional studies are needed. Research strategies and methods to assess joint action or interaction of chemicals in mixtures such as whole mixture testing, physiologically based toxicokinetic modeling and isobologram and dose response surface analyses are discussed. Guidance is given for risk assessment of both simple and complex mixtures. We hypothesize that, as a rule, exposure to mixtures of chemicals at (low) nontoxic doses of the individual constituents is of no health concern. To verify the hypothesis is a challenge; to timely detect exceptions to the rule is the real challenge of major practical importance.

[Abstract]

20 

Computerized adaptive testing for measuring development of young children
Developmental indicators that are used for routine measurement in The Netherlands are usually chosen to optimally identify delayed children. Measurements on the majority of children without problems are therefore quite imprecise. This study explores the use of computerized adaptive testing (CAT) to monitor the development of young children. CAT is expected to improve the measurement precision of the instrument. We do two simulation studies  one with real data and one with simulated data  to evaluate the usefulness of CAT. It is shown that CAT selects developmental indicators that maximally match the individual child, so that all children can be measured to the same precision. Copyright © 2006 John Wiley & Sons, Ltd.

[Abstract]
