T. Höllt
Please Note
56 records found
1
DimenFix
A novel meta-strategy to preserve user-defined data values on dimensionality reduction layouts
Dimensionality Reduction (DR) methods have become essential tools for the data analysis toolbox. Typically, DR methods combine features of a multivariate dataset to produce dimensions in a reduced space, preserving some data properties, usually pairwise distances or local neighborhoods. Preserving such properties makes DR methods attractive, but it is also one of their weaknesses. When calculating the embedded dimensions, usually through non-linear strategies, the original feature values are lost and not explicitly represented in the spatialization of the produced layouts, making it challenging to interpret the results and understand the features’ contributions to the attained representations. Some strategies have been proposed to tackle this issue, such as coloring the DR layouts or generating explanations. Still, they are post-processes, so specific features (values) are not guaranteed to be preserved or represented. This paper proposes DimenFix, a novel meta-DR strategy that explicitly preserves the values of a particular user-defined feature or external data (not used to generate a layout) in one of the embedded axes. DimenFix can be used to preserve ordinal (e.g., numerical measures) and nominal (e.g., labels) values and works with virtually any gradient-descent DR method. It requires minimum changes to the underlying DR technique, running in linear time considering the number of data instances. In our results, involving Force Scheme and t-SNE adaptations, DimenFix was capable of representing features without heavily impacting distance or neighborhood preservation, allowing for creating hybrid layouts that join characteristics of scatter plots and DR methods.
Gene co-expression provides crucial insights into biological functions, however, there is a lack of exploratory analysis tools for localized gene co-expression in large-scale datasets. We present GeneSurfer, an interactive interface designed to explore localized transcriptome-wide gene co-expression patterns in the 3D spatial domain. Key features of GeneSurfer include transcriptome-wide gene filtering and gene clustering based on spatial local co-expression within transcriptomically similar cells, multi-slice 3D rendering of average expression of gene clusters, and on-the-fly Gene Ontology term annotation of co-expressed gene sets. Additionally, GeneSurfer offers multiple linked views for investigating individual genes or gene co-expression in the spatial domain at each exploration stage. Demonstrating its utility with both spatially resolved transcriptomics and single-cell RNA sequencing data from the Allen Brain Cell Atlas, GeneSurfer effectively identifies and annotates localized transcriptome-wide co-expression, providing biological insights and facilitating hypothesis generation and validation.
Exploration and analysis of high-dimensional data are important tasks in many fields that produce large and complex data, like the financial sector, systems biology, or cultural heritage. Tailor-made visual analytics software is developed for each specific application, limiting their applicability in other fields. However, as diverse as these fields are, their characteristics and requirements for data analysis are conceptually similar. Many applications share abstract tasks and data types and are often constructed with similar building blocks. Developing such applications, even when based mostly on existing building blocks, requires significant engineering efforts. We developed ManiVault, a flexible and extensible open-source visual analytics framework for analyzing high-dimensional data. The primary objective of ManiVault is to facilitate rapid prototyping of visual analytics workflows for visualization software developers and practitioners alike. ManiVault is built using a plugin-based architecture that offers easy extensibility. While our architecture deliberately keeps plugins self-contained, to guarantee maximum flexibility and re-usability, we have designed and implemented a messaging API for tight integration and linking of modules to support common visual analytics design patterns. We provide several visualization and analytics plugins, and ManiVault's API makes the integration of new plugins easy for developers. ManiVault facilitates the distribution of visualization and analysis pipelines and results for practitioners through saving and reproducing complete application states. As such, ManiVault can be used as a communication tool among researchers to discuss workflows and results. A copy of this paper and all supplemental material is available at osf.io/9k6jw, and source code at github.com/ManiVaultStudio.
Cytosplore Simian Viewer
Visual Exploration for Multi-Species Single-Cell RNA Sequencing Data
In spatial transcriptomics (ST) data, biologically relevant features such as tissue compartments or cell-state transitions are reflected by gene expression gradients. Here, we present SpaceWalker, a visual analytics tool for exploring the local gradient structure of 2D and 3D ST data. The user can be guided by the local intrinsic dimensionality of the high-dimensional data to define seed locations, from which a flood-fill algorithm identifies transcriptomically similar cells on the fly, based on the high-dimensional data topology. In several use cases, we demonstrate that the spatial projection of these flooded cells highlights tissue architectural features and that interactive retrieval of gene expression gradients in the spatial and transcriptomic domains confirms known biology. We also show that SpaceWalker generalizes to several different ST protocols and scales well to large, multi-slice, 3D whole-brain ST data while maintaining real-time interaction performance.
The cognitive abilities of humans are distinctive among primates, but their molecular and cellular substrates are poorly understood. We used comparative single-nucleus transcriptomics to analyze samples of the middle temporal gyrus (MTG) from adult humans, chimpanzees, gorillas, rhesus macaques, and common marmosets to understand human-specific features of the neocortex. Human, chimpanzee, and gorilla MTG showed highly similar cell-type composition and laminar organization as well as a large shift in proportions of deep-layer intratelencephalic-projecting neurons compared with macaque and marmoset MTG. Microglia, astrocytes, and oligodendrocytes had more-divergent expression across species compared with neurons or oligodendrocyte precursor cells, and neuronal expression diverged more rapidly on the human lineage. Only a few hundred genes showed human-specific patterning, suggesting that relatively few cellular and molecular changes distinctively define adult human cortical structure.
Microglia have been identified as key players in Alzheimer's disease pathogenesis, and other neurodegenerative diseases. Iba1, and more specifically TMEM119 and P2RY12 are gaining ground as presumedly more specific microglia markers, but comprehensive characterization of the expression of these three markers individually as well as combined is currently missing. Here we used a multispectral immunofluorescence dataset, in which over seventy thousand microglia from both aged controls and Alzheimer patients have been analysed for expression of Iba1, TMEM119 and P2RY12 on a single-cell level. For all markers, we studied the overlap and differences in expression patterns and the effect of proximity to β-amyloid plaques. We found no difference in absolute microglia numbers between control and Alzheimer subjects, but the prevalence of specific combinations of markers (phenotypes) differed greatly. In controls, the majority of microglia expressed all three markers. In Alzheimer patients, a significant loss of TMEM119+-phenotypes was observed, independent of the presence of β-amyloid plaques in its proximity. Contrary, phenotypes showing loss of P2RY12, but consistent Iba1 expression were increasingly prevalent around β-amyloid plaques. No morphological features were conclusively associated with loss or gain of any of the markers or any of the identified phenotypes. All in all, none of the three markers were expressed by all microglia, nor can be wholly regarded as a pan- or homeostatic marker, and preferential phenotypes were observed depending on the surrounding pathological or homeostatic environment. This work could help select and interpret microglia markers in previous and future studies.
Chronic intestinal inflammation underlies inflammatory bowel disease (IBD). Previous studies indicated alterations in the cellular immune system; however, it has been challenging to interrogate the role of all immune cell subsets simultaneously. Therefore, we aimed to identify immune cell types associated with inflammation in IBD using high-dimensional mass cytometry. We analyzed 188 intestinal biopsies and paired blood samples of newly-diagnosed, treatment-naive patients (n=42) and controls (n=26) in two independent cohorts. We applied mass cytometry (36-antibody panel) to resolve single cells and analyzed the data with unbiased Hierarchical-SNE. In addition, imaging-mass cytometry (IMC) was performed to reveal the spatial distribution of the immune subsets in the tissue. We identified 44 distinct immune subsets. Correlation network analysis identified a network of inflammation-associated subsets, including HLA-DR+CD38+ EM CD4+ T cells, T regulatory-like cells, PD1+ EM CD8+ T cells, neutrophils, CD27+ TCRγδ cells and NK cells. All disease-associated subsets were validated in a second cohort. This network was abundant in a subset of patients, independent of IBD subtype, severity or intestinal location. Putative disease-associated CD4+ T cells were detectable in blood. Finally, imaging-mass cytometry revealed the spatial colocalization of neutrophils, memory CD4+ T cells and myeloid cells in the inflamed intestine. Our study indicates that a cellular network of both innate and adaptive immune cells colocalizes in inflamed biopsies from a subset of patients. These results contribute to dissecting disease heterogeneity and may guide the development of targeted therapeutics in IBD.
High-dimensional imaging is becoming increasingly relevant in many fields from astronomy and cultural heritage to systems biology. Visual exploration of such high-dimensional data is commonly facilitated by dimensionality reduction. However, common dimensionality reduction methods do not include spatial information present in images, such as local texture features, into the construction of low-dimensional embeddings. Consequently, exploration of such data is typically split into a step focusing on the attribute space followed by a step focusing on spatial information, or vice versa. In this paper, we present a method for incorporating spatial neighborhood information into distance-based dimensionality reduction methods, such as t-Distributed Stochastic Neighbor Embedding (t-SNE). We achieve this by modifying the distance measure between high-dimensional attribute vectors associated with each pixel such that it takes the pixel's spatial neighborhood into account. Based on a classification of different methods for comparing image patches, we explore a number of different approaches. We compare these approaches from a theoretical and experimental point of view. Finally, we illustrate the value of the proposed methods by qualitative and quantitative evaluation on synthetic data and two real-world use cases.
Author Correction
Comparative cellular analysis of motor cortex in human, marmoset and mouse (Nature, (2021), 598, 7879, (111-119), 10.1038/s41586-021-03465-8)
In the version of this article initially published, the Acknowledgements section was incomplete and has now been amended to include the following: “NIH BRAIN Initiative awards U01 MH121282 to J.R.E and M.M.B, U19 MH114831 to J.R.E. and E.M.C., U19 MH114830 to H.Z., U01 MH114819 to G.F., 1U01MH114828 to K.Z. and J.C., RF1MH123220 to M.H. and R.H.S., and U19 MH114821. NIH awards R01DC019370 to R.H., R24MH114815 to R.H. and O.R.W., and R24 MH114788 to O.R.W. Nancy and Buster Alvord Endowment to C.D.K.” The changes have been made to the HTML and PDF versions of the article.
Diffusion Tensor Imaging (DTI) is a non-invasive magnetic resonance imaging technique that, combined with fiber tracking algorithms, allows the characterization and visualization of white matter structures in the brain. The resulting fiber tracts are used, for example, in tumor surgery to evaluate the potential brain functional damage due to tumor resection. The DTI processing pipeline from image acquisition to the final visualization is rather complex generating undesirable uncertainties in the final results. Most DTI visualization techniques do not provide any information regarding the presence of uncertainty. When planning surgery, a fixed safety margin around the fiber tracts is often used; however, it cannot capture local variability and distribution of the uncertainty, thereby limiting the informed decision-making process. Stochastic techniques are a possibility to estimate uncertainty for the DTI pipeline. However, it has high computational and memory requirements that make it infeasible in a clinical setting. The delay in the visualization of the results adds hindrance to the workflow. We propose a progressive approach that relies on a combination of wild-bootstrapping and fiber tracking to be used within the progressive visual analytics paradigm. We present a local bootstrapping strategy, which reduces the computational and memory costs, and provides fiber-tracking results in a progressive manner. We have also implemented a progressive aggregation technique that computes the distances in the fiber ensemble during progressive bootstrap computations. We present experiments with different scenarios to highlight the benefits of using our progressive visual analytic pipeline in a clinical workflow along with a use case and analysis obtained by discussions with our collaborators.
The primary motor cortex (M1) is essential for voluntary fine-motor control and is functionally conserved across mammals1. Here, using high-throughput transcriptomic and epigenomic profiling of more than 450,000 single nuclei in humans, marmoset monkeys and mice, we demonstrate a broadly conserved cellular makeup of this region, with similarities that mirror evolutionary distance and are consistent between the transcriptome and epigenome. The core conserved molecular identities of neuronal and non-neuronal cell types allow us to generate a cross-species consensus classification of cell types, and to infer conserved properties of cell types across species. Despite the overall conservation, however, many species-dependent specializations are apparent, including differences in cell-type proportions, gene expression, DNA methylation and chromatin state. Few cell-type marker genes are conserved across species, revealing a short list of candidate genes and regulatory mechanisms that are responsible for conserved features of homologous cell types, such as the GABAergic chandelier cells. This consensus transcriptomic classification allows us to use patch–seq (a combination of whole-cell patch-clamp recordings, RNA sequencing and morphological characterization) to identify corticospinal Betz cells from layer 5 in non-human primates and humans, and to characterize their highly specialized physiology and anatomy. These findings highlight the robust molecular underpinnings of cell-type diversity in M1 across mammals, and point to the genes and regulatory pathways responsible for the functional identity of cell types and their species-specific adaptations.