Self-supervised Learning for Tumor Microenvironment Analysis

Addressing Label Scarcity in Multiplexed Immunofluorescence Imaging with Novel Feature Extraction Techniques

More Info
expand_more

Abstract

The study of tumor microenvironments (TMEs) and immune cell composition in cancer, a disease characterized by uncontrolled growth and spread of tumor cells, has become increasingly important for understanding tumor progression and patient outcomes. Tools such as the TME-Analyzer enable this kind of research, but their manual workflows highlight a common problem in medical imaging: the scarcity of labeled data. This limits the efficiency and applicability of supervised learning algorithms to improve such medical image analysis tools. Self-supervised learning algorithms offer a promising alternative by learning feature representations without requiring labeled data. This thesis aims to address the issue of label scarcity by exploring the potential of self-supervised learning models for TME analysis involving the classification of individual cells in multiplex immunofluorescence (MxIF) microscopy images of triple-negative breast cancer (TNBC) tissue.

To enable the learning of feature representations from MxIF images with an arbitrary number of color channels, this thesis proposes to pre-train an encoder network on every image channel separately according to the SimCLR algorithm and perform classification of multi-channel images by feeding the concatenated feature representation outputs of every channel to a classifier network — referred to as the Siamese configuration. A hyperparameter search is conducted to optimize the SimCLR encoder’s ability to learn high-quality feature representations of individual cells in MxIF images of TNBC tissue. Upon obtaining an optimal set of hyperparameters, the effectiveness of the learned feature representations in improving label-efficiency for individual cell classification is assessed.

The results demonstrate that the proposed Siamese configuration improves the accuracy of classifying the inflammation status of TNBC tumor sections by 2.63%. Additionally, the optimal set of hyperparameters identified through the search include the use of the normalized temperature cross-entropy loss function with low temperature and an added image intensity thresholding term, as well as zoom and brightness/contrast augmentations. Furthermore, the optimized self-supervised learning model improves label-efficiency for individual cell classification, maintaining performance with only 40% of labeled data, while performance drops only when the label percentage is reduced below this threshold.