This thesis investigates machine learning approaches for predicting the glass transition temperature of polymeric materials using chemical structure representations. Multiple molecular encoding strategies were evaluated, including character‑level tokenization of SMILES strings wi
...
This thesis investigates machine learning approaches for predicting the glass transition temperature of polymeric materials using chemical structure representations. Multiple molecular encoding strategies were evaluated, including character‑level tokenization of SMILES strings with count vectorization and TF‑IDF weighting, as well as structural fingerprints such as Morgan and MACCS keys. These representations were applied to a range of models, including Support Vector Regression, Random Forests, Gaussian Process Regression, and deep learning architectures such as Artificial Neural Networks (ANNs) and Convolutional Neural Networks (CNNs). Bayesian optimization via Optuna was employed to tune hyperparameters for optimal performance. Experimental results show that Random Forests combining tokenized SMILES with numerical descriptors achieved competitive predictive accuracy, while fingerprint‑based models provided interpretable and computationally efficient alternatives. The study highlights the trade‑offs between representation richness, interpretability, and prediction, offering insights into the design of data‑driven pipelines for polymer property prediction in aerospace and materials engineering applications.