Heuristic Optimization of Amazon Redshift Table Configurations

None, None

Heuristic Optimization of Amazon Redshift Table Configurations

Focusing on Distribution Style, Sort Keys and Column Encodings in Amazon Redshift

Master Thesis (2025)

Author(s)

X.L. Hu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

N. Yorke-Smith – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. Katsifodimos – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

C. Lofi – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Derek van den Broek – Mentor (PostNL B.V.)

Faculty

Electrical Engineering, Mathematics and Computer Science

Query Heurstic Table configuration optimisation Columnar database

To reference this document use

https://resolver.tudelft.nl/uuid:e6220a35-548c-4fc1-a37d-eb90b9b4f9cd

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

17-07-2025

Awarding Institution

Delft University of Technology

Programme

Computer Science

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

154

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This thesis presents a comprehensive, heuristic cost-driven framework for optimizing database table configuration in Amazon Redshift focusing on distribution styles, sort keys and column encodings. Unlike existing approaches that treat optimization parameters independently, this research develops a sequential optimization methodology that captures complex interdependencies between configuration choices and their performance impacts across different data scales.

The study addresses four research questions examining individual parameter optimization strategies and their integrated effects on system performance. The experimental evaluation employs two datasets: a primary table containing 300 million data records across 23 columns where optimization is performed, and a secondary join table with 117.5 million data records across 12 columns that remains unchanged. Scale-dependent analysis is conducted using subsets of 10 million and 100 million data records selected from the primary dataset to enable controlled comparison across different data volumes.

Key findings demonstrate that table configuration optimization in Amazon Redshift exhibits pronounced scale-dependent performance characteristics across the experimental datasets, with three distinct performance regimes identified: a small-scale regime (10M data records) characterized by query-type dependent optimization effectiveness, a medium-scale regime (100M data records) showing optimization trade-off transitions, and a large-scale regime (300M data records) dominated by I/O and storage optimizations. The research reveals mixed optimization outcomes, with performance improvements ranging from 21 percent CPU reduction at small scales to 62 percent I/O improvement at large scales, while demonstrating that optimization strategies effective at one scale can become counterproductive at another. Overall, the framework shows variable success in parameter selection for distribution style, sort key and encoding selection.

The research identifies fundamental challenges in optimizing Amazon Redshift table configurations where internal algorithms remain opaque and optimization benefits exhibit non-linear scaling patterns across the tested different data volumes. While the framework provides valuable insights into scale-dependent optimization patterns, the mixed results highlight the complexity of achieving consistent performance improvements across different scales and query types. These findings challenge assumptions about uniform optimization benefits and emphasize the need for empirical validation approaches in cloud database optimization, providing practical insights for database administrators and theoretical foundations for developing adaptive optimization systems.

Files

TU_Delft_Master_Thesis_1_.pdf

(pdf | 0 Mb)

License info not available

File under embargo until 17-07-2027