Modern machine learning systems face unprecedented challenges in processing continuously arriving data streams while maintaining both computational efficiency and privacy compliance. Traditional batch learning approaches exhibit quadratic scaling in memory and computational requi
...
Modern machine learning systems face unprecedented challenges in processing continuously arriving data streams while maintaining both computational efficiency and privacy compliance. Traditional batch learning approaches exhibit quadratic scaling in memory and computational requirements, making them unsuitable for long-term deployment in resource-constrained environments. Despite significant advances in continual learning for computer vision and natural language processing, tabular data represents the majority of industrial machine learning applications.
This thesis introduces IMLP (Incremental MLP), an attention-based architecture for energy-efficient continual learning on tabular data streams. IMLP augments a standard multilayer perceptron with attention-based feature rehearsal, maintaining a fixed-size buffer of learned 256-dimensional representations rather than raw historical samples. This design achieves constant computational complexity regardless of stream length while preserving task-relevant knowledge without storing personally identifiable information.
We conduct comprehensive evaluation across 36 diverse TabZilla classification tasks against 14 baseline methods spanning gradient boosting, classical machine learning, and neural architectures. Using calibrated power measurement equipment and rigorous statistical analysis via Friedman omnibus tests with post-hoc comparisons, we establish that IMLP achieves a $4.2\times$ median speedup and 79.6\% energy reduction compared to standard MLPs while maintaining competitive accuracy (80.6\% vs 82.9\% balanced accuracy).
Our key findings demonstrate that IMLP successfully trades a modest 2.3 percentage point accuracy reduction for substantial efficiency gains, achieving 97.5\% of cumulative learning performance using only current segment data. The approach proves robust across datasets spanning 5 to 2,000 features and diverse domains including medical diagnosis, sensor data, and financial applications. Moreover, we introduce NetScore-T, a composite metric for evaluating accuracy-efficiency trade-offs, positioning IMLP optimally on the neural network Pareto frontier.
Therefore, this work establishes the feasibility of practical continual learning for resource-constrained environments while contributing the first systematic study of energy consumption in neural continual learning for tabular data, enabling deployment scenarios previously considered computationally infeasible.