Mv
M. van den Bos
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
Search Strategies for Optimal Decision Trees
Classification and Regression with Continuous Features
Interpretable machine learning models, such as decision trees, are needed when decisions require trust. Optimal decision trees are shown to generalise better to new data than those constructed greedily, but due to the NP-hardness of the problem they are hard to apply to large datasets. Previous methods either do not take into account continuous features, are designed for a fixed maximum tree depth of two and three, or do not consider objectives other than classification.
We introduce CODTree, the first specialised branch-and-bound algorithm that finds optimal classification and regression trees with continuous features for an arbitrary maximum depth. Our algorithm is able to run different search strategies and includes a specialised solver for shallow trees. Our experiments show that global best-first search with a heuristic that prioritises smaller nodes with better lower bounds has the best time to optimality and anytime performance, and that the specialised solver for shallow trees provides a geometric mean speedup of 77.4x.
Compared to the state-of-the-art for classification, we have comparable runtime, but our search strategy has four orders of magnitude fewer operations for some datasets, although there are diminishing returns for greater depths. For regression, our algorithm is significantly faster than the state-of-the-art and, to the best of our knowledge, the first to find optimal regression trees at depth four. ...
We introduce CODTree, the first specialised branch-and-bound algorithm that finds optimal classification and regression trees with continuous features for an arbitrary maximum depth. Our algorithm is able to run different search strategies and includes a specialised solver for shallow trees. Our experiments show that global best-first search with a heuristic that prioritises smaller nodes with better lower bounds has the best time to optimality and anytime performance, and that the specialised solver for shallow trees provides a geometric mean speedup of 77.4x.
Compared to the state-of-the-art for classification, we have comparable runtime, but our search strategy has four orders of magnitude fewer operations for some datasets, although there are diminishing returns for greater depths. For regression, our algorithm is significantly faster than the state-of-the-art and, to the best of our knowledge, the first to find optimal regression trees at depth four. ...
Interpretable machine learning models, such as decision trees, are needed when decisions require trust. Optimal decision trees are shown to generalise better to new data than those constructed greedily, but due to the NP-hardness of the problem they are hard to apply to large datasets. Previous methods either do not take into account continuous features, are designed for a fixed maximum tree depth of two and three, or do not consider objectives other than classification.
We introduce CODTree, the first specialised branch-and-bound algorithm that finds optimal classification and regression trees with continuous features for an arbitrary maximum depth. Our algorithm is able to run different search strategies and includes a specialised solver for shallow trees. Our experiments show that global best-first search with a heuristic that prioritises smaller nodes with better lower bounds has the best time to optimality and anytime performance, and that the specialised solver for shallow trees provides a geometric mean speedup of 77.4x.
Compared to the state-of-the-art for classification, we have comparable runtime, but our search strategy has four orders of magnitude fewer operations for some datasets, although there are diminishing returns for greater depths. For regression, our algorithm is significantly faster than the state-of-the-art and, to the best of our knowledge, the first to find optimal regression trees at depth four.
We introduce CODTree, the first specialised branch-and-bound algorithm that finds optimal classification and regression trees with continuous features for an arbitrary maximum depth. Our algorithm is able to run different search strategies and includes a specialised solver for shallow trees. Our experiments show that global best-first search with a heuristic that prioritises smaller nodes with better lower bounds has the best time to optimality and anytime performance, and that the specialised solver for shallow trees provides a geometric mean speedup of 77.4x.
Compared to the state-of-the-art for classification, we have comparable runtime, but our search strategy has four orders of magnitude fewer operations for some datasets, although there are diminishing returns for greater depths. For regression, our algorithm is significantly faster than the state-of-the-art and, to the best of our knowledge, the first to find optimal regression trees at depth four.
Optimal Regression Trees via Dynamic Programming
Optimization techniques for learning Regression Trees
Decision trees make decisions in a way interpretable to humans, this is important when machines are increasingly used to aid in making high-stakes and socially sensitive decisions. While heuristics have been used for a long time to find decision trees with reasonable accuracy, recent approaches find fully optimal trees. Due to the computational hardness of finding fully optimal decision trees, it is only practically possible to find shallow trees for a limited dataset size. However, continuous algorithmic improvements keep pushing the scale of feasible solutions. Dynamic Programming approaches promise to find scalable optimal decision trees but need to be adapted for different objectives, such as regression. We combine and adapt the algorithmic techniques of two Dynamic Programming methods, creating a new method that improves the scalability of optimal regression trees. This new method often achieves an order of magnitude speed improvement over a previous state-of-the-art method.
...
Decision trees make decisions in a way interpretable to humans, this is important when machines are increasingly used to aid in making high-stakes and socially sensitive decisions. While heuristics have been used for a long time to find decision trees with reasonable accuracy, recent approaches find fully optimal trees. Due to the computational hardness of finding fully optimal decision trees, it is only practically possible to find shallow trees for a limited dataset size. However, continuous algorithmic improvements keep pushing the scale of feasible solutions. Dynamic Programming approaches promise to find scalable optimal decision trees but need to be adapted for different objectives, such as regression. We combine and adapt the algorithmic techniques of two Dynamic Programming methods, creating a new method that improves the scalability of optimal regression trees. This new method often achieves an order of magnitude speed improvement over a previous state-of-the-art method.