From Axis-aligned to Oriented Bounding Boxes

An optimization method to reconstruct Oriented Bounding Boxes from Axis-aligned Bounding Boxes of vehicles in low-altitude aerial imagery

More Info
expand_more

Abstract

An increase in the intelligence of autonomous driving functionalities demands detailed analysis of the behaviour of traffic participants. This level of analysis requires datasets that accurately describe the movement of all objects in a specific scene. Recent developments in small Unmanned Aerial Vehicles (sUAVs) and drones introduce an interesting approach to extract this type of data: applying object detection and tracking methods to imagery captured by sUAVs/drones to extract kinematic parameters of traffic participants.
Analysing the behaviour of vehicles in urban traffic has been the topic of research for many years. However, vehicle detection and tracking pipelines used in the proposed approaches lack the required accuracy in estimating a vehicle's orientation and contours. Vehicle detection methods used in recent approaches detect vehicles as Axis-aligned Bounding Boxes (ABBs) that contain no information about a vehicle's orientation and contours. Alternatively, vehicles could be detected as Oriented Bounding Boxes (OBBs) that do contain this information. For low-altitude aerial imagery, however, the amount of publicly available data to train methods using OBB detections is scarce.
To fill the gap in data availability, this thesis introduces the novel `Axis-aligned to Oriented Bounding Boxes' methodology (A2OBB). Based on a non-linear least squares approximation of the geometrical relationship between an ABB and an OBB, A2OBB finds a set of OBB parameters that fit optimally in all provided ABBs. As such, A2OBB can be applied (1) as an enhancement tool to convert existing ABB annotations to OBBs or (2) as an extension to existing detection networks that perform detections in ABB format. This thesis contributes by formulating the novel A2OBB methodology and by performing a detailed analysis on its OBB reconstruction performance. In four experiments, an analysis is performed investigating the impact of two assumptions made in the proposed methodology on the reconstruction performance. The results of the experiments illustrate a sensitivity to: (1) a change in perspective between the observed vehicle and the camera, (2) a mismatch between the perceived shape of vehicles and the assumed rectangular shape and (3) the error introduced by the detection network (in the application of extending ABB detection networks). To reduce the impact of the shape mismatch, a correction factor is introduced that significantly improves the reconstruction performance. As such, A2OBB can be used to reconstruct OBBs from a set of manually annotated ABBs with a length and width approximation accurate within 5% and 15% of their respective ground-truths and an optimal orientation with an average error below 2.5°. Overall, manually annotated ABBs are converted to OBBs with an average Intersection over Union (IoU) of 85%. Detections resulting from an in-house out-of-the-box implementation of a YOLOv3 network are reconstructed with a length and width bias of 4.5% and 10% of their respective ground-truth and with an orientation error below 5° but above 2.5°. Overall, the reconstructed OBBs for these detected ABBs have an IoU of 77%.
Note that the orientations resulting from A2OBB’s reconstructions comprise one of four possible solutions. A subsequent step is required to find the correct version.