The use of machine learning to identify the correctness of HS Code for the customs import declarations

Conference Paper (2021)
Author(s)

Hao Chen (IBM Ireland)

Ben van Rijnsoever (IBM Nederland)

Marcel Molenhuis (Customs Administration of the Netherlands)

Dennis van Dijk (Customs Administration of the Netherlands)

Yao Hua Tan (TU Delft - Information and Communication Technology)

B.D. Rukanova (TU Delft - Information and Communication Technology)

Research Group
Information and Communication Technology
Copyright
© 2021 Hao Chen, Ben Van Rijnsoever, Marcel Molenhuis, Dennis van Dijk, Y. Tan, B.D. Rukanova
DOI related publication
https://doi.org/10.1109/DSAA53316.2021.9564203
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Hao Chen, Ben Van Rijnsoever, Marcel Molenhuis, Dennis van Dijk, Y. Tan, B.D. Rukanova
Research Group
Information and Communication Technology
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

As an increasing volume of international trade activities around the world, the amount of cross-boarder import declarations grows rapidly, resulting in an unprecedented scale of potentially fraudulent transactions, in particular false commodity code (e.g., HS Code). The incorrect HS Code will cause duty risk and adversely impact the revenue collection. Physical investigation by the customs administrations is impractical due to the substantial quantity of declarations. This paper provides an automatic approach by harnessing the power of machine learning techniques to relief the burden of customs targeting officers. We introduced a novel model based on the off-the-shelf embedding encoder to identify the correctness of HS Code without any human effort. Determining whether the HS Code is correctly matched with commodity description is a classification task, so the labelled data is typically required. However, the lack of gold standard labelled data sets in customs domain limits the development of supervised-based approach. Our model is developed by the unsupervised mechanism and trained on the unlabelled historical declaration records, which is robust and able to be smoothly adapted by the different customs administrations. Rather than typically classifying whether the HS Code is correct or not, our model predicts the score to indicate the degree of the HS Code being correct. We have evaluated our proposed model on the ground-truth data set provided by Dutch customs officers. Results show promising performance of 71% overall accuracy.

Files

The_use_of_machine_learning_to... (pdf)
(pdf | 1.12 Mb)
- Embargo expired in 06-04-2022
License info not available