Exploring the Feasibility of Crowd-Powered Decomposition of Complex User Questions in Text-to-SQL Tasks

None, None; None, None; None, None; None, None

Exploring the Feasibility of Crowd-Powered Decomposition of Complex User Questions in Text-to-SQL Tasks

Conference Paper (2022)

Author(s)

S. Salimzadeh (TU Delft - Web Information Systems)

Ujwal Gadiraju (TU Delft - Web Information Systems)

C Hauff (TU Delft - Web Information Systems)

A. van Deursen (TU Delft - Software Technology)

Research Group

Web Information Systems

Copyright

DOI related publication

https://doi.org/10.1145/3511095.3531282

Crowdsourcing Human Computation Corpus Annotation Natural Language Interface to Databases Semantic Parsing Text-to-SQL

To reference this document use:

https://resolver.tudelft.nl/uuid:966a5972-ca66-414a-9556-a2031a5582a2

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Research Group

Web Information Systems

Pages (from-to)

154-165

ISBN (electronic)

978-1-4503-9233-4

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Natural Language Interfaces to Databases (NLIDB), also known as Text-to-SQL models, enable users with different levels of knowledge in Structured Query Language (SQL) to access relational databases without any programming effort. By translating natural languages into SQL query, not only do NLIDBs minimize the burden of memorizing the schema of databases and writing complex SQL queries, but they also allow non-experts to acquire information from databases in natural languages. However, existing NLIDBs largely fail to translate natural languages to SQL when they are complex, preventing them from being deployed in real-world scenarios and generalizing across unseen complex databases. In this paper, we explored the feasibility of decomposing complex user questions into multiple sub-questions - each with a reduced complexity - as a means to circumvent the problem of complex SQL generation. We investigated the feasibility of decomposing complex user questions in a manner that each sub-question is simple enough for existing NLIDBs to generate correct SQL queries, using non-expert crowd workers in juxtaposition with SQL experts. Through an empirical study on an NLIDB benchmark dataset, we found that crowd-powered decomposition of complex user questions led to an accuracy boost of an existing Text-to-SQL pipeline from 30% to 59% (96% accuracy boost). Similarly, decomposition by SQL experts resulted in boosting the accuracy to 76% (153% accuracy boost). Our findings suggest that crowd-powered decomposition can be a scalable alternative to producing the training data necessary to build machine learning models that can automatically decompose complex user questions, thereby improving Text-to-SQL pipelines.

Files

3511095.3531282.pdf

(pdf | 1.17 Mb)