Bi-VAKs: Bi-Temporal Versioning Approach for Knowledge Graphs

Master Thesis (2022)
Author(s)

L.H. Meijer (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Christoph Lofi – Mentor (TU Delft - Web Information Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 Lisa Meijer
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Lisa Meijer
Graduation Date
27-07-2022
Awarding Institution
Delft University of Technology
Programme
['Computer Science | Data Science and Technology']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Over time Linked Data collections are continuously subject to change because of numerous reasons. Users could insert new observations, or they could rectify erroneous statements in these knowledge graphs. In order not to lose historically import information, this trend of evolving Linked Data collections increases the need to version these collections. Furthermore, retrieving prior versions and their in-between changes could provide Linked Data users relevant information. However, for some changes to these collections we should record both their transaction time and their valid time. To address these two problems of versioning Linked Data collections and having bi-temporal changes, we introduce the Bi-Temporal Versioning Approach for Knowledge graphs (Bi-VAKs): a prototypical bi-temporal change-based Version Control System for an arbitrary RDF dataset. Bi-VAKs registers both the transaction time and the valid time of a set of modified quads, and therefore, it allows for coupled historical and retrospective SPARQL queries. In addition, in order to enhance collaboration between its users Bi-VAKs also keeps track on provenance data; it supports diverged states; and provide a standard data access interface. However, since the standard RDF data model is atemporal, defining such a set of modifications (update) in RDF poses difficult challenges. Firstly, to indicate this transaction time and the valid time of an update Bi-VAKs divides a revision or version into a transaction revision and a valid revision. And hence it directly separates the metadata from the actual data. Secondly, to denote and retrieve the modified triples/quads within a update Bi-VAKs uses RDF-star and SPARQL-star. In order to connect these revisions we develop three reference strategies: the explicit, the implicit, and the combined reference strategy. These strategies let a transaction revision either refer explicitly to its corresponding valid revision(s) or implicitly by the same revision number and branch index. Based on these strategies we initiate some different approaches to query the updates from the revision-store. And, we propose some different methods to construct a (prior) version. However, to evaluate these different design decision we cannot use the existing uni-temporal benchmarks for our bi-temporal versioning approach. Therefore, we expand the BEAR benchmark to a bi-temporal benchmark (Bi-BEAR). By means of this benchmark we demonstrate that all three reference strategies have about the same storage size. We notice that the usage of a snapshot and retrieval of all updates worsen the version materialisation (VM), delta materialisation (DM), and version (VQ) query performance. In addition, the VM query look up time considerably decreases if only the matching updates are queried. And modified updates, branches, and more updates in the revision-store slightly lower the VM, DM, and VQ query performance. In addition, for the implicit and combined reference strategy the query time is rather the same, and sometimes even better if we sort the updates instead of aggregating them directly. Overall, the implicit reference strategy is performing best, and is quickly followed by the combined reference strategy.

Files

MSc_thesis_Lisa_Meijer.pdf
(pdf | 6.74 Mb)
License info not available