Sub-document Timestamping:

A Study on the Content Creation Dynamics of Web Documents

Conference Paper (2016)
Author(s)

Yue Zhao (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Claudia Hauff (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group
Web Information Systems
URL related publication
http://10.1007/978-3-319-43997-6_16
More Info
expand_more
Publication Year
2016
Language
English
Research Group
Web Information Systems
Pages (from-to)
203-214
Publisher
Springer
ISBN (print)
978-3-319-43996-9
ISBN (electronic)
978-3-319-43997-6
Downloads counter
144

Abstract

The creation time of documents is an important kind of information in temporal information retrieval, especially for document clustering, timeline construction and search engine improvements. Considering the manner in which content on the Web is created, updated & deleted, the common assumption that each document has only one creation time is not suitable for Web documents. In this paper, we investigate to what extent this assumption is wrong. We introduce two methods to timestamp individual parts (sub-documents) of Web documents and analyze in detail the creation & update dynamics of three classes of Web documents.