Sub-document timestamping of web documents

More Info
expand_more

Abstract

Knowledge about a (Web) document's creation time has been shown to be an important factor in various temporal information retrieval settings. Commonly, it is assumed that such documents were created at a single point in time. While this assumption may hold for news articles and similar document types, it is a clear oversimplification for general Web documents. In this paper, we investigate to what extent (i) this simplifying assumption is violated for a corpus of Web documents, and, (ii) it is possible to accurately estimate the creation time of individual Web documents' components (so-called sub-documents).