Patentopia

A multi-stage patent extraction platform with disambiguation for certain semantic challenges

Conference Paper (2022)
Author(s)

Andrea Belz (University of Southern California)

Alexandra Graddy-Reed (University of Southern California)

FNU Shweta (University of Southern California)

Aleksandar Giga (TU Delft - Delft Centre for Entrepreneurship)

Shivesh Meenakshi Murali (University of Southern California)

Department
Delft Centre for Entrepreneurship
Copyright
© 2022 Andrea Belz, Alexandra Graddy-Reed, FNU Shweta, A. Giga, Shivesh Meenakshi Murali
DOI related publication
https://doi.org/10.1109/BigData55660.2022.10020918
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Andrea Belz, Alexandra Graddy-Reed, FNU Shweta, A. Giga, Shivesh Meenakshi Murali
Department
Delft Centre for Entrepreneurship
Bibliographical Note
Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.@en
Pages (from-to)
3478-3485
ISBN (print)
978-1-6654-8046-8
ISBN (electronic)
978-1-6654-8045-1
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Bibliographic name disambiguation is an major semantic challenge, but critical to social sciences studies of important intellectual assets. Here we contribute to innovation research in several ways. We show a significant synonym problem in author names and discuss how a pre-processing heuristic step standardizing name variants helps, but homonyms generated with Chinese names are particularly difficult to resolve and manifest in an associated location list. Here we identify a new phenomenon of "onomastic profusion," the frequent use of certain words in firm names for semantic reasons that can confound disambiguation clustering algorithms. We illustrate these concerns with Patentopia, our customized platform accessing the PatentsView portal for the United States Patent and Trademark Office database and available for free academic use. This multi-stage system uses heuristics in concert with the PatentsView clustering process and reports meta-data to further assist analysis. As highly relevant use cases, we illustrate system performance with data derived from two important public innovation programs, I-Corps and Small Business Innovation Research (SBIR), and we close with implications for bibliometric analysis of current patent data.

Files

Patentopia_A_multi_stage_paten... (pdf)
(pdf | 1.3 Mb)
- Embargo expired in 26-07-2023
License info not available