PathMiner

A library for mining of path-based representations of code

Conference Paper (2019)
Author(s)

Vladimir Kovalenko (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Egor Bogomolov (National Research University Higher School of Economics (HSE University))

Timofey Bryksin (National Research University Higher School of Economics (HSE University))

Alberto Bacchelli (Universitat Zurich)

Research Group
Software Engineering
DOI related publication
https://doi.org/10.1109/MSR.2019.00013 Final published version
More Info
expand_more
Publication Year
2019
Language
English
Research Group
Software Engineering
Article number
8816777
Pages (from-to)
13-17
ISBN (print)
978-1-7281-3370-6
ISBN (electronic)
978-1-7281-3412-3
Event
16th IEEE/ACM International Conference on Mining Software Repositories, MSR 2019 (2019-05-26 - 2019-05-27), Montreal, Canada
Downloads counter
312
Collections
Institutional Repository
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

One recent, significant advance in modeling source code for machine learning algorithms has been the introduction of path-based representation - an approach consisting in representing a snippet of code as a collection of paths from its syntax tree. Such representation efficiently captures the structure of code, which, in turn, carries its semantics and other information. Building the path-based representation involves parsing the code and extracting the paths from its syntax tree; these steps build up to a substantial technical job. With no common reusable toolkit existing for this task, the burden of mining diverts the focus of researchers from the essential work and hinders newcomers in the field of machine learning on code. In this paper, we present PathMiner - an open-source library for mining path-based representations of code. PathMiner is fast, flexible, well-tested, and easily extensible to support input code in any common programming language. Preprint [https://doi.org/10.5281/zenodo.2595271]; released tool [https://doi.org/10.5281/zenodo.2595257].

Files

Pathminer_preprint.pdf
(pdf | 0.586 Mb)
License info not available