Ontology alignment is widely used to find the correspondences between different ontologies in diverse fields. After discovering the alignments, several performance scores are available to evaluate them. The scores typically require the identified alignment and a reference containing the underlying actual correspondences of the given ontologies. The current trend in the alignment evaluation is to put forward a new score (e.g., precision, weighted precision, semantic precision, etc.) and to compare various alignments by juxtaposing the obtained scores. However, it is substantially provocative to select one measure among others for comparison. On top of that, claiming if one system has a better performance than one another cannot be substantiated solely by comparing two scalars. In this article, we propose the statistical procedures that enable us to theoretically favor one system over one another. The McNemar's test is the statistical means by which the comparison of two ontology alignment systems over one matching task is drawn. The test applies to a 2 × 2 contingency table, which can be constructed in two different ways based on the alignments, each of which has their own merits/pitfalls. The ways of the contingency table construction and various apposite statistics from the McNemar's test are elaborated in minute detail. In the case of having more than two alignment systems for comparison, the family wise error rate is expected to happen. Thus, the ways of preventing such an error are also discussed. A directed graph visualizes the outcome of the McNemar's test in the presence of multiple alignment systems. From this graph, it is readily understood if one system is better than one another or if their differences are imperceptible. The proposed statistical methodologies are applied to the systems participated in the OAEI 2016 anatomy track, and also compares several well-known similarity metrics for the same matching problem. © 2018 ACM.