How do Metric Score Distributions affect the Type i Error Rate of Statistical Significance Tests in Information Retrieval?