Evidence-based software portfolio management: a tool description and evaluation

Context: In this paper we describe and evaluate a tool for Evidence-Based Software Portfolio Management (EBSPM) that we developed over time in close cooperation with software practitioners from The Netherlands and Belgium. Objectives: The goal of the EBSPM-tool is to measure, analyze, and benchmark the performance of interconnected sets of software projects in terms of size, cost, duration, and number of defects, in order to support innovation of a company's software delivery capability. The tool supports building and maintaining a research repository of finalized software projects from different companies, business domains, and delivery approaches. Method: The tool consists of two parts. First, a Research Repository, at this moment holding data of for now 490 finalized software projects, from four different companies. Second, a Performance Dashboard, built from a so-called Cost Duration Matrix. Results: We evaluated the tool by describing its use in two practical applications in case studies in industry. Conclusions: We show that the EBSPM-tool can be used successfully in an industrial context, especially regarding its benchmarking and visualization purposes.


INTRODUCTION
Benchmarking is an important part of learning how and where to innovate for software companies. In this paper we describe a software project benchmark tool that compares the performance of software projects in terms of cost, duration, number of defects, and size with a measurement repository of finalized software projects from different companies.
For a period of seven years and ongoing we collected performance data of finalized software projects in industry, in close cooperation with a number of large banking and telecom companies in The Netherlands and Belgium. Based on this we built a research repository of core metrics data of, by now, approximately 500 software projects.

We noticed four shortcomings
During the collecting and analyzing of project data in industrial practice we experienced four major shortcomings.
First, many software companies and software researchers look upon success and failure of software projects in a way that is strongly related to realizing the estimated plan, as for example described in Standish CHAOS research [2]. In recent research we argue that this focus might be wrong because finalizing a project according to its plan does not imply that the achieved performance is good too (maybe the plan was just bad?) [3].
Second, although many software benchmark repositories are available -Jones [4] identifies in 2011 a number of no-less than twentyfive sources of software benchmarks; Menzies and Zimmermann [5] mention thirteen repositories of software engineering data -we experience in practice that many practitioners and researchers struggle with the question on 'how to convince decision makers based on facts and evidence from benchmark repositories'. Although some more or less open solutions are found (e.g. ISBSG, Promise), the majority of the benchmark sources are data collections that are available from commercial companies. Most benchmark sources give no insight in the raw data; especially commercial benchmarks tend to offer trends and aggregated data only.
Third, almost none of the benchmark sources include cost data of software projects as a source for productivity. All benchmarks use effort as a core metric for productivity indicators. In itself, the choice of effort as a core metric for productivity is correct. In the practice of collecting data for our research repository, however, we noticed that it is challenging to collect reliable effort data of software projects.
Fourth, in practice this vast variety of benchmark sources makes it difficult for practitioners to decide on how to implement a mature measurement and analysis capability that suites the needs of a company's decision makers. As an effect of on the one hand great variety of benchmark approaches and on the other low degree of standardization, we observe that a change of a measurement and analysis approach often can be linked with changes in decision makers and with changes in a primary development approach. As an example: a misunderstanding that apparently lives in many companies is that "going agile means opting for a new measurement approach too".

A conceptually new solution
In order to deal with these shortcomings, and driven by the somewhat lagging evidence-based software engineering capability in both research and industry [6], and our deeply felt importance of software companies to get more attention for evidence-based software engineering from a software portfolio point of view, we developed a conceptually new instrument, the Cost Duration Matrix. The primary goal of this matrix is to identify good practice software Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. deliveries and bad practice software projects within the scope of a company's software delivery portfolio as a whole, or within the scope of a broader, industry-wide portfolio. Subsequently from this subdivision we performed further analysis on factors that could be related strongly to good practice and factors that could be linked to bad practice [1].
In our analysis approach we compare all sorts of software projects, whether these are plan-driven projects, repeating iterations in a release, or deliveries after one or more sprints in Scrum, as part of a software portfolio as a whole. In earlier research we showed how analyzing software delivery portfolios in a binary way (e.g. better than average or worse than average), yet from the angle of different metrics (e.g. cost, duration, defects) can help software companies to define success or failure and to understand where realistic improvements are achievable [1].
We named our approach Evidence-Based Software Portfolio Management (EBSPM). Hence, we refer to our tool as the EBSPM-tool. In particular, the main contributions of the tool are twofold. First, it positions a Cost Duration Matrix as core instrument within a Performance Dashboard for analysis of good practice and bad practice in company-wide portfolios of software projects. Second, it provides a research repository, holding data of industrial software projects from different companies, on a standardized set of metrics: size, cost, duration, and defects.
The remainder of this paper is organized in the following way: In Section 2 we outline relevant prior work. In Section 3 we describe the EBSPM-tool and its functional components. In order to evaluate the EBSPM-tool we describe in Section 4 how the tool was used in two scenarios in industrial practice. In Section 5 we discuss the applicability of the tool for research purposes, in a practical setting   (1)  in industry, and in education. Finally, Section 6 includes a summary of the current status the EBSPM-tool.

RELATED WORK
Although many sources for benchmarking of software engineering are available [4] [5], the need for good economic models will grow rather than diminish as software becomes increasingly ubiquitous [7]. There is a lack of successful and agreed upon standards metrics set and selection processes [8]. We assume that especially follow up on studies with regard to software estimation techniques and algorithms [

THE EBSPM-TOOL
The EBSPM-tool offers two basic (initial) features, a Research Repository, and a Performance Dashboard, including a Cost Duration Matrix. These initial features are described in the following paragraphs.

Research Repository
An important feature of the EBSPM-tool is the availability of a research repository, including data of approximately 500 finalized software projects from different software companies in The Netherlands and Belgium, which is the source for the benchmark functionality. Table 1 gives an overview of the research repository, including the different aspects that are analyzed in our research. The data in the research repository is stored in a MS Excel file.
All software deliveries within the repository are measured over a period of nine years in four different companies (banking, telecom, supplier of a business-to-business software solution). Deliveries are measured by experienced, often certified, specialists. Delivery data was based on formal project administrations and reviewed by stakeholders (e.g. project managers, product owners, finance departments, project support). All deliveries were reviewed thoroughly by the lead author before they were included in the research repository. An important difference of the data in the EBSPM research repository with other software engineering repositories is that we collected were possible a company's software project portfolio as a whole. Where other repositories focus at projects, we focus at portfolios instead. This enables us to analyze good practice versus bad practice. A second major difference is that instead focus on collecting effort data of software projects, we focus at collecting cost data besides effort data too.

Performance Dashboard
In order to visualize the outcomes of the analysis we build a Performance Dashboard in the Business Intelligence (BI) solution

Figure 1. The performance dashboard in the EBSPM-tool showing a sample of 172 projects from the ISBSG [27] repository plotted against the EBSPM-research repository. The size of the data points indicates the size of a software project (bigger circles indicate bigger projects in FPs). The color of the data points indicates the quality (redder circles indicate more defects per FP).
Tableau. The core component within the Performance Dashboard (see Figure 1) is a Cost Duration Matrix. Each software project is depicted as a circle; the larger the circle, the larger the project is (in function points), and the 'redder' the project is, the more defects per function point are found. The position of each project in the matrix represents the cost and duration deviation of the delivery relative to the benchmark, expressed as percentages. The horizontal and vertical 0%-lines represent zero deviation, i.e. projects that are exactly consistent with the benchmark. A delivery at (0%, 0%) would be one that behaves exactly in accordance with the benchmark; a delivery at (-100%, -100%) would cost nothing and be ready immediately; and a delivery at (+100%, +100%) would be twice as expensive and takes twice as long as expected from the benchmark. Based on this percentage all deliveries from the repository are plotted in a Cost Duration Matrix, resulting in four quadrants: 1. Good Practice (upper right): projects that score better than the average of the total repository for both cost and duration. 2. Cost over Time (bottom right): projects that score better than the average of the total repository for cost, yet worse than average for duration. 3. Bad Practice (bottom left): projects that score worse than the average of the total repository for both cost and duration. 4. Time over Cost (upper left): projects that score better than the average of the total repository for duration, however worse than average for delivery cost.
The overall performance of the portfolio is furthermore summarized through the two red 'median' lines. For each project in the research repository three Key Performance Indicators are calculated as a measure for the performance with regard to Cost per FP, Duration per FP, and Defects per FP.

EVALUATION
In order to evaluate the usefulness of the EBSPM-tool, we describe its application in two recent case studies [3] [26] that we performed in industry. Within these studies we used the EBSPM-tool for analysis, benchmarking and visualization purposes. In both case studies we used the tool to analyze and visualize a specific subset of software projects against the tools repository.
In the first study [3] we analyzed a sample of 22 software projects that were performed within a Belgian telecom company. All sampled projects were in scope of an electronic survey among stakeholders from IT and business departments that were involved in a project. In the survey we assessed stakeholder satisfaction and perceived value. In the subsequent analysis we compared the outcomes of the survey with quantitative project metrics such as project size, project cost, project duration, number of defects, and Estimation Quality Factor (EQF) for both duration and cost. We calculated a Cost Duration Index, based on the relative position of a project in the Cost Duration Matrix of the EBSPM-tool [3].
In the second study [26] we performed causal analysis on a sample of nine software releases and eight single once-only projects, that all performed on the same CRM-system in a Belgian telecom company. In order to analyze the performances of each project we interviewed eleven stakeholders from both IT and business on the backgrounds of the projects. The case study resulted in a number of observations from both quantitative and qualitative analysis [26].
In both studies we used the EBSPM-tool to visualize the applicable research sample against the EBSPM research repository as a whole. We used the Performance Dashboard in two ways. First, we included print screens of the dashboard with a sample of projects in the applicable research papers [3] [26]. Secondly we made intensive use of the Performance Dashboard as an interactive visualization during presentations about our case studies to stakeholders within the Belgian telecom company. The fact that the dashboard enables fast interactions and fast selections in company-in-scope, business domain, project name, and development method helped us to gain management commitment and to advice on actionable follow-up actions.

Evaluation of Validity
We observe large differences between both repositories. The difference in average size of projects in both repositories might partly explain these. However, within the somewhat limited scope of this tool description we cannot argue on any causes that might explain these dissimilarities. Further research is needed to find out any backgrounds of these remarkable differences.

Impact / Implications
What can we do with the EBSPM-tool and the results from this short study?
We try to answer this question looking from three angles; research, practice, and education.
Research. We offer our research repository available for other researchers for research and education purposes. We are currently discussing how we can make the repository publicly available via Promise with regard to compliancy issues with cooperating companies. Based on the example given in Figure 1, we argue that further research is needed on the differences found between both ISBSG data and our research repository.
Practice. Although already a vast amount of benchmarking sources is to be found, we observe that collecting historic data of software deliveries in itself is a task that learns data analysts and decision makers in software companies a lot about their software delivery capability. The outcomes of the evaluation of the EBSPM-tool in an industrial context learns that companies should be careful in adopting one single source of benchmarking, especially when collected outside their own environment, as the truth.
As a spin-off from our research on EBSPM, we are currently cooperating with a number of software companies in order to further improve our analysis approach and characteristics of metrics to be collected (e.g. we focus on value and satisfaction too), and to enlarge and further evaluate the content of our research repository.
Education. Software engineering economics, especially from an evidence-based point of view, is usually only as part of other disciplines included in the vocabulary of universities. We think that the EBSPM-tool can help to clarify the importance of economic aspects of software development to students and teachers as it visualizes the somewhat diffuse and large content of a companies' software portfolio repository in a conveniently arranged way.

CONCLUSIONS
In order to evaluate the EBSPM-tool we looked at the applicability of both the research repository and the Performance Dashboard in an industrial context. We found that the tool supports analysis of software project and portfolio performances. Besides that, the tool enables both internal and external benchmarking. A major difference with other tools is that the EBSPM-tool values project cost data above project effort.