Measuring the Cost of Open Source Software Innovation on GitHub

Abstract

Open source software (OSS) is software that anyone can study, inspect, modify, and distribute freely under very limited restrictions such as attribution. While OSS is vital to virtually all aspects of modern society, including much of the core infrastructure for the Internet (e.g., Apache HTTP Server having the highest market share for HTTP servers on the Internet), there is currently no standard methodology to satisfactorily measure the scope and impact of these intangible assets. Today, GitHub is the world’s largest remote hosting platform with over 40 million users and 88 million public repositories. This study presents a framework to re-purpose GitHub’s administrative data to discover, profile, and measure the development of OSS. The data includes over 5 million original, non-deprecated repositories with a machine detectable Open Source Initiative (OSI) - approved license. For each repository, we collect metadata such as commits (e.g., author, timestamp, lines added and deleted), license, and information about contributors. Using a cost-approach method from software engineering, we harmonized the information to compare it with current information on software investment from the US national accounts. For that purpose, we developed a methodology to attribute direct contributions to US-based entities and classify these contributors into economic sectors to make the estimates comparable with the national accounts framework. Our current estimates for 2019 US investment on OSS is $34 billion. Lastly, we provide guidance on what our findings suggest in terms of assessing the appropriateness of the current national account framework to capture OSS and potential ways to improve it.

Date
2022-01-07 15:45 — 2022-01-09 17:45
Location
Remote (available on-demand through the ASSA 2022 App)