Assessing the Quality of Research Data Infrastructure Software

  • Smit, Mike (PI)

Project: Research project

Project Details

Description

Science is rapidly evolving, incorporating technology like autonomous vehicles, high-throughput scientific instruments, high-fidelity numerical models, and sensor networks, all generating data with increasing frequency, variety, and volume. Scientists are interested in sharing this data (or compelled to), which requires research data infrastructure (RDI: digital infrastructure organized to promote data sharing and consumption in support of research efforts). RDI software is often homegrown: created and deployed by people who have not received formal training in software engineering, or at organizations with primary mandates other than software development. These developers are often also users; they are adding features as they or their colleagues identify the need. Our understanding of software engineering as a field and practice does not universally translate to this software. This software is used by a growing set of users, but no one has systematically assessed its maintainability, longevity, technical debt, community resilience, or other indicators of long-term health, nor is it known if existing approaches to assessing these metrics are effective for RDI software. As RDI grows in importance and complexity, we must be confident in its reliability and accuracy. We know that atypical development processes can yield high-quality results, but we also know that RDI software is being stretched by new expectations for features like performance at scale, automated data cleaning, and visualization. This project will apply software quality metrics on maintainability, technical debt, sustainability, and the use of best practices to RDI software. There has been extensive research on these metrics for software created by professional developers. We will identify the most relevant metric and tools and apply them to case studies based on RDI for ocean science. We'll conduct ethnographic studies to compare what the metrics tell us with the lived experience of user communities and developers. Based on the results, we'll recommend metrics that work well and produce results that match reality. Software quality metrics will only have a practical impact on the quality of RDI software if developers realize benefit from using them. We will study how these metrics are perceived and integrated into development processes for RDI software, in comparison to existing studies on open source, commercial, and scientific software. Good science needs good data. Findable, Accessible, Interoperable, and Reusable data is only achievable if we have a research data infrastructure that is maintainable, reliable, and useable. While RDI is somewhat fractured at present, the long-term vision is a global network that links research data across disciplines and borders. A necessary precursor is ensuring that our software is up to this task. This project will contribute significantly to this need, while also advancing empirical software engineering research.

StatusActive
Effective start/end date1/1/22 → …

ASJC Scopus Subject Areas

  • Software
  • Information Systems