Project Details
Description
Motivation and Objectives: According to Fortune Business Insights, Machine Learning (ML) software is projected to have a global market share of $117 billion by 2027. It has found applications in major areas including healthcare, transportation, and business analytics. However, the technology is yet to mature and can be unreliable. Bugs in ML software are highly complex, hard to solve due to their data-driven, non-deterministic nature and have potentially deadly consequences (e.g., the fatal crash of Uber's self-driving car). The standard procedure for correcting bugs is labour-intensive and inefficient, which takes up ~50% of a developer's time. The majority of this time is spent finding and understanding the faulty code before making actual code changes. To date, many approaches were designed to find and solve bugs in traditional software, but most are not accurate enough and are inadequate for application to ML software. My research program aims to design intelligent frameworks for understanding, finding, and reproducing bugs in ML software. Research Plan and Methodology: This proposal encompasses three complementary activities: (1) advancing the current understanding of ML bugs, (2) finding bugs in ML software, and (3) reproducing the identified bugs for a reliable diagnosis. First, we will construct a large dataset from ML applications found at GitHub to systematically study the characteristics and central challenges of ML bugs and analyze the effectiveness of traditional debugging solutions to these bugs. Second, we will design an intelligent framework that can (a) detect faulty components using intelligent Information Retrieval methods, (b) detect the faulty code within these components using their static properties and dynamic behaviours, and (c) complement these results with meaningful explanations (e.g., type of bug). Third, we will design an intelligent framework that can (a) help a developer understand how a bug might trigger, and (b) deliver appropriate test cases to reproduce the identified bugs in ML software using reinforcement learning and a technology sandbox. Novelty and Expected Significance: This research program has three novel aspects: (a) intelligent debugging supports for ML software, (b) extension of developer's cognitive abilities with machine intelligence, and (c) enrichment of tools' results with complementary information. It will advance the current state of research for cost-effective debugging and will also benefit parallel practices such as change management. My research will also produce tools that will be adopted by industry, such as through my collaborations with Mozilla Corporation and Canadian software companies. By supporting developers in solving ML bugs efficiently and by providing high-quality training to students in an area of acute need, this program will thus assist in the development of safe, reliable machine-learning software and significantly contribute to the Canadian economy.
Status | Active |
---|---|
Effective start/end date | 1/1/22 → … |
Funding
- Natural Sciences and Engineering Research Council of Canada: US$22,284.00
ASJC Scopus Subject Areas
- Artificial Intelligence
- Information Systems