Software systems continue their exponential growth in size and complexity. With this growth, software quality and maintenance has become a major concern. Not only that maintenance efforts often eclipse that of the initial software development, but it has become the reason why many software systems become unsustainable over time.
Software engineers apply many refactoring techniques to maintain a reasonable code quality as the software evolves. These refactoring techniques are typically applied reactively only after quality degrades and maintenance becomes a concern. My research will aim to identify quality concerns and codebase degradation before it materializes. This way, refactoring’s can be applied proactively well before the emergence of deficiencies.
There is a significant and growing research that aims to predict software failure and software bugs using many techniques including machine learning. My research will build on these advancements and will focus on predicting code quality attributes, rather than predicting failures and bugs. One specific paper has ignited my interest in this research area authored by Liu [14]. In this paper, the authors applied a deep learning technique using several neural networks to predict the emergence of code smells. In this approach, the authors used a data set from Quality Corpus Repository [17] and extracted several code qualities features. The approach demonstrated the significant potential of machine learning in predicting code smells over time. The deep learning approach was found to be significantly superior to the most prevalent techniques and code analysis tools.
There are numerous open questions and challenges in this research area. The first big challenge is the availability of sufficiently large and appropriately labeled training datasets. Another big challenge is to identify the features that impact code quality evolution over time. And yet another challenge is to identify and evaluate the most appropriate and effective machine learning technique to predict several aspects of code quality attributes. I expect that my research will 1) create a large number of labeled data sets that can help propel research in this area, 2) identify key features that have the most impact on code quality and its evolution, and 3) contribute to identifying the effectiveness of several machine learning techniques in achieving the desired predictive code quality analysis.
- Seminar Lecture on an IEEE transaction paper – [ Presentation ]