On correlations between vulnerabilities, quality-, and design-metrics
- Over the past few decades society’s dependence on software systems has grown significantly. These systems are utilized in nearly every matter of life today and often handle sensitive, private data. This situation has turned software security analysis into an essential and widely researched topic in the field of computer science. Researchers in this field tend to make the assumption that the quality of the software systems' code directly affects the possibility for security gaps to arise in it. Because this assumption is based on properties of the code, proving it true would mean that security assessments can be performed on software, even before a certain version of it is released. A study based on this implication has already attempted to mathematically assess the existence of such a correlation, studying it based on quality and security metric calculations. The present study builds upon that study in finding an automatic method for choosing well-fitted software projects as a sample for this correlation analysis and extends the variety of projects considered for the it. In this thesis, the automatic generation of graphical representations both for the correlations between the metrics as well as for their evolution is also introduced. With these improvements, this thesis verifies the results of the previous study with a different and broader project input. It also focuses on analyzing the correlations between the quality and security metrics to real-world vulnerability data metrics. The data is extracted and evaluated from dedicated software vulnerability information sources and serves to represent the existence of proven security weaknesses in the studied software. The study discusses some of the difficulties that arise when trying to gather such information and link it to the difference in the information contained in the repositories of the studied projects. This thesis confirms the significant influence that quality metrics have on each other. It also shows that it is important to view them together as a whole and suppose that their correlation could influence the appearance of unwanted vulnerabilities as well. One of the important conclusions I can draw from this thesis is that the visualization of metric evolution graphs, helps the understanding of the values as well as their connection to each other in a more meaningful way. It allows for better grasp of their influence on each other as opposed to only studying their correlation values. This study confirms that studying metric correlations and evolution trends can help developers improve their projects and prevent them from becoming difficult to extend and maintain, increasing the potential for good quality as well as more secure software code.