OPUS 4 | Search

Technical and Methodological Improvements to Mining Software Repositories (2024)

Härtel, Johannes

Empirical studies in software engineering use software repositories as data sources to understand software development. Repository data is either used to answer questions that guide the decision-making in the software development, or to provide tools that help with practical aspects of developers’ everyday work. Studies are classified into the field of Empirical Software Engineering (ESE), and more specifically into Mining Software Repositories (MSR). Studies working with repository data often focus on their results. Results are statements or tools, derived from the data, that help with practical aspects of software development. This thesis focuses on the methods and high order methods used to produce such results. In particular, we focus on incremental methods to scale the processing of repositories, declarative methods to compose a heterogeneous analysis, and high order methods used to reason about threats to methods operating on repositories. We summarize this as technical and methodological improvements. We contribute the improvements to methods and high-order methods in the context of MSR/ESE to produce future empirical results more effectively. We contribute the following improvements. We propose a method to improve the scalability of functions that abstract over repositories with high revision count in a theoretically founded way. We use insights on abstract algebra and program incrementalization to define a core interface of highorder functions that compute scalable static abstractions of a repository with many revisions. We evaluate the scalability of our method by benchmarks, comparing a prototype with available competitors in MSR/ESE. We propose a method to improve the definition of functions that abstract over a repository with a heterogeneous technology stack, by using concepts from declarative logic programming and combining them with ideas on megamodeling and linguistic architecture. We reproduce existing ideas on declarative logic programming with languages close to Datalog, coming from architecture recovery, source code querying, and static program analysis, and transfer them from the analysis of a homogeneous to a heterogeneous technology stack. We provide a prove-of-concept of such method in a case study. We propose a high-order method to improve the disambiguation of threats to methods used in MSR/ESE. We focus on a better disambiguation of threats, operationalizing reasoning about them, and making the implications to a valid data analysis methodology explicit, by using simulations. We encourage researchers to accomplish their work by implementing ‘fake’ simulations of their MSR/ESE scenarios, to operationalize relevant insights about alternative plausible results, negative results, potential threats and the used data analysis methodologies. We prove that such way of simulation based testing contributes to the disambiguation of threats in published MSR/ESE research.

Decentralized Fair Data Exchange with Minimal Mutual Trust using Distributed Ledgers (2024)

Lohr, Matthias

In international business relationships, such as international railway operations, large amounts of data can be exchanged among the parties involved. For the exchange of such data, a limited risk of being cheated by another party, e.g., by being provided with fake data, as well as reasonable cost and a foreseeable benefit, is expected. As the exchanged data can be used to make critical business decisions, there is a high incentive for one party to manipulate the data in its favor. To prevent this type of manipulation, mechanisms exist to ensure the integrity and authenticity of the data. In combination with a fair exchange protocol, it can be ensured that the integrity and authenticity of this data is maintained even when it is exchanged with another party. At the same time, such a protocol ensures that the exchange of data only takes place in conjunction with the agreed compensation, such as a payment, and that the payment is only made if the integrity and authenticity of the data is ensured as previously agreed. However, in order to be able to guarantee fairness, a fair exchange protocol must involve a trusted third party. To avoid fraud by a single centralized party acting as a trusted third party, current research proposes decentralizing the trusted third party, e.g., by using a distributed ledger based fair exchange protocol. However, for assessing the fairness of such an exchange, state-of-the-art approaches neglect costs arising for the parties conducting the fair exchange. This can result in a violation of the outlined expectation of reasonable cost, especially when distributed ledgers are involved, which are typically associated with non-negligible costs. Furthermore, the performance of typical distributed ledger-based fair exchange protocols is limited, posing an obstacle to widespread adoption. To overcome the challenges, in this thesis, we introduce the foundation for a data exchange platform allowing for a fully decentralized fair data exchange with reasonable cost and performance. As a theoretical foundation, we introduce the concept of cost fairness, which considers cost for the fairness assessment by requesting that a party following the fair exchange protocol never suffers any unilateral disadvantages. We prove that cost fairness cannot be achieved using typical public distributed ledgers but requires customized distributed ledger instances, which usually lack complete decentralization. However, we show that the highest unilateral cost are caused by a grieving attack. To allow fair data exchanges to be conducted with reasonable cost and performance, we introduce FairSCE, a distributed ledger-based fair exchange protocol using distributed ledger state channels and incorporating a mechanism to protect against grieving attacks, reducing the possible unilateral cost that have to be covered to a minimum. Based on our evaluation of FairSCE, the worst-case cost for data exchange, even in the presence of malicious parties, is known, which allows an estimate of the possible benefit and, thus, the preliminary estimate of economic utility. Furthermore, to allow for an unambiguous assessment of the correct data being transferred while still allowing for sensitive parts of the data to be masked, we introduce an approach for the hashing of hierarchically structured data, which can be used to ensure integrity and authenticity of the data being transferred.

Applications for Symbol Elimination in Combination with Hierarchical Reasoning (2024)

Peuter, Dennis

The goal of this PhD thesis is to investigate possibilities of using symbol elimination for solving problems over complex theories and analyze the applicability of such uniform approaches in different areas of application, such as verification, knowledge representation and graph theory. In the thesis we propose an approach to symbol elimination in complex theories that follows the general idea of combining hierarchical reasoning with symbol elimination in standard theories. We analyze how this general approach can be specialized and used in different areas of application. In the verification of parametric systems it is important to prove that certain safety properties hold. This can be done by showing that a property is an inductive invariant of the system, i.e. it holds in the initial state of the system and is invariant under updates of the system. Sometimes this is not the case for the condition itself, but for a stronger condition it is. In this thesis we propose a method for goal-directed invariant strengthening. In knowledge representation we often have to deal with huge ontologies. Combining two ontologies usually leads to new consequences, some of which may be false or undesired. We are interested in finding explanations for such unwanted consequences. For this we propose a method for computing interpolants in the description logics EL and EL⁺, based on a translation to the theory of semilattices with monotone operators and a certain form of interpolation in this theory. In wireless network theory one often deals with classes of geometric graphs in which the existence or non-existence of an edge between two vertices in a graph relies on properties on their distances to other nodes. One possibility to prove properties of those graphs or to analyze relations between the graph classes is to prove or disprove that one graph class is contained in the other. In this thesis we propose a method for checking inclusions between geometric graph classes.

Refine

Author

Year of publication

Document Type

Language

Institute

3 search hits