Master's Thesis
Refine
Year of publication
- 2013 (2) (remove)
Document Type
- Master's Thesis (2) (remove)
Language
- English (2) (remove)
Keywords
- Text Analysis (1)
- Text Mining (1)
Large amounts of qualitative data make the utilization of computer-assisted methods for their analysis inevitable. In this thesis Text Mining as an interdisciplinary approach, as well as the methods established in the empirical social sciences for analyzing written utterances are introduced. On this basis a process of extracting concept networks from texts is outlined and the possibilities of utilitzing natural language processing methods within are highlighted. The core of this process is text processing, to whose execution software solutions supporting manual as well as automated work are necessary. The requirements to be met by these solutions, against the background of the initiating project GLODERS, which is devoted to investigating extortion racket systems as part of the global fiσnancial system, are presented, and their fulσlment by the two most preeminent candidates reviewed. The gap between theory and pratical application is closed by a prototypical application of the method to a data set of the research project utilizing the two given software solutions.
We present the conceptual and technological foundations of a distributed natural language interface employing a graph-based parsing approach. The parsing model developed in this thesis generates a semantic representation of a natural language query in a 3-staged, transition-based process using probabilistic patterns. The semantic representation of a natural language query is modeled in terms of a graph, which represents entities as nodes connected by edges representing relations between entities. The presented system architecture provides the concept of a natural language interface that is both independent in terms of the included vocabularies for parsing the syntax and semantics of the input query, as well as the knowledge sources that are consulted for retrieving search results. This functionality is achieved by modularizing the system's components, addressing external data sources by flexible modules which can be modified at runtime. We evaluate the system's performance by testing the accuracy of the syntactic parser, the precision of the retrieved search results as well as the speed of the prototype.