OPUS 4 | Search

Algolib, a generic algorithm library for JGraLab (2011)

Strauß, Sascha

Graphs are known to be a good representation of structured data. TGraphs, which are typed, attributed, ordered, and directed graphs, are a very general kind of graphs that can be used for many domains. The Java Graph Laboratory (JGraLab) provides an efficient implementation of TGraphs with all their properties. JGraLab ships with many features, including a query language (GReQL2) for extracting data from a graph. However, it lacks a generic library for important common graph algorithms. This mid-study thesis extends JGraLab with a generic algorithm library called Algolib, which provides a generic and extensible implementation of several important common graph algorithms. The major aspects of this work are the generic nature of Algolib, its extensibility, and the methods of software engineering that were used for achieving both. Algolib is designed to be extensible in two ways. Existing algorithms can be extended for solving specialized problems and further algorithms can be easily added to the library.

Analysing Sentiments on Wikipedia Concepts with varying Time and Geolocation Attributes (2018)

Ye, Qianhong

The purpose of this thesis is to explore the sentiment distributions of Wikipedia concepts. We analyse the sentiment of the entire English Wikipedia corpus, which includes 5,669,867 articles and 1,906,375 talks, by using a lexicon-based method with four different lexicons. Also, we explore the sentiment distributions from a time perspective using the sentiment scores obtained from our selected corpus. The results obtained have been compared not only between articles and talks but also among four lexicons: OL, MPQA, LIWC, and ANEW. Our findings show that among the four lexicons, MPQA has the highest sensitivity and ANEW has the lowest sensitivity to emotional expressions. Wikipedia articles show more sentiments than talks according to OL, MPQA, and LIWC, whereas Wikipedia talks show more sentiments than articles according to ANEW. Besides, the sentiment has a trend regarding time series, and each lexicon has its own bias regarding text describing different things. Moreover, our research provides three interactive widgets for visualising sentiment distributions for Wikipedia concepts regarding the time and geolocation attributes of concepts.

Approximate inference for assumption-based argumentation in AI (2021)

Sun, Chuyi

This thesis focuses on approximate inference in assumption-based argumentation frameworks. Argumentation provides a significant idea in the computerization of theoretical and practical reasoning in AI. And it has a close connection with AI, engaging in arguments to perform scientific reasoning. The fundamental approach in this field is abstract argumentation frameworks developed by Dung. Assumption-based argumentation can be regarded as an instance of abstract argumentation with structured arguments. When facing a large scale of data, a challenge of reasoning in assumption-based argumentation is how to construct arguments and resolve attacks over a given claim with minimal cost of computation and acceptable accuracy at the same time. This thesis proposes and investigates approximate methods that randomly select and construct samples of frameworks based on graphical dispute derivations to solve this problem. The presented approach aims to improve reasoning performance and get an acceptable trade-off between computational time and accuracy. The evaluation shows that for reasoning in assumption-based argumentation, in general, the running time is reduced with the cost of slightly low accuracy by randomly sampling and constructing inference rules for potential arguments over a query.

Aspects of Information Access: Modeling Navigation on Wikipedia (2018)

Dimitrov, Dimitar

Navigation is a natural way to explore and discover content in a digital environment. Hence, providers of online information systems such as Wikipedia---a free online encyclopedia---are interested in providing navigational support to their users. To this end, an essential task approached in this thesis is the analysis and modeling of navigational user behavior in information networks with the goal of paving the way for the improvement and maintenance of web-based systems. Using large-scale log data from Wikipedia, this thesis first studies information access by contrasting search and navigation as the two main information access paradigms on the Web. Second, this thesis validates and builds upon existing navigational hypotheses to introduce an adaptation of the well-known PageRank algorithm. This adaptation is an improvement of the standard PageRank random surfer navigation model that results in a more "reasonable surfer" by accounting for the visual position of links, the information network regions they lead to, and the textual similarity between the link source and target articles. Finally, using agent-based simulations, this thesis compares user models that have a different knowledge of the network topology in order to investigate the amount and type of network topological information needed for efficient navigation. An evaluation of agents' success on four different networks reveals that in order to navigate efficiently, users require only a small amount of high-quality knowledge of the network topology. Aside from the direct benefits to content ranking provided by the "reasonable surfer" version of PageRank, the empirical insights presented in this thesis may also have an impact on system design decisions and Wikipedia editor guidelines, i.e., for link placement and webpage layout.

Categorization and recognition of ontology refactoring pattern (2010)

Gröner, Gerd ; Staab, Steffen

Ontologies play an important role in knowledge representation for sharing information and collaboratively developing knowledge bases. They are changed, adapted and reused in different applications and domains resulting in multiple versions of an ontology. The comparison of different versions and the analysis of changes at a higher level of abstraction may be insightful to understand the changes that were applied to an ontology. While there is existing work on detecting (syntactical) differences and changes in ontologies, there is still a need in analyzing ontology changes at a higher level of abstraction like ontology evolution or refactoring pattern. In our approach we start from a classification of model refactoring patterns found in software engineering for identifying such refactoring patterns in OWL ontologies using DL reasoning to recognize these patterns.

Collaborative creation of semantic points of interest as linked data on the mobile phone (2010)

Braun, Max ; Scherp, Ansgar ; Staab, Steffen

The novel mobile application csxPOI (short for: collaborative, semantic, and context-aware points-of-interest) enables its users to collaboratively create, share, and modify semantic points of interest (POI). Semantic POIs describe geographic places with explicit semantic properties of a collaboratively created ontology. As the ontology includes multiple subclassiffcations and instantiations and as it links to DBpedia, the richness of annotation goes far beyond mere textual annotations such as tags. With the intuitive interface of csxPOI, users can easily create, delete, and modify their POIs and those shared by others. Thereby, the users adapt the structure of the ontology underlying the semantic annotations of the POIs. Data mining techniques are employed to cluster and thus improve the quality of the collaboratively created POIs. The semantic POIs and collaborative POI ontology are published as Linked Open Data.

Commonsense reasoning using path analysis on semantic networks (2019)

Mtarji, Adam

Commonsense reasoning can be seen as a process of identifying dependencies amongst events and actions. Understanding the circumstances surrounding these events requires background knowledge with sufficient breadth to cover a wide variety of domains. In the recent decades, there has been a lot of work in extracting commonsense knowledge, a number of these projects provide their collected data as semantic networks such as ConceptNet and CausalNet. In this thesis, we attempt to undertake the Choice Of Plausible Alternatives (COPA) challenge, a problem set with 1000 questions written in multiple-choice format with a premise and two alternative choices for each question. Our approach differs from previous work by using shortest paths between concepts in a causal graph with the edge weight as causality metric. We use CausalNet as primary network and implement a few design choices to explore the strengths and drawbacks of this approach, and propose an extension using ConceptNet by leveraging its commonsense knowledge base.

Comparing a Grid-based vs. List-based Approach for Faceted Search of Social Media Data on Mobile Devices (2012)

Schneider, Mark ; Scherp, Ansgar

In this paper, we compare two approaches for exploring large,rnhierarchical data spaces of social media data on mobile devicesrnusing facets. While the first approach arranges thernfacets in a 3x3 grid, the second approach makes use of arnscrollable list of facets for exploring the data. We have conductedrna between-group experiment of the two approachesrnwith 24 subjects (20 male, 4 female) executing the same set ofrntasks of typical mobile users" information needs. The resultsrnshow that the grid-based approach requires significantly morernclicks, but subjects need less time for completing the tasks.rnFurthermore, it shows that the additional clicks do not hamperrnthe subjects" satisfaction. Thus, the results suggest thatrnthe grid-based approach is a better choice for faceted searchrnon touchscreen mobile devices. To the best of our knowledge,rnsuch a summative evaluation of different approaches for facetedrnsearch on mobile devices has not been done so far.

Cultural neighbourhoods, or approaches to quantifying cultural contextualisation in multilingual knowledge repository Wikipedia (2021)

Samoilenko, Anna

As a multilingual system,Wikipedia provides many challenges for academics and engineers alike. One such challenge is cultural contextualisation of Wikipedia content, and the lack of approaches to effectively quantify it. Additionally, what seems to lack is the intent of establishing sound computational practices and frameworks for measuring cultural variations in the data. Current approaches seem to mostly be dictated by the data availability, which makes it difficult to apply them in other contexts. Another common drawback is that they rarely scale due to a significant qualitative or translation effort. To address these limitations, this thesis develops and tests two modular quantitative approaches. They are aimed at quantifying culture-related phenomena in systems which rely on multilingual user-generated content. In particular, they allow to: (1) operationalise a custom concept of culture in a system; (2) quantify and compare culture-specific content- or coverage biases in such a system; and (3) map a large scale landscape of shared cultural interests and focal points. Empirical validation of these approaches is split into two parts. First, an approach to mapping Wikipedia communities of shared co-editing interests is validated on two large Wikipedia datasets comprising multilateral geopolitical and linguistic editor communities. Both datasets reveal measurable clusters of consistent co-editing interest, and computationally confirm that these clusters correspond to existing colonial, religious, socio economic, and geographical ties. Second, an approach to quantifying content differences is validated on a multilingual Wikipedia dataset, and a multi-platform (Wikipedia and Encyclopedia Britannica) dataset. Both are limited to a selected knowledge domain of national history. This analysis allows, for the first time on the large scale, to quantify and visualise the distribution of historical focal points in the articles on national histories. All results are cross-validated either by domain experts, or external datasets. Main thesis contributions. This thesis: (1) presents an effort to formalise the process of measuring cultural variations in user-generated data; (2) introduces and tests two novel approaches to quantifying cultural contextualisation in multilingual data; (3) synthesises a valuable overview of literature on defining and quantifying culture; (4) provides important empirical insights on the effect of culture on Wikipedia content and coverage; demonstrates that Wikipedia is not contextfree, and these differences should not be treated as noise, but rather, as an important feature of the data. (5) makes practical service contributions through sharing data and visualisations.

Detecting Mental Distress: A Comprehensive Analysis of Online Discourses Via ML and NLP (2024)

Shah, Bhavya

This thesis explores and examines the effectiveness and efficacy of traditional machine learning (ML), advanced neural networks (NN) and state-of-the-art deep learning (DL) models for identifying mental distress indicators from the social media discourses based on Reddit and Twitter as they are immensely used by teenagers. Different NLP vectorization techniques like TF-IDF, Word2Vec, GloVe, and BERT embeddings are employed with ML models such as Decision Tree (DT), Random Forest (RF), Logistic Regression (LR) and Support Vector Machine (SVM) followed by NN models such as Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) to methodically analyse their impact as feature representation of models. DL models such as BERT, DistilBERT, MentalRoBERTa and MentalBERT are end-to-end fine tuned for classification task. This thesis also compares different text preprocessing techniques such as tokenization, stopword removal and lemmatization to assess their impact on model performance. Systematic experiments with different configuration of vectorization and preprocessing techniques in accordance with different model types and categories have been implemented to find the most effective configurations and to gauge the strengths, limitations, and capability to detect and interpret the mental distress indicators from the text. The results analysis reveals that MentalBERT DL model significantly outperformed all other model types and categories due to its specific pretraining on mental data as well as rigorous end-to-end fine tuning gave it an edge for detecting nuanced linguistic mental distress indicators from the complex contextual textual corpus. This insights from the results acknowledges the ML and NLP technologies high potential for developing complex AI systems for its intervention in the domain of mental health analysis. This thesis lays the foundation and directs the future work demonstrating the need for collaborative approach of different domain experts as well as to explore next generational large language models to develop robust and clinically approved mental health AI systems.

Refine

Author

Year of publication

Document Type

Language

Keywords

Institute

50 search hits