Refine
Year of publication
Document Type
- Master's Thesis (93) (remove)
Language
- English (93) (remove)
Keywords
Assessing ChatGPT’s Performance in Analyzing Students’ Sentiments: A Case Study in Course Feedback
(2024)
The emergence of large language models (LLMs) like ChatGPT has impacted fields such as education, transforming natural language processing (NLP) tasks like sentiment analysis. Transformers form the foundation of LLMs, with BERT, XLNet, and GPT as key examples. ChatGPT, developed by OpenAI, is a state-of-the-art model and its ability in natural language tasks makes it a potential tool in sentiment analysis. This thesis reviews current sentiment analysis methods and examines ChatGPT’s ability to analyze sentiments across three labels (Negative, Neutral, Positive) and five labels (Very Negative, Negative, Neutral, Positive, Very Positive) on a dataset of student course reviews. Its performance is compared with fine tuned state-of-the-art models like BERT, XLNet, bart-large-mnli, and RoBERTa-large-mnli using quantitative metrics. With the help of 7 prompting techniques which are ways to instruct ChatGPT, this work also analyzed how well it understands complex linguistic nuances in the given texts using qualitative metrics. BERT and XLNet outperform ChatGPT mainly due to their bidirectional nature, which allows them to understand the full context of a sentence, not just left to right. This, combined with fine-tuning, helps them capture patterns and nuances better. ChatGPT, as a general purpose, open-domain model, processes text unidirectionally, which can limit its context understanding. Despite this, ChatGPT performed comparably to XLNet and BERT in three-label scenarios and outperformed others. Fine-tuned models excelled in five label cases. Moreover, it has shown impressive knowledge of the language. Chain-of-Thought (CoT) was the most effective technique for prompting with step by step instructions. ChatGPT showed promising performance in correctness, consistency, relevance, and robustness, except for detecting Irony. As education evolves with diverse learning environments, effective feedback analysis becomes increasingly valuable. Addressing ChatGPT’s limitations and leveraging its strengths could enhance personalized learning through better sentiment analysis.
Exploring Academic Perspectives: Sentiments and Discourse on ChatGPT Adoption in Higher Education
(2024)
Artificial intelligence (AI) is becoming more widely used in a number of industries, including in the field of education. Applications of artificial intelligence (AI) are becoming crucial for schools and universities, whether for automated evaluation, smart educational systems, individualized learning, or staff support. ChatGPT, anAI-based chatbot, offers coherent and helpful replies based on analyzing large volumes of data. Integrating ChatGPT, a sophisticated Natural Language Processing (NLP) tool developed by OpenAI, into higher education has sparked significant interest and debate. Since the technology is already adapted by many students and teachers, this study delves into analyzing the sentiments expressed on university websites regarding ChatGPT integration into education by creating a comprehensive sentiment analysis framework using Hierarchical Residual RSigELU Attention Network (HR-RAN). The proposed framework addresses several challenges in sentiment analysis, such as capturing fine-grained sentiment nuances, including contextual information, and handling complex language expressions in university review data. The methodology involves several steps, including data collection from various educational websites, blogs, and news platforms. The data is preprocessed to handle emoticons, URLs, and tags and then, detect and remove sarcastic text using the eXtreme Learning Hyperband Network (XLHN). Sentences are then grouped based on similarity and topics are modeled using the Non-negative Term-Document Matrix Factorization (NTDMF) approach. Features, such as lexico-semantic, lexico structural, and numerical features are extracted. Dependency parsing and coreference resolution are performed to analyze grammatical structures and understand semantic relationships. Word embedding uses the Word2Vec model to capture semantic relationships between words. The preprocessed text and extracted features are inputted into the HR-RAN classifier to categorize sentiments as positive, negative, or neutral. The sentiment analysis results indicate that 74.8% of the sentiments towards ChatGPT in higher education are neutral, 21.5% are positive, and only 3.7% are negative. This suggests a predominant neutrality among users, with a significant portion expressing positive views and a very small percentage holding negative opinions. Additionally, the analysis reveals regional variations, with Canada showing the highest number of sentiments, predominantly neutral, followed by Germany, the UK, and the USA. The sentiment analysis results are evaluated based on various metrics, such as accuracy, precision, recall, F-measure, and specificity. Results indicate that the proposed framework outperforms conventional sentiment analysis models. The HR-RAN technique achieved a precision of 98.98%, recall of 99.23%, F-measure of 99.10%, accuracy of 98.88%, and specificity of 98.31%. Additionally, word clouds are generated to visually represent the most common terms within positive, neutral, and negative sentiments, providing a clear and immediate understanding of the key themes in the data. These findings can inform educators, administrators, and developers about the benefits and challenges of integrating ChatGPT into educational
settings, guiding improvements in educational practices and AI tool development.
This thesis explores and examines the effectiveness and efficacy of traditional machine learning (ML), advanced neural networks (NN) and state-of-the-art deep learning (DL) models for identifying mental distress indicators from the social media discourses based on Reddit and Twitter as they are immensely used by teenagers. Different NLP vectorization techniques like TF-IDF, Word2Vec, GloVe, and BERT embeddings are employed with ML models such as Decision Tree (DT), Random Forest (RF), Logistic Regression (LR) and Support Vector Machine (SVM) followed by NN models such as Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) to methodically analyse their impact as feature representation of models. DL models such as BERT, DistilBERT, MentalRoBERTa and MentalBERT are end-to-end fine tuned for classification task. This thesis also compares different text preprocessing techniques such as tokenization, stopword removal and lemmatization to assess their impact on model performance. Systematic experiments with different configuration of vectorization and preprocessing techniques in accordance with different model types and categories have been implemented to find the most effective configurations and to gauge the strengths, limitations, and capability to detect and interpret the mental distress indicators from the text. The results analysis reveals that MentalBERT DL model significantly outperformed all other model types and categories due to its specific pretraining on mental data as well as rigorous end-to-end fine tuning gave it an edge for detecting nuanced linguistic mental distress indicators from the complex contextual textual corpus. This insights from the results acknowledges the ML and NLP technologies high potential for developing complex AI systems for its intervention in the domain of mental health analysis. This thesis lays the foundation and directs the future work demonstrating the need for collaborative approach of different domain experts as well as to explore next generational large language models to develop robust and clinically approved mental health AI systems.
Artificial neural networks is a popular field of research in artificial intelli-
gence. The increasing size and complexity of huge models entail certain
problems. The lack of transparency of the inner workings of a neural net-
work makes it difficult to choose efficient architectures for different tasks.
It proves to be challenging to solve these problems, and with a lack of in-
sightful representations of neural networks, this state of affairs becomes
entrenched. With these difficulties in mind a novel 3D visualization tech-
nique is introduced. Attributes for trained neural networks are estimated
by utilizing established methods from the area of neural network optimiza-
tion. Batch normalization is used with fine-tuning and feature extraction to
estimate the importance of different parts of the neural network. A combi-
nation of the importance values with various methods like edge bundling,
ray tracing, 3D impostor and a special transparency technique results in a
3D model representing a neural network. The validity of the extracted im-
portance estimations is demonstrated and the potential of the developed
visualization is explored.
Digital transformation is a prevailing trend in the world, especially in dynamic Asia. Vietnam has recorded remarkable changes in the economy as domestic enterprises have made new strides in the digital transformation process. MB Bank, one of the prestigious financial groups in Vietnam, also takes advantage of digital transformation to have the opportunity to break through to become a large-scale technology enterprise with many factors such as improving customer experience, increasing customer base and increasing customer satisfaction. enhance competitiveness, build trust and loyalty for customers. However, in the process of converting MB, there are also many challenges that require banks to have appropriate policies to handle. It can be said that MB Bank is a typical case study of digital transformation in the banking sector in Vietnam.
Digital Transformation Maturity of Vietnam Aviation Industry: The Effect of Organizational Readiness
(2023)
The paper studies the digital transformation maturity in the context of the aviation industry in Vietnam. Digital transformation can mean enhancing existing processes, finding new opportunities within existing business domains, or finding new opportunities outside existing business domains. In the era of post Covid-19, digital transformation will play a vital role in the recovery with the support from digital technology to leverage the communication and implementation of new projects or changes.
Digital transformation and digital transformation maturity sometimes are used indistinguishing, but they are two different definitions. This paper will further explain the differences and will apply digital transformation maturity as a scale for the digital transformation in the report.
Due to the lack of experiment in the relationship between digital transformation maturity and the organizational readiness, the study will explore four components of organizational readiness, including digital leadership, digital culture, digital capabilities, and digital partnering.
FinTech is deemed to be an underexplored phenomenon even in academic and real environments. Among (1) “Sustainable FinTech” – the application of information technology as innovation in established financial services providers’ business operation; and (2) “Disruptive FinTech” – the provision of financial products and services by non-incumbents which in most cases are information technology entrepreneurs, the former receives more attention. In order to contribute to Disruptive FinTech category, the thesis strive to examine Entrepreneurial Strategy framework applied for technology players taking part in Vietnam financial market.
Challenges of Implementing Innovation Strategies at Large Organizations: A case of Lotte Group
(2023)
For many decades, one of the most important focuses of research has been on determining whether or not there is a correlation between the size of an organization and its level of innovation. Unlike small companies, large companies often have well-established structure that are hard to change and change managements seems to be much more difficult especially related to innovation. Nevertheless, there are many examples to prove the opposites. Some large organization like Apple, Amazon... always show great innovation efforts and keep changing in a much positive way. Therefore, the aim of this thesis is to discuss of how large organization can be able to implement innovation when having much drawbacks compare to SMEs. Through the use of a qualitative research approach, researcher was able to explore essential information on the innovation strategies that large companies are using in order to innovate and how they could overcome existing challenges by studying the working process of Lotte Group – one of the biggest companies in Korea.
The paper is a study focusing on exploring which factors and examining the impact of those factors influencing the entrepreneurial intention among students in the Construction industry, specifically among students of Hanoi Construction University and Hanoi Architecture University. The study also mentions some solution of this findings for entrepreneurship in the Construction field in Vietnam that the author might think of based on this research work for future study. The Theory of planned behavior is used as the theoritical framework for this study. Both qualitative and quantitative methods are employed. The questionaire will be conducted among students of the two universities mentioned above. Then, an exploratory factor analysis (EFA) will performed to test the validity of the constructs. The research findings provide factors and their impact factors influencing the entrepreneurial intention and propose some solutions to improve the entrepreneurship in the Construction field in Vietnam.
Predictive Process Monitoring is becoming more prevalent as an aid for organizations to support their operational processes. However, most software applications available today require extensive technical know-how by the operator and are therefore not suitable for most real-world scenarios. Therefore, this work presents a prototype implementation of a Predictive Process Monitoring dashboard in the form of a web application. The system is based on the PPM Camunda Plugin presented by Bartmann et al. (2021) and allows users to easily create metrics, visualizations to display these metrics, and dashboards in which visualizations can be arranged. A usability test is with test users of different computer skills is conducted to confirm the application’s user-friendliness.
In this thesis the possibilities for real-time visualization of OpenVDB
files are investigated. The basics of OpenVDB, its possibilities, as well
as NanoVDB and its GPU port, were studied. A system was developed
using PNanoVDB, the graphics API port of OpenVDB. Techniques were
explored to improve and accelerate a single ray approach of ray tracing.
To prove real-time capability, two single scattering approaches were
also implemented. One of these was selected, further investigated and
optimized to achieve interactive real-time rendering.
It is important to give artists immediate feedback on their adjustments, as
well as the possibility to change all parameters to ensure a user friendly
creation process.
In addition to the optical rendering, corresponding benchmarks were
collected to compare different improvement approaches and to prove
their relevance. Attention was paid to the rendering times and memory
consumption on the GPU to ensure optimal use. A special focus, when
rendering OpenVDB files, was put on the integrability and extensibility of
the program to allow easy integration into an existing real-time renderer
like U-Render.
The growing numbers of breeding rooks (Corvus frugilegus) in the city of Landau (Rhineland- Palatinate, Germany) increase the potential for conflict between rooks and humans, which is mainly associated with noise and faeces. Therefore, the aim of this work is a better understanding of the breeding tree selection of the rook in order to develop options for action and management in the future.
Part I of this thesis provides general background information on the rook and includes mapping of the rookeries in the Anterior Palatinate and South Palatinate including Landau in the year 2020. That mapping revealed that the number of rural colonies has decreased, while the number of urban colonies has increased in the study area in the last few years. In line with current literature, tree species and tree size were important criteria for breeding tree selection. However, the mapping showed that additional factors must be important as well.
Therefore, as rooks seem to often breed along traffic axes, Part II of this thesis examines how temperature, artificial light and noise, which are all linked to traffic axes, affect the breeding tree selection of the rook in the city of Landau. The following three hypotheses are developed: (1) manually selected breeding trees (Bm) have a warmer microclimate than manually selected non-breeding trees (Nm) or randomly selected non-breeding trees (Nr), (2) Bm are exposed to a higher light level than Nm or Nr and (3) Bm are exposed to a higher noise level than Nm or Nr. To test these hypotheses, 15 Bm, 13 Nm and 16 Nr are investigated.
The results show that Bm were exposed to more noise than both types of non-breeding trees (μBm, noise = 36.52481 dB, μNm, noise = 31.27229 dB, μNr, noise = 29.17417 dB) where the difference between Bm and Nr was significant. In addition, there was a tendency for Bm to be exposed to less light (μBm, light = 0.356 lx) than Nm (μNm, light = 0.4107692 lx) and significantly less light than Nr (μNr, light = 1.995 lx), while temperature did not differ between the groups (μBm, temp = 16.90549 °C, μNm, temp = 16.93118 °C, μNr, temp = 17.28639 °C).
This study shows for the first time that rooks prefer trees which are exposed to low light levels and high noise levels, i.e. more intense traffic noise, for breeding. It can only be speculated that the cause of this is lower enemy pressure at such sites. The fact that temperature does not seem to have any influence on breeding tree selection may be due to only small temperature differences at nest height, which might be compensated by breeding behaviour. Consequently, in the long term one management approach could be to divert traffic from inner-city areas, especially schools and hospitals, to bypasses. If tree genera suitable for rooks, such as plane trees, are planted along the bypasses, those sites could provide suitable alternative habitats to inner-city breeding locations, which become less attractive for breeding due to noise reduction. In the short term in addition to locally implemented repellent measures the most effective approach is to strengthen rook acceptance among the population. However, further research is needed to verify the results of this thesis and to gain further insights into rook breeding site selection in order to develop effective management measures.
Der Zweck dieser Arbeit ist es, sich auf die kritischen Forschungsherausforderungen und -themen zu konzentrieren, die UI/UX-Designprinzipien umgeben, mit einem Schwerpunkt auf kulturübergreifenden Konzepten aus der Perspektive von E-Learning-Plattformen. Zu diesem Zweck betrachten wir zunächst die kulturellen Dimensionen auf der Grundlage des Hofstede-Rahmens mit dem Ziel, wichtige kulturelle Werte zu identifizieren. Als zweites Ziel der Forschung erleichtert eine Reihe von Kriterien, die so genannte Usability-Heuristik von Nielsen, die Erkennung von Usability Problemen bei der Gestaltung von Benutzeroberflächen (UI). Die Usability-Heuristiken umfassen zehn Variablen, die die Interaktion zwischen dem Benutzer und einem Produkt oder System beeinflussen. Wenn wir uns näher mit
diesen Themen befassen, werden wir in der Lage sein, eine Matrix mit Beziehungen zwischen der heuristischen Bewertung von Nielsen und dem kulturellen Rahmen von Geert Hofstede aufzudecken. Abschließend erörtern wir das mögliche Potenzial kultureller Werte zur Beeinflussung von Benutzeroberflächen für E-Learning-Plattformen. In der Tat gibt es einige Funktionen in E-Learning-Plattformen, die aufgrund der Kultur weniger diskutiert werden, obwohl sie sehr praktisch in die Plattformen integriert werden können.
Advanced Auditing of Inconsistencies in Declarative Process Models using Clustering Algorithms
(2021)
To have a compliant business process of an organization, it is essential to ensure a onsistent process. The measure of checking if a process is consistent or not depends on the business rules of a process. If the process adheres to these business rules, then the process is compliant and efficient. For huge processes, this is quite a challenge. Having an inconsistency in a process can yield very quickly to a non-functional process, and that’s a severe problem for organizations. This thesis presents a novel auditing approach for handling inconsistencies from a post-execution perspective. The tool identifies the run-time inconsistencies and visualizes them in heatmaps. These plots aim to help modelers observe the most problematic constraints and help them make the right remodeling decisions. The modelers assisted with many variables can be set in the tool to see a different representation of heatmaps that help grasp all the perspectives of the problem. The heatmap sort and shows the run-time inconsistency patterns, so that modeler can decide which constraints are highly problematic and should address a re-model. The tool can be applied to real-life data sets in a reasonable run-time.
In this thesis, the performance of the IceCube projects photon propagation
code (clsim) is optimized. The process of GPU code analysis and perfor-
mance optimization is described in detail. When run on the same hard-
ware, the new version achieves a speedup of about 3x over the original
implementation. Comparing the unmodified code on hardware currently
used by IceCube (NVIDIA GTX 1080) against the optimized version run on
a recent GPU (NVIDIA A100) a speedup of about 9.23x is observed. All
changes made to the code are shown and their performance impact as well
as the implications for simulation accuracy are discussed individually.
The approach taken for optimization is then generalized into a recipe.
Programmers can use it as a guide, when approaching large and complex
GPU programs. In addition, the per warp job-queue, a design pattern used
for load balancing among threads in a CUDA thread block, is discussed in
detail.
This thesis focuses on approximate inference in assumption-based argumentation frameworks. Argumentation provides a significant idea in the computerization of theoretical and practical reasoning in AI. And it has a close connection with AI, engaging in arguments to perform scientific reasoning. The fundamental approach in this field is abstract argumentation frameworks developed by Dung. Assumption-based argumentation can be regarded as an instance of abstract argumentation with structured arguments. When facing a large scale of data, a challenge of reasoning in assumption-based argumentation is how to construct arguments and resolve attacks over a given claim with minimal cost of computation and acceptable accuracy at the same time. This thesis proposes and investigates approximate methods that randomly select and construct samples of frameworks based on graphical dispute derivations to solve this problem. The presented approach aims to improve reasoning performance and get an acceptable trade-off between computational time and accuracy. The evaluation shows that for reasoning in assumption-based argumentation, in general, the running time is reduced with the cost of slightly low accuracy by randomly sampling and constructing inference rules for potential arguments over a query.
The Material Point Method (MPM) has proven to be a very capable simulation method in computer graphics that is able to model materials that were previously very challenging to animate [1, 2]. Apart from simulating singular materials, the simulation of multiple materials that interact with each other introduces new challenges. This is the focus of this thesis. It will be shown that the self-collision capabilities of the MPM can naturally handle multiple materials interacting in the same scene on a collision basis, even if the materials use distinct constitutive models. This is then extended by porous interaction of materials as in[3], which also integrates easily with MPM.It will furthermore be shown that regular single-grid MPM can be viewed as a subset of this multi-grid approach, meaning that its behavior can also be achieved if multiple grids are used. The porous interaction is generalized to arbitrary materials and freely changeable material interaction terms, yielding a flexible, user-controllable framework that is independent of specific constitutive models. The framework is implemented on the GPU in a straightforward and simple way and takes advantage of the rasterization pipeline to resolve write-conflicts, resulting in a portable implementation with wide hardware support, unlike other approaches such as [4].
The industry standard Decision Model and Notation (DMN) has enabled a new way for the formalization of business rules since 2015. Here, rules are modeled in so-called decision tables, which are defined by input columns and output columns. Furthermore, decisions are arranged in a graph-like structure (DRD level), which creates dependencies between them. With a given input, the decisions now can be requested by appropriate systems. Thereby, activated rules produce output for future use. However, modeling mistakes produces erroneous models, which can occur in the decision tables as well as at the DRD level. According to the Design Science Research Methodology, this thesis introduces an implementation of a verification prototype for the detection and resolution of these errors while the modeling phase. Therefore, presented basics provide the needed theoretical foundation for the development of the tool. This thesis further presents the architecture of the tool and the implemented verification capabilities. Finally, the created prototype is evaluated.
On-screen interactive presentations have got immense popularity in the domain of attentive interfaces recently. These attentive screens adapt their behavior according to the user's visual attention. This thesis aims to introduce an application that would enable these attentive interfaces to change their behavior not just according to the gaze data but also facial features and expressions. The modern era requires new ways of communications and publications for advertisement. These ads need to be more specific according to people's interests, age, and gender. When advertising, it's important to get a reaction from the user but not every user is interested in providing feedback. In such a context more, advance techniques are required that would collect user's feedback effortlessly. The main problem this thesis intends to resolve is, to apply advanced techniques of gaze and face recognition to collect data about user's reactions towards different ads being played on interactive screens. We aim to create an application that enables attentive screens to detect a person's facial features, expressions, and eye gaze. With eye gaze data we can determine the interests and with facial features, age and gender can be specified. All this information will help in optimizing the advertisements.
Blockchain in Healthcare
(2020)
The underlying characteristics of blockchain can facilitate data provenance, data integrity, data security, and data management. It has the potential to transform the healthcare sector. Since the introduction of Bitcoin in the fintech industry, the blcockhain technology has been gaining a lot of traction and its purpose is not just limited to finance. This thesis highlights the inner workings of blockchain technology and its application areas with possible existing solutions. Blockchain could lay the path for a new revolution in conventional healthcare systems. We presented how individual sectors within the healthcare industry could use blockchain and what solution persists. Also, we have presented our own concept to improve the existing paper-based prescription management system which is based on Hyperledger framework. The results of this work suggest that healthcare can benefit from blockchain technology bringing in the new ways patients can be treated.
Since the invention of U-net architecture in 2015, convolutional networks based on its encoder-decoder approach significantly improved results in image analysis challenges. It has been proven that such architectures can also be successfully applied in different domains by winning numerous championships in recent years. Also, the transfer learning technique created an opportunity to push state-of-the-art benchmarks to a higher level. Using this approach is beneficial for the medical domain, as collecting datasets is generally a difficult and expensive process.
In this thesis, we address the task of semantic segmentation with Deep Learning and make three main contributions and release experimental results that have practical value for medical imaging.
First, we evaluate the performance of four neural network architectures on the dataset of the cervical spine MRI scans. Second, we use transfer learning from models trained on the Imagenet dataset and compare it to randomly initialized networks. Third, we evaluate models trained on the bias field corrected and raw MRI data. All code to reproduce results is publicly available online.
Constituent parsing attempts to extract syntactic structure from a sentence. These parsing systems are helpful in many NLP applications such as grammar checking, question answering, and information extraction. This thesis work is about implementing a constituent parser for German language using neural networks. Over the past, recurrent neural networks have been used in building a parser and also many NLP applications. In this, self-attention neural network modules are used intensively to understand sentences effectively. With multilayered self-attention networks, constituent parsing achieves 93.68% F1 score. This is improved even further by using both character and word embeddings as a representation of the input. An F1 score of 94.10% was the best achieved by constituent parser using only the dataset provided. With the help of external datasets such as German Wikipedia, pre-trained ELMo models are used along with self-attention networks achieving 95.87% F1 score.
Thesis is devoted to the topic of challenges and solutions for human resources management (HRM) in international organizations. The aim is to investigate methodological approaches to assessment of HRM challenges and solutions, and to apply them on practice, to develop ways of improvement of HRM of a particular enterprise. The practical research question investigated is “Is the Ongoing Professional Development – Strategic HRM (OPD-SHRM) model a better solution for HRM system of PrJSC “Philip Morris Ukraine”?”
To achieve the aim of this work and to answer the research question, we have studied theoretical approaches to explaining and assessing HRM in section 1, analyzed HRM system of an international enterprise in section 2, and then synthesized theory and practice to find intersection points in section 3.
Research findings indicate that the main challenge of HRM is to balance between individual and organizational interests. Implementation of OPD-SHRM is one of the solutions. Switching focus from satisfaction towards success will bring both tangible and intangible benefits for individuals and organization. In case of PrJSC “Philip Morris Ukraine”, the maximum forecasted increase is 330% in net profit, 350% in labor productivity, and 26% in Employee Development and Engagement Index.
Current political issues are often reflected in social media discussions, gathering politicians and voters on common platforms. As these can affect the public perception of politics, the inner dynamics and backgrounds of such debates are of great scientific interest. This thesis takes user generated messages from an up-to-date dataset of considerable relevance as Time Series, and applies a topic-based analysis of inspiration and agenda setting to it. The Institute for Web Science and Technologies of the University Koblenz-Landau has collected Twitter data generated beforehand by candidates of the European Parliament Election 2019. This work processes and analyzes the dataset for various properties, while focusing on the influence of politicians and media on online debates. An algorithm to cluster tweets into topical threads is introduced. Subsequently, Sequential Association Rules are mined, yielding wide array of potential influence relations between both actors and topics. The elaborated methodology can be configured with different parameters and is extensible in functionality and scope of application.
Belief revision is the subarea of knowledge representation which studies the dynamics of epistemic states of an agent. In the classical AGM approach, contraction, as part of the belief revision, deals with the removal of beliefs in knowledge bases. This master's thesis presents the study and the implementation of concept contraction in the Description Logic EL. Concept contraction deals with the following situation. Given two concept C and D, assuming that C is subsumed by D, how can concept C be changed so that it is not subsumed by D anymore, but is as similar as possible to C? This approach of belief change is different from other related work because it deals with contraction in the level of concepts and not T-Boxes and A-Boxes in general. The main contribution of the thesis is the implementation of the concept contraction. The implementation provides insight into the complexity of contraction in EL, which is tractable since the main inference task in EL is also tractable. The implementation consists of the design of five algorithms that are necessary for concept contraction. The algorithms are described, illustrated with examples, and analyzed in terms of time complexity. Furthermore, we propose an new approach for a selection function, adapt for the concept contraction. The selection function uses metadata about the concepts in order to select the best from an input set. The metadata is modeled in a framework that we have designed, based on standard metadata frameworks. As an important part of the concept contraction, the selection function is responsible for selecting the best concepts that are as similar as possible to concept C. Lastly, we have successfully implemented the concept contraction in Python, and the results are promising.
To construct a business process model manually is a highly complex and error-prone task which takes a lot of time and deep insights into the organizational structure, its operations and business rules. To improve the output of business analysts dealing with this process, different techniques have been introduced by researchers to support them during construction with helpful recommendations. These supporting recommendation systems vary in their way of what to recommend in the first place as well as their calculations taking place under the hood to recommend the most fitting element to the user. After a broad introduction into the field of business process modeling and its basic recommendation structures, this work will take a closer look at diverse proposals and descriptions published in current literature regarding implementation strategies to effectively and efficiently assist modelers during their business process model creation. A critical analysis of presentations in the selected literature will point out strengths and weaknesses of their approaches, studies and descriptions of those. As a result, the final concept matrix in this work will give a precise and helpful overview about the key features and recommendation methods used and implemented in previous research studies to pinpoint an entry into future works without the downsides already spotted by fellow researchers.
Commonsense reasoning can be seen as a process of identifying dependencies amongst events and actions. Understanding the circumstances surrounding these events requires background knowledge with sufficient breadth to cover a wide variety of domains. In the recent decades, there has been a lot of work in extracting commonsense knowledge, a number of these projects provide their collected data as semantic networks such as ConceptNet and CausalNet. In this thesis, we attempt to undertake the Choice Of Plausible Alternatives (COPA) challenge, a problem set with 1000 questions written in multiple-choice format with a premise and two alternative choices for each question. Our approach differs from previous work by using shortest paths between concepts in a causal graph with the edge weight as causality metric. We use CausalNet as primary network and implement a few design choices to explore the strengths and drawbacks of this approach, and propose an extension using ConceptNet by leveraging its commonsense knowledge base.
Implementation of Agile Software Development Methodology in a Company – Why? Challenges? Benefits?
(2019)
The software development industry is enhancing day by day. The introduction of agile software development methodologies was a tremendous structural change in companies. Agile transformation provides unlimited opportunities and benefits to the existing and new developing companies. Along with benefits, agile conversion also brings many unseen challenges. New entrants have the advantage of being flexible and cope with the environmental, consumer, and cultural changes, but existing companies are bound to rigid structure.
The goal of this research is to have deep insight into agile software development methodology, agile manifesto, and principles behind the agile manifesto. The prerequisites company must know for agile software development implementation. The benefits a company can achieve by implementing agile software development. Significant challenges that a company can face during agile implementation in a company.
The research objectives of this study help to generate strong motivational research questions. These research questions cover the cultural aspects of company agility, values and principles of agile, benefits, and challenges of agile implementation. The project management triangle will show how benefits of cost, benefits of time, and benefits of quality can be achieved by implementing agile methodologies. Six significant areas have been explored, which shows different challenges a company can face during implementation agile software development methodology. In the end, after the in depth systematic literature review, conclusion is made following some open topics for future work and recommendations on the topic of implementation of agile software development methodology in a company.
Business Process Querying (BPQ) is a discipline in the field of Business Process Man- agement which helps experts to understand existing process models and accelerates the development of new ones. Its queries can fetch and merge these models, answer questions regarding the underlying process, and conduct compliance checking in return. Many languages have been deployed in this discipline but two language types are dominant: Logic-based languages use temporal logic to verify models as finite state machines whereas graph-based languages use pattern matching to retrieve subgraphs of model graphs directly. This thesis aims to map the features of both language types to features of the other to identify strengths and weaknesses. Exemplarily, the features of Computational Tree Logic (CTL) and The Diagramed Modeling Language (DMQL) are mapped to one another. CTL explores the valid state space and thus is better for behavioral querying. Lacking certain structural features and counting mechanisms it is not appropriate to query structural properties. In contrast, DMQL issues structural queries and its patterns can reconstruct any CTL formula. However, they do not always achieve exactly the same semantic: Patterns treat conditional flow as sequential flow by ignoring its conditions. As a result, retrieved mappings are invalid process execution sequences, i.e. false positives, in certain scenarios. DMQL can be used for behavioral querying if these are absent or acceptable. In conclusion, both language types have strengths and are specialized for different BPQ use cases but in certain scenarios graph-based languages can be applied to both. Integrating the evaluation of conditions would remove the need for logic-based languages in BPQ completely.
Data visualization is an effective way to explore data. It helps people to get a valuable insight of the data by placing it in a visual context. However, choosing a good chart without prior knowledge in the area is not a trivial job. Users have to manually explore all possible visualizations and decide upon ones that reflect relevant and desired trend in the data, are insightful and easy to decode, have a clear focus and appealing appearance. To address these challenges we developed a Tool for Automatic Generation of Good viSualizations using Scoring (TAG²S²). The approach tackles the problem of identifying an appropriate metric for judging visualizations as good or bad. It consists of two modules: visualization detection: given a data-set it creates a list of combination of data attributes for scoring and visualization ranking: scores each chart and decides which ones are good or bad. For the later, an utility metric of ten criteria was developed and each visualization detected in the first module is evaluated on these criteria. Only those visualizations that received enough scores are then presented to the user. Additionally to these data parameters, the tool considers user perception regarding the choice of visual encoding when selecting a visualization. To evaluate the utility of the metric and the importance of each criteria, test cases were developed, executed and the results presented.
The erosion of the closed innovation paradigm in conjunction with increasing competitive pressure has boosted the interest of both researchers and organizations in open innovation. Despite such rising interest, several companies remain reluctant to open their organizational boundaries to practice open innovation. Among the many reasons for such reservation are the pertinent complexity of transitioning toward open innovation and a lack of understanding of the procedures required for such endeavors. Hence, this thesis sets out to investigate how organizations can open their boundaries to successfully transition from closed to open innovation by analyzing the current literature on open innovation. In doing so, the transitional procedures are structured and classified into a model comprising three phases, namely unfreezing, moving, and institutionalizing of changes. Procedures of the unfreezing phase lay the foundation for a successful transition to open innovation, while procedures of the moving phase depict how the change occurs. Finally, procedures of the institutionalizing phase contribute to the sustainability of the transition by employing governance mechanisms and performance measures. Additionally, the individual procedures are characterized along with their corresponding barriers and critical success factors. As a result of this structured depiction of the transition process, a guideline is derived. This guideline includes the commonly employed actions of successful practitioners of open innovation, which may serve as a baseline for interested parties of the paradigm. With the derivation of the guideline and concise depiction of the individual transitional phases, this thesis consequently reduces the overall complexity and increases the comprehensibility of the transition and its implications for organizations.
With the appearance of modern virtual reality (VR) headsets on the consumer market, there has been the biggest boom in the history of VR technology. Naturally, this was accompanied by an increasing focus on the problems of current VR hardware. Especially the control in VR has always been a complex topic.
One possible solution is the Leap Motion, a hand tracking device that was initially developed for desktop use, but with the last major software update it can be attached to standard VR headsets. This device allows very precise tracking of the user’s hands and fingers and their replication in the virtual world.
The aim of this work is to design virtual user interfaces that can be operated with the Leap Motion to provide a natural method of interaction between the user and the VR environment. After that, subject tests are performed to evaluate their performance and compare them to traditional VR controllers.
Despite the inception of new technologies at a breakneck pace, many analytics projects fail mainly due to the use of incompatible development methodologies. As big data analytics projects are different from software development projects, the methodologies used in software development projects could not be applied in the same fashion to analytics projects. The traditional agile project management approaches to the projects do not consider the complexities involved in the analytics. In this thesis, the challenges involved in generalizing the application of agile methodologies will be evaluated, and some suitable agile frameworks which are more compatible with the analytics project will be explored and recommended. The standard practices and approaches which are currently applied in the industry for analytics projects will be discussed concerning enablers and success factors for agile adaption. In the end, after the comprehensive discussion and analysis of the problem and complexities, a framework will be recommended that copes best with the discussed challenges and complexities and is generally well suited for the most data-intensive analytics projects.
The Internet of Things (IoT) is a fast-growing, technological concept, which aims to integrate various physical and virtual objects into a global network to enable interaction and communication between those objects (Atzori, Iera and Morabito, 2010). The application possibilities are manifold and may transform society and economy similarly to the usage of the internet (Chase, 2013). Furthermore, the Internet of Things occupies a central role for the realisation of visionary future concepts, for example, Smart City or Smart Healthcare. In addition, the utilisation of this technology promises opportunities for the enhancement of various sustainability aspects, and thus for the transformation to a smarter, more efficient and more conscious dealing with natural resources (Maksimovic, 2017). The action principle of sustainability increasingly gains attention in the societal and academical discourse. This is reasoned by the partly harmful consumption and production patterns of the last century (Mcwilliams et al., 2016). Relating to sustainability, the advancing application of IoT technology also poses risks. Following the precautionary principle, these risks should be considered early (Harremoës et al., 2001). Risks of IoT for sustainability include the massive amounts of energy and raw materials which are required for the manufacturing and operation of IoT objects and furthermore, the disposal of those objects (Birkel et al., 2019). The exact relations in the context of IoT and sustainability are insufficiently explored to this point and do not constitute a central element within the discussion of this technology (Behrendt, 2019). Therefore, this thesis aims to develop a comprehensive overview of the relations between IoT and sustainability.
To achieve this aim, this thesis utilises the methodology of Grounded Theory in combination with a comprehensive literature review. The analysed literature primarily consists of research contributions in the field of Information Technology (IT). Based on this literature, aspects, solution approaches, effects and challenges in the context of IoT and sustainability were elaborated. The analysis revealed two central perspectives in this context. IoT for Sustainability (IoT4Sus) describes the utilisation and usage of IoT-generated information to enhance sustainability aspects. In contrast, Sustainability for IoT (Sus4IoT) fo-cuses on sustainability aspects of the applied technology and highlights methods to reduce negative impacts, which are associated with the manufacturing and operation of IoT. Elaborated aspects and relations were illustrated in the comprehensive CCIS Framework. This framework represents a tool for the capturing of relevant aspects and relations in this context and thus supports the awareness of the link between IoT and sustainability. Furthermore, the framework suggests an action principle to optimise the performance of IoT systems regarding sustainability.
The central contribution of this thesis is represented by the providence of the CCIS Framework and the contained information regarding the aspects and relations of IoT and sustainability.
Willingness to pay and willingness to accept on a two-sided platform - The use case of DoBeeDo
(2019)
It is widely known that especially for technology-based start-ups, entrepreneurs need to set up the boundaries of the business and define the product/service to offer in order to minimize the risk of failure. The goal of this thesis is to not only emphasize the importance of the business model development and evaluation but also show an example customer validation process for an emerging start-up named DoBeeDo, which is a mobile app operating on a two-sided market. During the process of customer validation a survey has been conducted to evaluate the interest of the target groups as well as the fit of their expectations using the Willingness to Pay and Willingness to Accept measures. The paper includes an analysis and evaluation of the gathered results and assesses whether the execution of the Customer Development Model can be continued.
The status of Business Process Management (BPM) recommender systems is not quite clear as research states. The use of recommenders familiarized itself with the world during the rise of technological evolution in the past decade.Ever since then, several BPM recommender systems came about. However, not a lot of research is conducted in this field. It is not well known to what broad are the technologies used and how are they used. Moreover, this master’s thesis aims at surveying the BPM recommender systems existing. Building on this, the recommendations come in different shapes. They can be positionbased where an element is to be placed at an element’s front, back or to autocomplete a missing link. On the other hand, Recommendations can be textual, to fill the labels of the elements. Furthermore, the literature review for BPM recommender systems took place under the guides of a literature review framework. The framework suggests 5stages of consecutive stages for this sake. The first stage is defining a scope for the research. Secondly, conceptualizing the topic by choosing key terms for literature research. After that in the third stage, comes the research stage.As for the fourth stage, it suggests choosing analysis features over which the literature is to be synthesized and compared. Finally, it recommends defining the research agenda to describe the reason for the literature review. By invoking the mentioned methodology, this master’s thesis surveyed 18 BPM recommender systems. It was found as a result of the survey that there
are not many different technologies for implementing the recommenders. It was also found that the majority of the recommenders suggest nodes that are yet to come in the model, which is called forward recommending. Also, one of the results of the survey indicated the scarce use of textual recommendations to BPM labels. Finally, 18 recommenders are considered less than excepted for a developing field therefore as a result, the survey found a shortage in the number of BPM recommender systems. The results indicate several shortages in several aspects in the field of BPM recommender systems. On this basis, this master’s thesis recommends the future work on it the results.
Tracking is an integral part of many modern applications, especially in areas like autonomous systems and Augmented Reality. For performing tracking there are a wide array of approaches. One that has become a subject of research just recently is the utilization of Neural Networks. In the scope of this master thesis an application will be developed which uses such a Neural Network for the tracking process. This also requires the creation of training data as well as the creation and training of a Neural Network. Subsequently the usage of Neural Networks for tracking will be analyzed and evaluated. This includes several aspects. The quality of the tracking for different degrees of freedom will be checked as well as the the impact of the Neural Network on the applications performance. Additionally the amount of required training data is investigated, the influence of the network architecture and the importance of providing depth data as part of the networks input. This should provide an insight into how relevant this approach could be for its adoption in future products.
Business rules have become an important tool to warrant compliance at their business processes. But the collection of these business rules can have various conflicting elements. This can lead to a violation of the compliance to be achieved. This conflicting elements are therefore a kind of inconsistencies, or quasi incon- sistencies in the business rule base. The target for this thesis is to investigate how those quasi inconsistencies in business rules can be detected and analyzed. To this aim, we develop a comprehensive library which allows to apply results from the scientific field of inconsistency measurement to business rule formalisms that are actually used in practice.
Most social media platforms allow users to freely express their opinions, feelings, and beliefs. However, in recent years the growing propagation of hate speech, offensive language, racism and sexism on the social media outlets have drawn attention from individuals, companies, and researchers. Today, sexism both online and offline with different forms, including blatant, covert, and subtle lan- guage, is a common phenomenon in society. A notable amount of work has been done over identifying sexist content and computationally detecting sexism which exists online. Although previous efforts have mostly used peoples’ activities on social media platforms such as Twitter as a public and helpful source for collecting data, they neglect the fact that the method of gathering sexist tweets could be biased towards the initial search terms. Moreover, some forms of sexism could be missed since some tweets which contain offensive language could be misclassified as hate speech. Further, in existing hate speech corpora, sexist tweets mostly express hostile sexism, and to some degree, the other forms of sexism which also appear online was disregarded. Besides, the creation of labeled datasets with manual exertion, relying on users to report offensive comments with a tremendous effort by human annotators is not only a costly and time-consuming process, but it also raises the risk of involving discrimination under biased judgment.
This thesis generates a novel sexist and non-sexist dataset which is constructed via "UnSexistifyIt", an online web-based game that incentivizes the players to make minimal modifications to a sexist statement with the goal of turning it into a non-sexist statement and convincing other players that the modified statement is non-sexist. The game applies the methodology of "Game With A Purpose" to generate data as a side-effect of playing the game and also employs the gamification and crowdsourcing techniques to enhance non-game contexts. When voluntary participants play the game, they help to produce non-sexist statements which can reduce the cost of generating new corpus. This work explores how diverse individual beliefs concerning sexism are. Further, the result of this work highlights the impact of various linguistic features and content attributes regarding sexist language detection. Finally, this thesis could help to expand our understanding regarding the syntactic and semantic structure of sexist and non-sexist content and also provides insights to build a probabilistic classifier for single sentences into sexist or non-sexist classes and lastly find a potential ground truth for such a classifier.
Our work finds the fine grained edits in context of neighbouring tokens in Wikipedia articles. We cluster those edits according to similar neighbouring context. We encode neighbouring context into vector space using word vectors. We evaluate clusters returned by our algorithm on extrinsic and intrinsic metric and compare it with previous work. We analyse the relation between extrinsic and intrinsic measurements of fine grained edit tokens.
The goal of this master thesis was to develop a CRM system for the Assist team of CompuGroup Medical that is aiding in integrating open innovation into the development of the Minerva 2.0 software. To achieve this, CRM methodology has been combined with Social Networking Systems, following the research of Lin and Chen (2010, pp. 11 – 30). To achieve the predefined goals literature has been analyzed on how to successfully im- plement a CRM system as well as an online community. Subsequently the results have been applied to the development of the Minerva Community according to the guidelines of Design Science suggested by Hevner et al. (2004, pp. 75 – 104). The finished product is designed based on customer and management requirements and evaluated from a customer and company perspective.
The purpose of this thesis is to explore the sentiment distributions of Wikipedia concepts.
We analyse the sentiment of the entire English Wikipedia corpus, which includes 5,669,867 articles and 1,906,375 talks, by using a lexicon-based method with four different lexicons.
Also, we explore the sentiment distributions from a time perspective using the sentiment scores obtained from our selected corpus. The results obtained have been compared not only between articles and talks but also among four lexicons: OL, MPQA, LIWC, and ANEW.
Our findings show that among the four lexicons, MPQA has the highest sensitivity and ANEW has the lowest sensitivity to emotional expressions. Wikipedia articles show more sentiments than talks according to OL, MPQA, and LIWC, whereas Wikipedia talks show more sentiments than articles according to ANEW. Besides, the sentiment has a trend regarding time series, and each lexicon has its own bias regarding text describing different things.
Moreover, our research provides three interactive widgets for visualising sentiment distributions for Wikipedia concepts regarding the time and geolocation attributes of concepts.
We examine the systematic underrecognition of female scientists (Matilda effect) by exploring the citation network of papers published in the American Physical Society (APS) journals. Our analysis shows that articles written by men (first author, last author and dominant gender of authors) receive more citations than similar articles written by women (first author, last author and dominant gender of authors) after controlling for the journal of publication, year of publication and content of the publication. Statistical significance of the overlap between the lists of references was considered as the measure of similarity between articles in our analysis. In addition, we found that men are less likely to cite articles written by women and women are less likely to cite articles written by men. This pattern leads to receiving more citations by articles written by men than similar articles written by women because the majority of authors who published in APS journals are male (85%). We also observed Matilda effect reduces when articles are published in journals with the highest impact factors. In other words, people’s evaluation of articles published in these journals is not affected by the gender of authors significantly. Finally, we suggested a method that can be applied by editors in academic journals to reduce the evaluation bias to some extent. Editors can identify missing citations using our proposed method to complete bibliographies. This policy can reduce the evaluation bias because we observed papers written by female scholars (first author, last author, the dominant gender of authors) miss more citations than articles written by male scholars (first author, last author, the dominant gender of authors).
Ontologies are valuable tools for knowledge representation and important building blocks of the Semantic Web. They are not static and can change over time. Changing an ontology can be necessary for various reasons: the domain that is represented by an ontology can change or an ontology is reused and must be adapted to the new context. In addition, modeling errors could have been introduced into the ontology which must be found and removed. The non-triviality of the change process has led to the emerge of ontology change as an own field of research. The removal of knowledge from ontologies is an important aspect of this change process, because even the addition of new knowledge to an ontology potentially requires the removal of older, conflicting knowledge. Such a removal must be performed in a thought-out way. A naïve change of concepts within the ontology can easily remove other, unrelated knowledge or alter the semantics of concepts in an unintended way [2]. For these reasons, this thesis introduces a formal operator for the fine-grained retraction of knowledge from EL concepts which is partially based on the postulates for belief set contraction and belief base contraction [3, 4, 5] and the work of Suchanek et al. [6]. For this, a short introduction to ontologies and OWL 2 is given and the problem of ontology change is explained. It is then argued why a formal operator can support this process and why the Description Logic EL provides a good starting point for the development of such an operator. After this, a general introduction to Description Logic is given. This includes its history, an overview of its applications and common reasoning tasks in this logic. Following this, the logic EL is defined. In a next step, related work is examined and it is shown why the recovery postulate and the relevance postulate cannot be naïvely employed in the development of an operator that removes knowledge from EL concepts. Following this, the requirements to the operator are formulated and properties are given which are mainly based on the postulates for belief set and belief base contraction. Additional properties are developed which make up for the non-applicability of the recovery and relevance postulates. After this, a formal definition of the operator is given and it is shown that the operator is applicable to the task of a fine-grained removal of knowledge from EL concepts. In a next step, it is proven that the operator fulfills all the previously defined properties. It is then demonstrated how the operator can be combined with laconic justifications [7] to assist a human ontology editor by automatically removing unwanted consequences from an ontology. Building on this, a plugin for the ontology editor Protégé is introduced that is based on algorithms that were derived from the formal definition of the operator. The content of this work is then summarized and a final conclusion is drawn. The thesis closes with an outlook into possible future work.
Knowledge-based authentication methods are vulnerable to Shoulder surfing phenomenon.
The widespread usage of these methods and not addressing the limitations it has could result in the user’s information to be compromised. User authentication method ought to be effortless to use and efficient, nevertheless secure.
The problem that we face concerning the security of PIN (Personal Identification Number) or password entry is shoulder surfing, in which a direct or indirect malicious observer could identify the user sensitive information. To tackle this issue we present TouchGaze which combines gaze signals and touch capabilities, as an input method for entering user’s credentials. Gaze signals will be primarily used to enhance targeting and touch for selecting. In this work, we have designed three different PIN entry method which they all have similar interfaces. For the evaluation, these methods were compared based on efficiency, accuracy, and usability. The results uncovered that despite the fact that gaze-based methods require extra time for the user to get familiar with yet it is considered more secure. In regards to efficiency, it has the similar error margin to the traditional PIN entry methods.
This Master Thesis is an exploratory research to determine whether it is feasible to construct a subjectivity lexicon using Wikipedia. The key hypothesis is that that all quotes in Wikipedia are subjective and all regular text are objective. The degree of subjectivity of a word, also known as ''Quote Score'' is determined based on the ratio of word frequency in quotations to its frequency outside quotations. The proportion of words in the English Wikipedia which are within quotations is found to be much smaller as compared to those which are not in quotes, resulting in a right-skewed distribution and low mean value of Quote Scores.
The methodology used to generate the subjectivity lexicon from text corpus in English Wikipedia is designed in such a way that it can be scaled and reused to produce similar subjectivity lexica of other languages. This is achieved by abstaining from domain and language-specific methods, apart from using only readily-available English dictionary packages to detect and exclude stopwords and non-English words in the Wikipedia text corpus.
The subjectivity lexicon generated from English Wikipedia is compared against other lexica; namely MPQA and SentiWordNet. It is found that words which are strongly subjective tend to have high Quote Scores in the subjectivity lexicon generated from English Wikipedia. There is a large observable difference between distribution of Quote Scores for words classified as strongly subjective versus distribution of Quote Scores for words classified as weakly subjective and objective. However, weakly subjective and objective words cannot be differentiated clearly based on Quote Score. In addition to that, a questionnaire is commissioned as an exploratory approach to investigate whether subjectivity lexicon generated from Wikipedia could be used to extend the coverage of words of existing lexica.
The content aggregator platform Reddit has established itself as one of the most popular websites in the world. However, scientific research on Reddit is hindered as Reddit allows (and even encourages) user anonymity, i.e., user profiles do not contain personal information such as the gender. Inferring the gender of users in large-scale could enable the analysis of gender-specific areas of interest, reactions to events, and behavioral patterns. In this direction, this thesis suggests a machine learning approach of estimating the gender of Reddit users. By exploiting specific conventions in parts of the website, we obtain a ground truth for more than 190 million comments of labeled users. This data is then used to train machine learning classifiers to use them to gain insights about the gender balance of particular subreddits and the platform in general. By comparing a variety of different approaches for classification algorithm, we find that character-level convolutional neural network achieves performance with an 82.3% F1 score on a task of predicting a gender of a user based on his/her comments. The score surpasses 85% mark for frequent users with more than 50 comments. Furthermore, we discover that female users are less active on Reddit platform, they write fewer comments and post in fewer subreddits on average, when compared to male users.
Topic models are a popular tool to extract concepts of large text corpora. These text corpora tend to contain hidden meta groups. The size relation of these groups is frequently imbalanced. Their presence is often ignored when applying a topic model. Therefore, this thesis explores the influence of such imbalanced corpora on topic models.
The influence is tested by training LDA on samples with varying size relations. The samples are generated from data sets containing a large group differences i.e language difference and small group differences i.e. political orientation. The predictive performance on those imbalanced corpora is judged using perplexity.
The experiments show that the presence of groups in training corpora can influence the prediction performance of LDA. The impact varies due to various factors, including language-specific perplexity scores. The group-related prediction performance changes for groups when varying the relative group sizes. The actual change varies between data sets.
LDA is able to distinguish between different latent groups in document corpora if differences between groups are large enough, e.g. for groups with different languages. The proportion of group-specific topics is under-proportional to the share of the group in the corpus and relatively smaller for minorities.
The output of eye tracking Web usability studies can be visualized to the analysts as screenshots of the Web pages with their gaze data. However, the screenshot visualizations are found to be corrupted whenever there are recorded fixations on fixed Web page elements on different scroll positions. The gaze data are not gathered on their fixated fixed elements; rather they are scattered on their recorded scroll positions. This problem has raised our attention to find an approach to link gaze data to their intended fixed elements and gather them in one position on the screenshot. The approach builds upon the concept of creating the screenshot during the recording session, where images of the viewport are captured on visited scroll positions and lastly stitched into one Web page screenshot. Additionally, the fixed elements in the Web page are identified and linked to their fixations. For the evaluation, we compared the interpretation of our enhanced screenshot against the video visualization, which overcomes the problem. The results revealed that both visualizations equally deliver accurate interpretations. However, interpreting the visualizations of eye tracking Web usability studies using the enhanced screenshots outperforms the video visualizations in terms of speed and it requires less temporal demands from the interpreters.