Refine
Year of publication
- 2018 (7) (remove)
Document Type
- Master's Thesis (7) (remove)
Keywords
Institute
- Institute for Web Science and Technologies (7) (remove)
Knowledge-based authentication methods are vulnerable to Shoulder surfing phenomenon.
The widespread usage of these methods and not addressing the limitations it has could result in the user’s information to be compromised. User authentication method ought to be effortless to use and efficient, nevertheless secure.
The problem that we face concerning the security of PIN (Personal Identification Number) or password entry is shoulder surfing, in which a direct or indirect malicious observer could identify the user sensitive information. To tackle this issue we present TouchGaze which combines gaze signals and touch capabilities, as an input method for entering user’s credentials. Gaze signals will be primarily used to enhance targeting and touch for selecting. In this work, we have designed three different PIN entry method which they all have similar interfaces. For the evaluation, these methods were compared based on efficiency, accuracy, and usability. The results uncovered that despite the fact that gaze-based methods require extra time for the user to get familiar with yet it is considered more secure. In regards to efficiency, it has the similar error margin to the traditional PIN entry methods.
Ontologies are valuable tools for knowledge representation and important building blocks of the Semantic Web. They are not static and can change over time. Changing an ontology can be necessary for various reasons: the domain that is represented by an ontology can change or an ontology is reused and must be adapted to the new context. In addition, modeling errors could have been introduced into the ontology which must be found and removed. The non-triviality of the change process has led to the emerge of ontology change as an own field of research. The removal of knowledge from ontologies is an important aspect of this change process, because even the addition of new knowledge to an ontology potentially requires the removal of older, conflicting knowledge. Such a removal must be performed in a thought-out way. A naïve change of concepts within the ontology can easily remove other, unrelated knowledge or alter the semantics of concepts in an unintended way [2]. For these reasons, this thesis introduces a formal operator for the fine-grained retraction of knowledge from EL concepts which is partially based on the postulates for belief set contraction and belief base contraction [3, 4, 5] and the work of Suchanek et al. [6]. For this, a short introduction to ontologies and OWL 2 is given and the problem of ontology change is explained. It is then argued why a formal operator can support this process and why the Description Logic EL provides a good starting point for the development of such an operator. After this, a general introduction to Description Logic is given. This includes its history, an overview of its applications and common reasoning tasks in this logic. Following this, the logic EL is defined. In a next step, related work is examined and it is shown why the recovery postulate and the relevance postulate cannot be naïvely employed in the development of an operator that removes knowledge from EL concepts. Following this, the requirements to the operator are formulated and properties are given which are mainly based on the postulates for belief set and belief base contraction. Additional properties are developed which make up for the non-applicability of the recovery and relevance postulates. After this, a formal definition of the operator is given and it is shown that the operator is applicable to the task of a fine-grained removal of knowledge from EL concepts. In a next step, it is proven that the operator fulfills all the previously defined properties. It is then demonstrated how the operator can be combined with laconic justifications [7] to assist a human ontology editor by automatically removing unwanted consequences from an ontology. Building on this, a plugin for the ontology editor Protégé is introduced that is based on algorithms that were derived from the formal definition of the operator. The content of this work is then summarized and a final conclusion is drawn. The thesis closes with an outlook into possible future work.
We examine the systematic underrecognition of female scientists (Matilda effect) by exploring the citation network of papers published in the American Physical Society (APS) journals. Our analysis shows that articles written by men (first author, last author and dominant gender of authors) receive more citations than similar articles written by women (first author, last author and dominant gender of authors) after controlling for the journal of publication, year of publication and content of the publication. Statistical significance of the overlap between the lists of references was considered as the measure of similarity between articles in our analysis. In addition, we found that men are less likely to cite articles written by women and women are less likely to cite articles written by men. This pattern leads to receiving more citations by articles written by men than similar articles written by women because the majority of authors who published in APS journals are male (85%). We also observed Matilda effect reduces when articles are published in journals with the highest impact factors. In other words, people’s evaluation of articles published in these journals is not affected by the gender of authors significantly. Finally, we suggested a method that can be applied by editors in academic journals to reduce the evaluation bias to some extent. Editors can identify missing citations using our proposed method to complete bibliographies. This policy can reduce the evaluation bias because we observed papers written by female scholars (first author, last author, the dominant gender of authors) miss more citations than articles written by male scholars (first author, last author, the dominant gender of authors).
Topic models are a popular tool to extract concepts of large text corpora. These text corpora tend to contain hidden meta groups. The size relation of these groups is frequently imbalanced. Their presence is often ignored when applying a topic model. Therefore, this thesis explores the influence of such imbalanced corpora on topic models.
The influence is tested by training LDA on samples with varying size relations. The samples are generated from data sets containing a large group differences i.e language difference and small group differences i.e. political orientation. The predictive performance on those imbalanced corpora is judged using perplexity.
The experiments show that the presence of groups in training corpora can influence the prediction performance of LDA. The impact varies due to various factors, including language-specific perplexity scores. The group-related prediction performance changes for groups when varying the relative group sizes. The actual change varies between data sets.
LDA is able to distinguish between different latent groups in document corpora if differences between groups are large enough, e.g. for groups with different languages. The proportion of group-specific topics is under-proportional to the share of the group in the corpus and relatively smaller for minorities.
The content aggregator platform Reddit has established itself as one of the most popular websites in the world. However, scientific research on Reddit is hindered as Reddit allows (and even encourages) user anonymity, i.e., user profiles do not contain personal information such as the gender. Inferring the gender of users in large-scale could enable the analysis of gender-specific areas of interest, reactions to events, and behavioral patterns. In this direction, this thesis suggests a machine learning approach of estimating the gender of Reddit users. By exploiting specific conventions in parts of the website, we obtain a ground truth for more than 190 million comments of labeled users. This data is then used to train machine learning classifiers to use them to gain insights about the gender balance of particular subreddits and the platform in general. By comparing a variety of different approaches for classification algorithm, we find that character-level convolutional neural network achieves performance with an 82.3% F1 score on a task of predicting a gender of a user based on his/her comments. The score surpasses 85% mark for frequent users with more than 50 comments. Furthermore, we discover that female users are less active on Reddit platform, they write fewer comments and post in fewer subreddits on average, when compared to male users.
This Master Thesis is an exploratory research to determine whether it is feasible to construct a subjectivity lexicon using Wikipedia. The key hypothesis is that that all quotes in Wikipedia are subjective and all regular text are objective. The degree of subjectivity of a word, also known as ''Quote Score'' is determined based on the ratio of word frequency in quotations to its frequency outside quotations. The proportion of words in the English Wikipedia which are within quotations is found to be much smaller as compared to those which are not in quotes, resulting in a right-skewed distribution and low mean value of Quote Scores.
The methodology used to generate the subjectivity lexicon from text corpus in English Wikipedia is designed in such a way that it can be scaled and reused to produce similar subjectivity lexica of other languages. This is achieved by abstaining from domain and language-specific methods, apart from using only readily-available English dictionary packages to detect and exclude stopwords and non-English words in the Wikipedia text corpus.
The subjectivity lexicon generated from English Wikipedia is compared against other lexica; namely MPQA and SentiWordNet. It is found that words which are strongly subjective tend to have high Quote Scores in the subjectivity lexicon generated from English Wikipedia. There is a large observable difference between distribution of Quote Scores for words classified as strongly subjective versus distribution of Quote Scores for words classified as weakly subjective and objective. However, weakly subjective and objective words cannot be differentiated clearly based on Quote Score. In addition to that, a questionnaire is commissioned as an exploratory approach to investigate whether subjectivity lexicon generated from Wikipedia could be used to extend the coverage of words of existing lexica.
The purpose of this thesis is to explore the sentiment distributions of Wikipedia concepts.
We analyse the sentiment of the entire English Wikipedia corpus, which includes 5,669,867 articles and 1,906,375 talks, by using a lexicon-based method with four different lexicons.
Also, we explore the sentiment distributions from a time perspective using the sentiment scores obtained from our selected corpus. The results obtained have been compared not only between articles and talks but also among four lexicons: OL, MPQA, LIWC, and ANEW.
Our findings show that among the four lexicons, MPQA has the highest sensitivity and ANEW has the lowest sensitivity to emotional expressions. Wikipedia articles show more sentiments than talks according to OL, MPQA, and LIWC, whereas Wikipedia talks show more sentiments than articles according to ANEW. Besides, the sentiment has a trend regarding time series, and each lexicon has its own bias regarding text describing different things.
Moreover, our research provides three interactive widgets for visualising sentiment distributions for Wikipedia concepts regarding the time and geolocation attributes of concepts.