Refine
Year of publication
Document Type
- Master's Thesis (91) (remove)
Language
- English (91) (remove)
Keywords
Currently more than 850 biological databases exist. The majority of biological knowledge is not in these databases but rather contained as free text in scientific literature. For systems biology tasks it is often necessary to integrate and extract data from heterogeneous databases and free text as well as to analyse the information in the context of experimental data. ONDEX is an integration framework which aims to address these challenges by combining features of database integration, text mining and sequence analysis with methods for graph-based data analysis and visualisation. The main topics of this diploma thesis are the redesign of the ONDEX backend, the development of a data exchange format, the development of a query environment and the allocation of Web services for data integration, data exchange and queries. These Web services allow backend workflow control from both local and remote workstations.
The thesis develops and evaluates a hypothetical model of the factors that influence user acceptance of weblog technology. Previous acceptance studies are reviewed, and the various models employed are discussed. The eventual model is based on the technology acceptance model (TAM) by Davis et al. It conceptualizes and operationalizes a quantitative survey conducted by means of an online questionnaire, strictly from a user perspective. Finally, it is tested and validated by applying methods of data analysis.
Public electronic procurement (eProcurement), here electronic sourcing (eSourcing) in particular, is almost certainly on the agenda when eGovernment experts meet. Not surprisingly is eProcurement the first high-impact service to be addressed in the European Union- recent Action Plan. This is mainly dedicated to the fact that public procurement makes out almost 20% of Europe- GDP and therefore holds a huge saving potential. To some extent this potential lies in the common European market, since effective cross-boarder eSourcing solutions can open many doors, both for buyers and suppliers. To achieve this, systems and processes and tools, need to be adoptable, transferable as well as be able to communicate with each other. In one word, they need to be interoperable. In many relevant domains, interoperability has reached a very positive level, standards have been established, workflows been put in place. In other domains however, there is still a long road ahead. As a consequence it is crucial to define requirements for such interoperable eSourcing systems and to identify the progress in research and practice.
The internet is becoming more and more important in daily life. Fundamental changes can be observed in the private sector as well as in the public sector. In the course of this, active involvement of citizens in planning political procedures is more and more supported electronically. The expectations culminate in the assumption that information and communication technology (ICT) can enhance civic participation and reduce disenchantment with politics. Out of these expectations, a lot of eparticipation projects were initiated in Germany. Initiatives were established, e.g. the "Initiative eParticipation", which gave many incentives of electronic participation for policy and administration in order to strengthen decision-making processes with internet supported participation practices. This thesis consists of two major parts. In the first part, definitions of the essential terms are presented. The position of e-participation within the dimension of ebusiness is pointed out. In order to explain e-participation, basics of the classical offline participation are delivered. It will be shown that a change is in progress, not only because of the deployment of ICT. Subsequently, a framework to characterize eparticipation is presented. The European Union is encouraging the implementation of e-participation. So, the city of Koblenz should be no exception. But what is the current situation in Koblenz? To provide an answer to this question, the status quo was examined with the help of a survey among the citizens of Koblenz, which was developed, conducted and evaluated. This is the second major part of this thesis.
Entwicklung eines Regelungsverfahrens zur Pfadverfolgung für ein Modellfahrzeug mit Sattelanhänger
(2009)
Besides the progressive automation of internal goods traffic, there is an important area that should also be considered. This area is the carriage of goods in selected external areas. The use of driverless trucks in logistic centers can report economic efficiency. In particular, these precise control procedures require that trucks drive on predetermined paths. The general aim of this work is the adaption and evaluation of a path following control method for articulated vehicles. The differences in the kinematic behavior between trucks with one-axle trailer and semi-trailer vehicles will be emphasized. Additionally, the characteristic kinematic properties of semi-trailers for the adaptation of a control procedure will be considered. This control procedure was initially designed for trucks with one-axle trailer. It must work in forwards and backwards movements. This control process will be integrated as a closed component on the control software of the model vehicle. Thus, the geometry of the model vehicle will be specified, and the possible special cases of the control process will be discovered. The work also documents the most relevant software components of the implemented control process.
The development of a pan-European public E-Procurement system is an important target of the European Union to enhance the efficiency, transparency and competitiveness of public procurement procedures conducted within the European single market. A great obstacle for cross-border electronic procurement is the heterogeneity of national procurement systems in terms of technical, organizational and legal differences. To overcome this obstacle the European Commission funds several initiatives that contribute to the aim of achieving interoperability for pan-European public procurement. Pan European Public Procurement OnLine (PEPPOL) is one of these initiatives that aims at piloting an interoperable pan-European E-Procurement solution to support businesses and public purchasing entities from different member states to conduct their procurement processes electronically.rnrnAs interoperability and inter-connection of distributed heterogeneous information systems are the major requirements in the European procurement domain, and the VCD sub-domain in particular, service-oriented architecture (SOA) seems to provide a promising approach to realize such an architecture, as it promotes loose coupling and interoperability. This master thesis therefore discusses the SOA approach and how its concepts, methodologies and technologies can be used for the development of interoperable IT systems for electronic public procurement. This discussion is enhanced through a practical application of the discussed SOA methodologies by conceptualizing and prototyping of a sub-system derived from the overall system domain of the Virtual Company Dossier. For that purpose, important aspects of interoperability and related standards and technologies will be examined and put into the context of public electronic procurement. Furthermore, the paradigm behind SOA will be discussed, including the derivation of a top-down development methodology for service-oriented systems.
Mobile payment has been a payment option in the market for a long time now and was predicted to become a widely used payment method. However, over the years, the market penetration rate of mPayments has been relatively low, despite it having all characteristics required of a convenient payment method. The primaryrnreason for this has been cited as a lack of customer acceptance mainly caused due to the lack of perceived security by the end-user. Although biometric authentication is not a new technology, it is experiencing a revival in the light of the present day terror threats and increased security requirements in various industries. The application of biometric authentication in mPayments is analysed here and a suitable biometric authentication method for use with mPayments is recommended. The issue of enrolment, human and technical factors to be considered are discussed and the STOF business model is applied to a BiMoP (biometric mPayment) application.
Multi-agent systems are a mature approach to model complex software systems by means of Agent-Oriented Software Engineering (AOSE). However, their application is not widely accepted in mainstream software engineering. Parallel to this the interdisciplinary field of Agent-based Social Simulation (ABSS) finds increasing recognition beyond the purely academic realm which starts to draw attention from the mainstream of agent researchers. This work analyzes factors to improve the uptake of AOSE as well as characteristics which separate the two fields AOSE and ABSS to understand their gap. Based on the efficiency-oriented micro-agent concept of the Otago Agent Platform (OPAL) we have constructed a new modern and self-contained micro-agent platform called µ². The design takes technological trends into account and integrates representative technologies, such as the functionally-inspired JVM language Clojure (with its Transactional Memory), asynchronous message passing frameworks and the mobile application platform Android. The mobile version of the platform shows an innovative approach to allow direct interaction between Android application components and micro-agents by mapping their related internal communication mechanisms. This empowers micro-agents to exploit virtually any capability of mobile devices for intelligent agent-based applications, robotics or simply act as a distributed middleware. Additionally, relevant platform components for the support of social simulations are identified and partially implemented. To show the usability of the platform for simulation purposes an interaction-centric scenario representing group shaping processes in a multi-cultural context is provided. The scenario is based on Hofstede's concept of 'Cultural Dimensions'. It does not only confirm the applicability of the platform for simulations but also reveals interesting patterns for culturally augmented in- and out-group agents. This explorative research advocates the potential of micro-agents as a powerful general system modelling mechanism while bridging the convergence between mobile and desktop systems. The results stimulate future work on the micro-agent concept itself, the suggested platform and the deeper exploration of mechanisms for seemless interaction of micro-agents with mobile environments. Last but not least the further elaboration of the simulation model as well as its use to augment intelligent agents with cultural aspects offer promising perspectives for future research.
Tractography on HARDI data
(2011)
Diffusion weighted imaging is an important modality in clinical imaging and the only possibility to gain insight into the human brain noninvasively and in-vivo. The applications of this imaging technique are diversified. It is used to study the brain, its structure, development and the functionality of the different areas. Further, important fields of application are neurosurgical planning, examinations of pathologies, investigation of Alzheimer-, strokes, and multiple sclerosis. This thesis gives a brief introduction to MRI and diffusion MRI. Based on this, the mostly used data representation in diffusion MRI in clinical imaging, the diffusion tensor, is introduced. As the diffusion tensor suffers from severe limitations new techniques subsumed under the term HARDI (high angular resolution diffusion imaging) are introduced and discussed in detail. Further, an extensive introduction to tractography, approaches that aim at reconstructing neuronal fibers, is given. Based on the knowledge fromthe theoretical part established tractography algorithms are redesigned to handle HARDI data and, thus, improve the reconstruction of neuronal fibers. Among these algorithms, a novel approach is presented that successfully reconstructs fibers on phantom data as well as on human brain data. Further, a novel global classification approach is presented to cluster voxels according to their diffusion properties.
Identifying reusable legacy code able to implement SOA services is still an open research issue. This master thesis presents an approach to identify legacy code for service implementation based on dynamic analysis and the application of data mining techniques. rnrnAs part of the SOAMIG project, code execution traces were mapped to business processes. Due to the high amount of traces generated by dynamic analyses, the traces must be post-processed in order to provide useful information. rnrnFor this master thesis, two data mining techniques - cluster analysis and link analysis - were applied to the traces. First tests on a Java/Swing legacy system provided good results, compared to an expert- allocation of legacy code.
Magnetic resonance (MR) tomography is an imaging method, that is used to expose the structure and function of tissues and organs in the human body for medical diagnosis. Diffusion weighted (DW) imaging is a specific MR imaging technique, which enables us to gain insight into the connectivity of white matter pathways noninvasively and in vivo. It allows for making predictions about the structure and integrity of those connections. In clinical routine this modality finds application in the planning phase of neurosurgical operations, such as in tumor resections. This is especially helpful if the lesion is deeply seated in a functionally important area, where the risk of damage is given. This work reviews the concepts of MR imaging and DW imaging. Generally, at the current resolution of diffusion weighted data, single white matter axons cannot be resolved. The captured signal rather describes whole fiber bundles. Beside this, it often appears that different complex fiber configurations occur in a single voxel, such as crossings, splittings and fannings. For this reason, the main goal is to assist tractography algorithms who are often confound in such complex regions. Tractography is a method which uses local information to reconstruct global connectivities, i.e. fiber tracts. In the course of this thesis, existing reconstruction methods such as diffusion tensor imaging (DTI) and q-ball imaging (QBI) are evaluated on synthetic generated data and real human brain data, whereas the amount of valuable information provided by the individual reconstruction mehods and their corresponding limitations are investigated. The output of QBI is the orientation distribution function (ODF), where the local maxima coincides with the underlying fiber architecture. We determine those local maxima. Furthermore, we propose a new voxel-based classification scheme conducted on diffusion tensor metrics. The main contribution of this work is the combination of voxel-based classification, local maxima from the ODF and global information from a voxel- neighborhood, which leads to the development of a global classifier. This classifier validates the detected ODF maxima and enhances them with neighborhood information. Hence, specific asymmetric fibrous architectures can be determined. The outcome of the global classifier are potential tracking directions. Subsequently, a fiber tractography algorithm is designed that integrates along the potential tracking directions and is able to reproduce splitting fiber tracts.
Particle swarm optimization is an optimization technique based on simulation of the social behavior of swarms.
The goal of this thesis is to solve 6DOF local pose estimation using a modified particle swarm technique introduced by Khan et al. in 2010. Local pose estimation is achieved by using continuous depth and color data from a RGB-D sensor. Datasets are aquired from different camera poses and registered into a common model. Accuracy and computation time of the implementation is compared to state of the art algorithms and evaluated in different configurations.
The purpose of this master thesis is to enable the Robot Lisa to process complex commands and extract the necessary information in order to perform a complex task as a sequence of smaller tasks. This is intended to be achieved by the improvement of the understanding that Lisa has of her environment by adding semantics to the maps that she builds. The complex command itself will be expected to be already parsed. Therefore the way the input is processed to become a parsed command is out of the scope of this work. Maps that Lisa builds will be improved by the addition of semantic annotations that can include any kind of information that might be useful for the performance of generic tasks. This can include (but not necessarily limited to) hierarchical classifications of locations, objects and surfaces. The processing of the command in addition to some information of the environment shall trigger the performance of a sequence of actions. These actions are expected to be included in Lisa- currently implemented tasks and will rely on the currently existing modules that perform them.
Nevertheless the aim of this work is not only to be able to use currently implemented tasks in a more complex sequence of actions but also make it easier to add new tasks to the complex commands that Lisa can perform.
The World Wide Web (WWW) has become a very important communication channel. Its usage has steadily grown within the past. Interest by website owners in identifying user behaviour has been around since Tim Berners-Lee developed the first web browser in 1990. But as the influence of the online channel today eclipses all other media the interest in monitoring website usage and user activities has intensified as well. Gathering and analysing data about the usage of websites can help to understand customer behaviour, improve services and potentially increase profit.
It is further essential for ensuring effective website design and management, efficient mass customization and effective marketing. Web Analytics (WA) is the area addressing these considerations. However, changing technologies and evolving Web Analytic methods and processes present a challenge to organisations starting with Web Analytic programmes. Because of lacking resources in different areas and other types of websites especially small and medium-sized enterprises (SME) as well as non-profit organisations struggle to operate WA in an effective manner.
This research project aims to identify the existing gap between theory, tool possibilities and business needs for undertaking Web Analytic programmes. Therefore the topic was looked at from three different ways: the academic literature, Web Analytic tools and an interpretative case study. The researcher utilized an action research approach to investigate Web Analytics presenting an holistic overview and to identify the gaps that exists. The outcome of this research project is an overall framework, which provides guidance for SMEs who operate information websites on how to proceed in a Web Analytic programme.
Large amounts of qualitative data make the utilization of computer-assisted methods for their analysis inevitable. In this thesis Text Mining as an interdisciplinary approach, as well as the methods established in the empirical social sciences for analyzing written utterances are introduced. On this basis a process of extracting concept networks from texts is outlined and the possibilities of utilitzing natural language processing methods within are highlighted. The core of this process is text processing, to whose execution software solutions supporting manual as well as automated work are necessary. The requirements to be met by these solutions, against the background of the initiating project GLODERS, which is devoted to investigating extortion racket systems as part of the global fiσnancial system, are presented, and their fulσlment by the two most preeminent candidates reviewed. The gap between theory and pratical application is closed by a prototypical application of the method to a data set of the research project utilizing the two given software solutions.
We present the conceptual and technological foundations of a distributed natural language interface employing a graph-based parsing approach. The parsing model developed in this thesis generates a semantic representation of a natural language query in a 3-staged, transition-based process using probabilistic patterns. The semantic representation of a natural language query is modeled in terms of a graph, which represents entities as nodes connected by edges representing relations between entities. The presented system architecture provides the concept of a natural language interface that is both independent in terms of the included vocabularies for parsing the syntax and semantics of the input query, as well as the knowledge sources that are consulted for retrieving search results. This functionality is achieved by modularizing the system's components, addressing external data sources by flexible modules which can be modified at runtime. We evaluate the system's performance by testing the accuracy of the syntactic parser, the precision of the retrieved search results as well as the speed of the prototype.
Object recognition is a well-investigated area in image-based computer vision and several methods have been developed. Approaches based on Implicit Shape Models have recently become popular for recognizing objects in 2D images, which separate objects into fundamental visual object parts and spatial relationships between the individual parts. This knowledge is then used to identify unknown object instances. However, since the emergence of aσordable depth cameras like Microsoft Kinect, recognizing unknown objects in 3D point clouds has become an increasingly important task. In the context of indoor robot vision, an algorithm is developed that extends existing methods based on Implicit Shape Model approaches to the task of 3D object recognition.
Web application testing is an active research area. Garousi et al. did a systematic mapping study and classified 79 papers published between 2000-2011. However, there seems to be a lack of information exchange between the scientific community and tool developers.
This thesis systematically analyzes the field of functional, system level web application testing tools. 194 candidate tools were collected in the tool search and screened, with 23 tools being selected as foundation of this thesis. These 23 tools were systematically used to generate a feature model of the domain. The methodology to support this is an additional contribution of this thesis. It processes end user documentation of tools belonging to an examined domain and creates a feature model. The feature model gives an overview over the existing features, their alternatives and their distribution. It can be used to identify trends and problems, extraordinary features, help decision making of tool purchase or guide scientists how to focus research.
The mitral valve is one of the four valves in the human heart. It is located in the left heart chamber and its function is to control the blood flow from the left atrium to the left ventricle. Pathologies can lead to malfunctions of the valve so that blood can flow back to the atrium. Patients with a faulty mitral valve function may suffer from fatigue and chest pain. The functionality can be surgically restored, which is often a long and exhaustive intervention. Thorough planning is necessary to ensure a safe and effective surgery. This can be supported by creating pre-operative segmentations of the mitral valve. A post-operative analysis can determine the success of an intervention. This work will combine existing and new ideas to propose a new approach to (semi-)automatically create such valve models. The manual part can guarantee a high quality model and reliability, whereas the automatic part contributes to saving valuable labour time.
The main contributions of the automatic algorithm are an estimated semantic separation of the two leaflets of the mitral valve and an optimization process that is capable of finding a coaptation-line and -area between the leaflets. The segmentation method can perform a fully automatic segmentation of the mitral leaflets if the annulus ring is already given. The intermediate steps of this process will be integrated into a manual segmentation method so a user can guide the whole procedure. The quality of the valve models generated by the method proposed in this work will be measured by comparing them to completely manually segmented models. This will show that commonly used methods to measure the quality of a segmentation are too general and do not suffice to reflect the real quality of a model. Consequently the work at hand will introduce a set of measurements that can qualify a mitral valve segmentation in more detail and with respect to anatomical landmarks. Besides the intra-operative support for a surgeon, a segmented mitral valve provides additional benefits. The ability to patient-specifically obtain and objectively describe the valve anatomy may be the base for future medical research in this field and automation allows to process large data sets with reduced expert dependency. Further, simulation methods that use the segmented models as input may predict the outcome of a surgery.
Geographic cluster based routing in ad-hoc wireless sensor networks is a current field of research. Various algorithms to route in wireless ad-hoc networks based on position information already exist. Among them algorithms that use the traditional beaconing approach as well as algorithms that work beaconless (no information about the environment is required besides the own position and the destination). Geographic cluster based routing with guaranteed message delivery can be carried out on overlay graphs as well. Until now the required planar overlay graphs are not being constructed reactively.
This thesis proposes a reactive algorithm, the Beaconless Cluster Based Planarization (BCBP) algorithm, which constructs a planar overlay graph and noticeably reduces the number of messages required for that. Based on an algorithm for cluster based planarization it beaconlessly constructs a planar overlay graph in an unit disk graph (UDG). An UDG is a model for a wireless network in which every participant has the same sending radius. Evaluation of the algorithm shows it to be more efficient than the non beaconless variant. Another result of this thesis is the Beaconless LLRAP (BLLRAP) algorithm, for which planarity but not continued connectivity could be proven.
One task of executives and project managers in IT companies or departments is to hire suitable developers and to assign them to suitable problems. In this paper, we propose a new technique that directly leverages previous work experience of developers in a systematic manner. Existing evidence for developer expertise based on the version history of existing projects is analyzed. More specifically, we analyze the commits to a repository in terms of affected API usage. On these grounds, we associate APIs with developers and thus we assess API experience of developers. In transitive closure, we also assess programming domain experience.
Code package managers like Cabal track dependencies between packages. But packages rarely use the functionality that their dependencies provide. This leads to unnecessary compilation of unused parts and to speculative conflicts between package versions where there are no conflicts. In two case studies we show how relevant these two problems are. We then describe how we could avoid them by tracking dependencies not between packages but between individual code fragments.
Software systems are often developed as a set of variants to meet diverse requirements. Two common approaches to this are "clone-and-owning" and software product lines. Both approaches have advantages and disadvantages. In previous work we and collaborators proposed an idea which combines both approaches to manage variants, similarities, and cloning by using a virtual platform and cloning-related operators.
In this thesis, we present an approach for aggregating essential metadata to enable a propagate operator, which implements a form of change propagation. For this we have developed a system to annotate code similarities which were extracted throughout the history of a software repository. The annotations express similarity maintenance tasks, which can then either be executed automatically by propagate or have to be performed manually by the user. In this work we outline the automated metadata extraction process and the system for annotating similarities; we explain how the implemented system can be integrated into the workflow of an existing version control system (Git); and, finally, we present a case study using the 101haskell corpus of variants.
In this work a framework is developed that is used to create an evaluation scheme for the evaluation of text processing tools. The evaluation scheme is developed using a model-dependent software evaluation approach and the focus of the model-dependent part is the text-processing process which is derived from the Conceptual Analysis Process developed in the GLODERS project. As input data a German court document is used containing two incidents of extortion racketeering which happened in 2011 and 2012. The evaluation of six different tools shows that one tool offers great results for the given dataset when it is compared to manual results. It is able to identify and visualize relations between concepts without any additional manual work. Other tools also offer good results with minor drawbacks. The biggest drawback for some tools is the unavailability of models for the German language. They can perform automated tasks only on English documents. Nonetheless some tools can be enhanced by self-written code which allows users with development experience to apply additional methods.
Statistical Shape Models (SSMs) are one of the most successful tools in 3Dimage analysis and especially medical image segmentation. By modeling the variability of a population of training shapes, the statistical information inherent in such data are used for automatic interpretation of new images. However, building a high-quality SSM requires manually generated ground truth data from clinical experts. Unfortunately, the acquisition of such data is a time-consuming, error-prone and subjective process. Due to this effort, the majority of SSMs is often based on a limited set of this ground truth training data, which makes the models less statistically meaningful. On the other hand, image data itself is abundant in clinics from daily routine. In this work, methods for automatically constructing a reliable SSM without the need of manual image interpretation from experts are proposed. Thus, the training data is assumed to be the result of any segmentation algorithm or may originate from other sources, e.g. non-expert manual delineations. Depending on the algorithm, the output segmentations will contain errors to a higher or lower degree. In order to account for these errors, areas of low probability of being a boundary should be excluded from the training of the SSM. Therefore, the probabilities are estimated with the help of image-based approaches. By including many shape variations, the corrupted parts can be statistically reconstructed. Two approaches for reconstruction are proposed - an Imputation method and Weighted Robust Principal Component Analysis (WRPCA). This allows the inclusion of many data sets from clinical routine, covering a lot more variations of shape examples. To assess the quality of the models, which are robust against erroneous training shapes, an evaluation compares the generalization and specificity ability to a model build from ground truth data. The results show, that especially WRPCA is a powerful tool to handle corrupted parts and yields to reasonable models, which have a higher quality than the initial segmentations.
The publication of open source software aims to support the reuse, the distribution and the general utilization of software. This can only be enabled by the correct usage of open source software licenses. Therefore associations provide a multitude of open source software licenses with different features, of which a developer can choose, to regulate the interaction with his software. Those licenses are the core theme of this thesis.
After an extensive literature research, two general research questions are elaborated in detail. First, a license usage analysis of licenses in the open source sector is applied, to identify current trends and statistics. This includes questions concerning the distribution of licenses, the consistency in their usage, their association over a period of time and their publication.
Afterwards the recommendation of licenses for specific projects is investigated. Therefore, a recommendation logic is presented, which includes several influences on a suitable license choice, to generate an at most applicable recommendation. Besides the exact features of a license of which a user can choose, different methods of ranking the recommendation results are proposed. This is based on the examination of the current situation of open source licensing and license suggestion. Finally, the logic is evaluated on the exemplary use-case of the 101companies project.
“Did I say something wrong?” A word-level analysis of Wikipedia articles for deletion discussions
(2016)
This thesis focuses on gaining linguistic insights into textual discussions on a word level. It was of special interest to distinguish messages that constructively contribute to a discussion from those that are detrimental to them. Thereby, we wanted to determine whether “I”- and “You”-messages are indicators for either of the two discussion styles. These messages are nowadays often used in guidelines for successful communication. Although their effects have been successfully evaluated multiple times, a large-scale analysis has never been conducted. Thus, we used Wikipedia Articles for Deletion (short: AfD) discussions together with the records of blocked users and developed a fully automated creation of an annotated data set. In this data set, messages were labelled either constructive or disruptive. We applied binary classifiers to the data to determine characteristic words for both discussion styles. Thereby, we also investigated whether function words like pronouns and conjunctions play an important role in distinguishing the two. We found that “You”-messages were a strong indicator for disruptive messages which matches their attributed effects on communication. However, we found “I”-messages to be indicative for disruptive messages as well which is contrary to their attributed effects. The importance of function words could neither be confirmed nor refuted. Other characteristic words for either communication style were not found. Yet, the results suggest that a different model might represent disruptive and constructive messages in textual discussions better.
The purpose of this research is to examine various existing cloud-based Internet of Things (IoT) development platforms and evaluate one platform (IBM Watson IoT) in detail using a use case scenario. Internet of Things IoT is an emerging technology that has a vision of interconnecting the virtual world (e.g. clouds, social networks) and the physical world (e.g. device, cars, fridge, people, animals) through the Internet technology. For example, the IoT concept of smart cities which has the objectives to improve the efficiency and development of business, social and cultural services in the city, can be achieved by using sensors, actuators, clouds and mobile devices (IEEE, 2015). A sensor (e.g. temperature sensor) in the building (global world) can send the real-time data to the IoT cloud platform (virtual world), where it can be monitored, stored, analysed, or used to trigger some action (e.g. turn on the cooling system in the building if temperature exceeds a threshold limit). Although, the IoT creates vast opportunities in different areas (e.g. transportation, healthcare, manufacturing industry), it also brings challenges such as standardisation, interoperability, scalability, security and privacy. In this research report, IoT concepts and related key issues are discussed.
The focus of this research is to compare various cloud-based IoT platforms in order to understand the business and technical features they offer. The cloud-based IoT platforms from IBM, Google, Microsoft, PTC and Amazon have been studied.
To design the research, the Design Science Research (DSR) methodology has been followed, and to model the real-time IoT system the IOT-A modelling approach has been used.
The comparison of different cloud based IoT development platforms shows that all of the studied platforms provide basic IoT functionalities such as connecting the IoT devices to the cloud based IoT platform, collecting data from the IoT devices, data storage and data analytics. However, the IBM’s IoT platform appears to have an edge over the other platforms studied in this research because of the integrated run-time environment which also makes it more developer friendly. Therefore, IBM Watson IoT for Bluemix is selected for further examination of its capabilities. The IBM Watson IoT for Bluemix offerings include analytics, risk management, connect and information management. A use case was implemented to assess the capabilities that IBM Watson IoT platform offers. The digital artifacts (i.e. applications) are produced to evaluate the IBM’s IoT solution. The results show that IBM offers a very scalable, developer and deployment friendly IoT platform. Its cognitive, contextual and predictive analytics provide a promising functionality that can be used to gain insights from the IoT data transmitted by the sensors and other IoT devices.
This thesis analyzes the online attention towards scientists and their research topics. The studies compare the attention dynamics towards the winners of important scientific prizes with scientists who did not receive a prize. Web signals such as Wikipedia page views, Wikipedia edits, and Google Trends were used as a proxy for online attention. One study focused on the time between the creation of the article about a scientist and their research topics. It was discovered that articles about research topics were created closer to the articles of prize winners than to scientists who did not receive a prize. One possible explanation could be that the research topics are more closely related to the scientist who got an award. This supports that scientists who received the prize introduced the topics to the public. Another study considered the public attention trends towards the related research topics before and after a page of a scientist was created. It was observed that after a page about a scientist was created, research topics of prize winners received more attention than the topics of scientists who did not receive a prize. Furthermore, it was demonstrated that Nobel Prize winners get a lower amount of attention before receiving the prize than the potential nominees from the list of Citation Laureates of Thompson Reuters. Also, their popularity is going down faster after receiving it. It was also shown that it is difficult to predict the prize winners based on the attention dynamics towards them.
While Virtual Reality has been around for decades it gained new life in recent years. The release of the first consumer hardware devices allows fully immersive and affordable VR for the user at home. This availability lead to a new focus of research on technical problems as well as psychological effects. The concepts of presence, describing the feeling of being in the virtual place, body ownership and their impact are central topics in research for a long time and still not fully understood.
To enable further research in the area of Mixed Reality, we want to introduce a framework that integrates the users body and surroundings inside a visual coherent virtual environment. As one of two main aspects we want to merge real and virtual objects to a shared environment in a way such that they are no longer visually distinguishable. To achieve this the main focus is not supposed to be on a high graphical fidelity but on a simplified representation of reality. The essential question is, what level of visual realism is necessary to create a believable mixed reality environment that induces a sense of presence in the user? The second aspect considers the integration of virtual persons. Can characters be recorded and replayed in a way such that they are perceived as believable entities of the world and therefore act as a part of the users environment?
The purpose of this thesis was the development of a framework called Mixed Reality Embodiment Platform. This inital system implements fundamental functionalities to be used as a basis for future extensions to the framework. We also provide a first application that enables user studies to evaluate the framework and contribute to aforementioned research questions.
In recent years head mounted displays (HMD) and their abilities to create virtual realities comparable with the real world moved more into the focus of press coverage and consumers. The reason for this lies in constant improvements in available computing power, miniaturisation of components as well as the constantly shrinking power consumption. These trends originate in the general technical progress driven by advancements made in smartphone sector. This gives more people than ever access to the required components to create these virtual realities. However at the same time there is only limited research which uses the current generation of HMDs especially when comparing the virtual and real world against each other. The approach of this thesis is to look into the process of navigating both real and virtual spaces while using modern hardware and software. One of the key areas are the spatial and peripheral perception without which it would be difficult to navigate a given space. The influence of prior real and virtual experiences on these will be another key aspect. The final area of focus is the influence on the emotional state and how it compares to the real world. To research these influences a experiment using the Oculus Rift DK2 HMD will be held in which subjects will be guided through a real space as well as a virtual model of it. Data will be gather in a quantitative manner by using surveys. Finally, the findings will be discussed based on a statistical evaluation. During these tests the different perception of distances and room size will the compared and how they change based on the current reality. Furthermore, the influence of prior spatial activities both in the real and the virtual world will looked into. Lastly, it will be checked how real these virtual worlds are and if they are sufficiently sophisticated to trigger the same emotional responses as the real world.
This work covers techniques for interactive and physically - based rendering of hair for computer generated imagery (CGI). To this end techniques
for the simulation and approximation of the interaction of light with hair are derived and presented. Furthermore it is described how hair, despite such computationally expensive algorithms, can be rendered interactively.
Techniques for computing the shadowing in hair as well as approaches to render hair as transparent geometry are also presented. A main focus of
this work is the DBK-Buffer, which was conceived, implemented and evaluated. Using the DBK-Buffer, it is possible to render thousands of hairs as
transparent geometry without being dependent on either the newest GPU hardware generation or a great amount of video memory. Moreover, a comprehensive evaluation of all the techniques described was conducted with respect to the visual quality, performance and memory requirements. This
revealed that hair can be rendered physically - based at interactive or even at real - time frame rates.
In scientific data visualization huge amounts of data are generated, which implies the task of analyzing these in an efficient way. This includes the reliable detection of important parts and a low expenditure of time and effort. This is especially important for the big-sized seismic volume datasets, that are required for the exploration of oil and gas deposits. Since the generated data is complex and a manual analysis is very time-intensive, a semi-automatic approach could on one hand reduce the time required for the analysis and on the other hand offer more flexibility, than a fully automatic approach.
This master's thesis introduces an algorithm, which is capable of locating regions of interest in seismic volume data automatically by detecting anomalies in local histograms. Furthermore the results are visualized and a variety of tools for the exploration and interpretation of the detected regions are developed. The approach is evaluated by experiments with synthetic data and in interviews with domain experts on the basis of real-world data. Conclusively further improvements to integrate the algorithm into the seismic interpretation workflow are suggested.
With the emergence of current generation head-mounted displays (HMDs), virtual reality (VR) is regaining much interest in the field of medical imaging and diagnosis. Room-scale exploration of CT or MRI data in virtual reality feels like an intuitive application. However in VR retaining a high frame rate is more critical than for conventional user interaction seated in front of a screen. There is strong scientific evidence suggesting that low frame rates and high latency have a strong influence on the appearance of cybersickness. This thesis explores two practical approaches to overcome the high computational cost of volume rendering for virtual reality. One lies within the exploitation of coherency properties of the especially costly stereoscopic rendering setup. The main contribution is the development and evaluation of a novel acceleration technique for stereoscopic GPU ray casting. Additionally, an asynchronous rendering approach is pursued to minimize the amount of latency in the system. A selection of image warping techniques has been implemented and evaluated methodically, assessing the applicability for VR volume rendering.
The Internet of Things (IoT) is a network of addressable, physical objects that contain embedded sensing, communication and actuating technologies to sense and interact with their environment (Geschickter 2015). Like every novel paradigm, the IoT sparks interest throughout all domains both in theory and practice, resulting in the development of systems pushing technology to its limits. These limits become apparent when having to manage an increasing number of Things across various contexts. A plethora of IoT architecture proposals have been developed and prototype products, such as IoT platforms, been introduced. However, each of these architectures and products apply their very own interpretations of an IoT architecture and its individual components so that IoT is currently more an Intranet of Things than an Internet of Things (Zorzi et al. 2010). Thus, this thesis aims to develop a common understanding of the elements forming an IoT architecture and provide high-level specifications in the form of a Holistic IoT Architecture Framework.
Design Science Research (DSR) is used in this thesis to develop the architecture framework based on the pertinent literature. The development of the Holistic IoT Architecture Framework includes the identification of two new IoT Architecture Perspectives that became apparent during the analysis of the IoT architecture proposals identified in the extant literature. While applying these novel perspectives, the need for a new component for the architecture framework, which was merely implicitly mentioned in the literature, became obvious as well. The components of various IoT architecture proposals as well as the novel component, the Thing Management System, were combined, consolidated and related to each other to develop the Holistic IoT Architecture Framework. Subsequently, it was shown that the specifications of the architecture framework are suitable to guide the implementation of a prototype.
This contribution provides a common understanding of the basic building blocks, actors and relations of an IoT architecture.
The Internet of Things (IoT) recently developed from the far-away vision of ubiquitous computing into very tangible endeavors in politics and economy, implemented in expensive preparedness programs. Experts predict considerable changes in business models that need to be addressed by organizations in order to respond to competition. Although there is a need to develop strategies for upcoming transformations, organizational change literature did not turn to the specific change related to the new technology yet. This work aims at investigating IoT-related organizational change by identifying and classifying different change types. It therefore combines the methodological approach of grounded theory with a discussion and classification of identified change informed by a structured literature review of organizational change literature. This includes a meta-analysis of case studies using a qualitative, exploratory coding approach to identify categories of organizational change related to the introduction of IoT. Furthermore a comparison of the identified categories to former technology-related change is provided using the example of Electronic Business (e-business), Enterprise Resource Planning (ERP) systems, and Customer Relationship Management (CRM) systems. As a main result, this work develops a comprehensive model of IoT-related business change. The model presents two main themes of change indicating that personal smart things will transform businesses by means of using more personal devices, suggesting and scheduling actions of their users, and trying to avoid hazards. At the same time, the availability of information in organizations will further increase to a state where information is available ubiquitously. This will ultimately enable accessing real time information about objects and persons anytime and from any place. As a secondary result, this work gives an overview on concepts of technology-related organizational change in academic literature.
Coordination and awareness mechanisms are important in systems for Computer-Supported Cooperative Work (CSCW) and traditional groupware systems. It has been a key focus of research into collaborative groupware and its capability to enable people to efficiently collaborate and coordinate work. Until now, no classification of the mechanisms has been undertaken to identify commonalities and differences in coordination and awareness mechanisms and to show their significance in collaborative environments. In addition, there is a little investigation of coordination and awareness mechanisms in new forms of groupware such as socially enabled Enterprise Collaboration Systems (ECS). Indeed, both in science and in practices, ECS incorporating social software have become increasingly important. Based on the combination of traditional groupware and social software, ECS also include coordination and awareness mechanisms that may simplify collaboration, but these have not yet been investigated.
Therefore, the aim of this thesis is to identify coordination and awareness mechanisms in the academic literature to provide a general overview of those mechanisms examples. Additionally, this thesis aims to classify the mechanism examples. Based on a deep literature analysis, concepts described in literature are chosen and applied with the intension to analyse the mechanisms and to reach a classification. Based on the classification of the identified mechanisms their commonalities and differences are examined and described to gain a better understanding of them. For illustration purpose, examples of coordination and awareness mechanisms and their application are portrayed. The mechanisms examples refer to the classification groups derived. The selection of the mechanisms for the visualization is based on significant differences in their functionality. Subsequently, the selected mechanisms, more based on traditional groupware, are checked to a limited extend whether they can be found in socially enabled ECS. The collaborative platform of IBM Connections serves as a practical example of ECS incorporating social software. IBM Connections is used at the University of Koblenz to run the platform "UniConnect". On the platform it is investigated which of the identified mechanisms examples of the literature are applied in IBM Connections and which additional mechanisms are created by users. This work is the first step in the study of coordination and awareness mechanisms in socially-enabled ECS. In addition, it is expected to detect new mechanisms which are used while the social factor to collaborative work is new.
The purpose of this thesis is to examine and collect coordination and awareness mechanisms examples in literature to analyse them. Additionally, the purpose is to provide a first overview of mechanisms and to classify them by investigating their commonalities. Beside this thesis should give incentive for further investigations to investigate coordination and awareness mechanisms in socially integrated ECS.
This thesis proposes the use of MSR (Mining Software Repositories) techniques to identify software developers with exclusive expertise about specific APIs and programming domains in software repositories. A pilot Tool for finding such
“Islands of Knowledge” in Node.js projects is presented and applied in a case study to the 180 most popular npm packages. It is found that on average each package has 2.3 Islands of Knowledge, which is possibly explained by the finding that npm packages tend to have only one main contributor. In a survey, the maintainers of 50 packages are contacted and asked for opinions on the results produced by the Tool. Together with their responses, this thesis reports on experiences made with the pilot Tool and how future iterations could produce even more accurate statements about programming expertise distribution in developer teams.
The output of eye tracking Web usability studies can be visualized to the analysts as screenshots of the Web pages with their gaze data. However, the screenshot visualizations are found to be corrupted whenever there are recorded fixations on fixed Web page elements on different scroll positions. The gaze data are not gathered on their fixated fixed elements; rather they are scattered on their recorded scroll positions. This problem has raised our attention to find an approach to link gaze data to their intended fixed elements and gather them in one position on the screenshot. The approach builds upon the concept of creating the screenshot during the recording session, where images of the viewport are captured on visited scroll positions and lastly stitched into one Web page screenshot. Additionally, the fixed elements in the Web page are identified and linked to their fixations. For the evaluation, we compared the interpretation of our enhanced screenshot against the video visualization, which overcomes the problem. The results revealed that both visualizations equally deliver accurate interpretations. However, interpreting the visualizations of eye tracking Web usability studies using the enhanced screenshots outperforms the video visualizations in terms of speed and it requires less temporal demands from the interpreters.
Using semantic data from general-purpose programming languages does not provide the unified experience one would want for such an application. Static error checking is lacking, especially with regards to static typing of the data. Based on the previous work of λ-DL, which integrates semantic queries and concepts as types into a typed λ-calculus, this work takes its ideas a step further to meld them into a real-world programming language. This thesis explores how λ-DL's features can be extended and integrated into an existing language, researches an appropriate extension mechanism and produces Semantics4J, a JastAdd-based Java language semantic data extension for type-safe OWL programming, together with examples of its usage.
Mapping ORM to TGraph
(2017)
Object Role Modeling (ORM) is a semantic modeling language used to describe objects and their relations amongst each other. Both objects and relations may be subject to rules or ORM constraints.
TGraphs are ordered, attributed, typed and directed graphs. The type of a TGraph and its components, the edges and vertices, is defined using the schema language graph UML (grUML), a profiled version of UML class diagrams. The goal of this thesis is to map ORM schemas to grUML schemas in order to be able to represent ORM schema instances as TGraphs.
Up to this point, the preferred representation for ORM schema instances is in form of relational tables. Though mappings from ORM schemas to relational schemas exist, those publicly available do not support most of the constraints ORM has to offer.
Constraints can be added to grUML schemas using the TGraph query language GReQL, which can efficiently check whether a TGraph validates the constraint or not. The graph library JGraLab provides efficient implementations of TGraphs and their query language GReQL and supports the generation of grUML schemas.
The first goal of this work is to perform a complete mapping from ORM schemas to grUML schemas, using GReQL to sepcify constraints. The second goal is to represent ORM instances in form of TGraphs.
This work gives an overview of ORM, TGraphs, grUML and GReQL and the theoretical mapping from ORM schemas to grUML schemas. It also describes the implementation of this mapping, deals with the representation of ORM schema instances as TGraphs and the question how grUML constraints can be validated.
The extensive literature in the data visualization field indicates that the process of creating efficient data visualizations requires the data designer to have a large set of skills from different fields (such as computer science, user experience, and business expertise). However, there is a lack of guidance about the visualization process itself. This thesis aims to investigate the different processes for creating data visualizations and develop an integrated framework to guide the process of creating data visualizations that enable the user to create more useful and usable data visualizations. Firstly, existing frameworks in the literature will be identified, analyzed and compared. During this analysis, eight views of the visualization process are developed. These views represent the set of activities which should be done in the visualization process. Then, a preliminary integrated framework is developed based on an analysis of these findings. This new integrated framework is tested in the field of Social Collaboration Analytics on an example from the UniConnect platform. Lastly, the integrated framework is refined and improved based on the results of testing with the help of diagrams, visualizations and textual description. The results show that the visualization process is not a waterfall type. It is the iterative methodology with the certain phases of work, demonstrating how to address the eight views with different levels of stakeholder involvement. The findings are the basis for a visualization process which can be used in future work to develop the fully functional methodology.
This thesis explores the possibilities of probabilistic process modelling for the Computer Supported Cooperative Work (CSCW) systems in order to predict the behaviour of the users present in the CSCW system. Toward this objective applicability, advantages, limitations and challenges of probabilistic modelling are excavated in context of CSCW systems. Finally, as a primary goal seven models are created and examined to show the feasibilities of probabilistic process discovery and predictions of the users behaviour in CSCW systems.
The content aggregator platform Reddit has established itself as one of the most popular websites in the world. However, scientific research on Reddit is hindered as Reddit allows (and even encourages) user anonymity, i.e., user profiles do not contain personal information such as the gender. Inferring the gender of users in large-scale could enable the analysis of gender-specific areas of interest, reactions to events, and behavioral patterns. In this direction, this thesis suggests a machine learning approach of estimating the gender of Reddit users. By exploiting specific conventions in parts of the website, we obtain a ground truth for more than 190 million comments of labeled users. This data is then used to train machine learning classifiers to use them to gain insights about the gender balance of particular subreddits and the platform in general. By comparing a variety of different approaches for classification algorithm, we find that character-level convolutional neural network achieves performance with an 82.3% F1 score on a task of predicting a gender of a user based on his/her comments. The score surpasses 85% mark for frequent users with more than 50 comments. Furthermore, we discover that female users are less active on Reddit platform, they write fewer comments and post in fewer subreddits on average, when compared to male users.
This Master Thesis is an exploratory research to determine whether it is feasible to construct a subjectivity lexicon using Wikipedia. The key hypothesis is that that all quotes in Wikipedia are subjective and all regular text are objective. The degree of subjectivity of a word, also known as ''Quote Score'' is determined based on the ratio of word frequency in quotations to its frequency outside quotations. The proportion of words in the English Wikipedia which are within quotations is found to be much smaller as compared to those which are not in quotes, resulting in a right-skewed distribution and low mean value of Quote Scores.
The methodology used to generate the subjectivity lexicon from text corpus in English Wikipedia is designed in such a way that it can be scaled and reused to produce similar subjectivity lexica of other languages. This is achieved by abstaining from domain and language-specific methods, apart from using only readily-available English dictionary packages to detect and exclude stopwords and non-English words in the Wikipedia text corpus.
The subjectivity lexicon generated from English Wikipedia is compared against other lexica; namely MPQA and SentiWordNet. It is found that words which are strongly subjective tend to have high Quote Scores in the subjectivity lexicon generated from English Wikipedia. There is a large observable difference between distribution of Quote Scores for words classified as strongly subjective versus distribution of Quote Scores for words classified as weakly subjective and objective. However, weakly subjective and objective words cannot be differentiated clearly based on Quote Score. In addition to that, a questionnaire is commissioned as an exploratory approach to investigate whether subjectivity lexicon generated from Wikipedia could be used to extend the coverage of words of existing lexica.
Ontologies are valuable tools for knowledge representation and important building blocks of the Semantic Web. They are not static and can change over time. Changing an ontology can be necessary for various reasons: the domain that is represented by an ontology can change or an ontology is reused and must be adapted to the new context. In addition, modeling errors could have been introduced into the ontology which must be found and removed. The non-triviality of the change process has led to the emerge of ontology change as an own field of research. The removal of knowledge from ontologies is an important aspect of this change process, because even the addition of new knowledge to an ontology potentially requires the removal of older, conflicting knowledge. Such a removal must be performed in a thought-out way. A naïve change of concepts within the ontology can easily remove other, unrelated knowledge or alter the semantics of concepts in an unintended way [2]. For these reasons, this thesis introduces a formal operator for the fine-grained retraction of knowledge from EL concepts which is partially based on the postulates for belief set contraction and belief base contraction [3, 4, 5] and the work of Suchanek et al. [6]. For this, a short introduction to ontologies and OWL 2 is given and the problem of ontology change is explained. It is then argued why a formal operator can support this process and why the Description Logic EL provides a good starting point for the development of such an operator. After this, a general introduction to Description Logic is given. This includes its history, an overview of its applications and common reasoning tasks in this logic. Following this, the logic EL is defined. In a next step, related work is examined and it is shown why the recovery postulate and the relevance postulate cannot be naïvely employed in the development of an operator that removes knowledge from EL concepts. Following this, the requirements to the operator are formulated and properties are given which are mainly based on the postulates for belief set and belief base contraction. Additional properties are developed which make up for the non-applicability of the recovery and relevance postulates. After this, a formal definition of the operator is given and it is shown that the operator is applicable to the task of a fine-grained removal of knowledge from EL concepts. In a next step, it is proven that the operator fulfills all the previously defined properties. It is then demonstrated how the operator can be combined with laconic justifications [7] to assist a human ontology editor by automatically removing unwanted consequences from an ontology. Building on this, a plugin for the ontology editor Protégé is introduced that is based on algorithms that were derived from the formal definition of the operator. The content of this work is then summarized and a final conclusion is drawn. The thesis closes with an outlook into possible future work.
We examine the systematic underrecognition of female scientists (Matilda effect) by exploring the citation network of papers published in the American Physical Society (APS) journals. Our analysis shows that articles written by men (first author, last author and dominant gender of authors) receive more citations than similar articles written by women (first author, last author and dominant gender of authors) after controlling for the journal of publication, year of publication and content of the publication. Statistical significance of the overlap between the lists of references was considered as the measure of similarity between articles in our analysis. In addition, we found that men are less likely to cite articles written by women and women are less likely to cite articles written by men. This pattern leads to receiving more citations by articles written by men than similar articles written by women because the majority of authors who published in APS journals are male (85%). We also observed Matilda effect reduces when articles are published in journals with the highest impact factors. In other words, people’s evaluation of articles published in these journals is not affected by the gender of authors significantly. Finally, we suggested a method that can be applied by editors in academic journals to reduce the evaluation bias to some extent. Editors can identify missing citations using our proposed method to complete bibliographies. This policy can reduce the evaluation bias because we observed papers written by female scholars (first author, last author, the dominant gender of authors) miss more citations than articles written by male scholars (first author, last author, the dominant gender of authors).
Knowledge-based authentication methods are vulnerable to Shoulder surfing phenomenon.
The widespread usage of these methods and not addressing the limitations it has could result in the user’s information to be compromised. User authentication method ought to be effortless to use and efficient, nevertheless secure.
The problem that we face concerning the security of PIN (Personal Identification Number) or password entry is shoulder surfing, in which a direct or indirect malicious observer could identify the user sensitive information. To tackle this issue we present TouchGaze which combines gaze signals and touch capabilities, as an input method for entering user’s credentials. Gaze signals will be primarily used to enhance targeting and touch for selecting. In this work, we have designed three different PIN entry method which they all have similar interfaces. For the evaluation, these methods were compared based on efficiency, accuracy, and usability. The results uncovered that despite the fact that gaze-based methods require extra time for the user to get familiar with yet it is considered more secure. In regards to efficiency, it has the similar error margin to the traditional PIN entry methods.
Topic models are a popular tool to extract concepts of large text corpora. These text corpora tend to contain hidden meta groups. The size relation of these groups is frequently imbalanced. Their presence is often ignored when applying a topic model. Therefore, this thesis explores the influence of such imbalanced corpora on topic models.
The influence is tested by training LDA on samples with varying size relations. The samples are generated from data sets containing a large group differences i.e language difference and small group differences i.e. political orientation. The predictive performance on those imbalanced corpora is judged using perplexity.
The experiments show that the presence of groups in training corpora can influence the prediction performance of LDA. The impact varies due to various factors, including language-specific perplexity scores. The group-related prediction performance changes for groups when varying the relative group sizes. The actual change varies between data sets.
LDA is able to distinguish between different latent groups in document corpora if differences between groups are large enough, e.g. for groups with different languages. The proportion of group-specific topics is under-proportional to the share of the group in the corpus and relatively smaller for minorities.