The search for scientific literature in scientific information systems is a discipline at the intersection between information retrieval and digital libraries. Recent user studies show two typical weaknesses of the classical IR model: ranking of retrieved and maybe relevant documents and the language problem during the query formulation phase. At the same time traditional retrieval systems that rely primarily on textual document and query features are stagnating for years, as it could be observed in IR evaluation campaigns such as TREC or CLEF. Therefore alternative approaches to surpass these two problem fields are needed. Two different search support systems are presented in this work and evaluated with a lab evaluation using the IR test collection GIRT and iSearch with 150 and 65 topics, respectively. These two systems are (1) a query expansion that is based on the analysis of co-occurrences of document attributes and (2) a ranking mechanism that applies informetric analysis of the productivity of information producers in the information production process. Both systems were compared to a baseline system using the Solr search engine. Both methods showed positive effects when applying additional document attributes like author names, ISSN codes and controlled terms. The query expansion showed an improvement in precision (bpref +12%) and in recall (R +22%).
he alternative ranking methods were able to compete with the baseline for author names and ISSN codes and were able to beat the baseline by using controlled terms (MAP +14%). A clear negative influence was seen when using entities like publishers or locations. Both methods were able to generate a substantially different sorting of the result set, measured using Kendall. So, additional to the improved relevance in the result list, the user can get a new and different view on the document set. Query expansion using author names, ISSN codes and thesaurus terms showed great potential that lies within the rich metadata sets of digital library systems. The proposed ranking methods could outperform standard relevance ranking methods after they were filtered by the existence of a so-called power law. This showed that the proposed ranking methods cannot be used universally in any case but require specific frequency distributions in the metadata. A connection between the underlying informetric laws of Bradford, Lotka and Zipf is made clear. The evaluated methods were implemented as interactive search supporting systems that can be used in an interactive prototype and the social science digital library system Sowiport. Besides that, the methods are adaptable to other systems and environments using a free software framework and a web API.
The amount of information on the Web is constantly increasing and also there is a wide variety of information available such as news, encyclopedia articles, statistics, survey data, stock information, events, bibliographies etc. The information is characterized by heterogeneity in aspects such as information type, modality, structure, granularity, quality and by its distributed nature. The two primary techniques by which users on the Web are looking for information are (1) using Web search engines and (2) browsing the links between information. The dominant mode of information presentation is mainly static in the form of text, images and graphics. Interactive visualizations offer a number of advantages for the presentation and exploration of heterogeneous information on the Web: (1) They provide different representations for different, very large and complex types of information and (2) large amounts of data can be explored interactively using their attributes and thus can support and expand the cognition process of the user. So far, interactive visualizations are still not an integral part in the search process of the Web. The technical standards and interaction paradigms to make interactive visualization usable by the mass are introduced only slowly through standardatization organizations. This work examines how interactive visualizations can be used for the linking and search process of heterogeneous information on the Web. Based on principles in the areas of information retrieval (IR), information visualization and information processing, a model is created, which extends the existing structural models of information visualization with two new processes: (1) linking of information in visualizations and (2) searching, browsing and filtering based on glyphs. The Vizgr toolkit implements the developed model in a web application. In four different application scenarios, aspects of the model will be instantiated and are evaluated in user tests or examined by examples.