Das Suchergebnis hat sich seit Ihrer Suchanfrage verändert. Eventuell werden Dokumente in anderer Reihenfolge angezeigt.
  • Treffer 3 von 244
Zurück zur Trefferliste

Study on Data Placement Strategies in Distributed RDF Stores

  • The distributed setting of RDF stores in the cloud poses many challenges. One such challenge is how the data placement on the compute nodes can be optimized to improve the query performance. To address this challenge, several evaluations in the literature have investigated the effects of existing data placement strategies on the query performance. A common drawback in theses evaluations is that it is unclear whether the observed behaviors were caused by the data placement strategies (if different RDF stores were evaluated as a whole) or reflect the behavior in distributed RDF stores (if cloud processing frameworks like Hadoop MapReduce are used for the evaluation). To overcome these limitations, this thesis develops a novel benchmarking methodology for data placement strategies that uses a data-placement-strategy-independent distributed RDF store to analyze the effect of the data placement strategies on query performance. With this evaluation methodology the frequently used data placement strategies have been evaluated. This evaluation challenged the commonly held belief that data placement strategies that emphasize local computation, such as minimal edge-cut cover, lead to faster query executions. The results indicate that queries with a high workload may be executed faster on hash-based data placement strategies than on, e.g., minimal edge-cut covers. The analysis of the additional measurements indicates that vertical parallelization (i.e., a well-distributed workload) may be more important than horizontal containment (i.e., minimal data transport) for efficient query processing. Moreover, to find a data placement strategy with a high vertical parallelization, the thesis tests the hypothesis that collocating small connected triple sets on the same compute node while balancing the amount of triples stored on the different compute nodes leads to a high vertical parallelization. Specifically, the thesis proposes two such data placement strategies. The first strategy called overpartitioned minimal edge-cut cover was found in the literature and the second strategy is the newly developed molecule hash cover. The evaluation revealed a balanced query workload and a high horizontal containment, which lead to a high vertical parallelization. As a result these strategies showed a better query performance than the frequently used data placement strategies.

Volltext Dateien herunterladen

Metadaten exportieren

Weitere Dienste

Teilen auf Twitter Suche bei Google Scholar
Metadaten
Verfasserangaben:Daniel Janke
URN:urn:nbn:de:kola-20250
Gutachter:Steffen Staab, Georg Lausen
Betreuer:Steffen Staab
Dokumentart:Dissertation
Sprache:Englisch
Datum der Fertigstellung:05.02.2020
Datum der Veröffentlichung:06.02.2020
Veröffentlichende Institution:Universität Koblenz, Universitätsbibliothek
Titel verleihende Institution:Universität Koblenz, Fachbereich 4
Datum der Abschlussprüfung:31.01.2020
Datum der Freischaltung:06.02.2020
Seitenzahl:xiii, 197
Institute:Fachbereich 4 / Institute for Web Science and Technologies
BKL-Klassifikation:54 Informatik
Lizenz (Deutsch):License LogoEs gilt das deutsche Urheberrecht: § 53 UrhG