Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-joins

From Openresearch
Jump to navigation Jump to search
Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-joins
Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-joins
Bibliographical Metadata
Subject: Querying Distributed RDF Data Sources
Keywords: SPARQL, RDF, Distributed Querying
Year: 2008
Authors: Jan Zemánek, Simon Schenk, Vojtěch Svátek, Abraham Bernstein
Venue ISWC
Content Metadata
Problem: SPARQL Query Federation
Approach: No data available now.
Implementation: Distributed SPARQL
Evaluation: Performance Evaluation

Abstract

With the ever-increasing amount of data on the Web available at SPARQL endpoints the need for an integrated and transparent way of accessing the data has arisen. It is highly desirable to have a way of asking SPARQL queries that make use of data residing in disparate data sources served by multiple SPARQL endpoints. We aim at providing such a capability and thus enabling an integrated way of querying the whole Semantic Web at a time.

Conclusion

We briefly presented our Sesame extension Distributed SPARQL which aims at providing an integrated way of querying data sources scattered across multiple SPARQL endpoints. We shortly described its implementation and optimization used so far and outlined the direction for its future development. Distributed SPARQL is a part of Networked Graphs project and is publicly available at https://launchpad.net/networkedgraphs.

Future work

We would like to further improve the query evaluation performance by introducing a distributed join-aware join reordering. We will make use of the current Sesame optimization techniques for local queries and add our own component which will be re-ordering joins according to their relative costs. The costs will be based on statistics taking into account a sub-query selectivity combined with the distinction whether a triple pattern is supposed to be evaluated locally or at a remote SPARQL endpoint. In addition to join re-ordering we would like to make use of statistics about SPARQL endpoints in order to optimize queries even further. Hopefully the recent initiative called Vocabulary of Interlinked Datasets (http://community.linkeddata.org/MediaWiki/index.php?VoiD) will get to a point where it could be used for this purpose.

Approach

Positive Aspects: No data available now.

Negative Aspects: No data available now.

Limitations: No data available now.

Challenges: No data available now.

Proposes Algorithm: No data available now.

Methodology: No data available now.

Requirements: No data available now.

Limitations: No data available now.

Implementations

Download-page: No data available now.

Access API: No data available now.

Information Representation: No data available now.

Data Catalogue: -

Runs on OS: No data available now.

Vendor: No data available now.

Uses Framework: No data available now.

Has Documentation URL: No data available now.

Programming Language: Java

Version: No data available now.

Platform: Sesame

Toolbox: No data available now.

GUI: No

Research Problem

Subproblem of: No data available now.

RelatedProblem: No data available now.

Motivation: No data available now.

Evaluation

Experiment Setup: No data available now.

Evaluation Method : No data available now.

Hypothesis: No data available now.

Description: No data available now.

Dimensions: {{{Dimensions}}}

Benchmark used: No data available now.

Results: No data available now.