SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions

SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions
SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions
Bibliographical Metadata
Subject:	Querying Distributed RDF Data Sources
Year:	2011
Authors:	Olaf Gorlitz, Steffen Staab
Venue	COLD
Content Metadata
Problem:	SPLENDID

Abstract

In order to leverage the full potential of the Semantic Web it is necessary to transparently query distributed RDF data sources in the same way as it has been possible with federated databases for ages. However, there are significant differences between the Web of (linked) Data and the traditional database approaches. Hence, it is not straightforward to adapt successful database techniques for RDF federation. Reasons are the missing cooperation between SPARQL endpoints and the need for detailed data statistics for estimating the costs of query execution plans. We have implemented SPLENDID, a query optimization strategy for federating SPARQL endpoints based on statistical data obtained from voiD descriptions.

Conclusion

SPLENDID allows for transparent query federation over distributed SPARQL endpoints. In order to achieve a good query execution performance, data source selection and query optimization is based on basic statistical information which is obtained from VOID descriptions. The utilization of open semantic web standards, like VOID and SPARQL endpoints, allows for flexible integration of various distributed and linked RDF data sources. We have described in detail the implementation of the data source selection and the join order optimization. The evaluation shows that our approach can achieve good query performance and is competitive compared to other state-of-the-art federation implementations. In our analysis of the source selection we came to the conclusion that at least predicate and type statistics should be included in VOID description for RDF datasets. The use of 3rd party sameAs links, however, can significantly increase the number of requests and thus, hamper the efficiency of query execution plans. The comparison of the two employed physical join implementations has shown that the network overhead plays an important role. Both hash join and bind join can significantly reduce the query processing time for certain types of queries. With SPLENDID we also like to advocate the adoption of VOID statistics for Linked Data. As next steps, we plan to investigate whether VOID descriptions can easily be extended with more detailed statistics in order to allow for more accurate cardinality estimates and, thus, better query execution plans. On the other hand, the actual query execution has not yet been optimized in SPLENDID. Therefore, we plan to integrate optimization techniques as used in FedX. Moreover, the adoption of the SPARQL 1.1 federation extension will also allow for more efficient query execution.

Future work

As next steps, we plan to investigate whether VOID descriptions can easily be extended with more detailed statistics in order to allow for more accurate cardinality estimates and, thus, better query execution plans. On the other hand, the actual query execution has not yet been optimized in SPLENDID. Therefore, we plan to integrate optimization techniques as used in FedX. Moreover, the adoption of the SPARQL 1.1 federation extension will also allow for more efficient query execution.

Approach

Positive Aspects: {{{PositiveAspects}}}

Negative Aspects: {{{NegativeAspects}}}

Limitations: {{{Limitations}}}

Challenges: {{{Challenges}}}

Proposes Algorithm: {{{ProposesAlgorithm}}}

Methodology: {{{Methodology}}}

Requirements: {{{Requirements}}}

Limitations: {{{Limitations}}}

Implementations

Download-page: {{{Download-page}}}

Access API: {{{API}}}

Information Representation: {{{InfoRepresentation}}}

Data Catalogue: {{{Catalogue}}}

Runs on OS: {{{OS}}}

Property "RunsOn OS" (as page type) with input value "{{{OS}}}" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

Vendor: {{{vendor}}}

Uses Framework: {{{Framework}}}

Has Documentation URL: {{{DocumentationURL}}}

Programming Language: {{{ProgLang}}}

Property "ImplementedIn ProgLang" (as page type) with input value "{{{ProgLang}}}" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

Version: {{{Version}}}

Platform: {{{Platform}}}

Toolbox: {{{Toolbox}}}

GUI: No

Research Problem

Subproblem of: {{{Subproblem}}}

Property "Has Subproblem" (as page type) with input value "{{{Subproblem}}}" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

RelatedProblem: {{{RelatedProblem}}}

Property "Has relatedProblem" (as page type) with input value "{{{RelatedProblem}}}" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

Motivation: {{{Motivation}}}

Evaluation

Experiment Setup: {{{ExperimentSetup}}}

Evaluation Method : {{{EvaluationMethod}}}

Hypothesis: {{{Hypothesis}}}

Description: {{{Description}}}

Dimensions: {{{Dimensions}}}

Benchmark used: {{{Benchmark}}}

Property "Has Benchmark" (as page type) with input value "{{{Benchmark}}}" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

Results: {{{Results}}}

Access API	{{{API}}} +
Event in series	COLD +
Has Challenges	{{{Challenges}}} +
Has DataCatalouge	{{{Catalogue}}} +
Has Description	{{{Description}}} +
Has Dimensions	{{{Dimensions}}} +
Has DocumentationURL	http://{{{DocumentationURL}}} +
Has Downloadpage	http://{{{Download-page}}} +
Has EvaluationMethod	{{{EvaluationMethod}}} +
Has ExperimentSetup	{{{ExperimentSetup}}} +
Has GUI	No +
Has Hypothesis	{{{Hypothesis}}} +
Has InfoRepresentation	{{{InfoRepresentation}}} +
Has Limitations	{{{Limitations}}} +
Has NegativeAspects	{{{NegativeAspects}}} +
Has PositiveAspects	{{{PositiveAspects}}} +
Has Requirements	{{{Requirements}}} +
Has Results	{{{Results}}} +
Has Version	{{{Version}}} +
Has abstract	In order to leverage the full potential of … In order to leverage the full potential of the Semantic Web it is necessary to transparently query distributed RDF data sources in the same way as it has been possible with federated databases for ages. However, there are significant differences between the Web of (linked) Data and the traditional database approaches. Hence, it is not straightforward to adapt successful database techniques for RDF federation. Reasons are the missing cooperation between SPARQL endpoints and the need for detailed data statistics for estimating the costs of query execution plans. We have implemented SPLENDID, a query optimization strategy for federating SPARQL endpoints based on statistical data obtained from voiD descriptions. ical data obtained from voiD descriptions. +
Has authors	Olaf Gorlitz + and Steffen Staab +
Has conclusion	SPLENDID allows for transparent query fede … SPLENDID allows for transparent query federation over distributed SPARQL endpoints. In order to achieve a good query execution performance, data source selection and query optimization is based on basic statistical information which is obtained from VOID descriptions. The utilization of open semantic web standards, like VOID and SPARQL endpoints, allows for flexible integration of various distributed and linked RDF data sources. We have described in detail the implementation of the data source selection and the join order optimization. The evaluation shows that our approach can achieve good query performance and is competitive compared to other state-of-the-art federation implementations. In our analysis of the source selection we came to the conclusion that at least predicate and type statistics should be included in VOID description for RDF datasets. The use of 3rd party sameAs links, however, can significantly increase the number of requests and thus, hamper the efficiency of query execution plans. The comparison of the two employed physical join implementations has shown that the network overhead plays an important role. Both hash join and bind join can significantly reduce the query processing time for certain types of queries. With SPLENDID we also like to advocate the adoption of VOID statistics for Linked Data. As next steps, we plan to investigate whether VOID descriptions can easily be extended with more detailed statistics in order to allow for more accurate cardinality estimates and, thus, better query execution plans. On the other hand, the actual query execution has not yet been optimized in SPLENDID. Therefore, we plan to integrate optimization techniques as used in FedX. Moreover, the adoption of the SPARQL 1.1 federation extension will also allow for more efficient query execution. allow for more efficient query execution. +
Has future work	As next steps, we plan to investigate whet … As next steps, we plan to investigate whether VOID descriptions can easily be extended with more detailed statistics in order to allow for more accurate cardinality estimates and, thus, better query execution plans. On the other hand, the actual query execution has not yet been optimized in SPLENDID. Therefore, we plan to integrate optimization techniques as used in FedX. Moreover, the adoption of the SPARQL 1.1 federation extension will also allow for more efficient query execution. allow for more efficient query execution. +
Has motivation	{{{Motivation}}} +
Has platform	{{{Platform}}} +
Has problem	SPLENDID +
Has subject	Querying Distributed RDF Data Sources +
Has vendor	{{{vendor}}} +
Has year	2011 +
Proposes Algorithm	{{{ProposesAlgorithm}}} +
Title	SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions +
Uses Framework	{{{Framework}}} +
Uses Methodology	{{{Methodology}}} +
Uses Toolbox	{{{Toolbox}}} +

SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions

Contents

Abstract

Conclusion

Future work

Approach

Implementations

Research Problem

Evaluation

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Search

Create

Data

Kuratierung

Tools