Difference between revisions of "A Semantic Web Middleware for Virtual Data Integration on the Web"

From Openresearch
Jump to: navigation, search
m
 
(13 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
|Subject=Querying Distributed RDF Data Sources
 
|Subject=Querying Distributed RDF Data Sources
 
|Authors=Andreas Langegger, Wolfram Wöß, Martin Blochl,
 
|Authors=Andreas Langegger, Wolfram Wöß, Martin Blochl,
 +
|Series=ESWC
 +
|Year=2008
 +
|Keywords=not available.
 
|Abstract=In this contribution a system is presented, which provides access to distributed data sources using Semantic Web technology. While it was primarily designed for data sharing and scientific collaboration, it is regarded as a base technology useful for many other Semantic Web applications. The proposed system allows to retrieve data using SPARQL queries, data sources can register and abandon freely, and all RDF Schema or OWL vocabularies can be used to describe their data, as long as they are accessible on the Web. Data heterogeneity is addressed by RDF-wrappers like D2R-Server placed on top of local information systems. A query does not directly refer to actual endpoints, instead it contains graph patterns adhering to a virtual data set. A mediator finally pulls and joins RDF data from different endpoints providing a transparent on-the-fly view to the end-user. The SPARQL protocol has been defined to enable systematic data access to remote endpoints. However, remote SPARQL queries require the explicit notion of endpoint URIs. The presented system allows users to execute queries without the need to specify target endpoints. Additionally, it is possible to execute join and union operations across different remote endpoints. The optimization of such distributed operations is a key factor concerning the performance of the overall system. Therefore, proven concepts from database research can be applied.
 
|Abstract=In this contribution a system is presented, which provides access to distributed data sources using Semantic Web technology. While it was primarily designed for data sharing and scientific collaboration, it is regarded as a base technology useful for many other Semantic Web applications. The proposed system allows to retrieve data using SPARQL queries, data sources can register and abandon freely, and all RDF Schema or OWL vocabularies can be used to describe their data, as long as they are accessible on the Web. Data heterogeneity is addressed by RDF-wrappers like D2R-Server placed on top of local information systems. A query does not directly refer to actual endpoints, instead it contains graph patterns adhering to a virtual data set. A mediator finally pulls and joins RDF data from different endpoints providing a transparent on-the-fly view to the end-user. The SPARQL protocol has been defined to enable systematic data access to remote endpoints. However, remote SPARQL queries require the explicit notion of endpoint URIs. The presented system allows users to execute queries without the need to specify target endpoints. Additionally, it is possible to execute join and union operations across different remote endpoints. The optimization of such distributed operations is a key factor concerning the performance of the overall system. Therefore, proven concepts from database research can be applied.
|Future work=For a mediator, minimizing response time is usually more important than maximizing throughput. Because ARQ is pipelined, response time is very good. However, shipping data is costly, so another goal is the minimization of the amount of data transferred. When using the REST-based SPARQL protocol a second requirement is to minimize the number of required requests. Query 2 and 3 show bad performance mainly because of bad join ordering and not pushing down filters to local sub-plans.
 
|Year=2008
 
 
|Conclusion=In this contribution a mediator-based system for virtual data integration based on SemanticWeb technology has been presented. The system is primarily developed for sharing scientific data, but because of its generic architecture, it is supposed to be used for many other Semantic Web applications. In this paper query federation based on SPARQL and Jena/ARQ has been demonstrated in detail and several concepts for query optimization which is currently on the agenda have been discussed. Additional contributions can be expected after the implementation of additional features mentioned before.
 
|Conclusion=In this contribution a mediator-based system for virtual data integration based on SemanticWeb technology has been presented. The system is primarily developed for sharing scientific data, but because of its generic architecture, it is supposed to be used for many other Semantic Web applications. In this paper query federation based on SPARQL and Jena/ARQ has been demonstrated in detail and several concepts for query optimization which is currently on the agenda have been discussed. Additional contributions can be expected after the implementation of additional features mentioned before.
 +
|Future work=Other future work will be the support for DESCRIBE-queries and IRIs as subjects. In
 +
future, the mediator should also use an OWL-DL reasoner to infer additional types for
 +
subject nodes specified in the query pattern. Currently, types have to be explicitly specified
 +
for each BGP (more precisely for the first occurrence: the algorithm caches already
 +
known types). OWL-DL constraints like for example a qualified cardinality restriction
 +
on obs:byObserver with owl:allValuesFrom obs:Observer would allow
 +
the mediator to deduce types of other nodes in the query pattern.
 +
|Problem=SPARQL Query Federation
 +
|Approach=Querying Distributed RDF Data Sources
 
|Implementation=SemWIQ
 
|Implementation=SemWIQ
|GUI=No
+
|Evaluation=Sample Queries Evaluation
 +
|Model=Architectural
 +
|Download-page=https://sourceforge.net/projects/semwiq/
 +
|API=-
 +
|InfoRepresentation=RDF
 +
|Catalogue=RDF stats + VoID
 +
|OS=OS independent
 +
|vendor=Open source
 +
|Framework=Jena
 +
|DocumentationURL=http://semwiq.faw.uni-linz.ac.at/core/2007-10-24/ catalog.owl.
 +
|ProgLang=Java
 +
|Version=1
 +
|Platform=Jena
 +
|Toolbox=-
 +
|GUI=Yes
 +
|ExperimentSetup=The tests were performed with the following setup: the mediator (and also the test client) where running on a 2.16 GHz Intel Core 2 Duo with 2 GB memory and a 2 MBit link to the remote endpoints. All endpoints were simulated on the same physical host running two AMD Opteron CPUs at 1.6 GHz and 2 GB memory.
 +
|EvaluationMethod=Evaluate the system using a set of sample queries
 +
|Hypothesis=-
 +
|Description=For the following sample queries, real-world data of sunspot observations recorded at
 +
Kanzelh¨ohe Solar Observatory (KSO) have been used. The observatory is also a partner
 +
in the Austrian Grid project.
 +
The queries are shown in Fig. 2. Query 1 retrieves the first name, the last name,
 +
and optionally the e-mail address of scientists who have done observations. Query 2
 +
retrieves all observations ever recorded by Mr. Otruba.
 +
|Dimensions=Performance
 +
|Benchmark=Kanzelh¨ohe Solar Observatory (KSO)
 +
|Results=Because ARQ is using a pipelining concept the response time is very good, even
 +
when data has to be retrieved from a remote data source.
 
}}
 
}}

Latest revision as of 21:52, 11 July 2018

A Semantic Web Middleware for Virtual Data Integration on the Web
A Semantic Web Middleware for Virtual Data Integration on the Web
Bibliographical Metadata
Subject: Querying Distributed RDF Data Sources
Keywords: not available.
Year: 2008
Authors: Andreas Langegger, Wolfram Wöß, Martin Blochl
Venue ESWC
Content Metadata
Problem: SPARQL Query Federation
Approach: Querying Distributed RDF Data Sources
Implementation: SemWIQ
Evaluation: Sample Queries Evaluation

Abstract

In this contribution a system is presented, which provides access to distributed data sources using Semantic Web technology. While it was primarily designed for data sharing and scientific collaboration, it is regarded as a base technology useful for many other Semantic Web applications. The proposed system allows to retrieve data using SPARQL queries, data sources can register and abandon freely, and all RDF Schema or OWL vocabularies can be used to describe their data, as long as they are accessible on the Web. Data heterogeneity is addressed by RDF-wrappers like D2R-Server placed on top of local information systems. A query does not directly refer to actual endpoints, instead it contains graph patterns adhering to a virtual data set. A mediator finally pulls and joins RDF data from different endpoints providing a transparent on-the-fly view to the end-user. The SPARQL protocol has been defined to enable systematic data access to remote endpoints. However, remote SPARQL queries require the explicit notion of endpoint URIs. The presented system allows users to execute queries without the need to specify target endpoints. Additionally, it is possible to execute join and union operations across different remote endpoints. The optimization of such distributed operations is a key factor concerning the performance of the overall system. Therefore, proven concepts from database research can be applied.

Conclusion

In this contribution a mediator-based system for virtual data integration based on SemanticWeb technology has been presented. The system is primarily developed for sharing scientific data, but because of its generic architecture, it is supposed to be used for many other Semantic Web applications. In this paper query federation based on SPARQL and Jena/ARQ has been demonstrated in detail and several concepts for query optimization which is currently on the agenda have been discussed. Additional contributions can be expected after the implementation of additional features mentioned before.

Future work

Other future work will be the support for DESCRIBE-queries and IRIs as subjects. In future, the mediator should also use an OWL-DL reasoner to infer additional types for subject nodes specified in the query pattern. Currently, types have to be explicitly specified for each BGP (more precisely for the first occurrence: the algorithm caches already known types). OWL-DL constraints like for example a qualified cardinality restriction on obs:byObserver with owl:allValuesFrom obs:Observer would allow the mediator to deduce types of other nodes in the query pattern.

Approach

Positive Aspects: {{{PositiveAspects}}}

Negative Aspects: {{{NegativeAspects}}}

Limitations: {{{Limitations}}}

Challenges: {{{Challenges}}}

Proposes Algorithm: {{{ProposesAlgorithm}}}

Methodology: {{{Methodology}}}

Requirements: {{{Requirements}}}

Limitations: {{{Limitations}}}

Implementations

Download-page: https://sourceforge.net/projects/semwiq/

Access API: -

Information Representation: RDF

Data Catalogue: RDF stats + VoID

Runs on OS: OS independent

Vendor: Open source

Uses Framework: Jena

Has Documentation URL: http://semwiq.faw.uni-linz.ac.at/core/2007-10-24/ catalog.owl.

Programming Language: Java

Version: 1

Platform: Jena

Toolbox: -

GUI: Yes

Research Problem

Subproblem of: {{{Subproblem}}}

Property "Has Subproblem" (as page type) with input value "{{{Subproblem}}}" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

RelatedProblem: {{{RelatedProblem}}}

Property "Has relatedProblem" (as page type) with input value "{{{RelatedProblem}}}" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

Motivation: {{{Motivation}}}

Evaluation

Experiment Setup: The tests were performed with the following setup: the mediator (and also the test client) where running on a 2.16 GHz Intel Core 2 Duo with 2 GB memory and a 2 MBit link to the remote endpoints. All endpoints were simulated on the same physical host running two AMD Opteron CPUs at 1.6 GHz and 2 GB memory.

Evaluation Method : Evaluate the system using a set of sample queries

Hypothesis: -

Description: For the following sample queries, real-world data of sunspot observations recorded at Kanzelh¨ohe Solar Observatory (KSO) have been used. The observatory is also a partner in the Austrian Grid project. The queries are shown in Fig. 2. Query 1 retrieves the first name, the last name, and optionally the e-mail address of scientists who have done observations. Query 2 retrieves all observations ever recorded by Mr. Otruba.

Dimensions: Performance

Benchmark used: Kanzelh¨ohe Solar Observatory (KSO)

Results: Because ARQ is using a pipelining concept the response time is very good, even when data has to be retrieved from a remote data source.

Access API{{{API}}} +
Has Challenges{{{Challenges}}} +
Has DataCatalouge{{{Catalogue}}} +
Has Description{{{Description}}} +
Has Dimensions{{{Dimensions}}} +
Has DocumentationURLhttp://{{{DocumentationURL}}} +
Has Downloadpagehttp://{{{Download-page}}} +
Has EvaluationMethod{{{EvaluationMethod}}} +
Has ExperimentSetup{{{ExperimentSetup}}} +
Has GUINo +
Has Hypothesis{{{Hypothesis}}} +
Has ImplementationSemWIQ +
Has InfoRepresentation{{{InfoRepresentation}}} +
Has Limitations{{{Limitations}}} +
Has NegativeAspects{{{NegativeAspects}}} +
Has PositiveAspects{{{PositiveAspects}}} +
Has Requirements{{{Requirements}}} +
Has Results{{{Results}}} +
Has Version{{{Version}}} +
Has abstractIn this contribution a system is presented
In this contribution a system is presented, which provides access to distributed data sources using Semantic Web technology. While it was primarily designed for data sharing and scientific collaboration, it is regarded as a base technology useful for many other Semantic Web applications. The proposed system allows to retrieve data using SPARQL queries, data sources can register and abandon freely, and all RDF Schema or OWL vocabularies can be used to describe their data, as long as they are accessible on the Web. Data heterogeneity is addressed by RDF-wrappers like D2R-Server placed on top of local information systems. A query does not directly refer to actual endpoints, instead it contains graph patterns adhering to a virtual data set. A mediator finally pulls and joins RDF data from different endpoints providing a transparent on-the-fly view to the end-user. The SPARQL protocol has been defined to enable systematic data access to remote endpoints. However, remote SPARQL queries require the explicit notion of endpoint URIs. The presented system allows users to execute queries without the need to specify target endpoints. Additionally, it is possible to execute join and union operations across different remote endpoints. The optimization of such distributed operations is a key factor concerning the performance of the overall system. Therefore, proven concepts from database research can be applied.
pts from database research can be applied. +
Has authorsAndreas Langegger +, Wolfram Wöß + and Martin Blochl +
Has conclusionIn this contribution a mediator-based syst
In this contribution a mediator-based system for virtual data integration based on SemanticWeb technology has been presented. The system is primarily developed for sharing scientific data, but because of its generic architecture, it is supposed to be used for many other Semantic Web applications. In this paper query federation based on SPARQL and Jena/ARQ has been demonstrated in detail and several concepts for query optimization which is currently on the agenda have been discussed. Additional contributions can be expected after the implementation of additional features mentioned before.
n of additional features mentioned before. +
Has future workFor a mediator, minimizing response time i
For a mediator, minimizing response time is usually more important than maximizing throughput. Because ARQ is pipelined, response time is very good. However, shipping data is costly, so another goal is the minimization of the amount of data transferred. When using the REST-based SPARQL protocol a second requirement is to minimize the number of required requests. Query 2 and 3 show bad performance mainly because of bad join ordering and not pushing down filters to local sub-plans.
t pushing down filters to local sub-plans. +
Has motivation{{{Motivation}}} +
Has platform{{{Platform}}} +
Has subjectQuerying Distributed RDF Data Sources +
Has vendor{{{vendor}}} +
Has year2008 +
Proposes Algorithm{{{ProposesAlgorithm}}} +
TitleA Semantic Web Middleware for Virtual Data Integration on the Web +
Uses Framework{{{Framework}}} +
Uses Methodology{{{Methodology}}} +
Uses Toolbox{{{Toolbox}}} +