Querying the Web of Interlinked Datasets using VOID Descriptions

From Openresearch
Jump to: navigation, search
Querying the Web of Interlinked Datasets using VOID Descriptions
Querying the Web of Interlinked Datasets using VOID Descriptions
Bibliographical Metadata
Year: 2012
Authors: Ziya Akar, Tayfun Gökmen Halaç, Erdem Eser Ekinci, Oguz Dikenelli
Venue LDOW
Content Metadata
Problem: SPARQL Query Federation
Approach: analyzing query structure with respect to the metadata of datasets
Implementation: WoDQA
Evaluation: No evaluation exists.

Abstract

Query processing is an important way of accessing data on the Semantic Web. Today, the Semantic Web is characterized as a web of interlinked datasets, and thus querying the web can be seen as dataset integration on the web. Also, this dataset integration must be transparent from the data consumer as if she is querying the whole web. To decide which datasets should be selected and integrated for a query, one requires a metadata of the web of data. In this paper, to enable this transparency, we introduce a federated query engine called WoDQA (Web of Data Query Analyzer) which discovers datasets relevant with a query in an automated manner using VOID documents as metadata. WoDQA focuses on powerful dataset elimination by analyzing query structure with respect to the metadata of datasets. Dataset and linkset descriptions in VOID documents are analyzed for a SPARQL query and a federated query is constructed. By means of linkset concept of VOID, links between datasets are incorporated into selection of federated data sources. Current version ofWoDQA is available as a SPARQL endpoint.

Conclusion

In this paper, we have introduced a query federation engine called WoDQA that discovers related datasets in a VOID store for a query and distributes the query over these datasets. The novelty of our approach is exhaustive dataset selection mechanism which includes analysis of triple pattern relations and links between datasets besides analyzing datasets for each triple pattern. WoDQA focuses on discovering relevant datasets and eliminating irrelevant ones using a rule-based approach introduced in this paper. Our approach requires query the dataset, reflect actual content of the dataset completely and accurately, and include linksets between datasets to select datasets ectively. WoDQA allows users to construct raw queries without the need to know how query will divide into sub-queries and where sub-queries are executed. Query results are complete under the assumption of available, accurate and complete VOID descriptions of datasets. The initial version of WoDQA which is introduced in this paper has some disadvantages arising from query federation approach which WoDQA builds upon. As mentioned previously, follow-your-nose has some problems such as missing results and large document retrieval. Similar problems may occur for query federation. Firstly, to find complete results to queries, it is required that metadata of all datasets must be well-defined and accurate. But, to provide such an accurate dataset metadata an automated mechanism which continuously updates the metadata is required. However, even there would be a tool which implements this requirement, providing accurate dataset metadata via such a tool is the responsibility of dataset publishers. Another problems of query federation are high latency and low selectivity of datasets which are similar to retrieval of large documents in follow-your-nose. Query optimization can be a solution for these problems of query federation. Grouping triple patterns to lter more triples on an endpoint can prevent high latency (required processing time) and changing query evaluation order according to dataset selectivity statistics can prevent retrieving large result sets. To make WoDQA functioning in the wild, optimization step of query federation is required to be implemented. We plan to incorporate triple pattern selectivity into query reorganization using VOID properties about statistics. On the other hand, we could not make an evaluation of our approach in this paper, since VOID documents in current VOID stores are not well-dened. Since SPARQL endpoint denitions, linkset descriptions or vocabularies are missing in most of VOID documents, we could not nd a chance to execute comprehensive scenarios. Developing a tool which extracts well-dened VOID descriptions of datasets, and by this means evaluating our approach is a required future work to confirm applicability of WoDQA on linked open data. Also, evaluating the analysis cost of WoDQA for a large VOID store will be possible when well-dened VOIDs are constructed.

Future work

Developing a tool which extracts well-defined VOID descriptions of datasets, and by this means evaluating our approach is a required future work to confirm applicability of WoDQA on linked open data. Also, evaluating the analysis cost of WoDQA for a large VOID store will be possible when well-defined VOIDs are constructed.

Approach

Positive Aspects: {{{PositiveAspects}}}

Negative Aspects: {{{NegativeAspects}}}

Limitations: {{{Limitations}}}

Challenges: {{{Challenges}}}

Proposes Algorithm: {{{ProposesAlgorithm}}}

Methodology: {{{Methodology}}}

Requirements: {{{Requirements}}}

Limitations: {{{Limitations}}}

Implementations

Download-page: https://sourceforge.net/projects/wodqa/&action=edit&redlink=1

Access API: -

Information Representation: RDF

Data Catalogue: VoID stores

Runs on OS: OS independent

Vendor: Open source

Uses Framework: {{{Framework}}}

Has Documentation URL: https://sourceforge.net/projects/wodqa/

Programming Language: Java

Version: 1.0

Platform: Jena

Toolbox: -

GUI: Yes

Research Problem

Subproblem of: Query processing on Linked Data

RelatedProblem: missing results and large document retrieval.

Motivation: No data available now.

Evaluation

Experiment Setup: -

Evaluation Method : -

Hypothesis: -

Description: -

Dimensions: -

Benchmark used: -

Results: -

Access API- +
Event in seriesLDOW +
Has Benchmark- +
Has Challenges{{{Challenges}}} +
Has DataCatalougeVoID stores +
Has Description- +
Has Dimensions- +
Has DocumentationURLhttps://sourceforge.net/projects/wodqa/ +
Has Downloadpagehttps://sourceforge.net/projects/wodqa/&action=edit&redlink=1 +
Has EvaluationNo evaluation exists. +
Has EvaluationMethod- +
Has ExperimentSetup- +
Has GUIYes +
Has Hypothesis- +
Has ImplementationWoDQA +
Has InfoRepresentationRDF +
Has Limitations{{{Limitations}}} +
Has NegativeAspects{{{NegativeAspects}}} +
Has PositiveAspects{{{PositiveAspects}}} +
Has Requirements{{{Requirements}}} +
Has Results- +
Has SubproblemQuery processing on Linked Data +
Has Version1.0 +
Has abstractQuery processing is an important way of ac
Query processing is an important way of accessing data on the Semantic Web. Today, the Semantic Web is characterized as a web of interlinked datasets, and thus querying the

web can be seen as dataset integration on the web. Also, this dataset integration must be transparent from the data consumer as if she is querying the whole web. To decide which datasets should be selected and integrated for a query, one requires a metadata of the web of data. In this paper, to enable this transparency, we introduce a federated query engine called WoDQA (Web of Data Query Analyzer) which discovers datasets relevant with a query in an automated manner using VOID documents as metadata. WoDQA focuses on powerful dataset elimination by analyzing query structure with respect to the metadata of datasets. Dataset and linkset descriptions in VOID documents are analyzed for

a SPARQL query and a federated query is constructed. By means of linkset concept of VOID, links between datasets are incorporated into selection of federated data sources. Current version ofWoDQA is available as a SPARQL endpoint.
ofWoDQA is available as a SPARQL endpoint. +
Has approachanalyzing query structure with respect to the metadata of datasets +
Has authorsZiya Akar +, Tayfun Gökmen Halaç +, Erdem Eser Ekinci + and Oguz Dikenelli +
Has conclusionIn this paper, we have introduced a query
In this paper, we have introduced a query federation engine called WoDQA that discovers related datasets in a VOID store for a query and distributes the query over these datasets.

The novelty of our approach is exhaustive dataset selection mechanism which includes analysis of triple pattern relations and links between datasets besides analyzing datasets for each triple pattern. WoDQA focuses on discovering relevant datasets and eliminating irrelevant ones using a rule-based approach introduced in this paper. Our approach requires query the dataset, reflect actual content of the dataset completely and accurately, and include linksets between datasets to select datasets ectively. WoDQA allows users to construct raw queries without the need to know how query will divide into sub-queries and where sub-queries are executed. Query results are complete under the assumption of available, accurate and complete VOID descriptions of datasets. The initial version of WoDQA which is introduced in this paper has some disadvantages arising from query federation approach which WoDQA builds upon. As mentioned previously, follow-your-nose has some problems such as missing results and large document retrieval. Similar problems may occur for query federation. Firstly, to find complete results to queries, it is required that metadata of all datasets must be well-defined and accurate. But, to provide such an accurate dataset metadata an automated mechanism which continuously updates the metadata is required. However, even there would be a tool which implements this requirement, providing accurate dataset metadata via such a tool is the responsibility of dataset publishers.

Another problems of query federation are high latency and low selectivity of datasets which are similar to retrieval of large documents in follow-your-nose. Query optimization can be a solution for these problems of query federation. Grouping triple patterns to lter more triples on an endpoint can prevent high latency (required processing time) and changing query evaluation order according to dataset selectivity statistics can prevent retrieving large result sets. To make WoDQA functioning in the wild, optimization step of query federation is required to be implemented. We plan to incorporate triple pattern selectivity into query reorganization using VOID properties about statistics. On the other hand, we could not make an evaluation of our approach in this paper, since VOID documents in current VOID stores are not well-dened. Since SPARQL endpoint denitions, linkset descriptions or vocabularies are missing in most of VOID documents, we could not nd a chance to execute comprehensive scenarios. Developing a tool which extracts well-dened VOID descriptions of datasets, and by this means evaluating our approach is a required future work to confirm applicability of WoDQA on linked open data. Also, evaluating the analysis cost of WoDQA for a large VOID store will be possible when well-dened VOIDs are constructed.
ble when well-dened VOIDs are constructed. +
Has future workDeveloping a tool which extracts well-defi
Developing a tool which

extracts well-defined VOID descriptions of datasets, and by this means evaluating our approach is a required future work to confirm applicability of WoDQA on linked open data. Also, evaluating the analysis cost of WoDQA for a large VOID store will be possible when well-defined VOIDs are

constructed.
e when well-defined VOIDs are constructed. +
Has motivationNo data available now. +
Has platformJena +
Has problemSPARQL Query Federation +
Has relatedProblemMissing results and large document retrieval. +
Has vendorOpen source +
Has year2012 +
ImplementedIn ProgLangJava +
Proposes Algorithm{{{ProposesAlgorithm}}} +
RunsOn OSOS independent +
TitleQuerying the Web of Interlinked Datasets using VOID Descriptions +
Uses Framework{{{Framework}}} +
Uses Methodology{{{Methodology}}} +
Uses Toolbox- +