Querying the Web of Interlinked Datasets using VOID Descriptions

From Openresearch
Revision as of 13:42, 26 June 2018 by Said (talk | contribs)
Jump to: navigation, search
Querying the Web of Interlinked Datasets using VOID Descriptions
Querying the Web of Interlinked Datasets using VOID Descriptions
Bibliographical Metadata
Year: 2012
Authors: Ziya Akar, Tayfun Gökmen Halaç, Erdem Eser Ekinci, Oguz Dikenelli
Venue LDOW
Content Metadata
Implementation: WoDQA

Abstract

Query processing is an important way of accessing data on the Semantic Web. Today, the Semantic Web is characterized as a web of interlinked datasets, and thus querying the web can be seen as dataset integration on the web. Also, this dataset integration must be transparent from the data consumer as if she is querying the whole web. To decide which datasets should be selected and integrated for a query, one requires a metadata of the web of data. In this paper, to enable this transparency, we introduce a federated query engine called WoDQA (Web of Data Query Analyzer) which discovers datasets relevant with a query in an automated manner using VOID documents as metadata. WoDQA focuses on powerful dataset elimination by analyzing query structure with respect to the metadata of datasets. Dataset and linkset descriptions in VOID documents are analyzed for a SPARQL query and a federated query is constructed. By means of linkset concept of VOID, links between datasets are incorporated into selection of federated data sources. Current version ofWoDQA is available as a SPARQL endpoint.

Conclusion

In this paper, we have introduced a query federation engine called WoDQA that discovers related datasets in a VOID store for a query and distributes the query over these datasets. The novelty of our approach is exhaustive dataset selection mechanism which includes analysis of triple pattern relations and links between datasets besides analyzing datasets for each triple pattern. WoDQA focuses on discovering relevant datasets and eliminating irrelevant ones using a rule-based approach introduced in this paper. Our approach requires query the dataset, reflect actual content of the dataset completely and accurately, and include linksets between datasets to select datasets ectively. WoDQA allows users to construct raw queries without the need to know how query will divide into sub-queries and where sub-queries are executed. Query results are complete under the assumption of available, accurate and complete VOID descriptions of datasets. The initial version of WoDQA which is introduced in this paper has some disadvantages arising from query federation approach which WoDQA builds upon. As mentioned previously, follow-your-nose has some problems such as missing results and large document retrieval. Similar problems may occur for query federation. Firstly, to find complete results to queries, it is required that metadata of all datasets must be well-defined and accurate. But, to provide such an accurate dataset metadata an automated mechanism which continuously updates the metadata is required. However, even there would be a tool which implements this requirement, providing accurate dataset metadata via such a tool is the responsibility of dataset publishers. Another problems of query federation are high latency and low selectivity of datasets which are similar to retrieval of large documents in follow-your-nose. Query optimization can be a solution for these problems of query federation. Grouping triple patterns to lter more triples on an endpoint can prevent high latency (required processing time) and changing query evaluation order according to dataset selectivity statistics can prevent retrieving large result sets. To make WoDQA functioning in the wild, optimization step of query federation is required to be implemented. We plan to incorporate triple pattern selectivity into query reorganization using VOID properties about statistics. On the other hand, we could not make an evaluation of our approach in this paper, since VOID documents in current VOID stores are not well-dened. Since SPARQL endpoint denitions, linkset descriptions or vocabularies are missing in most of VOID documents, we could not nd a chance to execute comprehensive scenarios. Developing a tool which extracts well-dened VOID descriptions of datasets, and by this means evaluating our approach is a required future work to confirm applicability of WoDQA on linked open data. Also, evaluating the analysis cost of WoDQA for a large VOID store will be possible when well-dened VOIDs are constructed.

Future work

{{{Future work}}}

Approach

Positive Aspects: {{{PositiveAspects}}}

Negative Aspects: {{{NegativeAspects}}}

Limitations: {{{Limitations}}}

Challenges: {{{Challenges}}}

Proposes Algorithm: {{{ProposesAlgorithm}}}

Methodology: {{{Methodology}}}

Requirements: {{{Requirements}}}

Limitations: {{{Limitations}}}

Implementations

Download-page: {{{Download-page}}}

Access API: {{{API}}}

Information Representation: {{{InfoRepresentation}}}

Data Catalogue: {{{Catalogue}}}

Runs on OS: {{{OS}}}

Property "RunsOn OS" (as page type) with input value "{{{OS}}}" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

Vendor: {{{vendor}}}

Uses Framework: {{{Framework}}}

Has Documentation URL: {{{DocumentationURL}}}

Programming Language: {{{ProgLang}}}

Property "ImplementedIn ProgLang" (as page type) with input value "{{{ProgLang}}}" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

Version: {{{Version}}}

Platform: {{{Platform}}}

Toolbox: {{{Toolbox}}}

GUI: No

Research Problem

Subproblem of: {{{Subproblem}}}

Property "Has Subproblem" (as page type) with input value "{{{Subproblem}}}" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

RelatedProblem: {{{RelatedProblem}}}

Property "Has relatedProblem" (as page type) with input value "{{{RelatedProblem}}}" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

Motivation: {{{Motivation}}}

Evaluation

Experiment Setup: {{{ExperimentSetup}}}

Evaluation Method : {{{EvaluationMethod}}}

Hypothesis: {{{Hypothesis}}}

Description: {{{Description}}}

Dimensions: {{{Dimensions}}}

Benchmark used: {{{Benchmark}}}

Property "Has Benchmark" (as page type) with input value "{{{Benchmark}}}" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

Results: {{{Results}}}

Access API{{{API}}} +
Event in seriesLDOW +
Has Challenges{{{Challenges}}} +
Has DataCatalouge{{{Catalogue}}} +
Has Description{{{Description}}} +
Has Dimensions{{{Dimensions}}} +
Has DocumentationURLhttp://{{{DocumentationURL}}} +
Has Downloadpagehttp://{{{Download-page}}} +
Has EvaluationMethod{{{EvaluationMethod}}} +
Has ExperimentSetup{{{ExperimentSetup}}} +
Has GUINo +
Has Hypothesis{{{Hypothesis}}} +
Has ImplementationWoDQA +
Has InfoRepresentation{{{InfoRepresentation}}} +
Has Limitations{{{Limitations}}} +
Has NegativeAspects{{{NegativeAspects}}} +
Has PositiveAspects{{{PositiveAspects}}} +
Has Requirements{{{Requirements}}} +
Has Results{{{Results}}} +
Has Version{{{Version}}} +
Has abstractQuery processing is an important way of ac
Query processing is an important way of accessing data on the Semantic Web. Today, the Semantic Web is characterized as a web of interlinked datasets, and thus querying the

web can be seen as dataset integration on the web. Also, this dataset integration must be transparent from the data consumer as if she is querying the whole web. To decide which datasets should be selected and integrated for a query, one requires a metadata of the web of data. In this paper, to enable this transparency, we introduce a federated query engine called WoDQA (Web of Data Query Analyzer) which discovers datasets relevant with a query in an automated manner using VOID documents as metadata. WoDQA focuses on powerful dataset elimination by analyzing query structure with respect to the metadata of datasets. Dataset and linkset descriptions in VOID documents are analyzed for

a SPARQL query and a federated query is constructed. By means of linkset concept of VOID, links between datasets are incorporated into selection of federated data sources. Current version ofWoDQA is available as a SPARQL endpoint.
ofWoDQA is available as a SPARQL endpoint. +
Has authorsZiya Akar +, Tayfun Gökmen Halaç +, Erdem Eser Ekinci + and Oguz Dikenelli +
Has conclusionIn this paper, we have introduced a query
In this paper, we have introduced a query federation engine called WoDQA that discovers related datasets in a VOID store for a query and distributes the query over these datasets.

The novelty of our approach is exhaustive dataset selection mechanism which includes analysis of triple pattern relations and links between datasets besides analyzing datasets for each triple pattern. WoDQA focuses on discovering relevant datasets and eliminating irrelevant ones using a rule-based approach introduced in this paper. Our approach requires query the dataset, reflect actual content of the dataset completely and accurately, and include linksets between datasets to select datasets ectively. WoDQA allows users to construct raw queries without the need to know how query will divide into sub-queries and where sub-queries are executed. Query results are complete under the assumption of available, accurate and complete VOID descriptions of datasets. The initial version of WoDQA which is introduced in this paper has some disadvantages arising from query federation approach which WoDQA builds upon. As mentioned previously, follow-your-nose has some problems such as missing results and large document retrieval. Similar problems may occur for query federation. Firstly, to find complete results to queries, it is required that metadata of all datasets must be well-defined and accurate. But, to provide such an accurate dataset metadata an automated mechanism which continuously updates the metadata is required. However, even there would be a tool which implements this requirement, providing accurate dataset metadata via such a tool is the responsibility of dataset publishers.

Another problems of query federation are high latency and low selectivity of datasets which are similar to retrieval of large documents in follow-your-nose. Query optimization can be a solution for these problems of query federation. Grouping triple patterns to lter more triples on an endpoint can prevent high latency (required processing time) and changing query evaluation order according to dataset selectivity statistics can prevent retrieving large result sets. To make WoDQA functioning in the wild, optimization step of query federation is required to be implemented. We plan to incorporate triple pattern selectivity into query reorganization using VOID properties about statistics. On the other hand, we could not make an evaluation of our approach in this paper, since VOID documents in current VOID stores are not well-dened. Since SPARQL endpoint denitions, linkset descriptions or vocabularies are missing in most of VOID documents, we could not nd a chance to execute comprehensive scenarios. Developing a tool which extracts well-dened VOID descriptions of datasets, and by this means evaluating our approach is a required future work to confirm applicability of WoDQA on linked open data. Also, evaluating the analysis cost of WoDQA for a large VOID store will be possible when well-dened VOIDs are constructed.
ble when well-dened VOIDs are constructed. +
Has future work{{{Future work}}} +
Has motivation{{{Motivation}}} +
Has platform{{{Platform}}} +
Has vendor{{{vendor}}} +
Has year2012 +
Proposes Algorithm{{{ProposesAlgorithm}}} +
TitleQuerying the Web of Interlinked Datasets using VOID Descriptions +
Uses Framework{{{Framework}}} +
Uses Methodology{{{Methodology}}} +
Uses Toolbox{{{Toolbox}}} +