SERIMI – Resource Description Similarity, RDF Instance Matching and Interlinking

From Openresearch
Revision as of 15:35, 27 June 2018 by Said (talk | contribs) (Created page with "{{Paper |Title=SERIMI – Resource Description Similarity, RDF Instance Matching and Interlinking |Subject=Ontology matching |Authors=Samur Araujo, Jan Hidders, Daniel Schwabe...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
SERIMI – Resource Description Similarity, RDF Instance Matching and Interlinking
SERIMI – Resource Description Similarity, RDF Instance Matching and Interlinking
Bibliographical Metadata
Subject: Ontology matching
Keywords: data integration, RDF interlinking, instance matching, linked data, entity recognition, entity search.
Year: 2011
Authors: Samur Araujo, Jan Hidders, Daniel Schwabe, Arjen P. de Vries, Abraham Bernstein
Venue ArXiv
Content Metadata
Problem: No data available now.
Approach: No data available now.
Implementation: No data available now.
Evaluation: No data available now.

Abstract

The interlinking of datasets published in the Linked Data Cloud is a challenging problem and a key factor for the success of the Semantic Web. Manual rule-based methods are the most effective solution for the problem, but they require skilled human data publishers going through a laborious, error prone and time-consuming process for manually describing rules mapping instances between two datasets. Thus, an automatic approach for solving this problem is more than welcome. In this paper, we propose a novel interlinking method, SERIMI, for solving this problem automatically. SERIMI matches instances between a source and a target datasets, without prior knowledge of the data, domain or schema of these datasets. Experiments conducted with benchmark collections demonstrate that our approach considerably outperforms state-of-the-art automatic approaches for solving the interlinking problem on the Linked Data Cloud.

Conclusion

RDF instance matching in the context of interlinking RDF datasets published in the Linked Data Cloud is the task of determining if two resources are referred to the same entity in the real world. This is a challenging task in high demand by data publishers that wish to interlink their datasets in the cloud. In this work, we propose a novel approach, called SERIMI, for solving the RDF instance-matching problem automatically. SERIMI matches instances between a source and target datasets, without prior knowledge of the data, domain or schema of these datasets. It does so by approximating the notion of similarity by pairing instances based on entity labels as well as structural (ontological) context. As part of the SERIMI approach, we proposed the CRDS function to approximate that judgment of similarity. We used two collections proposed by the OAEI 2010 initiative to evaluate SERIMI. On average, SERIMI outperforms two representative systems, RiMOM and ObjectCoref, which tried to solve the same problem using the same collections and reference alignment, in 70% of the cases.

Future work

As future work, we intend to investigate how our model can be adjusted to consider partial string matching in the similarity function that we proposed, and to accommodate different score distribution metrics as the threshold for the parameter Also, we intend to evaluate this approach in different collections that may provide a more accurate reference alignment than the ones that we used in this work.

Approach

Positive Aspects: No data available now.

Negative Aspects: No data available now.

Limitations: No data available now.

Challenges: No data available now.

Proposes Algorithm: No data available now.

Methodology: No data available now.

Requirements: No data available now.

Limitations: No data available now.

Implementations

Download-page: No data available now.

Access API: No data available now.

Information Representation: No data available now.

Data Catalogue: {{{Catalogue}}}

Runs on OS: No data available now.

Vendor: No data available now.

Uses Framework: No data available now.

Has Documentation URL: No data available now.

Programming Language: No data available now.

Version: No data available now.

Platform: No data available now.

Toolbox: No data available now.

GUI: No

Research Problem

Subproblem of: No data available now.

RelatedProblem: No data available now.

Motivation: No data available now.

Evaluation

Experiment Setup: No data available now.

Evaluation Method : No data available now.

Hypothesis: No data available now.

Description: No data available now.

Dimensions: {{{Dimensions}}}

Benchmark used: No data available now.

Results: No data available now.

Access APINo data available now. +
Event in seriesArXiv +
Has BenchmarkNo data available now. +
Has ChallengesNo data available now. +
Has DataCatalouge{{{Catalogue}}} +
Has DescriptionNo data available now. +
Has Dimensions{{{Dimensions}}} +
Has DocumentationURLhttp://No data available now. +
Has Downloadpagehttp://No data available now. +
Has EvaluationNo data available now. +
Has EvaluationMethodNo data available now. +
Has ExperimentSetupNo data available now. +
Has GUINo +
Has HypothesisNo data available now. +
Has ImplementationNo data available now. +
Has InfoRepresentationNo data available now. +
Has LimitationsNo data available now. +
Has NegativeAspectsNo data available now. +
Has PositiveAspectsNo data available now. +
Has RequirementsNo data available now. +
Has ResultsNo data available now. +
Has SubproblemNo data available now. +
Has VersionNo data available now. +
Has abstractThe interlinking of datasets published in
The interlinking of datasets published in the Linked Data Cloud is a challenging problem and a key factor for the success of the Semantic Web. Manual rule-based methods are the most effective solution for the problem, but they require skilled human data publishers going through a laborious, error prone and time-consuming process for manually describing rules mapping instances between two datasets. Thus, an automatic approach for solving this problem is more than welcome. In this paper, we propose a novel interlinking method, SERIMI, for solving this problem automatically. SERIMI matches instances between a source and a target datasets, without prior knowledge of the data, domain or schema of these datasets. Experiments conducted with benchmark collections demonstrate that our approach considerably outperforms state-of-the-art automatic approaches for solving the interlinking problem on the Linked Data Cloud.
rlinking problem on the Linked Data Cloud. +
Has approachNo data available now. +
Has authorsSamur Araujo +, Jan Hidders +, Daniel Schwabe +, Arjen P. de Vries + and Abraham Bernstein +
Has conclusionRDF instance matching in the context of in
RDF instance matching in the context of interlinking RDF datasets published in the Linked Data Cloud is the task of determining if two resources are referred to the same entity in the real world. This is a challenging task in high demand by data publishers that wish to interlink their datasets in the cloud.

In this work, we propose a novel approach, called SERIMI, for solving the RDF instance-matching problem automatically. SERIMI matches instances between a source and target datasets, without prior knowledge of the data, domain or schema of these datasets. It does so by approximating the notion of similarity by pairing instances based on entity labels as well as structural (ontological) context. As part of the SERIMI approach, we proposed the CRDS function to approximate that judgment of similarity.

We used two collections proposed by the OAEI 2010 initiative to evaluate SERIMI. On average, SERIMI outperforms two representative systems, RiMOM and ObjectCoref, which tried to solve the same problem using the same collections and reference alignment, in 70% of the cases.
reference alignment, in 70% of the cases. +
Has future workAs future work, we intend to investigate h
As future work, we intend to investigate how our model can be adjusted to consider partial string matching in the similarity function that we proposed, and to accommodate different score distribution metrics as the threshold for the parameter Also, we intend to evaluate this approach in different collections that may provide a more accurate reference alignment than the ones that we used in this work.
t than the ones that we used in this work. +
Has keywordsdata integration, RDF interlinking, instance matching, linked data, entity recognition, entity search. +
Has motivationNo data available now. +
Has platformNo data available now. +
Has problemNo data available now. +
Has relatedProblemNo data available now. +
Has subjectOntology matching +
Has vendorNo data available now. +
Has year2011 +
ImplementedIn ProgLangNo data available now. +
Proposes AlgorithmNo data available now. +
RunsOn OSNo data available now. +
TitleSERIMI – Resource Description Similarity, RDF Instance Matching and Interlinking +
Uses FrameworkNo data available now. +
Uses MethodologyNo data available now. +
Uses ToolboxNo data available now. +