LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data

From Openresearch
Jump to: navigation, search
LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data
LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data
Bibliographical Metadata
Subject: Link Discovery
Keywords: Linked Data, Web of Data, Link Discovery, Record Linkage, Duplicate Detection, Instance-Based Matching
Year: 2011
Authors: Axel-Cyrille Ngonga Ngomo, Sören Auer
Content Metadata
Problem: Link Discovery
Approach: Mathematical characteristics of metric spaces
Implementation: LIMES
Evaluation: Performance Analysis

Abstract

The Linked Data paradigm has evolved into a powerful enabler for the transition from the document-oriented Web into the Semantic Web. While the amount of data published as Linked Data grows steadily and has surpassed 25 billion triples, less than 5% of these triples are links between knowledge bases. Link discovery frameworks provide the functionality necessary to discover missing links between knowledge bases in a semi-automatic fashion. Yet, the task of linking knowledge bases requires a significant amount of time, especially when it is carried out on large data sets. This paper presents and evaluates LIMES - a novel time-efficient approach for link discovery in metric spaces. Our approach utilizes the mathematical characteristics of metric spaces to compute estimates of the similarity between instances. These estimates are then used to filter out a large amount of those instance pairs that do not suffice the mapping conditions. Thus, LIMES can reduce the number of comparisons needed during the mapping process by several orders of magnitude. We present the mathematical foundation and the core algorithms employed in the implementation. We evaluate LIMES with synthetic data to elucidate its behavior on small and large data sets with different configurations and show that our approach can significantly reduce the time complexity of a mapping task. In addition, we compare the runtime of our framework with a state-oft heart link discovery tool. We show that LIMES is more than 60 times faster when mapping large knowledge bases.

Conclusion

We presented the LIMES framework, which implements a very time-efficient approach for the discovery of links between knowledge bases on the Linked Data Web. We evaluated our approach both with synthetic and real data and showed that it outperforms state-of-the-art approaches with respect to the number of comparisons and runtime. In particular, we showed that the speedup of our approach grows with the a-priori time complexity of the mapping task, making our framework especially suitable for handling large-scale matching tasks (cf. results of the SimCities experiment).

Future work

We aim to explore the combination of LIMES with active learning strategies in a way, that a manual configuration of the tool becomes unnecessary. Instead, matching results will be computed quickly by using the exemplars in both the source and target knowledge bases. Subsequently, they will be presented to the user who will give feedback to the system by rating the quality of found matches. This feedback in turn will be employed for improving the matching configuration and to generate a revised list of matching suggestions to the user. This iterative process will be continued until a sufficiently high quality (in terms of precision and recall) of matches is reached.

Approach

Positive Aspects: No data available now.

Negative Aspects: No data available now.

Limitations: No data available now.

Challenges: No data available now.

Proposes Algorithm: No data available now.

Methodology: No data available now.

Requirements: No data available now.

Limitations: No data available now.

Implementations

Download-page: http://limes.sf.net

Access API: No data available now.

Information Representation: No data available now.

Data Catalogue: {{{Catalogue}}}

Runs on OS: No data available now.

Vendor: Open Source

Uses Framework: No data available now.

Has Documentation URL: http://limes.sf.net

Programming Language: Java

Version: No data available now.

Platform: No data available now.

Toolbox: No data available now.

GUI: No

Research Problem

Subproblem of: No data available now.

RelatedProblem: No data available now.

Motivation: No data available now.

Evaluation

Experiment Setup: No data available now.

Evaluation Method : Compare LIMES with different numbers of exemplars on knowledge bases of different sizes.

Hypothesis: No data available now.

Description: No data available now.

Dimensions: Performance

Benchmark used: DBpedia, DrugBank, LinkedCT, MESH

Results: LIMES outperforms SILK in all experimental settings. It is important to notice that the difference in performance grows with the (product of the) size of the source and target knowledge bases.

Access APINo data available now. +
Has BenchmarkDBpedia +, DrugBank +, LinkedCT + and MESH +
Has ChallengesNo data available now. +
Has DataCatalouge{{{Catalogue}}} +
Has DescriptionNo data available now. +
Has DimensionsPerformance +
Has DocumentationURLhttp://limes.sf.net +
Has Downloadpagehttp://limes.sf.net +
Has EvaluationPerformance Analysis +
Has EvaluationMethodCompare LIMES with different numbers of exemplars on knowledge bases of different sizes. +
Has ExperimentSetupNo data available now. +
Has GUINo +
Has HypothesisNo data available now. +
Has ImplementationLIMES +
Has InfoRepresentationNo data available now. +
Has LimitationsNo data available now. +
Has NegativeAspectsNo data available now. +
Has PositiveAspectsNo data available now. +
Has RequirementsNo data available now. +
Has ResultsLIMES outperforms SILK in all experimental settings. It is important to notice that the difference in performance grows with the (product of the) size of the source and target knowledge bases. +
Has SubproblemNo data available now. +
Has VersionNo data available now. +
Has abstractThe Linked Data paradigm has evolved into
The Linked Data paradigm has evolved into a powerful enabler for the transition from the document-oriented Web into the Semantic Web. While the amount of data published as Linked Data grows steadily and has surpassed 25 billion triples, less than 5% of these triples are links between knowledge bases. Link discovery frameworks provide the functionality necessary to discover missing links between knowledge bases in a semi-automatic fashion. Yet, the task of linking knowledge bases requires a significant amount of time, especially when it is carried out on large data sets. This paper presents and evaluates LIMES - a novel time-efficient approach for link discovery in metric spaces. Our

approach utilizes the mathematical characteristics of metric spaces to compute estimates of the similarity between instances. These estimates are then used to filter out a large amount of those instance pairs that do not suffice the mapping conditions. Thus, LIMES can reduce the number of comparisons needed during the mapping process by several orders of magnitude. We present the mathematical foundation and the core algorithms employed in the implementation. We evaluate LIMES with synthetic data to elucidate its behavior on small and large data sets with different configurations and show that our approach can significantly reduce the time complexity of a mapping task. In addition,

we compare the runtime of our framework with a state-oft heart link discovery tool. We show that LIMES is more than 60 times faster when mapping large knowledge bases.
faster when mapping large knowledge bases. +
Has approachMathematical characteristics of metric spaces +
Has authorsAxel-Cyrille Ngonga Ngomo + and Sören Auer +
Has conclusionWe presented the LIMES framework, which im
We presented the LIMES framework, which implements a very time-efficient approach for the discovery of links between knowledge bases on the Linked Data Web. We evaluated our approach both with synthetic and real data and showed that it outperforms state-of-the-art approaches with respect to the number of comparisons and runtime. In particular, we showed that the speedup of our approach grows with the a-priori time complexity of the mapping task, making our framework especially suitable for handling large-scale matching tasks (cf. results of the SimCities experiment).
(cf. results of the SimCities experiment). +
Has future workWe aim to explore the combination of LIMES
We aim to explore the combination of LIMES with active learning strategies in a way, that a manual configuration of the tool becomes unnecessary. Instead, matching results will be computed quickly by using the exemplars in both the source and target knowledge bases. Subsequently, they will be presented to the user who will give feedback to the system by rating the quality of found matches. This feedback in turn will be employed for improving the matching configuration and to generate a revised list of matching suggestions to the user. This iterative process will be continued until a sufficiently high quality (in terms of precision and recall) of matches is reached.
ecision and recall) of matches is reached. +
Has keywordsLinked Data, Web of Data, Link Discovery, Record Linkage, Duplicate Detection, Instance-Based Matching +
Has motivationNo data available now. +
Has platformNo data available now. +
Has problemLink Discovery +
Has relatedProblemNo data available now. +
Has subjectLink Discovery +
Has vendorOpen Source +
Has year2011 +
ImplementedIn ProgLangJava +
Proposes AlgorithmNo data available now. +
RunsOn OSNo data available now. +
TitleLIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data +
Uses FrameworkNo data available now. +
Uses MethodologyNo data available now. +
Uses ToolboxNo data available now. +