LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data
LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data | |
---|---|
LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data
| |
Bibliographical Metadata | |
Subject: | Link Discovery |
Keywords: | Linked Data, Web of Data, Link Discovery, Record Linkage, Duplicate Detection, Instance-Based Matching |
Year: | 2011 |
Authors: | Axel-Cyrille Ngonga Ngomo, Sören Auer |
Venue | IJCAI |
Content Metadata | |
Problem: | No data available now. |
Approach: | No data available now. |
Implementation: | No data available now. |
Evaluation: | No data available now. |
Contents
Abstract
The Linked Data paradigm has evolved into a powerful enabler for the transition from the document-oriented Web into the Semantic Web. While the amount of data published as Linked Data grows steadily and has surpassed 25 billion triples, less than 5% of these triples are links between knowledge bases. Link discovery frameworks provide the functionality necessary to discover missing links between knowledge bases in a semi-automatic fashion. Yet, the task of linking knowledge bases requires a significant amount of time, especially when it is carried out on large data sets. This paper presents and evaluates LIMES - a novel time-efficient approach for link discovery in metric spaces. Our approach utilizes the mathematical characteristics of metric spaces to compute estimates of the similarity between instances. These estimates are then used to filter out a large amount of those instance pairs that do not suffice the mapping conditions. Thus, LIMES can reduce the number of comparisons needed during the mapping process by several orders of magnitude. We present the mathematical foundation and the core algorithms employed in the implementation. We evaluate LIMES with synthetic data to elucidate its behavior on small and large data sets with different configurations and show that our approach can significantly reduce the time complexity of a mapping task. In addition, we compare the runtime of our framework with a state-oft heart link discovery tool. We show that LIMES is more than 60 times faster when mapping large knowledge bases.
Conclusion
We presented the LIMES framework, which implements a very time-efficient approach for the discovery of links between knowledge bases on the Linked Data Web. We evaluated our approach both with synthetic and real data and showed that it outperforms state-of-the-art approaches with respect to the number of comparisons and runtime. In particular, we showed that the speedup of our approach grows with the a-priori time complexity of the mapping task, making our framework especially suitable for handling large-scale matching tasks (cf. results of the SimCities experiment).
Future work
We aim to explore the combination of LIMES with active learning strategies in a way, that a manual configuration of the tool becomes unnecessary. Instead, matching results will be computed quickly by using the exemplars in both the source and target knowledge bases. Subsequently, they will be presented to the user who will give feedback to the system by rating the quality of found matches. This feedback in turn will be employed for improving the matching configuration and to generate a revised list of matching suggestions to the user. This iterative process will be continued until a sufficiently high quality (in terms of precision and recall) of matches is reached.
Approach
Positive Aspects: No data available now.
Negative Aspects: No data available now.
Limitations: No data available now.
Challenges: No data available now.
Proposes Algorithm: No data available now.
Methodology: No data available now.
Requirements: No data available now.
Limitations: No data available now.
Implementations
Download-page: No data available now.
Access API: No data available now.
Information Representation: No data available now.
Data Catalogue: {{{Catalogue}}}
Runs on OS: No data available now.
Vendor: No data available now.
Uses Framework: No data available now.
Has Documentation URL: No data available now.
Programming Language: No data available now.
Version: No data available now.
Platform: No data available now.
Toolbox: No data available now.
GUI: No
Research Problem
Subproblem of: No data available now.
RelatedProblem: No data available now.
Motivation: No data available now.
Evaluation
Experiment Setup: No data available now.
Evaluation Method : No data available now.
Hypothesis: No data available now.
Description: No data available now.
Dimensions: {{{Dimensions}}}
Benchmark used: No data available now.
Results: No data available now.
Access API | No data available now. + |
Event in series | IJCAI + |
Has Benchmark | No data available now. + |
Has Challenges | No data available now. + |
Has DataCatalouge | {{{Catalogue}}} + |
Has Description | No data available now. + |
Has Dimensions | {{{Dimensions}}} + |
Has DocumentationURL | http://No data available now. + |
Has Downloadpage | http://No data available now. + |
Has Evaluation | No data available now. + |
Has EvaluationMethod | No data available now. + |
Has ExperimentSetup | No data available now. + |
Has GUI | No + |
Has Hypothesis | No data available now. + |
Has Implementation | No data available now. + |
Has InfoRepresentation | No data available now. + |
Has Limitations | No data available now. + |
Has NegativeAspects | No data available now. + |
Has PositiveAspects | No data available now. + |
Has Requirements | No data available now. + |
Has Results | No data available now. + |
Has Subproblem | No data available now. + |
Has Version | No data available now. + |
Has abstract | The Linked Data paradigm has evolved into … The Linked Data paradigm has evolved into a powerful enabler for the transition from the document-oriented Web into the Semantic Web. While the amount of data published as Linked Data grows steadily and has surpassed 25 billion triples, less than 5% of these triples are links between knowledge bases. Link discovery frameworks provide the functionality necessary to discover missing links between knowledge bases in a semi-automatic fashion. Yet, the task of linking knowledge bases requires a significant amount of time, especially when it is carried out on large data sets. This paper presents and evaluates LIMES - a novel time-efficient approach for link discovery in metric spaces. Our
faster when mapping large knowledge bases. +approach utilizes the mathematical characteristics of metric spaces to compute estimates of the similarity between instances. These estimates are then used to filter out a large amount of those instance pairs that do not suffice the mapping conditions. Thus, LIMES can reduce the number of comparisons needed during the mapping process by several orders of magnitude. We present the mathematical foundation and the core algorithms employed in the implementation. We evaluate LIMES with synthetic data to elucidate its behavior on small and large data sets with different configurations and show that our approach can significantly reduce the time complexity of a mapping task. In addition, we compare the runtime of our framework with a state-oft heart link discovery tool. We show that LIMES is more than 60 times faster when mapping large knowledge bases. |
Has approach | No data available now. + |
Has authors | Axel-Cyrille Ngonga Ngomo + and Sören Auer + |
Has conclusion | We presented the LIMES framework, which im … We presented the LIMES framework, which implements a very time-efficient approach for the discovery of links between knowledge bases on the Linked Data Web. We evaluated our approach both with synthetic and real data and showed that it outperforms state-of-the-art approaches with respect to the number of comparisons and runtime. In particular, we showed that the speedup of our approach grows with the a-priori time complexity of the mapping task, making our framework especially suitable for handling large-scale matching tasks (cf. results of the SimCities experiment). (cf. results of the SimCities experiment). + |
Has future work | We aim to explore the combination of LIMES … We aim to explore the combination of LIMES with active learning strategies in a way, that a manual configuration of the tool becomes unnecessary. Instead, matching results will be computed quickly by using the exemplars in both the source and target knowledge bases. Subsequently, they will be presented to the user who will give feedback to the system by rating the quality of found matches. This feedback in turn will be employed for improving the matching configuration and to generate a revised list of matching suggestions to the user. This iterative process will be continued until a sufficiently high quality (in terms of precision and recall) of matches is reached. ecision and recall) of matches is reached. + |
Has keywords | Linked Data, Web of Data, Link Discovery, Record Linkage, Duplicate Detection, Instance-Based Matching + |
Has motivation | No data available now. + |
Has platform | No data available now. + |
Has problem | No data available now. + |
Has relatedProblem | No data available now. + |
Has subject | Link Discovery + |
Has vendor | No data available now. + |
Has year | 2011 + |
ImplementedIn ProgLang | No data available now. + |
Proposes Algorithm | No data available now. + |
RunsOn OS | No data available now. + |
Title | LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data + |
Uses Framework | No data available now. + |
Uses Methodology | No data available now. + |
Uses Toolbox | No data available now. + |