LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data

LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data
LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data
Bibliographical Metadata
Subject:	Link Discovery
Keywords:	Linked Data, Web of Data, Link Discovery, Record Linkage, Duplicate Detection, Instance-Based Matching
Year:	2011
Authors:	Axel-Cyrille Ngonga Ngomo, Sören Auer
Venue	IJCAI
Content Metadata
Problem:	No data available now.
Approach:	No data available now.
Implementation:	No data available now.
Evaluation:	No data available now.

Abstract

The Linked Data paradigm has evolved into a powerful enabler for the transition from the document-oriented Web into the Semantic Web. While the amount of data published as Linked Data grows steadily and has surpassed 25 billion triples, less than 5% of these triples are links between knowledge bases. Link discovery frameworks provide the functionality necessary to discover missing links between knowledge bases in a semi-automatic fashion. Yet, the task of linking knowledge bases requires a significant amount of time, especially when it is carried out on large data sets. This paper presents and evaluates LIMES - a novel time-efficient approach for link discovery in metric spaces. Our approach utilizes the mathematical characteristics of metric spaces to compute estimates of the similarity between instances. These estimates are then used to filter out a large amount of those instance pairs that do not suffice the mapping conditions. Thus, LIMES can reduce the number of comparisons needed during the mapping process by several orders of magnitude. We present the mathematical foundation and the core algorithms employed in the implementation. We evaluate LIMES with synthetic data to elucidate its behavior on small and large data sets with different configurations and show that our approach can significantly reduce the time complexity of a mapping task. In addition, we compare the runtime of our framework with a state-oft heart link discovery tool. We show that LIMES is more than 60 times faster when mapping large knowledge bases.

Conclusion

We presented the LIMES framework, which implements a very time-efficient approach for the discovery of links between knowledge bases on the Linked Data Web. We evaluated our approach both with synthetic and real data and showed that it outperforms state-of-the-art approaches with respect to the number of comparisons and runtime. In particular, we showed that the speedup of our approach grows with the a-priori time complexity of the mapping task, making our framework especially suitable for handling large-scale matching tasks (cf. results of the SimCities experiment).

Future work

We aim to explore the combination of LIMES with active learning strategies in a way, that a manual configuration of the tool becomes unnecessary. Instead, matching results will be computed quickly by using the exemplars in both the source and target knowledge bases. Subsequently, they will be presented to the user who will give feedback to the system by rating the quality of found matches. This feedback in turn will be employed for improving the matching configuration and to generate a revised list of matching suggestions to the user. This iterative process will be continued until a sufficiently high quality (in terms of precision and recall) of matches is reached.

Approach

Positive Aspects: No data available now.

Negative Aspects: No data available now.

Limitations: No data available now.

Challenges: No data available now.

Proposes Algorithm: No data available now.

Methodology: No data available now.

Requirements: No data available now.

Limitations: No data available now.

Implementations

Download-page: No data available now.

Access API: No data available now.

Information Representation: No data available now.

Data Catalogue: {{{Catalogue}}}

Runs on OS: No data available now.

Vendor: No data available now.

Uses Framework: No data available now.

Has Documentation URL: No data available now.

Programming Language: No data available now.

Version: No data available now.

Platform: No data available now.

Toolbox: No data available now.

GUI: No

Research Problem

Subproblem of: No data available now.

RelatedProblem: No data available now.

Motivation: No data available now.

Evaluation

Experiment Setup: No data available now.

Evaluation Method : No data available now.

Hypothesis: No data available now.

Description: No data available now.

Dimensions: {{{Dimensions}}}

Benchmark used: No data available now.

Results: No data available now.

Access API	No data available now. +
Event in series	IJCAI +
Has Benchmark	No data available now. +
Has Challenges	No data available now. +
Has DataCatalouge	{{{Catalogue}}} +
Has Description	No data available now. +
Has Dimensions	{{{Dimensions}}} +
Has DocumentationURL	http://No data available now. +
Has Downloadpage	http://No data available now. +
Has Evaluation	No data available now. +
Has EvaluationMethod	No data available now. +
Has ExperimentSetup	No data available now. +
Has GUI	No +
Has Hypothesis	No data available now. +
Has Implementation	No data available now. +
Has InfoRepresentation	No data available now. +
Has Limitations	No data available now. +
Has NegativeAspects	No data available now. +
Has PositiveAspects	No data available now. +
Has Requirements	No data available now. +
Has Results	No data available now. +
Has Subproblem	No data available now. +
Has Version	No data available now. +
Has abstract	The Linked Data paradigm has evolved into … The Linked Data paradigm has evolved into a powerful enabler for the transition from the document-oriented Web into the Semantic Web. While the amount of data published as Linked Data grows steadily and has surpassed 25 billion triples, less than 5% of these triples are links between knowledge bases. Link discovery frameworks provide the functionality necessary to discover missing links between knowledge bases in a semi-automatic fashion. Yet, the task of linking knowledge bases requires a significant amount of time, especially when it is carried out on large data sets. This paper presents and evaluates LIMES - a novel time-efficient approach for link discovery in metric spaces. Our approach utilizes the mathematical characteristics of metric spaces to compute estimates of the similarity between instances. These estimates are then used to filter out a large amount of those instance pairs that do not suffice the mapping conditions. Thus, LIMES can reduce the number of comparisons needed during the mapping process by several orders of magnitude. We present the mathematical foundation and the core algorithms employed in the implementation. We evaluate LIMES with synthetic data to elucidate its behavior on small and large data sets with different configurations and show that our approach can significantly reduce the time complexity of a mapping task. In addition, we compare the runtime of our framework with a state-oft heart link discovery tool. We show that LIMES is more than 60 times faster when mapping large knowledge bases. faster when mapping large knowledge bases. +
Has approach	No data available now. +
Has authors	Axel-Cyrille Ngonga Ngomo + and Sören Auer +
Has conclusion	We presented the LIMES framework, which im … We presented the LIMES framework, which implements a very time-efficient approach for the discovery of links between knowledge bases on the Linked Data Web. We evaluated our approach both with synthetic and real data and showed that it outperforms state-of-the-art approaches with respect to the number of comparisons and runtime. In particular, we showed that the speedup of our approach grows with the a-priori time complexity of the mapping task, making our framework especially suitable for handling large-scale matching tasks (cf. results of the SimCities experiment). (cf. results of the SimCities experiment). +
Has future work	We aim to explore the combination of LIMES … We aim to explore the combination of LIMES with active learning strategies in a way, that a manual configuration of the tool becomes unnecessary. Instead, matching results will be computed quickly by using the exemplars in both the source and target knowledge bases. Subsequently, they will be presented to the user who will give feedback to the system by rating the quality of found matches. This feedback in turn will be employed for improving the matching configuration and to generate a revised list of matching suggestions to the user. This iterative process will be continued until a sufficiently high quality (in terms of precision and recall) of matches is reached. ecision and recall) of matches is reached. +
Has keywords	Linked Data, Web of Data, Link Discovery, Record Linkage, Duplicate Detection, Instance-Based Matching +
Has motivation	No data available now. +
Has platform	No data available now. +
Has problem	No data available now. +
Has relatedProblem	No data available now. +
Has subject	Link Discovery +
Has vendor	No data available now. +
Has year	2011 +
ImplementedIn ProgLang	No data available now. +
Proposes Algorithm	No data available now. +
RunsOn OS	No data available now. +
Title	LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data +
Uses Framework	No data available now. +
Uses Methodology	No data available now. +
Uses Toolbox	No data available now. +

LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data

Contents

Abstract

Conclusion

Future work

Approach

Implementations

Research Problem

Evaluation

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Search

Create

Data

Kuratierung

Tools