Search by property

Jump to: navigation, search

This page provides a simple browsing interface for finding entities described by a property and a named value. Other available search interfaces include the page property search, and the ask query builder.

Search by property

A list of all pages that have property "Has future work" with value "The Avalanche system has shown how a completely heterogeneous distributed query engine that makes no assumptions about data distribution could be implemented. The current approach does have a number of limitations. In particular, we need to better understand the employed objective functions for the planner, investigate if the requirements put on participating triple-stores are reasonable, explore if Avalanche can be changed to a stateless model, and empirically evaluate if the approach truly scales to large number of hosts. Here we discuss each of these issues in turn. The core optimization of the Avalanche system lies in its cost and utility function. The basic utility function only considers possible joins with no information regarding the probability of the respective join. The proposed utility extension UE estimates the join probability of two highly selective molecules. Although this improves the accuracy of the objective function, its limitation to highly selective molecules is often impractical, as many queries (such as our example query) combine highly selective molecules with non-selective ones. Hence, we need to find a probabilistic distributed join cardinality estimation for low selectivity molecules. One approach might be the usage of bloom-filter caches to store precomputed, “popular” estimates. Another might be investigating sampling techniques for distributed join estimation. In order to support Avalanche existing triple-stores should be able to: – report statistics: cardinalities, bloom filters, other future extensions – support the execution of distributed joins (common in distributed databases), which could be delegated to an intermediary but would be inefficient – share the same key space (can be URIs but would result in bandwidth intensive joins and merges) Whilst these requirements seem simple we need to investigate how complex these extensions of triple-stores are in practice. Even better would be an extension of the SPARQL standard with the above-mentioned operations, which we will attempt to propose. The current Avalanche process assumes that hosts keep partial results throughout plan execution to reduce the cost of local database operations and that result-views are kept for the duration of a query. This limits the number of queries a host can handle. We intend to investigate if a stateless approach is feasible. Note that the simple approach—the use of REST-full services—may not be applicable as the size of the state (i.e., the partial results) may be huge and overburden the available bandwidth. We designed Avalanche with the need for high scalability in mind. The core idea follows the principle of decentralization. It also supports asynchrony using asynchronous HTTP requests to avoid blocking, autonomy by delegating the coordination and execution of the distributed join/update/merge operations to the hosts, concurrency through the pipeline shown in Figure 1, symmetry by allowing each endpoint to act as the initiating Avalanche node for a query caller, and fault tolerance through a number of time-outs and stopping conditions. Nonetheless, an empirical evaluation of Avalanche with a large number of hosts is still missing—a non-trivial shortcoming (due to the lack of suitable, partitioned datasets and the significant experimental complexity) we intend to address in the near future.". Since there have been only a few results, also nearby values are displayed.

Showing below up to 2 results starting with #1.

View (previous 50 | next 50) (20 | 50 | 100 | 250 | 500)


    

List of results

    • Avalanche: Putting the Spirit of the Web back into Semantic Web Querying  + (The Avalanche system has shown how a compl
      The Avalanche system has shown how a completely heterogeneous distributed query engine that makes no assumptions about data distribution could be implemented. The current approach does have a number of limitations. In particular, we need to better understand the employed objective functions for the planner, investigate if the requirements put on participating triple-stores are reasonable, explore if Avalanche can be changed to a stateless model, and empirically evaluate if the approach truly scales to large number of hosts. Here we discuss each of these issues in turn. The core optimization of the Avalanche system lies in its cost and utility function. The basic utility function only considers possible joins with no information regarding the probability of the respective join. The proposed utility extension UE estimates the join probability of two highly selective molecules. Although this improves the accuracy of the objective function, its limitation to highly selective molecules is often impractical, as many queries (such as our example query) combine highly selective molecules with non-selective ones. Hence, we need to find a probabilistic distributed join cardinality estimation for low selectivity molecules. One approach might be the usage of bloom-filter caches to store precomputed, “popular” estimates. Another might be investigating sampling techniques for distributed join estimation. In order to support Avalanche existing triple-stores should be able to: – report statistics: cardinalities, bloom filters, other future extensions – support the execution of distributed joins (common in distributed databases), which could be delegated to an intermediary but would be inefficient – share the same key space (can be URIs but would result in bandwidth intensive joins and merges) Whilst these requirements seem simple we need to investigate how complex these extensions of triple-stores are in practice. Even better would be an extension of the SPARQL standard with the above-mentioned operations, which we will attempt to propose. The current Avalanche process assumes that hosts keep partial results throughout plan execution to reduce the cost of local database operations and that result-views are kept for the duration of a query. This limits the number of queries a host can handle. We intend to investigate if a stateless approach is feasible. Note that the simple approach—the use of REST-full services—may not be applicable as the size of the state (i.e., the partial results) may be huge and overburden the available bandwidth. We designed Avalanche with the need for high scalability in mind. The core idea follows the principle of decentralization. It also supports asynchrony using asynchronous HTTP requests to avoid blocking, autonomy by delegating the coordination and execution of the distributed join/update/merge operations to the hosts, concurrency through the pipeline shown in Figure 1, symmetry by allowing each endpoint to act as the initiating Avalanche node for a query caller, and fault tolerance through a number of time-outs and stopping conditions. Nonetheless, an empirical evaluation of Avalanche with a large number of hosts is still missing—a non-trivial shortcoming (due to the lack of suitable, partitioned datasets and the significant experimental complexity) we intend to address in the near future.
      ) we intend to address in the near future.)