Events and Group Seminars
Here is collected the list of seminars of the Data and Information Management group that is part of IDI department. The following list start from July 2009.09 Mar 2011 | Dr. George Tsatsaronis A Maximum-Entropy Approach for Accurate Document Annotation in the Biomedical Domain +
The increasing number of scientific literature on the Internet and the
absence of efficient tools used for classifying and searching the
documents are the two most important factors that influence
the speed of the search and the quality of the results. Previous studies
have shown that the usage of ontologies makes it possible to process
document and query information at the semantic level,
which greatly improves the search for the relevant information and makes
one step further towards the Semantic Web. A fundamental step in these
approaches is the annotation of documents with ontology concepts, which
can also be seen as a classification task. In this work we address this
issue for the biomedical domain and present a new automated and robust
method, based on a Maximum Entropy approach, for annotating biomedical
literature documents with MeSH concepts, which provides very high
F-measure.
|
01 Mar 2011 | Dr. Kim Jin-Dong The Activities of DBCLS +
Database Center For Life Science (DBCLS) is a government-funded center in Japan with a mission for the integration of life science databases. In the talk, I am going to introduce activities of DBCLS, which ranges from DB hosting, integration, semantic web, NLP, to license issues, while seeking possible collaboration with NTNU.
Kim Jin-Dong has a Ph.D in the area of computer science, especially with NLP, from Korea
University in 2000. He is a Project Researcher and Lecturer in University of Tokyo from 2001 to 2010 and Project Associate Professor in DBCLS from 2010. In addition he is Co-Author of GENIA corpus and Co-organizer of BioNLP shared task.
|
11 Feb 2011 | Muhammad Ali Norozi Relevancy in Schema-Agnostic Environment +
Relevance is an important component in free text search and often
distinguishes the implementations. Relevancy is used to score matching
documents and rank them according to the users intent. One of the
reasons of the high popularity of Google is its good relevancy
originally based on the PageRank algorithm. The emergence of
semi-structured data as a standard for data representation opened up new
areas which could be related to both the data- base and information
retrieval communities. Although the information retrieval and database
viewpoints were, until quite recently irreconcilable, semi-structured
retrieval helped to bridge the gap. This work is about exploring
relevancy in semi-structured retrieval both in isolation and as bridge
between database and information retrieval communities.
|
08 Dec 2010 | Tanja Mercun Presentation of data and navigation in bibliographic information systems +
Inefficient, difficult to use, and out-dated user interfaces of
library catalogues and other bibliographic information systems have
continuously been criticized over the years and with a growing selection
of information sources and providers on the web, an increasing number of
users has started to bypass library systems when searching for
information. Exploring new ways to extract more value from library data
and improve library catalogues, we have chosen the implementation of
FRBR model as our central approach. FRBR model has great potential not
only for more effective cataloguing, but especially for end-user
oriented organization and display of records and search results,
navigation, and presentation of relationships.
In the presentation we will discuss our current work on possible uses of
FRBR in user interfaces of library catalogues that could improve the
findabilty of resources as well as support exploration and discovery.
|
03 Dec 2010 | Naimdjon Takhirov An XML-based representational document format for FRBR systems +
Metadata related to cultural items such as movies, books and music is
a valuable resource that currently is exploited in many applications
and services based on mashup and linked data. Unfortunately, existing
metadata formats do not have the semantics needed for versatile
integration and reuse of such information across domains and
applications. The conceptual model in the Functional Requirements for
Bibliographic Records is a major contribution towards a solution, but
the existing large body of legacy data makes a transition to this
model difficult. In this paper we present a format for exchange of
MARC-based information that makes the entities and relationships of
the FRBR model explicit. The main purpose of this format is to enable
the exchange of FRBR enriched MARC records while still maintaining
compatibility with MARC-based systems.
|
26 Nov 2010 | Massimiliano Ruocco Event Clusters Detection on Flickr Images using a Suffix-Tree Structure +
Image clustering is a problem that has been treated extensively in both
Content-Based (CBIR) and Text- Based (TBIR) Image Retrieval Systems. In
this paper, we propose a new image clustering approach that takes both
annotation, time and geographical position into account. Our goal is to
develop a clustering method that allows an image to be part of an event
cluster. We extend a well-known clustering algorithm called Suffix Tree
Clustering (STC), which was originally developed to cluster text
documents using a document snippet. To be able to use this algorithm, we
consider an image with annotation as a document. Then, we extend it to
also include time and geographical position. This appears to be
particularly useful on the images gathered from online photo-sharing
applications such as Flickr. Here image tags are often subjective and
incomplete. For this reason, clustering based on textual annotations
alone is not enough to capture all context information related to an
image. Our approach has been suggested to address this challenge. In
addition, we propose a novel algorithm to extract event clusters. The
algorithm is evaluated using an annotated dataset from Flickr, and a
comparison between different granularity of time and space is provided.
|
25 Nov 2010 | Krisztian Balog Entity Search +
We have come to depend on technological resources to create order and
find meaning in the ever-growing amount of online data. A large fraction
of (web) search queries concern named entities: persons, organizations,
locations, etc. These information needs are better answered by returning
specific objects instead of just any type of documents that merely
mention them.
In this talk I will briefly review my work on entity-oriented retrieval.
Starting with the task of finding people in organizational environments,
I will gradually expand the scope of the search both in terms of type
(from people to other types of entities) and scale (from intranet to
internet). To address these tasks I propose a probabilistic retrieval
framework based on statistical language modeling techniques. On top of
the basic layer of these solid text-based models, I will discuss how
top-down semantic information can be incorporated. Using standard data
sets from international evaluation campaigns, I will demonstrate that
the proposed approaches achieve state-of-the-art performance in terms of
effectiveness, while maintaining high efficiency.
|
19 Nov 2010 | Marek Ciglan Fast Detection of Size-Constrained Communities in Large Networks +
The community detection in networks is a prominent task in the graph
data mining, because of the rapid emergence of the graph data; e.g.,
information networks or social networks. In this paper, we propose a new
algorithm for detecting communities in networks. Our approach differs
from others in the ability of constraining the size of communities being
generated, a property important for a class of applications. In
addition, the algorithm is greedy in nature and belongs to a small
family of community detection algorithms with the pseudo-linear time
complexity, making it applicable also to large networks. The algorithm
is able to detect small-sized clusters independently of the network
size. It can be viewed as complementary approach to methods optimizing
modularity, which tend to increase the size of generated communities
with the increase of the network size. Extensive evaluation of the
algorithm on synthetic benchmark graphs for community detection showed
that the proposed approach is very competitive with state-of-the-art
methods, outperforming other approaches in some of the settings.
|
12 Nov 2010 | Simon Jonassen A Combined Semi-Pipelined Query Processing Architecture for Distributed Full-Text Retrieval +
Term-partitioning is an efficient way to distribute a large inverted
index. Two fundamentally different query processing approaches are
pipelined and non-pipelined. While the pipelined approach provides
higher query throughput, the non-pipelined approach provides shorter
query latency. In this work we propose a third alternative, combining
non-pipelined inverted index access, heuristic decision between
pipelined and non-pipelined query execution and an improved query
routing strategy. From our results, the method combines the advantages
of both approaches and provides high throughput and short query latency.
Our method increases the throughput by up to 26% compared to the
non-pipelined approach and reduces the latency by up to 32% compared to
the pipelined.
|
15 Oct 2010 | Joao da Rocha Junior On the Selectivity of Multidimensional Routing Indices +
Recently, the problem of efficiently supporting advanced query
operators, such as nearest neighbor or range queries, over
multidimensional data in widely distributed environments has attracted
much attention. In unstructured peer-to-peer (P2P) networks, peers store
data in an autonomous manner, thus multidimensional routing indices
(MRI) are required, in order to route user queries efficiently to only
those peers that may contribute to the query result set. Focusing on a
hybrid unstructured P2P network, in this paper, we analyze the
parameters for building MRI of high selectivity. In the case where
similar data are located at different parts of the network, MRI exhibit
extremely poor performance, which renders them ineffective. We present
algorithms that boost the query routing performance by detecting similar
peers and reassigning these peers to other parts of the hybrid network
in a distributed and scalable way. The resulting MRI are able to eagerly
discard routing paths during query processing. We demonstrate the
advantages of our approach experimentally and show that our framework
enhances a state-of-the-art approach for similarity search in terms of
reduced network traffic and number of contacted peers.
|
03 Oct 2010 | Nattiya Kanhabua Determining Time of Queries for Re-ranking Search Results + Recent work on analyzing query logs shows that a significant
fraction of queries are temporal, i.e., relevancy is dependent on time,
and temporal queries play an important role in many domains, e.g.,
digital libraries and document archives. Temporal queries can be divided
into two types: 1) those with temporal criteria explicitly provided by
users, and 2) those with no temporal criteria provided. In this paper,
we deal with the latter type of queries, i.e., queries that comprise
only keywords, and their relevant documents are associated to particular
time periods not given by the queries. We propose a number of methods to
determine the time of queries using temporal language models. After
that, we show how to increase the retrieval effectiveness by using the
determined time of queries to re-rank the search results. Through
extensive experiments we show that our proposed approaches improve
retrieval effectiveness.
|
27 Aug 2010 | Nils Grimsmo Fast Optimal Twig Joins + In XML search systems twig queries specify predicates on node values and
on the structural relationships between nodes, and a key operation is to
join individual query node matches into full twig matches. Linear time
twig join algorithms exist, but many non-optimal algorithms with better
average-case performance have been introduced recently. These use
somewhat simpler data
structures that are faster in practice, but have exponential worst-case
time complexity. In this paper we explore and extend the solution space
spanned by previous approaches. We introduce new data structures and
improved strategies for filtering out useless data nodes, yielding
combinations that are both worst-case optimal and faster in practice. An
experimental study shows that
our best algorithm outperforms previous approaches by an average factor
of three on common benchmarks. On queries with at least one unselective
leaf node, our algorithm can be an order of magnitude faster, and it is
never more than 20% slower on any tested benchmark query.
|
13 Aug 2010 | Georg Russ Spatial Data Mining in Precision Agriculture +
The talk will first shortly introduce us to the area of precision agriculture and high-resolution geodata, before covering two important tasks which are nowadays occurring in this area. One of those tasks is yield prediction, for which some of the issues with spatial data must be taken into account. For this purpose, a simple spatial cross-validation technique has been developed. The second task runs into the area of management zone delineation, which is the subdivision of an agriculture site into zones which should be managed differently with respect to fertilizer or pesticides, for example. A spatial clustering-based approach towards this non-trivial task will be presented.
|
15 Jun 2010 | Nattiya Kanhabua Exploiting Time-based Synonyms in Searching Document Archives +
Query expansion of named entities can be employed in order to increase the retrieval effectiveness. A peculiarity of named entities compared to other vocabulary terms is that they are very dynamic in appearance, and synonym relationships between terms change with time. In this paper, we present an approach to extracting synonyms of named entities over time from the whole history of Wikipedia. In addition, we will use their temporal patterns as a feature in ranking and classifying them into two types, i.e., time-independent or time-dependent. Time-independent synonyms are invariant to time, while time-dependent synonyms are relevant to a particular time period, i.e., the synonym relationships change over time. Further, we describe how to make use of both types of synonyms to increase the retrieval effectiveness, i.e., query expansion with time-independent synonyms for an ordinary search, and query expansion with time-dependent synonyms for a search wrt. temporal criteria. Finally, through an evaluation based on TREC collections, we demonstrate how retrieval performance of queries consisting of named entities can be improved using our approach.
|
21 May 2010 | Christos Doulkeridis Reverse Top-k Queries: Current State and Research Challenges +
Top-k queries are widely applied for retrieving a ranked set of the k most interesting objects based on the individual user preferences. As an example, in online marketplaces, customers (users) typically seek a ranked set of products (objects) that satisfy their needs. Reversing top-k queries leads to a query type that instead returns the set of customers that find a product appealing (it belongs to the top-k result set of their preferences). In this talk, we provide an introduction to reverse top-k queries and a brief overview of query processing algorithms and techniques.
In addition, we propose efficient algorithms for processing meaningful variations of reverse top-k queries, such as identifying the most influential products to customers, where influence is defined as the cardinality of the reverse top-k result set. Finally, a roadmap of open problems and research challenges that rely on reverse top-k queries will be presented.
|
23 Apr 2010 | Xiangliang Zhang Hi-AP and StrAP: Algorithms and Applications ---Clustering Large-scale and Streaming Data +
The clustering of large-scale streaming data is a key issue for many application domains. In this talk, we present two algorithms, Hi-AP for clustering of large-scale data and StrAP for clustering of streaming data. Our Hi-AP algorithm has the merits of 1) only quasi-linear complexity; 2) better clustering performance; 3) annulling the specification of the number of clusters. Our StrAP algorithm summarizes data streams by an incrementally updated model. It is designed for the data streaming framework and has the merits of (1) seamlessly updating the clustering model; (2) adapting to changes of data distribution; (3) intelligible compressed data model. Based on Hi-AP and StrAP, we developed a multi-scale online grid monitoring system in a fashion of autonomic computing. We will show the performance of the monitoring system running on 5-million job trace from the European EGEE grid and how the system helps to discover device problems (e.g., clogging of LogMonitor).
|
25 Feb 2010 | Akrivi Vlachou Reverse Top-k Queries +
Rank-aware query processing has become essential for many applications that return to the user only the top-k objects based on the individual user’s preferences. Top-k queries have been mainly studied from the perspective of the user, focusing primarily on efficient query processing. In this work, for the first time, we study top-k queries from the perspective of the product manufacturer. Given a potential product, which are the user preferences for which this product is in the top-k
query result set? We identify a novel query type, namely reverse top-k query, that is essential for manufacturers to assess the potential market and impact of their products based on the competition. We formally define reverse top-k queries and introduce two versions of the query, namely monochromatic and bichromatic and present efficient algorithms. Our experimental evaluation
demonstrates the efficiency of our techniques, which reduce the required number of top-k computations by 1 to 3 orders of magnitude.
|
19 Feb 2010 | Muhammad Ali Norozi Ranking the Web using Linear Algebra +
The talk is about the Link Analysis Ranking algorithms and their mathematical state of the art interpretations. And hence I will present a notable improvement in the convergence behaviors of the query-dependent algorithms like HITS, SALSA and their descendants (e.g., Exponentiated and Randomized HITS) using the Extrapolation techniques. Through which I was able to accelerate the algorithms in terms of reducing the number of iterations and therefore uncovered a much faster convergence. In the experiments I even got much better results than theoretically predicted results, a speedup of order 3 - 19 times better.
|
05 Feb 2010 | Mihaela A. Bornea Serializability with Snapshot Isolation under the Hood +
This presentation proposes a new multi-version concurrency control algorithm, called serializable generalized snapshot isolation (SGSI), targeting middleware replicated database systems. Under this algorithm, each replica runs snapshot isolation locally and the replication middleware guarantees global serializability by performing
enhanced certification for update transactions. We proved the correctness of the proposed algorithm and employ novel techniques both to extract transaction readsets and to perform enhanced certification to prevent read-write and write-write conflicts, without changing the underlying database replicas. We build a prototype
replicated database system, which uses snapshot isolated database engines while maintaining serializable execution. We assess the algorithm experimentally using the TPC-W benchmark. We show that the algorithm is practical and demonstrate that it is has low overhead for small degrees of replication.
|
22 Jan 2010 | Orestis Gorgas Software structure and code reproduction through sequence diagram analysis +
One of the major problems in modern software systems is that, due to their size, their complexity and the constant upgrades they are subject to, it has become very hard to afford the time, the money and the effort to analyze and maintain them. Additionally, the tasks that a system performs are frequently quite different from those the system is intended to complete. The model-driven approach deals with the software systems through an abstractive view using a variety of models, in order to make the design, the implementation and the maintenance of the software systems easier.
This presentation is a walk-through of the design and the development of a transformation that can be applied to the sequence diagrams of a software system and can produce an abstract framework of the code. During the face of design and development of a software system, this can guide the insertion of extra code that can be transformed into a runnable system. During the phase of maintenance the generated framework can be compared with actual code to trace inconsistencies between the code and the sequence diagrams. Thus it will be easier, after an update of the sequence diagrams to spot the areas where the code has to be updated, and vice versa. The transformation of the sequence diagrams and the comparison of the code framework with the actual code are demonstrated through an example system to which the techniques proposed in this work are applied.
|
08 Jan 2010 | Nils Grimsmo Towards Unifying Advances in Twig Join Algorithms +
Twig joins are key building blocks in current XML indexing systems, and numerous algorithms and useful data structures have been introduced. We give a structured, qualitative analysis of recent advances, which leads to the identification of a number of opportunities for further improvements. Cases where combining competing or orthogonal techniques would be advantageous are highlighted, such as algorithms avoiding redundant computations and schemes for cheaper intermediate result management. We propose some direct improvements over existing solutions, such as reduced memory usage and stronger filters for bottom-up algorithms. In addition we identify cases where previous work has been overlooked or not used to its full potential, such as for virtual streams, or the benefits of previous techniques have been underestimated, such as for skipping joins. Using the identified opportunities as a guide for future work, we are hopefully one step closer to unification of many advances in twig join algorithms.
|
26 Nov 2009 | Katja Hose Maintenance Strategies for Routing Indexes +
Processing queries efficiently in large-scale unstructured P2P networks is a crucial part of operating such systems. The straightforward solution of querying all the peers in the network (flooding) leads to complete query answers but does not scale well with the number of peers. Thus, in order to avoid the expensive flooding of the network for query processing, routing indexes are used. Each peer maintains such an index for its neighbors. It provides a compact representation (data summary) of data accessible via each neighboring peer. Based on this information and a given query, a peer can decide whether it is worthwhile to forward the query to a particular neighbor or not. As P2P networks are dynamic systems and peers might change their local data over time, an important problem in this context is to keep these data summaries up-to-date without paying high maintenance costs.
This talk discusses the problem of updating routing indexes in P2P-based environments in the absence of global knowledge and central instances. Using the QTree, a combination of R-trees and histograms, as an example base structure for routing indexes, this talk presents a classification of maintenance strategies and discusses several approaches to keep maintenance costs at a reasonable level.
|
23 Oct 2009 | Truls A. Bjorklund A Confluence of Column Stores and Search Engines: Opportunities and Challenges +
IR and DB integration has been a long-withstanding research challenge. Most of the work trying to integrate the two fields is motivated by specific application scenarios. In this paper we approach this problem from another perspective. Instead of focusing on IR and DB as whole fields, we restrict the focus to search engines and column stores. We present observations of similarities in the two technologies, and aggregate information on parallel developments in the two fields. We argue that these developments point towards a confluence of column stores and search engines, and one may in fact argue that this confluence has already started. We evaluate the potential in developing an engine capable of handling the workloads traditionally supported by the different systems, namely decision support and search workloads, by identifying potential opportunities and challenges. The opportunities include potential areas for technology transfer and more efficient support for features. The identified challenges outline areas for future work whose successfulness will help decide whether a confluence of column stores and search engines is feasible.
|
07 Oct 2009 | Iraklis Varlamis Monitoring the evolution of interests in the blogosphere +
This presentation describes blogTrust, an innovative modular and extensible prototype application for monitoring changes in the interests of blogosphere participants. A new approach for the analysis of weblog contents is introduced, which can yield new insights on the analysis of the blogosphere by monitoring the convergence or dispersion of blogosphere interests.
BlogTrust uses established, robust data mining techniques to support every step of the process. The motivation for the work is a hypothesized strong connection between important (global or "local") events and the rapid reduction in the divergence of (global or "local") weblog topic coverage.
Experimental results on a real data are provide support for our hypothesis, indicate the most critical points in the proposed process, and point to interesting directions
for further research.
|
14 Aug 2009 | Marek Ciglan Mining interesting relations from wikipedia link graph +
Recently, Wikipedia has gained lots of popularity amongst researcher as a source of data, mainly in the areas of natural language processing and information retrieval and extraction. In this preliminary work presentation, we describe our ideas for to exploiting wikipedia in a new manner, for mining non-trivial semantic relations between sets of topics.
We will present some experiments with the use of a spreading activation algorithm on the link structure of wikipedia to achieve this goal. In this talk, we discuss the challenges of our approach, describe proposed solutions and give a
short demonstration of our research prototype.
|
17 Jul 2009 | George Tsatsaronis Text Relatedness based on a Word Thesaurus +
Measuring the relatedness between two text segments in an automated manner is a tedious task. Text conveys semantics that are hard for a computer program to capture. Without doubt, a measure of relatedness between text segments must take into account both the lexical and the semantic relatedness between words. Such a measure that captures well both aspects of text relatedness may help in many tasks, such as text retrieval, classification and clustering. We present a new approach for measuring the semantic relatedness between words based on their implicit semantic links. The approach does not require any type of training, since it exploits a word thesaurus in order to devise implicit semantic links between words. Based on this approach, we introduce a new measure of semantic relatedness between texts, which capitalizes on the semantic relatedness between individual words, and extends it to measure the relatedness between sets of words. We gradually validate our method: we first evaluate the performance of the semantic relatedness measure between individual words in three tasks and then proceed with evaluating the performance of our method in measuring text-to-text semantic relatedness in sentence-to-sentence similarity, paraphrase recognition and text classification. Experimental evaluation shows that the proposed method outperforms every lexicon-based method of word semantic relatedness in the selected tasks and the tested data sets, and competes well against corpus-based approaches that require training. Finally, we show that the proposed measure can be successfully applied to more complex linguistic tasks (e.g. paraphrasing) and that it is able to capture the human notion of relatedness better than traditional lexical matching techniques.
|
26 Jun 2009 | Joao da Rocha-Junior AGiDS: A Grid-based Strategy for Distributed Skyline Query Processing +
Skyline queries help users make intelligent decisions over complex
data, where different and often conflicting criteria are considered. A
challenging problem is to support skyline queries in distributed
environments, where data is scattered over independent sources. The
query response time of skyline processing over distributed data
depends on the amount of transferred data and the query processing
cost at each server. In this paper, we propose AGiDS, a framework for
efficient skyline processing over distributed data. Our approach
reduces significantly the amount of transferred data, by using a
grid-based data summary that captures the data distribution on each
server. AGiDS consists of two phases to compute the result: in the
first phase the querying server gathers the grid-based summary,
whereas in the second phase a skyline request is sent only to the
servers that may contribute to the skyline result set asking only for
the points of non-dominated regions. We provide an experimental
evaluation showing that our approach performs efficiently and
outperforms existing techniques.
The same paper will be presented between August 31 - September 4 at
the Second International Conference on Data Management in Grid and P2P
Systems (Globe 2009).
|
12 Jun 2009 | Akrivi Vlachou Angle-based Space Partitioning for Efficient Parallel Skyline Computation +
Recently, skyline queries have attracted much attention in the database
research community. Space partitioning techniques, such as recursive division
of the data space, have been used for skyline query processing in centralized,
parallel and distributed settings. Unfortunately, such grid-based partitioning
is not suitable in the case of a parallel skyline query, where all partitions
are examined at the same time, since many data partitions do not contribute to
the overall skyline set, resulting in a lot of redundant processing. In this
talk, we present a novel angle-based space partitioning scheme using the
hyperspherical coordinates of the data points. We demonstrate both formally as
well as through an exhaustive set of experiments that this new scheme is very
suitable for skyline query processing in a parallel share nothing
architecture. The intuition of our partitioning technique is that the skyline
points are equally spread to all partitions. Our novel partitioning scheme
alleviates most of the problems of traditional grid partitioning techniques,
thus managing to reduce the response time and share the computational workload
more fairly.
|
20 Feb 2009 | Jon Olav Hauglid PROQID and DYFRAM +
1) PROQID: Partial Restarts of Queries in Distributed
Databases
In a number of application areas, distributed database systems can
be used to provide persistent storage of data while providing efficient
access for both local and remote data. With an increasing
number of sites (computers) involved in a query, the probability
of failure at query time increases. Recovery has previously only
focused on database updates while query failures have been handled
by complete restart of the query. This technique is not always
applicable in the context of large queries and queries with deadlines.
In this paper we present an approach for partial restart of
queries that incurs minimal extra network traffic during query recovery.
Based on results from experiments on an implementation
of the partial restart technique in a distributed database system, we
demonstrate its applicability and significant reduction of query cost
in the presence of failures.
2) DYFRAM: Dynamic Fragmentation and Replica
Management in Distributed Database Systems
In distributed database systems, tables are frequently fragmented
and replicated over a number of sites in order to reduce network
communication costs. How to fragment, when to replicate and
how to allocate the fragments to the sites are challenging problems
that has previously been solved either by static fragmentation,
replication and allocation, or based on a priori query analysis.
Many emerging applications of distributed database systems
generate very dynamic workloads with frequent changes in access
patterns from different sites. In those contexts, continuous refragmentation
and reallocation can significantly improve performance.
In this paper we present DYFRAM, a decentralized approach for
dynamic table fragmentation and allocation in distributed database
systems based on observation of the access patterns of sites to tables.
The approach performs fragmentation, replication, and reallocation
based on recent access history, aiming at maximizing the
number of local accesses compared to accesses from remote sites.
Through simulations, we show that the approach significantly reduces
communication costs for typical access patterns, thus demonstrating
the feasibility of our approach.
|