Report ID
2005-10
Report Authors
S. Alireza Aghili, Hua-Gang Li, Divyakant Agrawal, and Amr El Abbadi
Report Date
Abstract
XML queries specify predicates on the content and the structure of the elements of tree-structured XML documents. Hence, discovering the occurrences of twig (tree structure) query patterns is a core operation for XML query processing. Prior works have typically applied top-down decomposition of the twig patterns into (i) binary (parent-child or ancestor-descendant) relationships, or (ii) path expression queries, followed by a join operation to reconstruct matched twig patterns. However, most of these methods (1) rely on the user\'s knowledge of the underlying database to pose well-formed queries, and (2) suffer from inspecting too many irrelevant results. In this paper, we propose a novel heuristic for matching of XML twig query patterns, named TWIX, which imposed minimal restrictions on the user and causes substantial reduction of the search space through a distributed binary labeling technique. The algorithm incorporates a holistic ranking scheme of structure and content, named TRANK, to rank and report the top-k results. Furthermore, TWIX benefits from an interactive graphical user interface twig query matching. Experimental results on real datasets depict the ranking semantics and efficient filtration of the search space.
Document
2005-10.pdf217.83 KB