An efficient comparison technique for sanitized XML trees

Rahaman, Mohammad Ashiqur;Roudier, Yves
Research Report RR-09-227


When comparing different versions of large tree structured data the detection of changes and according generation of the minimum cost edit script is a  CPU and disc I/O intensive task. State of the art requires the complete XML trees to be in memory and intermediate normalized trees to be computed before any comparison may start. Furthermore, the comparison of sanitized XML trees is not addressed in these techniques. In this paper, we propose a comparison technique for sanitized XML documents  which ultimately results into a minimum cost edit script that transforms an initial version of XML tree to a target tree. This method makes use of encrypted integer labels which encode the original XML structure and content. The content of the sanitized XML is readable only by a legitimate party. Based on this encoding, any third party can compare the tree nodes on the fly without relying on any intermediate normalized trees. Besides, it allows partial comparison as opposed to computing the full trees a-priori of starting any matching operation. To support our approach a modular algorithm describing the comparison technique is provided along with its complexity analysis.

Sécurité numérique
Eurecom Ref:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Research Report RR-09-227 and is available at :
See also: