Right Protection via Watermarking
with Provable Preservation of Distance-based Mining

Online companion to journal paper

Abstract: Protection of one’s intellectual property is a topic with important technological and legal facets. The significance of this issue is amplified nowadays due to the ease of data dissemination through the Internet. Here, we provide technological mechanisms for establishing the ownership of a dataset consisting of multiple objects. However, at the same time one needs to maintain the dataset’s utility for mining operations. We provide algorithms that guarantee balance between right protection and utility preservation. We consider a right protection scheme based on watermarking. Our approach is robust to many common transformations such as data rotation, translation, scaling, noise addition and resampling. However, watermarking is essentially a noise-adding operation and so it may distort the original object graph. We study how to properly tune the watermark embedding power so that distance relations between the dataset objects are not distorted. As an illustrative paradigm, we focus on preserving the Nearest-Neighbor (NN) and the Minimum Spanning Tree of the original distance graph; this leads to preservation of many mining tasks that depend on the above topological preservation, such as NN-search and classification, as well as a variety of visualization techniques. Our approach plainly applies to any mining operation that depends on the ordering of distances between objects. We prove fundamental and tight lower and upper bounds on the distance between objects post-watermarking. In particular, we establish a restricted isometry property, i.e., tight bounds on the distance contraction/expansion due to watermarking. We leverage this result to effectively prune the search space and provide fast algorithms for NN-preserving and MST-preserving watermarking. We have tested these schemes in a variety of datasets; the fast algorithms achieve from two up to five orders of magnitude speed-up over the exhaustive schemes without any sacrifice in NN or MST preservation.

Keywords: Watermarking, Distance preservation, Nearest Neighbors (NN), Minimum Spanning Tree (MST), Restricted Isometry Property (RIP)

 

Additional resources: