Bicocca Open Archive

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n^ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n^ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model.

Bernardini, G., Chen, H., Fici, G., Loukides, G., Pissis, S. (2020). Reverse-safe data structures for text indexing. In 2020 Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX) (pp.199-213). Society for Industrial and Applied Mathematics Publications [10.1137/1.9781611976007.16].

Reverse-safe data structures for text indexing

Bernardini, Giulia;Chen, Huiping;Fici, Gabriele;Loukides, Grigorios;Pissis, Solon P.

2020

Abstract

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n^ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n^ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				data structure; algorithm; combinatorics; de Bruijn graph; data mining; privacy
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				2020 Symposium on Algorithm Engineering and Experiments, ALENEX 2020
			
	Anno del convegno
	
				2020
			
	Curatori della monografia
	
				Blelloch, G; Finocchi, I
			
	Titolo degli atti
	
				2020 Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX)
			
	ISBN del volume degli atti
	
				9781611976007
			
	Data di pubblicazione
	
				2020
			
	Numero del volume
	
				2020-
			
	Pagina iniziale
	
				199
			
	Pagina finale
	
				213
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1137/1.9781611976007.16
			
	Fulltext
	
				none
			
	Citazione
	
				Bernardini, G., Chen, H., Fici, G., Loukides, G., Pissis, S. (2020). Reverse-safe data structures for text indexing. In 2020 Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX) (pp.199-213). Society for Industrial and Applied Mathematics Publications [10.1137/1.9781611976007.16].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/258298

Citazioni

13

9

Social impact