Let W be a string of length n over an alphabet Σ, k be a positive integer, and be a set of length-k substrings of W. The ETFS problem asks us to construct a string X_{ED} such that: (i) no string of occurs in X_{ED}; (ii) the order of all other length-k substrings over Σ is the same in W and in X_{ED}; and (iii) X_{ED} has minimal edit distance to W. When W represents an individual’s data and represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in (kn²) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of |Σ|. Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in (n^{2-δ}) time, for any δ>0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Künnemann, FOCS 2015], to ETFS.

Bernardini, G., Chen, H., Loukides, G., Pisanti, N., Pissis, S., Stougie, L., et al. (2020). String Sanitization under Edit Distance. Intervento presentato a: 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020), Copenhagen [10.4230/LIPIcs.CPM.2020.7].

String Sanitization under Edit Distance

Giulia Bernardini;
2020

Abstract

Let W be a string of length n over an alphabet Σ, k be a positive integer, and be a set of length-k substrings of W. The ETFS problem asks us to construct a string X_{ED} such that: (i) no string of occurs in X_{ED}; (ii) the order of all other length-k substrings over Σ is the same in W and in X_{ED}; and (iii) X_{ED} has minimal edit distance to W. When W represents an individual’s data and represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in (kn²) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of |Σ|. Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in (n^{2-δ}) time, for any δ>0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Künnemann, FOCS 2015], to ETFS.
Si
paper
String algorithms, data sanitization, edit distance, dynamic programming, conditional lower bound
English
31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)
9783959771498
Bernardini, G., Chen, H., Loukides, G., Pisanti, N., Pissis, S., Stougie, L., et al. (2020). String Sanitization under Edit Distance. Intervento presentato a: 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020), Copenhagen [10.4230/LIPIcs.CPM.2020.7].
Bernardini, G; Chen, H; Loukides, G; Pisanti, N; Pissis, S; Stougie, L; Sweering, M
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/277041
Citazioni
  • Scopus 7
  • ???jsp.display-item.citation.isi??? ND
Social impact