Let W be a string of length n over an alphabet, k be a positive integer, and S be a set of length-k substrings of W. The ETFS problem asks us to construct a string XED such that: (i) no string of S occurs in XED; (ii) the order of all other length-k substrings overis the same in W and in XED; and (iii) XED has minimal edit distance to W. When W represents an individual's data and S represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in O(kn2) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of . Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in O(n2-) time, for any> 0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Künnemann, FOCS 2015], to ETFS. 2012 ACM Subject Classification Theory of computation ! Pattern matching.

Bernardini, G., Chen, H., Loukides, G., Pisanti, N., Pissis, S., Stougie, L., et al. (2020). String Sanitization under Edit Distance. Intervento presentato a: 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 - 17 June 2020 - 19 June 2020, Copenhagen [10.4230/LIPIcs.CPM.2020.7].

String Sanitization under Edit Distance

Giulia Bernardini;
2020

Abstract

Let W be a string of length n over an alphabet, k be a positive integer, and S be a set of length-k substrings of W. The ETFS problem asks us to construct a string XED such that: (i) no string of S occurs in XED; (ii) the order of all other length-k substrings overis the same in W and in XED; and (iii) XED has minimal edit distance to W. When W represents an individual's data and S represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in O(kn2) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of . Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in O(n2-) time, for any> 0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Künnemann, FOCS 2015], to ETFS. 2012 ACM Subject Classification Theory of computation ! Pattern matching.
paper
Conditional lower bound; Data sanitization; Dynamic programming; Edit distance; String algorithms;
English
31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 - 17 June 2020 - 19 June 2020
2020
9783959771498
2020
161
7
none
Bernardini, G., Chen, H., Loukides, G., Pisanti, N., Pissis, S., Stougie, L., et al. (2020). String Sanitization under Edit Distance. Intervento presentato a: 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 - 17 June 2020 - 19 June 2020, Copenhagen [10.4230/LIPIcs.CPM.2020.7].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/277041
Citazioni
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 5
Social impact