The Lyndon factorization of a word has been extensively studied in different contexts and several variants of it have been proposed. In particular, the canonical inverse Lyndon factorization ICFL, introduced in [5], maintains the main properties of the Lyndon factorization since it can be computed in linear time and it is uniquely determined. In this paper we investigate new properties of this factorization with the purpose of exploring its use in string queries. As a main result, we prove an upper bound on the length of the longest common extension (or longest common prefix) for two factors of a word w. This bound is at most the maximum length of two consecutive factors of ICFL(w). A tool used in the proof is a property that we state for factors with nonempty borders in ICFL(w): a nonempty border of a factor mi cannot be a prefix of the next factor mi+1. Another interesting result relates sorting of global suffixes, i.e., suffixes of a word w, and sorting of local suffixes, i.e., suffixes of the factors in ICFL(w). Finally, given a word w and a factor x of w, we prove that their Lyndon factorizations share factors, except for the first and last term of the Lyndon factorization of x. This property suggests that, given two words sharing a common overlap, their Lyndon factorizations could be used to capture the common overlap of these two words.
Bonizzoni, P., De Felice, C., Zaccagnino, R., Zizza, R. (2020). Lyndon words versus inverse lyndon words: Queries on suffixes and bordered words. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp.385-396). Springer [10.1007/978-3-030-40608-0_27].
Lyndon words versus inverse lyndon words: Queries on suffixes and bordered words
Bonizzoni P.
;
2020
Abstract
The Lyndon factorization of a word has been extensively studied in different contexts and several variants of it have been proposed. In particular, the canonical inverse Lyndon factorization ICFL, introduced in [5], maintains the main properties of the Lyndon factorization since it can be computed in linear time and it is uniquely determined. In this paper we investigate new properties of this factorization with the purpose of exploring its use in string queries. As a main result, we prove an upper bound on the length of the longest common extension (or longest common prefix) for two factors of a word w. This bound is at most the maximum length of two consecutive factors of ICFL(w). A tool used in the proof is a property that we state for factors with nonempty borders in ICFL(w): a nonempty border of a factor mi cannot be a prefix of the next factor mi+1. Another interesting result relates sorting of global suffixes, i.e., suffixes of a word w, and sorting of local suffixes, i.e., suffixes of the factors in ICFL(w). Finally, given a word w and a factor x of w, we prove that their Lyndon factorizations share factors, except for the first and last term of the Lyndon factorization of x. This property suggests that, given two words sharing a common overlap, their Lyndon factorizations could be used to capture the common overlap of these two words.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.