TY - JOUR
T1 - Comparing incomplete sequences via longest common subsequence
AU - Castelli, Mauro
AU - Dondi, Riccardo
AU - Mauri, Giancarlo
AU - Zoppis, Italo
N1 - Castelli, M., Dondi, R., Mauri, G., & Zoppis, I. (2019). Comparing incomplete sequences via longest common subsequence. Theoretical Computer Science, 796, 272-285. https://doi.org/10.1016/j.tcs.2019.09.022
PY - 2019/12
Y1 - 2019/12
N2 - Inspired by scaffold filling, a recent approach for genome reconstruction from incomplete data, we consider a variant of the well-known longest common subsequence problem for the comparison of two sequences. The new problem, called Longest Filled Common Subsequence, aims to compare a complete sequence with an incomplete one, i.e. with some missing elements. Longest Filled Common Subsequence (LFCS), given a complete sequence A, an incomplete sequence B, and a multiset M of symbols missing in B, asks for a sequence B⁎ obtained by inserting the symbols of M into B so that B⁎ induces a common subsequence with A of maximum length. We investigate the computational and approximation complexity of the problem and we show that it is NP-hard and APX-hard when A contains at most two occurrences of each symbol, and we give a polynomial time algorithm when the input sequences are over a constant-size alphabet. We give a [Formula presented] approximation algorithm for the Longest Filled Common Subsequence problem. Finally, we present a fixed-parameter algorithm for the problem, when it is parameterized by the number of symbols inserted in B that “match” symbols of A.
AB - Inspired by scaffold filling, a recent approach for genome reconstruction from incomplete data, we consider a variant of the well-known longest common subsequence problem for the comparison of two sequences. The new problem, called Longest Filled Common Subsequence, aims to compare a complete sequence with an incomplete one, i.e. with some missing elements. Longest Filled Common Subsequence (LFCS), given a complete sequence A, an incomplete sequence B, and a multiset M of symbols missing in B, asks for a sequence B⁎ obtained by inserting the symbols of M into B so that B⁎ induces a common subsequence with A of maximum length. We investigate the computational and approximation complexity of the problem and we show that it is NP-hard and APX-hard when A contains at most two occurrences of each symbol, and we give a polynomial time algorithm when the input sequences are over a constant-size alphabet. We give a [Formula presented] approximation algorithm for the Longest Filled Common Subsequence problem. Finally, we present a fixed-parameter algorithm for the problem, when it is parameterized by the number of symbols inserted in B that “match” symbols of A.
KW - Approximation algorithms
KW - Computational complexity
KW - Fixed-parameter algorithms
KW - Longest common subsequence
KW - String algorithms
UR - http://www.scopus.com/inward/record.url?scp=85072570657&partnerID=8YFLogxK
UR - http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=Alerting&SrcApp=Alerting&DestApp=WOS_CPL&DestLinkType=FullRecord&UT=WOS:000496338900019
U2 - 10.1016/j.tcs.2019.09.022
DO - 10.1016/j.tcs.2019.09.022
M3 - Article
AN - SCOPUS:85072570657
VL - 796
SP - 272
EP - 285
JO - Theoretical Computer Science
JF - Theoretical Computer Science
SN - 0304-3975
ER -