基于BM方法的退化模式匹配算法A Boyer-Moore Approach to Degenerate Pattern Matching
林劼;林舒晔;
摘要(Abstract):
退化模式匹配问题在生物信息学中具有重要应用意义,但由于该问题的计算复杂度高,现有的算法均难以在实际中应用.在分析退化模式的特点以及经典的Boyer-Moore(BM)算法的基础上,提出基于BM算法框架解决退化模式匹配问题的方法.在计算偏移数组的预处理过程中,定义兼容规则并计算偏移数组,并将其应用在查找阶段,提高退化模式的匹配速度.在平均情况下,该算法提供了线性的模式匹配速度,在实际应用中得到良好的效果.
关键词(KeyWords): 模式匹配;退化模式;算法;DNA序列;生物信息
基金项目(Foundation): 福建省教育厅资助项目(JA08033)
作者(Authors): 林劼;林舒晔;
参考文献(References):
- [1]Boyer R S,Moore J S.A fast string searching algorithm[J].Communications of the ACM,1977,20(10):762-772.
- [2]Sohel Rahman M,Costas S Iliopoulos,Mouchard L.Pattern matching in degenerate DNA/RNA sequence[C]∥Proceedings of the workshop on aogorithms and computation(WALCOM),2007,Dhaka,Bangladesh,2007:109-120.
- [3]Shaner M C,Blair I M,Schneider T D.Sequence logos:a powerful,yet simple,tool[C]∥Proceedings of thetwenty-sixth annual hawaii international conference on system sciences,volume 1:architecture and biotechnologycomputing,Hawaii,USA,IEEE Computer Society Press,1993:813-821.
- [4]Holub Jan,Smyth W F,Wang Shu.Fast pattern-matching on indeterminate strings[J].Journal of DiscreteAlgorithms,2008,6(1):37-50.
- [5]Pavlos Antoniou,Costas S Iliopoulos,Laurent Mouchard,et al.Algorithms for mapping short degenerate andweighted sequences to a reference genome[J].International Journal Computational Biology and Drug Design,2009,2(4):385-397.