[Back to Issue 12 ToC] [Back to Journal Contents] [Back to Biochemistry (Moscow) Home page]

How Often Does Filtering of Alignment Columns Improve the Phylogenetic Inference of Two-Domain Proteins?


Andrey I. Sigorskikh1, Daria D. Latortseva1, Anna S. Karyagina2,3,4, and Sergey A. Spirin3,5,a,*

1Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119992 Moscow, Russia

2Gamaleya National Research Center of Epidemiology and Microbiology, Ministry of Healthcare of the Russian Federation, 123098 Moscow, Russia

3Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, 119992 Moscow, Russia

4All-Russia Research Institute of Agricultural Biotechnology, 127550 Moscow, Russia

5National Research University Higher School of Economics, 109028 Moscow, Russia

* To whom correspondence should be addressed.

Received September 23, 2022; Revised November 1, 2022; Accepted November 1, 2022
Protein phylogeny is usually reconstructed basing on a multiple alignment of amino acid sequences. One of the problems of such alignments is the presence of regions with different degree of conservation, including those with a questionable quality of the alignment. This problem is often solved by filtering the alignment columns with a special software developed for this purpose. In this work, we investigated various approaches to the phylogeny reconstruction using proteins with two evolutionary domains as examples. The sequences of such proteins are inherently heterogeneous in the degree of conservation due to the presence of both evolutionary domains and linkers between them, as well as the N- and C-termini. It is shown that filtering the alignment columns on average improves the quality of reconstruction only when using the full-length sequences and only for eukaryotic proteins. Limiting the alignment to the evolutionary domains with rejection of less conserved linkers and terminal sequences on average worsened the quality of phylogenetic reconstruction.
KEY WORDS: phylogenetic inference, evolutionary domains, filtration of multiple sequence alignment

DOI: 10.1134/S0006297922120239