In a recent study published on the bioRxiv* preprint server, researchers conduct an evolutionary analysis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-related bat β-coronavirus genomes.

Several studies have shown that SARS-CoV-2, the causative agent of the coronavirus disease 2019 (COVID-19), is zoonotically derived. Earlier epidemics, such as those due to outbreaks of the Middle East respiratory syndrome (MERS) and SARS, proved that bats were viral reservoirs; however, the interaction mechanisms between viruses and bat genomes are not clear.

Study: Evolutionary analysis of genomes of SARS-CoV-2-related bat viruses suggests old roots, constant effective population size, and possible increase of fitness. Image Credit: Rudmer Zwerver /

About the study

In the present study, the researchers used genomic sequences of bat β-coronaviruses to analyze their evolution pattern and estimate related parameters.

Phylogenetic assessment of nucleotide genomic sequences that were either unpartitioned 5’ or 3’ UTR (untranslated regions) or partitioned was performed with the Bayesian Evolutionary Analysis by Sampling Trees 2 (BEAST2) software. Any changes in the bat population over the years were determined by the Bayesian Skyline algorithm.

The tree roots for the 47 nucleotide genomic sequences and sub-sequences that encode for the spike (S) and envelope (E) proteins, as well as the 3’ and 5’ UTR sites, were analyzed to determine the origin of the bat viral genome. Changes in fitness were evaluated using McFarland’s Tug-of-War model inspired Moran model.

In the Moran model, a largely unchanged population would be equivalent to the life expectancy of the most recent common ancestor (MRCA) measured across generations.  For qualitative assessment, the results of the Moran model were matched with the BEAST 2 software.

The site frequency spectra (SFS) were created using matrix laboratory (MATLAB) software. The researchers compared the bat viral genomic SFS with that of the Moran model to assess changes in bat viral fitness over time. For SFS evaluation, if the difference in the genetic sequence at a specific region was within the ambiguity area, it was not considered a mutation.

Bat viral genomes were obtained from the National Center for Biotechnology Information (NCBI) Virus database, whereas SARS genes were obtained from the Reference Sequence (RefSeq) database from 2004 to 2017. Phylogenetic evolutionary trees created by the BEAST software were summarized using TreeAnnotator software and visualized using the FigTree software. The ancestral sequence was determined using the maximum deoxyribonucleic acid (DNA) parsimony technique from the Phylogeny Inference Package (PHYLIP).

The Hasegawa, Kishino, and Yano (HKY) substitution model was used for amino acid or nucleotide genetic sequence substitutions. The method described by Fitch was used for counting the base changes required for a given phylogenetic tree.

Felsenstein’s bootstrap seqboot method was used to increase the reliability of the most-parsimonious bat evolutionary tree. The Consense algorithm of the PHYLIP package was used in the final step of the study using the” Majority Rule extended” parameter.

Study findings

Based on the results of this study, the phylogenetic foundation of bat coronaviruses originated from a few decades to over a thousand years ago, while the population sizes remained largely unchanged over the years. There was a qualitative agreement between findings of the Moran model and BEAST 2 software, which indicated that the bat population that originated thousands of years ago did not significantly change over the years.

SFS comparison of the bat genomes to the Moran Tug-of-War model demonstrated that there was a probable increase in fitness of the bat virus over time due to direction-based selection, as indicated by the excess of ‘driver’ mutations compared to ‘passenger’ mutations.

SFS analysis results of the 47 genomic sequences were the most striking, as they were very similar to the classical theoretical prediction made by the Moran model showing consistently large effective population sizes under the infinite site model (ISM).

Based on the manner of tuning of the bat immune system, the interaction between bat and the virus was considered to be most likely stable. Mutation rate estimation indicated that the E gene and UTR regions were highly conserved, whereas the S gene demonstrated a high mutation rate.


The phylogenetic analysis of β-coronavirus genomes of bats demonstrated a greater than thousand-year-old origin, with no significant change in their population size over the years. However, the viral fitness likely increased.

Since bats have been considered to be natural reservoirs of β-coronaviruses, phylogenetic assessment of their evolutionary patterns may contribute to understanding the origin of COVID-19 by SARS-CoV-2 in humans. Furthermore, a better understanding of the bat viral genome would enable the development of genetic vaccines and immune strategies targeted against SARS-CoV-2 genomic sequences for improved protection against COVID-19.

*Important notice

bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference: