In a recent study posted to bioRxiv* preprint server, researchers found frequent recombination of severe acute respiratory syndrome (SARS)-like coronaviruses (SL-CoVs) with diverse gene pools.

Study: Correlated substitutions reveal SARS-like coronaviruses recombine frequently with a diverse set of structured gene pools. Image Credit: Christina Krivonos/Shutterstock
Study: Correlated substitutions reveal SARS-like coronaviruses recombine frequently with a diverse set of structured gene pools. Image Credit: Christina Krivonos/Shutterstock

Recombination allows viruses to adapt to selective pressures and avoid accumulating deleterious mutations, which could lead to extinction. Positive-sense (+) single-stranded (ss) RNA viruses exhibit variable levels of recombination. Population genomics has been instrumental in monitoring SARS-CoV-2 and understanding the correlations between transmission patterns and genomic substitutions.

Moreover, a quantitative population genomics-based understanding of the relative contribution of mutations and recombination to the evolution of SARS-CoV-2 and SL-CoVs is being developed. Most tools to study recombination are based on phylogeny, in which recombination breakpoints are evaluated by analyzing phylogenetic incongruence. The recombination parameters are then inferred using Bayesian and Markov chain Monte Carlo methods.

Such methods have successfully identified recombination events. However, their application to large-scale population genomics data is challenging due to the computational demand of these methods. Notably, these methods rely on sampled or observed sequences and do not capture recombination within larger unobserved gene pools.

The study and findings

In the present study, researchers adapted a previously-described method, mcorr, to infer recombination parameters of (+) ssRNA viruses. This method was originally used to infer bacterial recombination rates. However, a notable difference in the current model is the copy-choice recombination of RNA viruses. The model predicts a conditional probability of a synonymous substitution at a genomic site, referred to as the correlation profile.

Ideally, the correlation profile should rapidly decline in a highly recombining viral population, whereas, in a non-recombining population, the profile should be flat. First, the correlated substitutions were analyzed in poliovirus. They generated a genome-wide plot of the correlation coefficient for pairwise synonymous substitutions across the coding region (CDS) of major poliovirus serotypes.

It revealed that substitutions were more likely correlated in the first 800 codons. In line with the literature, the team found substantial recombination in the poliovirus. Next, 191 whole-genome sequences from Nextstrain for SL-CoVs were aligned to a SARS-CoV-2 reference genome. SL-CoVs included SARS-CoV, SARS-CoV-2, and bat CoVs.

The authors found that correlated substitutions were accumulated in the CDS of the open-reading frame (orf)1ab preceding the -1 ribosomal frameshift and the spike. Correlation profiles and recombination parameters were computed for each gene. There was strong evidence for recombination in the spike protein and orf1a.

The CDS of the nucleocapsid (N) and orf3a also showed recombination. The inferred recombination parameters suggested that genes that show evidence of recombination had frequently been recombining. Further, correlated substitutions were measured for SARS-CoV-2 using complete genome assemblies for SARS-CoV-2 available in the National Center for Biotechnology Information (NCBI) database.

The researchers observed weakly correlated substitutions across the SARS-CoV-2 genome, unlike SL-CoVs. Correlation profiles across multiple genes remained flat. The team estimated pairwise synonymous diversity across the 191 SL-CoVs and clustered the sequences using the average linkage algorithm to construct a dendrogram.

This tree was split into 11 clusters, with distinct clusters for SARS-CoV-2 and SARS-CoV and several clusters for bat CoVs. To determine whether a statistically significant clonal signal was present in the sampled genomes, the pool diversity, inferred from the correlation profile, was compared to sample diversity, measured from sequence data.

Mutational divergence was used as a measure of clonal divergence. The difference between pool and sample diversity was determined concerning the variability in the measurement of sample diversity, a quantity referred to as the residual clonality (RC) effect size. Recombination parameters for pairs of clusters of 11 SL-CoV clusters were inferred, and almost all cluster pairs showed recombination.

Mutational divergence and recombination coverage were plotted for these cluster pairs against RC effect size. Based on the mutational divergence, an average linkage tree was constructed for 11 SL-CoV clusters. The data had sufficient RC to infer the clonal structures for most SL-CoV lineages.

While the genome-wide pairwise distance-based dendrogram suggested that SARS-CoV-2 shared its most common recent ancestor (MRCA) with cluster 1 bat CoVs, the clonal tree indicated that SARS-CoV-2 shared an MRCA with clusters 1, 3, 4, and 5. The whole-genome correlation profiles for all sequence pairs were calculated to identify SL-CoVs recombining with shared gene pools.

The recombination between sequence pairs was viewed as a network where nodes represent strains, and edges link strains that recombined with a shared pool. The network appeared substantially connected, suggesting that multiple pairs recombined with the shared gene pools. Further investigation revealed that clusters were less interlinked, indicating that distinct gene pools were present despite the high level of recombination and sharing of gene pools across SL-CoVs lineages.

Conclusions

In summary, the authors showed that the mcorr method was adaptable to infer recombination parameters for SL-CoVs. They demonstrated the applicability of mcorr to (+) ssRNA viruses. The method was then applied to SL-CoVs that revealed strong recombination signatures in the CDS of spike and orf1a. Overall, this method allows for the analysis of enormous datasets and helps to understand the interplay among population structure, selection, and recombination.

*Important notice

bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:

Source