3.2.2. NGS for Detection of SARS-CoV-2
In late January 2020, Lu et al. reported SARS-CoV-2 genomic data from nine patients presenting with pneumonia of unknown origin at three hospitals in Wuhan, China [
1]. BAL and cultured isolates were used as samples. The patients’ samples were negative for known respiratory pathogens, with five tested by the Chinese CDC and four by the BGI group in Bei**g, China. NGS technology was used to sequence and identify the causative pathogen, with the BGI and CDC labs differing slightly in their sequencing techniques and bioinformatic processing pipelines. In both groups, gaps between contigs were connected using Sanger sequencing and terminal genome regions were identified via rapid amplification of cDNA ends (RACE).
At the BGI group, RNA extraction of BALF samples was carried out with a QIAamp Viral RNA Mini Kit, and a probe-captured technique was used to remove human nucleic acid material. Next, RNA was reverse transcribed to cDNA, second-strand synthesis was performed, and a DNA library was constructed. The DNA library was quantified with a Qubit method and transformed into a single strand circular library. Rolling circle amplification was used to construct DNA nanoballs, and they were subsequently qualified. The DNBSEQ-T7 high throughput sequencer from MGI was used with paired-end, 100 bp read lengths. High quality reads were filtered for human reads against the hg19 human reference genome with Burrow-Wheeler alignment software. The remaining data were aligned with published data on coronaviruses from the US National Center for Biotechnology Information. Mapped reads were assembled with SPAdes software to create a consensus genome sequence.
The Chinese CDC sequencing protocol similarly used the QIAamp Viral RNA Mini Kit to extract viral RNA from the clinical samples, followed by cDNA synthesis and second-strand synthesis. cDNA libraries were generated and then purified with Agencourt AMPure XP beads to remove contaminants. Following quantitation, the sequencing was carried out on MiSeq or iSeq platforms from Illumina. The terminal genome regions were identified by the use of Rapid amplification of cDNA ends (RACE) system from Invitrogen. Assembled genomes were confirmed with traditional Sanger sequencing. The raw sequencing reads were filtered via the same protocol used by the BGI group, and CLCBio software version 11.0.1 was used for de novo assembly, variant calling, and alignment. The bat-SL-CoVZC45 virus (containing 87.99% sequence similarity) was also used to perform a mapped assembly.
Sequencing yielded eight full genomes and two partial genomes (one patient’s BALF sample was used to isolate the virus, which was also sequenced, yielding 10 total samples). The sequences were used to generate PCR-based assays, that were then used to confirm the presence of the SARS-CoV-2 virus, and cycle threshold (Ct) values ranged from 22.85 to 34.23.
The results of sequencing the viral genome in this study yielded highly useful information during the early stages of the SARS-CoV-2 outbreak. Genomic analyses led to the revelation that, while the whole genome sequence of SARS-CoV-2 is highly similar to bat-SL-CoVZC45 (87.99% similarity) and bat-SL-CoVZXC21 (87.23%), the receptor binding domain (S1) sequence of the spike protein (S), was more similar to that of SARS-CoV, the virus responsible for the first SARS outbreak in the early 2000s. This evidence supports the suggestion that SARS-CoV-2 uses the ACE-2 receptor to gain entry into cells, the same route utilized by SARS-CoV. The utilization of ACE-2 receptors by SARS-CoV-2 has also been demonstrated in infectivity studies by Zhou et al. [
3]. The phylogenetic analysis, made possible by the assembled sequences, allowed the classification of the virus, showing that the virus belongs to the subgenus Sarbecovirus, a member of the Betacoronavirus genus. The high sequence similarity (over 99.9%) among viral samples obtained from the nine patients in Wuhan provides evidence of very recent entry into the human population.
Other laboratories in China conducted parallel investigations at the onset of the outbreak, such as Zhu et al. [
4], who used a similar combination of Illumina and nanopore sequencing, RACE, and Sanger sequencing to identify and characterize the SARS-CoV-2 genomes extracted from three patient samples in Wuhan, China. Their bioinformatics pipeline included CLC Genomics software, version 4.6.1; Muscle; and RAxML (13) for phylogenetic analysis. Their sequencing protocol yielded more than 20,000 viral reads per sample, obtaining one full-length genome and two nearly full-length genomes. They similarly noted that contigs aligned with high similarity with bat-SL-CoVZC45. Published 24 January 2020, they reached similar conclusions to Lu et al. regarding the phylogenetic characterizations of the virus and used their de novo generated sequences to design primers for PCR-based diagnostic assays.
Groups all over the world are now investigating possible diagnostic interventions made possible by NGS technology. Campos et al. reported the use of metatranscriptomic next-generation sequencing technology in the detection of SARS-CoV-2 in a nasopharyngeal swab specimen from a patient in Feira de Santana-Bahia, Brazil [
34]. They used the Ion S5 platform from ThermoFisher with an Ion 540™ chip and the Ion Total RNA-Seq kit v2. This platform uses an ion-semiconductor sequencing process, and they implemented the Low Input RiboMinus™ Eukaryote System v2 from ThermoFisher to remove rRNA from one sample. The rRNA-depleted library contained human transcripts as 77.29% of total reads, while the whole RNA library had 84.49% of total reads as human transcripts. Contigs from the rRNA-depleted library provided 29.9% genome coverage, while contigs from the non-depleted sample yielded only 5.4% genome coverage. Total genome coverage from all viral reads in the rRNA depleted sample was 59.9%. These results indicate that rRNA-depletion strategies may play a role in improving NGS diagnostic abilities.
Moore et al. have reported on the use of amplicon- and metagenomic-MiniION-based sequencing in the identification of SARS-CoV-2 and co-infections, respectively [
35]. Amplicon-based NGS is a tool that is commonly used to provide highly specific data on the presence of organisms in a sample via primers targeting highly conserved areas of a genome. This is contrasted with metagenomic-NGS which takes a “shotgun” approach, identifying all genetic material in a sample, not just those that contain the highly conserved genetic region. Primers in this amplicon-based approach were designed with sufficient overlap that the sequence of SARS-CoV-2 could be reconstructed from the individual fragments. The study was limited as it only included two patients, both from the UK. Primers were designed for amplicon-based NGS sequencing of SARS-CoV-2 to generate approximately 1000 base pair fragments with roughly 200 base pair overlaps for sequence assembly, and the assay successfully sequenced the SARS-CoV-2 genome in both patients. For validation, they spiked samples with VP35 RNA from Ebola Virus as an internal control. The map** software successfully identified the internal control RNA and also identified the presence of Fusobacterium periodonticum and human cytomegalovirus (human betaherpes virus 5) in addition to SARS-CoV-2 in the mNGS results. The group used Oxford Nanopore Technology’s (ONT’s) cloud-based pipeline EPI2ME (WIMP rev. 3.2.2) workflow for bioinformatic analysis. Patient 1 was sampled twice (two days apart) and patient 2 was sampled once, yielding total reads of 8,698,559 (78.6% of which were human reads), 9,890,327 (97.7%), and 5,849,966 (92.7%), respectively, during mNGS. The metagenomic approach did not provide uniform genome coverage among the three genomes, and the amplicon-based sequencing method provided a much higher read depth than the metagenomic approach. This study highlights the useful abilities enabled by the hypothesis-free approach taken by NGS. Identifying co-infections via the use of NGS, whether viral or bacterial, is highly relevant to clinical decision making and could help guide treatment and patient outcomes.
A table comparing currently employed NGS methods—short read: Illumina, Ion torrent, long read: Nanopore WGS assay—for detection of COVID-19 is presented (
Table 1).
3.2.5. Understanding Physical and Chemical Properties of the Virus
As previously noted, Lu et al. used sequencing results generated in their lab to provide evidence that the structure of the receptor binding domain (S1) of the spike (S) protein was highly similar to that of the original SARS-CoV virus [
1], indicating that the protein would likely bind the ACE-2 receptor. This type of genetic analysis is a highly useful byproduct enabled by sequencing infectious diseases, as it can reveal structural parameters that guide pharmaceutical developments.
Kim et al. used NGS technology to provide a high resolution readout of the SARS-CoV-2 transcriptome and epi-transcriptome using viral RNA isolated from a patient in South Korea, revealing a complicated array of RNA transcripts and RNA modification sites [
42]. They used a combination of sequence-by-synthesis (SBS) and Nanopore-based direct RNA sequencing (DRS) methods to map the full-length SARS-CoV-2 genome (gRNA) as well as sub-genomic RNAs (sgRNAs) which code for structural proteins, open reading frames (ORFs), and transcription regulatory sequences (TRSs) of the SARS-CoV-2 transcriptome. DRS techniques enable long reads of RNA without conversion to cDNA, allowing RNA modifications to be observed. The group generated in vitro RNAs as negative controls to study the modifications made to the virus inside of cells. Differences in ion current were noted at 41 sites between the patient-isolated and negative-control viral genomes, pointing to likely sites of RNA modification, and cross-examination revealed that “AAGAA” motifs were commonly associated with these modification sites. The group developed in-house software to analyze DRS sequencing data, and this software was used to measure the poly-A tails of full-length gRNAs and sgRNAs. They found that gRNAs have longer poly-A tails, than sgRNAs, and that sgRNA poly-A tails have two distinct populations, one with ~30 nts and one with ~45 nts. These differences likely result from age of the RNA and could indicate strategies for viral RNA degradation. They also noted that modified RNA molecules had shorter Poly-A tails. The authors speculate that these modifications could play roles in viral RNA stability control or evasion of host immune response. This study provides rich information on the complex intracellular processes involved in SARS-CoV-2 infection and could lead to breakthroughs in antiviral development.
Coutard et al. analyzed SARS-CoV-2 sequencing data and identified a furin-like cleavage site present in the S protein amino acid sequence that is not present in coronaviruses of the same clade [
43]. They proposed that the addition of this cleavage site in the amino acid structure of the S protein may have been a gain of function mutation that allowed for efficient spread into the human population. They supplement their hypothesis with evidence that furin expression levels are high in the lungs and that host cells attempt to inhibit the activity and availability of furin-like enzymes during viral infections [
44]. The authors note that the presence of similar furin and furin-like cleavage sites have been linked to higher pathogenicity in infectious bronchitis virus, influenza viruses, and other human coronaviruses.
Anand et al. provided supporting evidence on the importance of this furin-cleavage site [
45]. They performed computational analyses on 10,967 SARS-CoV-2 genomes available in the GISAID public database, and report that the furin cleavage site on the spike (S) protein is identical to the furin cleavage site present on the human epithelial sodium channel alpha subunit (ENaC-a). The EnaC-a subunit requires cleavage by furin proteases for activation in the same manner as the S protein of SARS-CoV-2, and the EnaC channel is present in high levels in regions of initial SARS-CoV-2 infection (respiratory epithelium, nasal cavity, etc.). They hypothesize that the furin cleavage site present on the SARS-CoV-2 virus could compete for activation with the EnaC-a furin cleavage site, causing dysregulation of cellular electrolyte balance, leading to the high levels of fluid found in the lungs of some COVID-19 patients. Taken altogether, this analysis of sequencing data provides another interesting proposal that could point to an antiviral candidate for SARS-CoV-2 and shed light on the high transmission capability of the novel virus.
In Japan, Wakida et al. have developed a technique that they call “Fate-Seq” which uses next-generation sequencing to identify viral RNA sequences that can help ensure RNA stability inside host cells [
46]. The study found that the original SARS-CoV virus, first seen in the early 2000s, contains 21 RNA sequences that could confer stability to viral RNA inside host cells, and comparative analysis with the SARS-CoV-2 genome shows high levels of conservation in these sequences. These findings may point to mechanisms that the novel SARS-CoV-2 virus uses to inhibit host RNA degradation systems and promote viral replication.
Yadav et al., researchers in India, used the Illumina MiniSeq platform to sequence three initial positive SARS-CoV-2 samples discovered by RT-PCR testing in February of 2020 [
47]. One of the sequences was of low data quality and so was excluded from the analysis. Extraction was performed with the QIAamp Viral RNA kit followed with Qubit RNA High-Sensitivity kit for quantification. Libraries were prepared and quantified with KAPA Library Quantification Kit, and CLC genomics workbench version 11.0 was used for bioinformatics analysis. For case 1, 20,096 viral reads were obtained from 5,615,846 total reads for a reconstructed SARS-CoV-2 genome of 29,854 nucleotides (99.83% coverage). Case 3 provided 11,296 viral reads out of 1,405,038 total reads for a reconstructed genome of 29,851 nucleotides (99.83% coverage).
They used these data to identify predicted linear and conformational B-cell epitopes as well as T-cell epitopes using a multitude of software products that predict amino acid structures based on sequencing data. These epitopes represent ripe targets for vaccine development and warrant further investigation. The sequence data from the two complete genomes showed 0.04% nucleotide divergence and 0.10% amino acid divergence, and phylogenetic analysis was able to indicate that these patients represented separate introductions into the country. Additionally, of note is that blood samples from these three confirmed COVID-19 patients were negative, highlighting the need for the development of accurate testing assays. This work reiterates the value of sequencing data in hel** public health authorities gain information that can help them make management decisions.
Key findings made possible by NGS technology |
1. The presence, structure, and function of the SARS-CoV-2 spike protein [1,43]. |
2. The presence and ramifications of a furin-like cleavage site on the SARS-CoV-2 spike protein [43,45]. |
3. Mechanisms of viral stability inside human cells [42,46]. |
4. The presence and structure of B-cell and T-cell epitopes on viral proteins [47]. |
3.2.6. SARS-CoV-2 Phylogenetics and Mutational Characteristics
Scientists in Italy, a country especially hard hit by the outbreak, began performing NGS analyses early in the pandemic to gain a better understanding of how the virus has spread across the country. Lorusso et al. sequenced 46 samples from patients in the Abruzzo region of central Italy between 16 March and 23 March 2020 [
48]. They chose these 46 samples for NGS among 839 SARS-CoV-2 positive samples based on their low Ct scores during RT-PCR testing. Their protocol utilized the MiniSeq Mid Output Kit (300-cycles) with 150 bp paired-end reads, and they used trimmomatic bowtie2 and samtools software products for bioinformatic processing.
Forty-five out of 46 NGS-generated sequences were of high read quality and suitable for analysis, while 16/45 had horizontal coverage >95.2% and were deposited in the GISAID database. Coverage depth for these 16 sequences ranged from 87× to 3721×. All sequences generated were >99% similar to the Wuhan-Hu-1 reference strain, yet all contained single nucleotide polymorphism (SNP) mutations. Phylogenetic analysis of these genomes tentatively points to two separate modes of introduction to the region. Twenty-nine out of the 45 sequences showed R203K and G204R mutations in the N protein, while 13 did not (3 were partial genomes and were missing this portion of the genome). These N protein mutations have been associated with Northern Europe, but a lack of sequences from the outbreak in northern Italy makes it difficult to draw robust conclusions at this time. More studies monitoring mutational data will help researchers trace routes of viral spread, allowing authorities to better stop outbreaks in the future, and they will also be vital to ensure that vaccine, antiviral, diagnostic primer–probe designs are up to date and adequate for an ever-changing virus. Monitoring viral mutation also helps researchers understand mechanisms of the virus, such as those of transmissibility and pathogenicity.
Several groups have reported on the mutational characteristics of the SARS-CoV-2 genome, including van Dorp et al. who have assessed 7666 public genome assemblies [
49], commenting on the relatively non-conservative nature of the N and S genes in the viral genome. The group has identified 198 recurrent mutations or mutations that have emerged independently multiple times, with 80% of these being non-synonymous mutations. More than 15 recurrent mutations were noted in each of the Nsp6, Nsp11, Nsp13, and S protein regions, respectively. Thus, these regions of the viral genome are non-conservative, as shown in similar studies.
In one such study, Wang et al. submitted a comprehensive analysis of 6156 SARS-CoV-2 genomes obtained between 5 January and 24 April, providing a breakdown of mutation patterns in the viral genome as well as across geography and time [
50]. The group assessed the genomes for analysis of their geographical distribution by k-means clustering. They noted that genomes in the study, gathered from across the world, cluster into five main groups with common mutations as shown in
Table 2. Note that these are not the only mutations present in the genome of each virus, just those that members of a cluster have in common; there can be subtypes of these groups themselves. From these clusters, some information can be drawn. Clusters 1 and 2 were apparent in the early data and are the dominant subtypes found in Asian countries, which is intuitive as the virus originated in Asia and subsequently spread across the world, accumulating more mutations along the way. The authors also note that the 23403A > G mutation (D614G amino acid change) found in clusters 3–5 is a spike protein mutation and could be a contributing factor to the high levels of spread seen in Europe and the United States. All of the five groups can be found at some level in the United States, and the authors went further to classify the SARS-CoV-2 genomes found in the US into three major clusters (A, B, and C). Cluster A is spread out across the nation, although in somewhat smaller numbers. Cluster B is highly prevalent on the US west coast, especially in the state of Washington, while the east coast shows a high prevalence of Cluster C. This distribution provides some evidence that the east coast COVID-19 outbreak originated mainly from Europe, as Cluster C is a descendant of Cluster 3 from the world-wide data.
The mutations were also assessed for their protein alterations and a mutation ratio and mutation h-index were determined for each genomic region. The mutation ratio reflects the absolute number of mutations found in the data relative to the length of the region (i.e., number of mutations found divided by the number of codons, or residues, in that region). This number reflects the relative conservative or non-conservative nature of each region. The mutation h-index is also provided to account for the fact that some mutations occur many times while others appear only a handful of times in the data; this index is mathematically defined as “the maximum value of
h such that the given protein genetic section has
hsingle mutations that have each occurred at least
htimes” [
50]. Their calculated mutation ratios and h-indices for a given genomic region were highly correlated with each other. When assessing these values together, it is noted that the most conservative regions of the viral genome are, in order, the envelope € protein, main protease, and endoribonuclease. Alternatively, the least conserved regions were the nucleocapsid (N) protein, Spike (S) protein, and papain-like protease.
Several prominent mutations appeared early in the pandemic with high frequency, including D614G (nt 23,403) in the S-protein and P323L (nt 14,408) in the RNA-dependent RNA polymerase (RdRp) protein, among others. As noted, the authors speculate that the D614G mutation, present in clusters 3–5, may confer a higher transmission ability to the virus, and also mention that it and other’s proximity to epitope regions may be relevant to vaccine development. Pachetti et al. analyzed 220 SARS-CoV-2 genomes obtained from patients across the world from December 2019 through mid-March 2020 and investigated mutations in the RdRp gene, including the P323L (nt 14,408) mutation [
51]. RdRp is involved in viral replication, and thus likely plays a role in the generation of new mutations in the viral genome. A silent mutation in this region appeared in their data on 9 February in the UK (nucleotide position 14,408), while the non-synonymous P323L mutation appeared on 20 February in Lombardy, Italy. They divided the genomes obtained after 9 February into groups that either had the 14,408 mutations (
n = 53) or did not (
n = 84) and found that those with the RdRp mutation had a statistically significantly higher number of mutations, and a median number of point mutations of three versus one, respectively (
p < 0.001). These data suggests that an RdRp mutation could confer higher mutation rates by interfering with viral proofreading abilities or by some other mechanism.
Chen et al. have reported on the effect of S-protein receptor binding domain (RBD) mutations on viral infectivity [
52]. They used a computational approach to estimate the changes in binding affinity between the viral S-protein RBD and the human ACE-2 receptor that occur following mutations found in 13,752 SARS-CoV-2 genomes available in the GISAID database. They assessed the five clusters of genomes presented by Wang et al., with the addition of a small sixth cluster, and found 55 amino acid mutations on the RBD. After evaluating the presence and frequency of each RBD mutation in each cluster, they determined that the mutational patterns of the RBD in five out of the six genome clusters had evolved towards higher RBD-ACE2 binding affinity (except for cluster 3). The authors implied that this points towards a trend of increased infectivity in the SARS-CoV-2 virus, but these results are limited in that only the receptor binding domain of the Spike protein was assessed. Other areas of the S-protein or other unknown factors could play important roles in viral infection.
Shen et al. used metatranscriptome NGS technology to evaluate mutational properties of the SARS-CoV-2 virus in eight BALF samples from patients in Wuhan [
53]. Using an Illumina HiSeq 2500/4000 platform, they searched for intra-host variants (varying strands of the virus present within individual humans) using a multitude of bioinformatic software applications. Variants had to meet rigorous inclusion criteria to be fit for analysis, including: “(1) sequencing depth ≥ 50, (2) minor allele frequency (MAF) ≥ 5%, (3) MAF ≥ 2% on each strand, (4) minor allele count ≥ 5 on each strand, (5) minor allele supported by the inner part of the read (excluding 10 base pairs on each end), and (6) both alleles identified in ≥3 reads that specifically mapped to the genome of Betacoronavirus [
53]”. Sequence depth ranged from 18× to 32,291×, and five samples had 50× depth or greater on >80% of their genome.
The study reported a median number of intra-host variants to be four with a range of 0–51, but the transmission of these intra-host variants was not observed, indicating that a bottleneck may be associated with the transmission, although more publicly shared sequences are needed to make broad generalizations. The authors note that high numbers of variants present in individual patients may increase viral fitness, making eradication more difficult. Chronically infected individuals could provide opportunities for the virus to improve its evolutionary fitness, but the biological significance of high mutation rates in some individuals is still unknown and a source for further study.
In addition to using sequencing technology to study the SARS-CoV-2 virus, genomic sequencing technology can also be applied to the patients themselves to draw conclusions about the virus. Ellinghaus et al. recently released a genome-wide association study identifying two genetic regions in humans that appear to be associated with severe complications from SARS-CoV-2 [
54]. Variants at the ABO blood locus and a region of chromosome 3 both were found in higher proportions in patients with respiratory failure, which they defined as requiring mechanical ventilation or supplemental oxygen. The study assessed 1610 COVID-19 patients with respiratory failure from Spain and Italy and 2205 healthy controls. Patients with A-positive blood types were 1.45 times more likely to have severe COVID-19 complications, while those with type-O blood enjoyed a protective effect, being only around two-thirds (0.65) as likely to have similar problems. The region of chromosome 3 contains several relevant genes. SLC6A20 codes for an amino acid transport protein that interacts with ACE-2, while two other genes in the region code for CCR9 and CXCR6 cytokine receptors, respectively. These genes are sites for urgent investigation as the cytokine receptors are involved in the human immune system, and the SLC6A20 gene’s interaction with ACE-2 indicates a high likelihood that it is involved in viral transmission.
Key findings made possible by NGS technology |
1. Information on the spread of SARS-CoV-2 into and across national borders [47,48,50] and identification of the emergence of distinct viral clades throughout the world [50]. |
2. Mutational rates and characteristics of distinct regions of the SARS-CoV-2 genome [49,50,52]. |
3. Information on important individual mutations that have an outsized impact on the continued spread of the virus including the D614G and P323L mutations [50,51]. |
4. Analysis of intra-host SARS-CoV-2 variants [53]. |
5. Implications of specific human genotypes on susceptibility to develo** severe symptoms made possible by human genome sequencing [54]. |
With the recent discovery of several new variants throughout the world, the need for more ubiquitous sequencing is ever more urgent. In India, there has been a recent surge in new cases along with the identification of a new variant, B.1.6.1.7. Previously, India’s cases peaked in September of 2020 with around 90,000 new cases per day, which began to decline in the following months to numbers nearing 10,000 cases per day. However, with the identification of the new variants, from March 2021 onward there has been a drastic increase in new cases that now exceed 250,000 cases per day. Further, a variant discovered to be spreading rapidly in UK, lineage B.1.1.7 [501Y.V1], has been the source of much media coverage. Models have suggested that this lineage spreads 56% more rapidly than other SARS-CoV-2 lineages [
55]. This new strain represents an increasing share of global infections at a point when the virus is already widespread, which suggests that its fitness and/or transmissibility has outmatched that of its peers [
56]. On 18 December 2020, the South African government announced a similar emergent strain of the virus, called is 501Y.V2.
The new strains of SARS-CoV-2 share an N501Y amino acid mutation on the RBD region of the spike protein, in addition to many other changes present in these viral lineages. The 501Y.V1 variant contains eight changes to the spike protein alone, while the 501Y.V2 variant has nine such changes. One mutation found in the spike protein of the 501Y.V2 variant, E484K, was identified in a preprint on 28 December 2020, as a possible mutation that could confer an immunological escape to SARS-CoV-2 [
57].
Thus, screening for mutations and identifying any new strains is critical in the ongoing battle to contain the pandemic. Unfortunately, there has been a lag in screening the phylogentic evolution of the SARS-CoV-2 genome, including in the US, primarily because the resources were aimed at qPCR-based COVID-19 diagnostics. The identification of new strains and upsurge of cases in several parts of the world has led to the adoption of NGS technology on a larger scale. The laboratories testing for SARS-CoV-2 using qPCR have started to either sequence or send out limited number of cases for sequencing to national laboratories for variant monitoring in the US. Though this is an important initial step, there is a need for more laboratories to incorporate NGS technology for variant monitoring, as the process is efficient if all positive cases get screened for variant detection.
Overall, scientists are optimistic that currently available vaccines will be able to suppress the spread of SARS-CoV-2, but these changes are worrisome as the currently available vaccines utilize the spike protein as a way to produce an immunogenic effect in patients. Studies are urgently needed to assess the effect of vaccines against these emerging strains. Thus, there is an urgent need for surveillance testing by NGS to provide a means to keep track of the mutations in the SARS-CoV-2 genome.