Increasingly, clinical samples of all types are being analyzed with the goal of matching patients to target therapies. In the summer of 2017, Thermo Fisher received the first FDA companion diagnostic test approval for multiple non-small cell lung cancer (NSCLC) therapies, while Foundation Medicine received FDA approval for their FoundationFocus CDxBRCA as a companion diagnostic for an ovarian cancer treatment, using their Comprehensive Genomic Profiling assay. Roche’s cobas EGFR Mutation Test v2 was approved by the FDA in the summer of 2016 as a PCR-based assay of specific Epidermal growth factor receptor (EGFR) mutations in metastatic NSCLC as a companion diagnostic to a Tarceva (erlotinib) therapy. Other commercial laboratory-developed tests (LDTs) from organizations like Foundation Medicine and Guardant Health are actively marketed and sold to oncologists, and their submissions for regulatory approval are underway. The overall trend is clear: next-generation sequencing (NGS) has arrived in the molecular pathology laboratory, widening from examination of solid tumor samples into analysis of hematological cancers and cell-free DNA analysis.
Pre-analytical variability in sample types
One key consideration in NGS-based pathology workflows is how the samples are treated before nucleic acid purification. After a surgical biopsy, tissue is fixed in formalin, and then embedded in paraffin wax for sectioning by a microtome. Finally, stained sections are examined by a pathologist. If necessary, several biopsy slides will be provided to the molecular laboratory for analysis. Two problems occur at this phase:- Variability in the reagent quality and the incubation time after preservation.
- Artifacts in the sequencing data as a result of variation in reagents or deamination.
Sources of inherent sequencing error in the NGS raw data
Manufacturers of sequencing instruments have gone to great lengths to obtain the highest possible data quality. However, inherent difficulties remain. There are multiple steps involved in NGS workflows, each of which can result in errors.-
- PCR: Many rounds of PCR are involved in library preparation and hybridization-based enrichment. While the error rate using proofreading enzymes may be on the order of 1 in 50,000, any error that occurs in early cycles will be amplified in the final library.
-
- Cluster amplification: The prepared library undergoes cluster amplification within the sequencing instrument, which may also introduce some level of sequencing error.
-
- Base calling: The bases in the final sequenced library then need to be called. The base-calling process is also prone to errors. For example, a single cluster may overlap partially with its neighbor, giving a lower quality score for that base, Then there is the calling itself; each base has a quality score attached to it, and each read will have its own quality score. At what quality is sufficient—at the level of a single base, or at the level of the entire read? Is one high quality mutant read enough? These judgement calls are set by pages of parameters, prohibiting or misdirecting the base-calling software.
- Variant calling: Now, with an abundance of high quality sequence reads, data must be translated to a Variant Call File (VCF). Variant calling relies on proper alignment of the sequencing reads with a reference genome, but this is often not a simple task. Gene fusions, copy number variants, and repetitive sequences such as transposons (which all play important roles in some cancers) all profoundly complicate the variant calling process. Additionally, many important somatic variants may make up an extremely small proportion of the sequenced tissue.