Increasingly, clinical samples of all types are being analyzed with the goal of matching patients to target therapies. In the summer of 2017, Thermo Fisher received the first FDA companion diagnostic test approval
for multiple non-small cell lung cancer (NSCLC) therapies, while Foundation Medicine received FDA approval for their FoundationFocus CDxBRCA
as a companion diagnostic for an ovarian cancer treatment, using their Comprehensive Genomic Profiling assay. Roche’s cobas EGFR Mutation Test v2 was approved by the FDA
in the summer of 2016 as a PCR-based assay of specific Epidermal growth factor receptor (EGFR) mutations in metastatic NSCLC as a companion diagnostic to a Tarceva (erlotinib) therapy. Other commercial laboratory-developed tests (LDTs) from organizations like Foundation Medicine and Guardant Health are actively marketed and sold to oncologists, and their submissions for regulatory approval are underway. The overall trend is clear: next-generation sequencing (NGS) has arrived in the molecular pathology laboratory, widening from examination of solid tumor samples into analysis of hematological cancers and cell-free DNA analysis.
Pre-analytical variability in sample types
One key consideration in NGS-based pathology workflows is how the samples are treated before nucleic acid purification. After a surgical biopsy, tissue is fixed in formalin, and then embedded in paraffin wax for sectioning by a microtome. Finally, stained sections are examined by a pathologist. If necessary, several biopsy slides will be provided to the molecular laboratory for analysis.
Two problems occur at this phase:
- Variability in the reagent quality and the incubation time after preservation.
- Artifacts in the sequencing data as a result of variation in reagents or deamination.
Formalin-Fixed Paraffin-Embedded (FFPE) samples are especially challenging. While FFPE is an important tool for long term storage at room temperature, the lower pH can cause acid-induced hydrolysis and fragmentation of DNA. This pH change is caused by the oxidation of the formaldehyde by atmospheric oxygen, which produces formic acid. During library preparation, DNA polymerase may read through an abasic site due to the lowered pH. Adenines and secondarily guanines may thus be preferentially incorporated. For additional background, see this reference
For cell-free DNA (cfDNA) analysis, there have been a number of research articles comparing standard and proprietary commercial techniques. A recent publication comparing three commonly used methods
(EDTA, Streck and CellSave blood collection) indicated a 6-hour window to isolate plasma. However, collection methods for cfDNA still need further examination and clinical verification prior to routine use.
Sources of inherent sequencing error in the NGS raw data
Manufacturers of sequencing instruments have gone to great lengths to obtain the highest possible data quality. However, inherent difficulties remain. There are multiple steps involved in NGS workflows, each of which can result in errors.
- PCR: Many rounds of PCR are involved in library preparation and hybridization-based enrichment. While the error rate using proofreading enzymes may be on the order of 1 in 50,000, any error that occurs in early cycles will be amplified in the final library.
- Cluster amplification: The prepared library undergoes cluster amplification within the sequencing instrument, which may also introduce some level of sequencing error.
- Base calling: The bases in the final sequenced library then need to be called. The base-calling process is also prone to errors. For example, a single cluster may overlap partially with its neighbor, giving a lower quality score for that base, Then there is the calling itself; each base has a quality score attached to it, and each read will have its own quality score. At what quality is sufficient—at the level of a single base, or at the level of the entire read? Is one high quality mutant read enough? These judgement calls are set by pages of parameters, prohibiting or misdirecting the base-calling software.
- Variant calling: Now, with an abundance of high quality sequence reads, data must be translated to a Variant Call File (VCF). Variant calling relies on proper alignment of the sequencing reads with a reference genome, but this is often not a simple task. Gene fusions, copy number variants, and repetitive sequences such as transposons (which all play important roles in some cancers) all profoundly complicate the variant calling process. Additionally, many important somatic variants may make up an extremely small proportion of the sequenced tissue.
Additional notes on variant calling
PIK3CA as a case study:
PIK3CA (phosphatidylinosital 3-kinase p110a catalytic subunit) is an important oncogene, one of the most highly mutated oncogenes in human colorectal, breast and liver cancers. PIK3CA has been recently found to be important in determining clinical benefit of HER2-targeted therapies
in breast cancer. However, present in the genome is a pseudogene related to PIK3CA. When you map a sequence read containing a fragment of the PIK3CA target, you now have two places in the genome that it maps to. Your alignment and calling algorithms have to take the mutant pseudogene copy into account when determining if the sequence read came from the pseudogene or the genuine gene.
Variant Allele Frequency:
The clinical oncology market for solid tumor analysis has a nominal threshold of a 5% minor allele frequency floor, below which standard informatics pipelines will refuse to make a call. Pillar Biosciences, however, is able to go down to 2% from FFPE samples, and as low as 1% from high-quality samples, due to the enrichment technology and informatics pipeline. For more information, click here