To complement next-generation sequencing technologies, there is a pressing need for efficient pre-sequencing capture methods with reduced costs and DNA requirement. The Alu family of short interspersed nucleotide elements is the most abundant type of transposable elements in the human genome, with over one million Alu elements identified. We have made use of inter-Alu PCR with an enhanced range of amplicons in conjunction with next-generation sequencing to generate an Alu-anchored scan, or 'AluScan'. To illustrate the method, one pair of glioma DNA was sequenced by means of AluScan. The over 10 Mb sequences obtained, derived from more than 8,000 genes, revealed a highly reproducible capture of the genome. In addition, 341 somatic indels and 274 somatic SNVs have been identified. Therefore we suggested AluScan as a good alternative for accelerating the understanding of genomic studies. Meanwhile, an exploration of cancer genomics was also performed based on Affymetrix microarray data. We examined the possible use of machine learning to reveal associations between recurrent copy number variations (CNVs) and predisposition to cancer. Recurrent focal constitutional CN-gains and CN-losses were identified from both of the non-tumor and tumor blood cell of Caucasian and Korean cohorts respectively. In both instances, highly significant differences were revealed with respect to the CNV signatures identified by (a) Correlation-based Feature Selection (CFS), (b) Frequency-based Selection, and (c) Classifier-based Selection. The extensive discrimination between cancer-patient and normal person with the average prediction accuracies of 93.6% and 86.5% indicated the possible predisposition to cancer based on recurrent CNVs. Inspired by the above findings, we had also tried to call CNV using AluScan data. However, the special features of AluScan data have rendered them inaccessible to analysis by most algorithms designed for calling copy number variation (CNV) based on whole genome sequencing and exome-capture data, which require a paired control sample to proceed. Accordingly, In the present study, an 'AluScanCNV' method has been developed to call CNVs from AluScans, using a group of reference samples to construct a reference template, a transformed distribution of the read-depth ratio between sequence windows on target sequence and reference template to call local CNVs, a poisson binomial distribution to identify recurrent CNVs, and sequential merging of windows to reveal large CNVs. Application of the method to the AluScans for 21 non-cancerous and 39 cancer tissues led to the identification of an average of 532 local CNVs with length of 500kb per AluScan, a total of 49 recurrent CNVs with copy number gain and 65 recurrent CNVs with copy number loss in liver cancer samples, and a total of 6 and 12 large CNVs including the well known deletions on chromosomes 1 q and 19p in two glioma samples, and on chromosome 9 in one of the gliomas. The AluScanCNV method was found to be very robust when applied to CNV calling from AluScan data. In addition, it can be applied without loss of generality to other next generartion sequencing data. Since the method does not require any paired control, it can be employed as well to identify germline CNVs in normal samples. The utility of the method for calling recurrent CNVs broadens extensively the scope of systematic analysis of recurrent CNV regions. The current work has established a standard analysis flow for AluScan data as well as other targeted sequencing data. In addition, a simple statistical model for calculating recurrent patterns was applied for the first time on next generation sequence data, which could enable AluScan to be a quick and robust detection method of germline and somatic CNVs in cancer patients. Based on the findings in this study, these recurrent CNVs could be used as a complex signature to distinguish cancer patient from normal person. The current study highlights the importance of recurrent CNVs pattern in cancer blood sample which might shed light on a new biomarker for cancer prognosis and early detection.
| Date of Award | 2014 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The Hong Kong University of Science and Technology
|
|---|
AluScan data analyses and copy number variation-based prediction of cancer risk
Ding, X. (Author). 2014
Student thesis: Doctoral thesis