Identification of structural variants in human genome using long-read Next Generation Sequencing with the Oxford Nanopore technology

Project period: 2018 - 2019
Project leader: Michał Piętal

DNA sequence analysis is a foundation of contemporary biological sciences but not only. DNA sequencing technologies are the tools that contribute to the development of molecular research, also they broaden the scope of knowledge about human biology. Some time ago, the core sequencing technique was Sanger’s method. Today, the next generation sequencing (NGS) techniques gain more significance. They allow for the development and extension of molecular methods, due to the elimination of shortcomings of conventional sequencing techniques (that is mainly: cost and time). The main advantages of NGS techniques as compared for conventional methods is simultaneous detection of large number of genetic markers of high-resolution genetic data. It becomes possible to obtain genetic data about the microorganisms, insects, plants or soil (so called: eDNA).

As an example, 1000 Genomes Consortium (1KGP), established in 2008, is an international research effort which aims at the creation of most comprehensive repository of genetic variability of the human genome. The researchers involved, initially planned to sequence a thousand of individual genomes throughout the world, within various ethnic groups. Firs paper documenting the project was released in Nature in 2010. The aim of the last project phase was to learn about the genetic variability, across 26 selected populations, in order to create the catalogue of human genome variability on a global scale. According to the paper from 1000 Genomes, 3rd Phase (Nature, 2015), structural variants pose a natural factor contributing to human genome variability, by altering nearly 20 000 000 bases in the sequence. Human genome consists of c.a. 2100 - 2500 structural variants, such as: deletions, duplications, insertions, inversions, with the most frequent, large deletions. The research established the connection between structural variants and the occurrence of many illness-related phenotypes, which include: obesity, cancer, autism or schizophrenia. Despite of the enormous progress in the area of sequencing techniques, detection of structural variants is still challenging, because of complex nature of variants, significant number of long variants, or the tendency of variant occurrence in repetitive genomic loci. The technologies like Nanopore MinION, which allows for the longer length of reads, are more favorable that short reads techniques. Long DNA fragments can cover all repetitive regions, which might lead to resolving the gaps in the sequence being analyzed. Moreover, they allow for identification of very large structural variants contained within the one single read. Furthermore, large number of structural variants, which occurrence is correlated with single nucleotide polymorphisms (SNP) as reported in association studies (GWAS), only underlines the significance of sensitivity improvement of the variant detection or genotyping in clinical cases. The prediction of genetic variability may also have importance for autoimmune diseases which etiology is complex and not entirely known, to date. In such cases, the development of the structural variants detection and analysis method, would definitely contribute into the development of personalized medicine.

MinION (Oxford Nanopore) is a new technique, which has been available since 2015 for the selected users. The revolution being born with this technology covers the following key features: (i) ease of use, (ii) mobility/portability, (iii) long reads (up to 20 kbps and below). This technology can be used for the following application areas: prompt viral identification, ebola or influenza virus monitoring, environmental monitoring, food safety monitoring, SV analysis of malignant cells, haplotyping, fetus DNA analysis, antibiotics resistance and many more.

The main aim of our research project is the increase of the competences of the applicant, in the area of understanding of experimental technologies, such as Oxford Nanopore long read sequencing. Another aim is the generation of experimental data (which isn’t accessible to date as public data), for the purpose of subsequent bioinformatics research and the submission of research grants. The research project which is assumed to be submitted as a proposal (up to 12 months), assumes the creation of bioinformatics tools and web services for the purpose of analysis of the human genome (and others), for the analysis of the correspondence between rare diseases and structural variants (SV) occurrence. The data being sequenced are to be originated from 1000 genomes families (parents and children). DNA had been isolated from peripheral blood (according to the protocols used in the 1000 Genomes project). With the methodology used in 1000 Genomes consortium and basing upon our own algorithms the main types of structural variants will be identified.