페이지 정보작성자 관리자 작성일2016-06-15 조회4,377회
2015년 12월 2~4일 본 워크샵에 참여하여 신테카바이오가 개발한 유전체 빅데이터환경을 설명합니다.
IDENTIFYING RARE GERMLINE MUTATIONS IN 2,200 NGS GERMLINE DATA FROM ICGC
Jongsun Jung1, Jonghui Hong1, Wan Choi2, Ho-Youl Jung2 and HyungLae Kim3
1GDIC(Genome Data Integration Center), Syntekabio, Inc., 981 Venture Town, Korea Institute of Science and Technology, Seoul, 136-130, Korea; 2Cloud Computing Research Department, ETRI, 218 Gajeongno Yuseong-gu, Daejeon, 305-700, Korea, 3PGM21 (Personalized Genomic Medicine21), Ewha Womans University Medical Center, 1071, Anyangcheon-ro, Yangcheon-gu, 158-710, Seoul, Korea.
A novel method, ADISCAN (allelic depth and imbalance scanning) for somatic and germline calling and ranking algorithm was used to indentify rare germline mutations in ICGC 2,200 cancer germline data (CGD) and 2,600 normal germline data from 1,000 genome (NGD). Integrating and processing of two big data sets was a kind of huge technical challenges using MAHA supercomputing infrastructure from ETRI (electronics and telecommunications research institute) and Syntekabio.
In this big data analysis, we generated a big matrix of 2,600 normal germline samples in y-axis and 3 billion nucleotides of human genome in x-axis, with which we were able to compute Minor Allele Frequencies (MAF) in every single position of the whole genome. In addition, the positions, both from NGD and CGD were filtered by the non-synonymous variant database.
Now, all known coding sequences of each of CGD were applied to NGD Minor Allele Frequencies to indentify rare germline mutations. Here, the definition of rare germline mutations in this presentation is simply the ones with MAF < 0.01.
Overall, the statistics and functional consequences of the rare germline mutations with regard to cancer predisposition and risk will be further discussed in the 11th Scientific Workshop of the International Cancer Genome Consortium, Mumbai, India.