




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、二代測(cè)序數(shù)據(jù)分析簡(jiǎn)介童春發(fā)2013.12.23主要內(nèi)容 重測(cè)序的原理及流程 數(shù)據(jù)結(jié)構(gòu)與質(zhì)量評(píng)估 SRA數(shù)據(jù)庫(kù)及數(shù)據(jù)獲取 Bowtie2、BWA和SAMtools軟件使用重測(cè)序的原理及流程數(shù)據(jù)結(jié)構(gòu)與質(zhì)量評(píng)估 Fastq格式 FastQCA FASTQ file containing a single sequence might look like thisSEQ_IDGATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT+!*(*+)%+)(%).1*-+*)*55CCFCCCCCCC
2、65Illumina sequence identifiersHWUSI-EAS100R:6:73:941:1973#0/1Versions of the Illumina pipeline since 1.4 appear to use #NNNNNN instead of #0 for the multiplex ID, where NNNNNN is the sequence of the multiplex tag.With Casava 1.8 the format of the line has changedEAS139:136:FC706VJ:2:2104:15343:1973
3、93 1:Y:18:ATCACGQuality A quality value Q is an integer mapping of p (i.e., the probability that the corresponding base call is incorrect). Phred quality score: The Solexa pipeline (i.e., the software delivered with the Illumina Genome Analyzer) earlier used QualityEncoding Sanger format can encode
4、a Phred quality score from 0 to 93 using ASCII 33 to 126 Illuminas newest version (1.8) of their pipeline CASAVA will directly produce fastq in Sanger format Solexa/Illumina 1.0 format can encode a Solexa/Illumina quality score from -5 to 62 using ASCII 59 to 126 Starting with Illumina 1.3 and befor
5、e Illumina 1.8, the format encoded a Phred quality score from 0 to 62 using ASCII 64 to 126 Starting in Illumina 1.5 and before Illumina 1.8, the Phred scores 0 to 2 have a slightly different meaningAmerican Standard Code for Information Interchange (ASCII)FastQCBasic StatisticsFilenameNHS066-47_L4_
6、1.fq.gzFile typeConventional base callsEncodingSanger / Illumina 1.9Total Sequences3992798Filtered Sequences0Sequence length100%GC37Per Base Sequence QualityThe central red line is the median valueThe yellow box represents the inter-quartile range (25-75%)The upper and lower whiskers represent the 1
7、0% and 90% pointsThe blue line represents the mean qualityPer Sequence Quality ScoresA warning is raised if the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate. An error is raised if the most frequently observed mean quality is below 20 - this equates to a 1% er
8、ror rate. Per Base Sequence ContentThis module issues a warning if the difference between A and T, or G and C is greater than 10% in any position. This module will fail if the difference between A and T, or G and C is greater than 20% in any position.Per Base GC ContentThis module issues a warning i
9、t the GC content of any base strays more than 5% from the mean GC content.This module will fail if the GC content of any base strays more than 10% from the mean GC content. Per Sequence GC ContentA warning is raised if the sum of the deviations from the normal distribution represents more than 15% o
10、f the readsThis module will indicate a failure if the sum of the deviations from the normal distribution represents more than 30% of the readsPer Base N ContentThis module raises a warning if any position shows an N content of 5%This module will raise an error if any position shows an N content of 2
11、0%Sequence Length DistributionThis module will raise a warning if all sequences are not the same lengthThis module will raise an error if any of the sequences have zero lengthDuplicate SequencesThis module will issue a warning if non-unique sequences make up more than 20% of the totalThis module wil
12、l issue a error if non-unique sequences make up more than 50% of the totalOverrepresented SequencesAATTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGAAGATCTCG653111.636TruSeq Adapter, Index 10 (97% over 36bp)ATTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGAAGATCTCGT64640.162TruSeq Adapter, Index 10 (97% over 36bp)AA
13、TAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGAAGATCTCGT46330.116TruSeq Adapter, Index 10 (97% over 36bp)AATTAGTCGGAAGAGCACACGTCTGAACTCCAGTCACTCGAAGATCTCGT44630.112TruSeq Adapter, Index 10 (97% over 34bp)AATTATGGATAATTAAAGTATTCCCCCCTTTTTTTTATGATATTTTTGAC39940.100No HitWarning: 0.1%Failure: 1%Overrepresented
14、 KmersThis module will issue a warning if any k-mer is enriched more than 3 fold overall, or more than 5 fold at any individual positionThis module will issue a error if any k-mer is enriched more than 10 fold at any individual base positionSaving a ReportNHS066-47_L4_1.fq_fastqc.zipSRA數(shù)據(jù)庫(kù)及數(shù)據(jù)獲取SRA數(shù)據(jù)
15、庫(kù)及數(shù)據(jù)獲取SRA數(shù)據(jù)庫(kù)及數(shù)據(jù)獲取SRA數(shù)據(jù)庫(kù)及數(shù)據(jù)獲取查看和下載SRR576183Fastq-dum將SRA文件轉(zhuǎn)化成FASTQ格式 fastq-dump -split-files -DQ “+” ./SRR576183.sra fastq-dump -split-files -DQ “+” -gzip ./SRR576183.sra直接下載FASTQ格式數(shù)據(jù) ftp:/ftp.era.ebi.ac.uk/vol1/fastq/SRR576/SRR576183將Reads比對(duì)到參考序列 BWA Bowtie2 Soap SamtoolsBWA http:/bio- https:/ wget
16、http:/ tar -xjvf bwa-0.7.5a.tar.bz2 cd bwa-0.7.5a make Dowload test.tar.gz from 93BWA ./bwa-0.7.5a/bwa index ref.fa ./bwa-0.7.5a/bwa mem ref.fa test_PE1.fa aln-se.sam ./bwa-0.7.5a/bwa mem ref.fa test_PE1.fa test_PE2.fa aln-se.samBowtie2 http:/bowtie- 下載 bowtie2-2.1.0-linux-x86_64.z
17、ip unzip bowtie2-2.1.0-linux-x86_64.zip mv bowtie2-2.1.0 bowtie2 cd bowtie2/example mkdir work cd workBowtie2 Index a reference genome ././bowtie2-build ./reference/lambda_virus.fa lambda_virus Aligning single-end reads ././bowtie2 -x lambda_virus -U ./reads/reads_1.fq -S eg1.sam Aligning paired-end
18、 reads ././bowtie2 -x lambda_virus -1 ./reads/reads_1.fq -2 ./reads/reads_2.fq -S eg2.sam-U: unpaired reads -S: sam formatSAM output1.Name of read that aligned2.Sum of all applicable flags. Flags relevant to Bowtie are:1: The read is one of a pair2: The alignment is one end of a proper paired-end al
19、ignment4: The read has no reported alignments8: The read is one of a pair and has no reported alignments16: The alignment is to the reverse reference strandSAM output32: The other mate in the paired-end alignment is aligned to the reverse reference strand64: The read is mate 1 in a pair128: The read
20、 is mate 2 in a pair3. Name of reference sequence where alignment occurs4. 1-based offset into the forward reference strand where leftmost character of the alignment occursSAM output5.Mapping quality6.CIGAR string representation of alignment7.Name of reference sequence where mates alignment occurs.
21、Set to = if the mates reference sequence is the same as this alignments, or * if there is no mate.8.1-based offset into the forward reference strand where leftmost character of the mates alignment occurs. Offset is 0 if there is no mateSAM output9.Inferred fragment size. Size is negative if the mate
22、s alignment occurs upstream of this alignment. Size is 0 if there is no mate.10. Read sequence (reverse-complemented if aligned to the reverse strand)11. ASCII-encoded read qualities (reverse-complemented if the read aligned to the reverse strand). The encoded quality values are on the Phred quality
23、 scale and the encoding is ASCII-offset by 33 (ASCII char !), similarly to a FASTQ file.12. Optional fields. Fields are tab-separated. bowtie2 outputs zero or more of these optional fields for each alignment, depending on the type of the alignment:SAM output9.Optional fields:AS:i: Alignment score. O
24、nly present if SAM record is for an aligned readXS:i: Alignment score for second-best alignment. Only present if the SAM record is for an aligned read and more than one alignment was found for the readYS:i: Alignment score for opposite mate in the paired-end alignment. Only present if the SAM record
25、 is for a read that aligned as part of a paired-end alignment.SAM output9.Optional fields:XN:i: The number of ambiguous bases in the reference covering this alignment. Only present if SAM record is for an aligned readXM:i: The number of mismatches in the alignment. Only present if SAM record is for
26、an aligned readXO:i: The number of gap opens, for both read and reference gaps, in the aligment. Only present if SAM record is for an aligned readSAM output9.Optional fields:XG:i: The number of gap extensions, for both read and reference gaps, in the aligment. Only present if SAM record is for an al
27、igned read NM:i: The edit distance; that is, the minimal number of one-necleotide edits (substitutions, insertions and deletions) needed to transform the read string into the reference string. Only present if SAM record is for an aligned readSAM output9.Optional fields:YP:i: Equals 1 if the read is
28、part of a pair that has at least N concordant alignments, where N is the argument specified to M plus one. Equals 0 if the read is part of pair that has fewer than N alignments. E.g. if M 2 is specified and 3 distinct, concordant paired-end alignments are found, YP:i:1 will be printed. If fewer than
29、 3 are found, YP:i:0 is printed. Only present if SAM record is for a read that aligned as part of a paired-end alignment.SAM output9.Optional fields:YM:i: Equals 1 if the read aligned with at least N unpaired alignments, where N is the argument specified to M plus one. Equals 0 if the read aligned w
30、ith fewer than N unpaired alignments. E.g. if M 2 is specified and 3 distinct, valid, unpaired alignments are found, YM:i:1 is printed. If fewer than 3 are found, YM:i:0 is printed. Only present if SAM record is for a read that Bowtie 2 attempted to align in an unpaired fashion.SAM output9. Optional
31、 fields:YF:Z: String indicating reason why the read was filtered out. Only appears for reads that were filtered out.MD:Z: A string representation of the mismatched reference bases in the alignment. Only present if SAM record is for an aligned read.SAMtools http:/ Install SAMtools: Dowload samtools-0.1.19.tar.bz2 tar xjvf samtools-0.1.19.tar.bz2Or: git clone git:/ cd samtools-0.1.19 makeSAMtools: Primer Tutorial /samtools_primer.html Sample Data Files Aligning Reads Using Bowtie2 Converting SAM to BAM Sorting and Indexing Identifying Genomic Variants Understanding
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年騰訊服務(wù)合同模板
- 2025企業(yè)實(shí)習(xí)生勞動(dòng)合同樣本
- 2025自然人借款合同
- 2025市中心商業(yè)區(qū)房屋租賃合同模板
- 特種車輛雇傭合同協(xié)議
- 電動(dòng)液壓租賃合同協(xié)議
- 玻璃運(yùn)輸裝卸服務(wù)合同協(xié)議
- 電池電解液采購(gòu)合同協(xié)議
- 玉米秸稈定購(gòu)合同協(xié)議
- 電動(dòng)送料機(jī)采購(gòu)合同協(xié)議
- 雙盤摩擦壓力機(jī)的設(shè)計(jì)(全套圖紙)
- 國(guó)家開放大學(xué)《西方經(jīng)濟(jì)學(xué)(本)》章節(jié)測(cè)試參考答案
- 原地面高程復(fù)測(cè)記錄表正式版
- 高等學(xué)校建筑學(xué)專業(yè)本科(五年制)教育評(píng)估標(biāo)準(zhǔn)
- 品質(zhì)周報(bào)表(含附屬全套EXCEL表)
- 商鋪裝修工程施工方案.
- MQ2535門座起重機(jī)安裝方案
- 一針療法高樹中著精校版本
- 第六課-吸煙者的煩惱-《橋梁》實(shí)用漢語中級(jí)教程(上)課件
- 八年級(jí)數(shù)學(xué)下冊(cè)第3章圖形與坐標(biāo)復(fù)習(xí)教案(新)湘教
- 吊籃作業(yè)安全監(jiān)理專項(xiàng)實(shí)施細(xì)則
評(píng)論
0/150
提交評(píng)論