Saturday, November 24, 2018

[minimap2] How to Download, Install and execute minimap2

[minimap2] How to Download, Install and execute minimap2

How to Download, Install, Execute and Make a Reference file in minimap2?

QUESTION

How to Download minimap2?
How to Install minimap2?
How to Make a Reference index in minimap2?
How to Execute minimap2?

ANSWER

Download minimap2.
Download minimap2
git clone https://github.com/lh3/minimap2.git

Install minimap2
Install minimap2
cd minimap2
make
Let’s execute minimap2 and confirm properly installed.
./minimap2
Usage: minimap2 [options] <target.fa>|<target.idx> [query.fa] [...]

Options:
Indexing:
-H use homopolymer-compressed k-mer (preferrable for PacBio)
-k INT k-mer size (no larger than 28) [15]
-w INT minizer window size [10]
-I NUM split index for every ~NUM input bases [4G]
-d FILE dump index to FILE []

Mapping:
-f FLOAT filter out top FLOAT fraction of repetitive minimizers [0.0002]
-g NUM stop chain enlongation if there are no minimizers in INT-bp [5000]
-G NUM max intron length (effective with -xsplice; changing -r) [200k]
-F NUM max fragment length (effective with -xsr or in the fragment mode) [800]
-r NUM bandwidth used in chaining and DP-based alignment [500]
-n INT minimal number of minimizers on a chain [3]
-m INT minimal chaining score (matching bases minus log gap penalty) [40]
-X skip self and dual mappings (for the all-vs-all mode)
-p FLOAT min secondary-to-primary score ratio [0.8]
-N INT retain at most INT secondary alignments [5]

Alignment:
-A INT matching score [2]
-B INT mismatch penalty [4]
-O INT[,INT] gap open penalty [4,24]
-E INT[,INT] gap extension penalty; a k-long gap costs min{O1+k*E1,O2+k*E2} [2,1]
-z INT[,INT] Z-drop score and inversion Z-drop score [400,200]
-s INT minimal peak DP alignment score [80]
-u CHAR how to find GT-AG. f:transcript strand, b:both strands, n:don't match GT-AG [n]

Input/Output:
-a output in the SAM format (PAF by default)
-Q don't output base quality in SAM
-L write CIGAR with >65535 ops at the CG tag
-R STR SAM read group line in a format like '@RG\tID:foo\tSM:bar' []
-c output CIGAR in PAF
--cs[=STR] output the cs tag; STR is 'short' (if absent) or 'long' [none]
--MD output the MD tag
--eqx write =/X CIGAR operators
-Y use soft clipping for supplementary alignments
-t INT number of threads [3]
-K NUM minibatch size for mapping [500M]
--version show version number

Preset:
-x STR preset (always applied before other options; see minimap2.1 for details) []
- map-pb/map-ont: PacBio/Nanopore vs reference mapping
- ava-pb/ava-ont: PacBio/Nanopore read overlap
- asm5/asm10/asm20: asm-to-ref mapping, for ~0.1/1/5% sequence divergence
- splice: long-read spliced alignment
- sr: genomic short-read mapping

See `man ./minimap2.1' for detailed description of these and other advanced command-line options.

Make a Reference index in minimap2
Make a Reference index in minimap2
minimap2 -d ucsc.hg19.mmi ucsc.hg19.fasta
This process takes about 3 minute for me. (ucsc.hg19.fasta 3GB)
You may see the following log.
minimap2 -d ucsc.hg19.mmi ucsc.hg19.fasta
[M::mm_idx_gen::75.576*1.76] collected minimizers
[M::mm_idx_gen::90.135*1.96] sorted minimizers
[M::main::102.770*1.83] loaded/built the index for 93 target sequence(s)
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 93
[M::mm_idx_stat::103.674*1.82] distinct minimizers: 100029963 (38.72% are singletons); average occurrences: 5.458; average spacing: 5.746
[M::main] Version: 2.14-r886-dirty
[M::main] CMD: minimap2 -d ucsc.hg19.mmi ucsc.hg19.fasta
[M::main] Real time: 103.849 sec; CPU: 189.092 sec; Peak RSS: 11.213 GB


Execute minimap2
Execute minimap2
Suppose we are going to mapping HiSeq short read fastq file. (use -ax sr)
Presets
# use presets (no test data)

./minimap2 -ax map-pb ref.fa pacbio.fq.gz > aln.sam # PacBio genomic reads
./minimap2 -ax map-ont ref.fa ont.fq.gz > aln.sam # Oxford Nanopore genomic reads
./minimap2 -ax asm20 ref.fa pacbio-ccs.fq.gz > aln.sam # PacBio CCS genomic reads
./minimap2 -ax sr ref.fa read1.fa read2.fa > aln.sam # short genomic paired-end reads
./minimap2 -ax splice ref.fa rna-reads.fa > aln.sam # spliced long reads (strand unknown)
./minimap2 -ax splice -uf -k14 ref.fa reads.fa > aln.sam # noisy Nanopore Direct RNA-seq
./minimap2 -ax splice -uf -C5 ref.fa query.fa > aln.sam # Final PacBio Iso-seq or traditional cDNA
./minimap2 -cx asm5 asm1.fa asm2.fa > aln.paf # intra-species asm-to-asm alignment
./minimap2 -x ava-pb reads.fa reads.fa > overlaps.paf # PacBio read overlap
./minimap2 -x ava-ont reads.fa reads.fa > overlaps.paf # Nanopore read overlap
minimap2 -ax sr \
-t <Thread> \
-R <ReadGroup> \
ucsc.hg19.fasta \
sample_1.fastq.gz \
sample_2.fastq.gz
Comparing with BWA-0.7.12, the mapping time reduced about 20% in HiSeq short read.


Reference
https://github.com/lh3/minimap2

0 개의 λŒ“κΈ€:

Post a Comment