# create testsample for cfDNA UniFlow

Small test data set based on IC17 from Snyder et al. 2016 (Cell): https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1833242 / SRX1120780

## calculate average coverage

> mosdepth -n testsample /PATH/TO/cfDNA-data/Cell_2016/BAMs/hg19/IC17.bam

print resulting average coverage:

> tail -n 1 testsample.mosdepth.summary.txt | awk '{print "average coverage: "$4}'

expected result:

> average coverage 25.12

## get factor for subsampling

### target 1X

> factor=$(tail -n 1 testsample.mosdepth.summary.txt | awk '{print 1/$4}'); echo "${factor}"

expected result:

> 0.0398089

### target 5X

> factor=$(tail -n 1 testsample.mosdepth.summary.txt | awk '{print 5/$4}'); echo "${factor}"

expected result:

> 0.199045

## downsample to target coverage

### target 1X

> samtools view -s 0.0398089 -b /PATH/TO/cfDNA-data/Cell_2016/BAMs/hg19/IC17.bam > testsample_hg19_1x_full.bam

### target 5X

> samtools view -s 0.199045 -b /PATH/TO/cfDNA-data/Cell_2016/BAMs/hg19/IC17.bam > testsample_hg19_5x_full.bam

## extract test chromosomes

> samtools view -bh testsample_hg19_1x_full.bam 20 21 22 > testsample_hg19_1x_chr20-22.bam

## create FASTQ files

Convert bam file to fastQ files

> samtools collate -O testsample_hg19_1x_chr20-22.bam | samtools fastq -t -1 testsample_hg19_1x_chr20-22_R1.fastq.gz -2 testsample_hg19_1x_chr20-22_R2.fastq.gz -s testsample_hg19_1x_chr20-22_PEsingleton.fastq.gz -0 testsample_hg19_1x_chr20-22_single_read.fastq.gz
