Experimental studies

LW-FQZip 2 is compared with other state-of-the-art FASTQ data compression algorithms (including Quip, DSRC2, CRAM, FQZcomp, LFQC, LEON, SCALCE, LW-FQZip 1, bzip2, and gzip) using 10 real-world FASTQ files download from the Sequence Read Archive of the National Centre for Biotechnology Information (NCBI). The experimental results demonstrate that LW-FQZip 2 obtains superior compression ratios to other methods at reasonalbe time and memory costs. More discussions are available in our paper. The details of implementation, data sets, and experimental studies are provided in the supplementary file.

Table 1: Compression results of LW-FQZip 2 ( Normal mode )

Dataset
Platform
Size(MB)
Compressed
ratio
Compressed size(MB)
Compressed time(S)
Decompressed time(S)
Long-read
SRR2916693
454 GS
425
16.5%
71
35
25
SRR2994368
Illumina Miseq
4688
17.3%
812
300
240
SRR3211986
Pacbio RS
1759
33.3%
585
203
400
ERR739513
MinION
871
35.2%
307
122
170
SRR3190692
Illumina MiSeq
11379
12.7%
1441
540
416
short-read
ERR385912
Illumina Hiseq 2000
641
6.4%
41
25
12
ERR386131
Ion Torrent PGM
1371
16.5%
226
87
73
SRR034509
Illumina Analyzer II
5247
23.7%
1241
301
275
ERR174310
Illumina Hiseq 2000
105122
21.0%
22061
14050
10428
ERR194147
Illumina Hiseq 2000
202631
20.1%
40812
26488
19737

Table 2: Compression results of LW-FQZip 2(-g) ( High compression ratio mode )

Dataset
Platform
Size(MB)
Compressed
ratio
Compressed size(MB)
Compressed time(S)
Decompressed time(S)
Long-read
SRR2916693
454 GS
425
15.3%
65
303
295
SRR2994368
Illumina Miseq
4688
16.0%
748
1260
1198
SRR3211986
Pacbio RS
1759
32.3%
568
759
725
ERR739513
MinION
871
34.8%
303
333
320
SRR3190692
Illumina MiSeq
11379
11.7%
1330
2520
2372
short-read
ERR385912
Illumina Hiseq 2000
641
5.0%
32
282
268
ERR386131
Ion Torrent PGM
1371
16.0%
219
324
301
SRR034509
Illumina Analyzer II
5247
22.7%
1193
1200
1080
ERR174310
Illumina Hiseq 2000
105122
20.1%
21152
42600
30000
ERR194147
Illumina Hiseq 2000
202631
14.3%
28915
71400
60540
Compression ratio (all methods are configured to obtain the best compression ratios)
Note: 0 means that unknown errors occur in the compression