LZMA Compression

A few months back I proposed the use of lzma compression over bzip2, but being right before the release of 2007 Spring, it was just not the right time...

Now OTOH cooker is fully active again and it's the appropriate time for making decissions and changes regarding this. :)

I proposed the change on cooker & maintainers list yesterday and people seems mostly to be positive, everyone seems to agree on it being better than current bzip2 compression at least.

The reason for this is due to it being faster to decompress, achieves better compression and can also result in lower memory usage.

On Gustavo's request I made some comparisions on my old & slow Blade 100 UltraSparc IIi 500 Mhz to make the difference more obvious:

(bzip2 -9 is standard for man pages)
[peroyvind@blade100 SPECS?]$ MANPAGER='true' time man ./bash.1.bz2
4.43user 0.20system 0:04.64elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1major+6454minor)pagefaults 0swaps

(lzma -5 is used to get slightly lower memory usage than bzip2 while still better compression, ratio shouldn't affect decompression time for lzma though)
[peroyvind@blade100 SPECS?]$ MANPAGER='true' time man ./bash.1.lzma
1.66user 0.15system 0:01.92elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (12major+4923minor)pagefaults 0swaps

[peroyvind@blade100 SPECS?]$ MANPAGER='true' time man ./zgrep.1.bz2
0.54user 0.11system 0:00.66elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1major+5436minor)pagefaults 0swaps

[peroyvind@blade100 SPECS?]$ MANPAGER='true' time man ./zgrep.1.lzma
0.46user 0.11system 0:00.58elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1major+4786minor)pagefaults 0swaps

Gustavo also did some further investigation where size comparasion was made too:

-rw-rr 1 spuk spuk 31315707 Jun 7 16:25 manpages.gz (gzip -9)
-rw-rr 1 spuk spuk 17808514 Jun 7 16:33 manpages.lzma (lzma -5)
-rw-rr 1 spuk spuk 22764006 Jun 7 16:35 manpages.bz2 (bzip2 -9)
-rw-rr 1 spuk spuk 115609592 Jun 7 16:37 manpages

$ find /usr/share/man -name *.bz2 | xargs cat | wc -c 40741331

(Compressing all in a single file give better compression, of course.)

$ time lzmadec /dev/null
3.76user 0.06system 0:04.41elapsed 86%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+772minor)pagefaults 0swaps

$ time bzcat /dev/null
14.78user 0.12system 0:17.67elapsed 84%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1074minor)pagefaults 0swaps

$ time zcat /dev/null
1.35user 0.06system 0:02.52elapsed 55%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1major+588minor)pagefaults 0swaps

So it seems to be a very nice improvement for this, not that much for size compared to bzip2, but a lot when it comes to time where it's almost near gzip while being way better (although someone tends to care so little that they're obsessed of not wanting any compression at all;p)!

To make it work though, support needs to be implemented, I've implemented it in lesspipe earlier which took care of handling compressed formats for man in the past, but now it seems that man takes care of this by it self. Not a big problem though, implementing support for it was quite simple (also sent back to and accepted upstream:), same goes for info (from texinfo) where I also implemented support for it yesterday. I'm not entirely certain about the correctnes of my patch for install-info though, but then again I didn't even quite get how it's used there nor does it seem like support for it there is something we actually use/need..

Not big news for everyone, but fun for all of us that likes to squeeze out the very best of even the most tiny things even if most others don't care. I know at least that this includes both me and Austin. :o)

Blog Home?

dvalin - LZMASupport
Version 1.449 last modified by Arkub on 01/04/2008 at 15:43

 


en

RSS

Creator: proyvind Karlsen on 2007/06/08 15:05
(c) Mandriva 2007
18888888