Features Download
From: Taras Glek <tglek <at> mozilla.com>
Subject: Re: [RFC/PATCH 0/2] ext4: Transparent Decompression Support
Newsgroups: gmane.comp.file-systems.ext4
Date: Thursday 25th July 2013 16:42:18 UTC (over 4 years ago)
Dhaval Giani wrote:
> On 07/24/2013 07:36 PM, Jörn Engel wrote:
>> On Wed, 24 July 2013 17:03:53 -0400, Dhaval Giani wrote:
>>> I am posting this series early in its development phase to solicit some
>>> feedback.
>> At this state, a good description of the format would be nice.
> Sure. The format is quite simple. There is a 20 byte header followed 
> by an offset table giving us the offsets of 16k compressed zlib chunks 
> (The 16k is the default number, it can be changed with the use of szip 
> tool, the kernel should still decompress it as that data is in the 
> header). I am not tied to the format. I used it as that is what being 
> used here. My final goal is the have the filesystem agnostic of the 
> compression format as long as it is seekable.
>>> We are implementing transparent decompression with a focus on ext4. One
>>> of the main usecases is that of Firefox on Android. Currently libxul.so
>>> is compressed and it is loaded into memory by a custom linker on
>>> demand. With the use of transparent decompression, we can make do
>>> without the custom linker. More details (i.e. code) about the linker 
>>> can
>>> be found at https://github.com/glandium/faulty.lib
>> It is not quite clear what you want to achieve here.
> To introduce transparent decompression. Let someone else do the 
> compression for us, and supply decompressed data on demand  (in this 
> case a read call). Reduces the complexity which would otherwise have 
> to be brought into the filesystem.
The main use for file compression for Firefox(it's useful on Linux 
desktop too) is to improve IO-throughput and reduce startup latency. In 
order for compression to be a net win an application should be aware of 
what is being compressed and what isn't. For example patterns for IO on 
large libraries (eg 30mb libxul.so) are well suited to compression, but 
SQLite databases are not.  Similarly for our disk cache: images should 
not be compressed, but javascript should be. Footprint wins are useful 
on android, but it's the increased IO throughput on crappy storage 
devices that makes this most attractive.

In addition of being aware of which files should be compressed, Firefox 
is aware of patterns of usage of various files it could schedule 
compression at the most optimal time.

Above needs tie in nicely with the simplification of not implementing 
compression at fs-level.
>>    One approach is
>> to create an empty file, chattr it to enable compression, then write
>> uncompressed data to it.  Nothing in userspace will ever know the file
>> is compressed, unless you explicitly call lsattr.
>> If you want to follow some other approach where userspace has one
>> interface to write the compressed data to a file and some other
>> interface to read the file uncompressed, you are likely in a world of
>> pain.
> Why? If it is going to only be a few applications who know the file is 
> compressed, and read it to get decompressed data, why would it be 
> painful? What about introducing a new flag, O_COMPR which tells the 
> kernel, btw, we want this file to be decompressed if it can be. It can 
> fallback to O_RDONLY or something like that? That gets rid of the 
> chattr ugliness.
This transparent decompression idea is based on our experience with 
HFS+. Apple uses the fs-attribute approach. OSX is able to compress 
application libraries at installation-time, apps remain blissfully 
unaware but get an extra boost in startup perf.

So in Linux, the package manager could compress .so files, textual data 
files, etc.
>> Assuming you use the chattr approach, that pretty much comes down to
>> adding compression support to ext4.  There have been old patches for
>> ext2 around that never got merged.  Reading up on the problems
>> encountered by those patches might be instructive.
> Do you have subjects for these? When I googled for ext4 compression, I 
> found http://code.google.com/p/e4z/
which doesn't seem to exist, and 
> checking in my LKML archives gives too many false positives.
> Thanks!
> Dhaval
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
CD: 3ms