Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: =?UTF-8?Q?Luk=C3=A1=C5=A1_Lalinsk=C3=BD?= <lalinsky <at> gmail.com>
Subject: Re: Bazaar, OLPC, and the "no-packs" repository format.
Newsgroups: gmane.comp.version-control.bazaar-ng.general
Date: Thursday 20th December 2007 13:06:00 UTC (over 10 years ago)
On Št, 2007-12-20 at 06:25 -0600, John Arbash Meinel wrote:
> Robert has at least conceptualized a hash-based index, so lookups become
more
> of an O(1) operation. Which makes it O(N) for N packs, but that is better
than
> O(N log(M)) for our current indexes. I wonder if you could even combine
the
> hash maps to make it O(1) across all indexes that you have read...

I've implemented a prototype of a similar idea about two weeks ago, but
gave up on finishing it because I found out that the main bottleneck for
me was reading data from all over the big pack files. The indexing code
is in the attachment, but it's only a part of the plugin, which itself
requires patched bzrlib and breaks 'packs-0.92', so I did't bother with
attaching the whole package (it's not very usable, anyway).

The general idea is that the index is split into segments, which are
grouped by their left-hand history (similar to toposort, but backwards)
and the index header contains 32 bit hashes of each key of each segment.
The combined index also uses a global cache, so in most cases you don't
search in all indexes, but only the ones that actually contain the key
(for 15k revisions in bzr.dev I get no hash collisisions, so it's
something like O(1+very_small_constant*N)). But that helps only in cases
where you have a few big indexes, not a *lot* of tiny ones.

Lukas
 
CD: 21ms