|
Subject: Re: Bazaar-NG vs. Mercurial -- speed comparison Newsgroups: gmane.comp.version-control.bazaar-ng.general Date: 2006-05-13 06:44:08 GMT (2 years, 16 weeks, 5 days, 8 hours and 26 minutes ago) Diwaker Gupta wrote: > Bryan Sullivan (of Mercurial) recently posted this benchmark: > <http://lists.freestandards.org/pipermail/lsb-futures/2006-May/002080.html> > > I know that some of the speed difference is due to the fact that bzr > doesn't need a specialized server at the other end point -- it can > pull natively over HTTP, SFTP and so on. But in that case, I think we > should emphasize this point strongly in the feature list. As someone > who is new to both systems, the above speed comparison numbers will > easily bias one towards Mercurial. I'm not saying Mercurial is bad -- > I use it on a daily basis and its *great*. All I'm saying is that bzr > should play to its strengths. > > Diwaker I wanted to post a little bit of a rebuttal to this. But first I would like to say that Mercurial really does show off as a fast little system. First, mercurial use a custom server rather than working off of 'plain' http/sftp. This does give it a huge latency advantage. It has some drawbacks, as in it is another thing that needs to be setup, holes opened in firewalls, etc. Though honestly 'hg serve' isn't real hard to setup. And it is something that we want to pay attention to. I think we want to support it, just not require it. mercurial uses a python extension (C code) to do diff & patch. Which can certainly be a bottleneck (though I'm thinking bzr's might be using XML, and certainly used to be how it handled weaves). I should revisit my performance testing with knits. Mercurial has used 'revfiles' for a long time, which are very similar to knits. We might consider looking into using a similar diff & patch, though it would mean requiring some sort of build stage for bzr. Right now it is very nice that bzr just works from the source tree. (Also, on one of my production servers, I don't install gcc, which makes mercurial difficult to install, and it is where I host some repositories) hg has a lot less code which is imported by default. So a plain 'hg root' takes only 0.1s rather than 'bzr root' which takes 1.1s. Now, if you switch and use bzrtools' 'bzr shell' command, which leaves the bzrlib code in memory, stuff like 'bzr root' again becomes very fast. Again, we could look at what we import when. I did a cleanup at one point in time, which helped simplistic tests like that. Though long term we found out that delayed imports can be costly inside internal loops. Bzr could be better about not having to load support for all of its features until they are actually needed. 'hg' actually uses a solution called 'demandload', which we probably could just move directly into the bzr code. I'm not sure if this imposes much of a runtime overhead, though I do see at least one more __getattribute__ function call per module. As to the specific benchmarks... Timing hg clone of hg code isn't quite the same as timing bzr.dev code. bzr.dev has 5002 revisions, while hg has 2253 changesets (4553 changes). So there is at least a factor of 2 there. Not huge, but not trivial. Also, there is the raw amount of data: $ du -ksh bzr.dev/ mercurial/ 28M bzr.dev 5.5M mercurial Mercurial has 2MB of source files, and bzr.dev has 4MB. So while mercurial is about 1/5th the size of bzr, it also has 1/2 the code, and 1/2 the revisions, so I'm guessing it is only compressing slightly better than bzr. In a local network, this is what I get: $ time hg clone http://juju.arbash-meinel.com:8000/ real 0m18.448s user 0m5.906s sys 0m4.346s $ time bzr get http://bzr.arbash-meinel.com/mirrors/bzr/bzr.dev/ http real 1m49.052s user 0m34.059s sys 0m10.676s $ time bzr get sftp://juju/srv/bzr/public/mirrors/bzr/bzr.dev/ sftp real 1m41.964s user 0m36.068s sys 0m10.979s So bzr still needs to do some catching up, but in a local network it is only 6x slower. (Honestly I thought sftp would spank http, I don't know whether this is good or bad :) I can say that the max theoretical speed for bzr would be: $ time rsync -av juju:/srv/bzr/public/mirrors/bzr/.bzr/ xxx real 0m15.583s user 0m1.890s sys 0m2.482s hg does a lot better, but it also isn't copying nearly as much data around: $ time rsync -av juju:dev/hg/mercurial real 0m3.940s user 0m0.381s sys 0m0.849s hg is 4-5 times slower than rsync, while bzr is 6.5 times slower. Remote network: $ time hg clone http://catharsis.i-clic.uihc.uiowa.edu:8000/ tmp real 0m17.449s user 0m5.931s sys 0m4.561s $ time bzr get http://src.i-clic.uihc.uiowa.edu/bzr/bzr/bzr.dev xxx real 6m25.014s user 0m32.949s sys 0m9.863s $ time bzr get sftp://src/srv/bzr/bzr/bzr.dev/ yyy real 7m46.863s user 0m43.455s sys 0m13.142s this shows hg as 21x faster, though you do have to still count the fact that hg is copying only 1/5th the amount of data. Local clone: 'hg' uses hardlinks when copying its repositories, which means it saves disk space, and definitely saves time (on Linux). The downside is that fat32 doesn't support them, technically NTFS does, though some people fear it (I think they should be fine, though I don't have much experience myself). I *do* know that on Mac OSX (HFS+), they are abysmal. Coming from Arch which loved hardlinked revlibs, I could measure how much slower it was to use hardlinks than plain copying. (It has to do with how HFS+ stores hardlink entries: *badly*). bzr decided to use a different solution, repositories. So to be fair, lets try this: $ \time hg clone mercurial/ xxx 2.20user 0.36system 0:02.58elapsed $ \time bzr init-repo bzr 1.24user 0.17system 0:01.41elapsed $ cd bzr/ $ \time bzr branch ~/bzr/mirrors/bzr/bzr.dev/ bzr.dev 51.69user 4.19system 0:59.35elapsed (Yes, this first copy is much slower) $ \time bzr branch bzr.dev bzr-dev2 Branched 1706 revision(s). 1.71user 0.23system 0:01.97elapsed Notice that this time, we actually create a new branch faster than hg without using hardlinks, and in a system which has approximately a 1.1s startup overhead. Also, this can be done from remote, so you can create a new branch over sftp with very little overhead. (I don't know if sftp supports hardlinking or not) Though I suppose since hg uses a custom server, they could still make branching on the remote end stay cheap. Now, I admit to a little bit of cheating, in that I didn't create working trees here. This does have relevance when you have public repositories (meaning it is fairly cheap However, so let me go fix that: $ rm .bzr/repository/no-working-trees $ \time bzr branch bzr.dev bzr-dev3 12.37user 1.17system 0:14.43elapsed Creating the working tree is pretty slow in bzr, and it is something we need to look closely at. So I think hg definitely does have a lot of things we should look closely at. But I did want to make people aware that hg isn't 30x faster than bzr. In many cases it is more in the 4-7x range. John =:-> |
|
|