|
Subject: [ricwheeler <at> gmail.com: suspiciously good fsck times?] Newsgroups: gmane.comp.file-systems.ext4 Date: 2008-07-10 17:28:29 GMT (26 weeks, 1 day, 5 hours and 26 minutes ago) Transferring this thread to the linux-ext4 list instead of linux-ext4-owner. Subject: suspiciously good fsck times? Date: 2008-07-10 12:36:42 GMT Just to be mean, I have been trying to test the fsck speed of ext4 with lots of small files. The test I ran uses fs_mark to fill a 1TB Seagate drive with 45.6 million 20k files (distributed between 256 subdirectories). Running on ext3, "fsck -f" takes about one hour. Running on ext4, with uninit_bg, the same fsck is finished in a bit over 5 minutes - more than 10x faster. (Without uninit_bg, the fsck takes about 10 minutes). Is this too good to be true? Below is the fsck run itself, the tree is Ted's latest git tree and his 1.41 WIP tools, ric [root <at> localhost Perf]# time /sbin/fsck.ext4 -t -t -f /dev/sdb1 e4fsck 1.41-WIP (07-Jul-2008) Pass 1: Checking inodes, blocks, and sizes Pass 1: Memory used: 40632k/69424k (36424k/4209k), time: 204.95/78.22/25.58 Pass 1: I/O read: 11140MB, write: 0MB, rate: 54.35MB/s Pass 2: Checking directory structure Pass 2: Memory used: 70184k/61968k (51803k/18382k), time: 76.47/50.27/ 8.77 Pass 2: I/O read: 3023MB, write: 0MB, rate: 39.53MB/s Pass 3: Checking directory connectivity Peak memory: Memory used: 70184k/61968k (59256k/10929k), time: 281.72/128.59/34.35 Pass 3A: Memory used: 70184k/61968k (59256k/10929k), time: 0.00/ 0.00/ 0.00 Pass 3A: I/O read: 0MB, write: 0MB, rate: 0.00MB/s Pass 3: Memory used: 70184k/61968k (51803k/18382k), time: 0.03/ 0.00/ 0.00 Pass 3: I/O read: 1MB, write: 0MB, rate: 37.86MB/s Pass 4: Checking reference counts Pass 4: Memory used: 70184k/44968k (27354k/42831k), time: 2.37/ 2.36/ 0.00 Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s Pass 5: Checking group summary information Pass 5: Memory used: 70184k/240k (64619k/5566k), time: 19.40/ 5.52/ 0.29 Pass 5: I/O read: 34MB, write: 0MB, rate: 1.75MB/s /dev/sdb1: 45600268/61054976 files (0.0% non-contiguous), 232657574/244190000 blocks Memory used: 70184k/240k (64889k/5296k), time: 303.54/136.48/34.65 I/O read: 14198MB, write: 1MB, rate: 46.77MB/s real 5m3.993s user 2m16.477s sys 0m35.041s Subject: Re: suspiciously good fsck times? Date: 2008-07-10 15:18:22 GMT On Thu, Jul 10, 2008 at 08:36:42AM -0400, Ric Wheeler wrote: > > Just to be mean, I have been trying to test the fsck speed of ext4 with > lots of small files. The test I ran uses fs_mark to fill a 1TB Seagate > drive with 45.6 million 20k files (distributed between 256 > subdirectories). > > Running on ext3, "fsck -f" takes about one hour. > > Running on ext4, with uninit_bg, the same fsck is finished in a bit over > 5 minutes - more than 10x faster. (Without uninit_bg, the fsck takes > about 10 minutes). > > Is this too good to be true? Below is the fsck run itself, the tree is > Ted's latest git tree and his 1.41 WIP tools, Wow. My guess is that flex_bg is making the difference. What we would want to compare is the I/O read statistics line: > I/O read: 14198MB, write: 1MB, rate: 46.77MB/s That's pretty good, and indicates we've avoided a *lot* of seeking. The e2fsck -t -t output for ext3 should show roughly the same mount of I/O read (with 20k files, there would be no advantage towards using extents), but the I/O rate is probably *much* lower, indicating a lot more seeking is going on. Can you send the full e2fsck -t -t output of the ext3 run? And what is the hdparm -t -t results of the disk? If I'm right, if you create the filesystem with mke2fs -t ext4dev -O ^flex_bg,^uninit_bg, you should see performance back to the old ext3 levels. - Ted P.S. We probably do want to examine the block allocation layout with flex_bg to make sure that the filesystem ages well in the long term. Subject: Re: suspiciously good fsck times? Date: 2008-07-10 15:49:51 GMT Theodore Tso wrote: > On Thu, Jul 10, 2008 at 08:36:42AM -0400, Ric Wheeler wrote: > >> Just to be mean, I have been trying to test the fsck speed of ext4 with >> lots of small files. The test I ran uses fs_mark to fill a 1TB Seagate >> drive with 45.6 million 20k files (distributed between 256 >> subdirectories). >> >> Running on ext3, "fsck -f" takes about one hour. >> >> Running on ext4, with uninit_bg, the same fsck is finished in a bit over >> 5 minutes - more than 10x faster. (Without uninit_bg, the fsck takes >> about 10 minutes). >> >> Is this too good to be true? Below is the fsck run itself, the tree is >> Ted's latest git tree and his 1.41 WIP tools, >> > > Wow. My guess is that flex_bg is making the difference. What we > would want to compare is the I/O read statistics line: > > >> I/O read: 14198MB, write: 1MB, rate: 46.77MB/s >> > > That's pretty good, and indicates we've avoided a *lot* of seeking. > The e2fsck -t -t output for ext3 should show roughly the same mount of > I/O read (with 20k files, there would be no advantage towards using > extents), but the I/O rate is probably *much* lower, indicating a lot > more seeking is going on. > We did run fsck through seekwatcher & saw a significant reduction in seeks/sec for ext4. Eric has the pretty pictures that he can share. > Can you send the full e2fsck -t -t output of the ext3 run? And what > is the hdparm -t -t results of the disk? > I didn't run the ext3 test with -t -t (but can refill and rerun, takes about 12 hours). This disk is a relatively new Seagate 1TB drive, specs at: http://www.seagate.com/ww/v/index.jsp?vgnextoid=0732f141e7f43110VgnVCM100000f5ee0a0aRCRD hdparm test: [root <at> localhost rwheeler]# /sbin/hdparm -t -t /dev/sdb /dev/sdb: Timing buffered disk reads: 186 MB in 3.03 seconds = 61.33 MB/sec > If I'm right, if you create the filesystem with mke2fs -t ext4dev -O > ^flex_bg,^uninit_bg, you should see performance back to the old ext3 > levels. > With uninit_bg off, it ran about 10 minutes, but it would be interesting to run without either. > - Ted > > P.S. We probably do want to examine the block allocation layout with > flex_bg to make sure that the filesystem ages well in the long term. > Testing aged file systems is always the holy grail - this workload is a fairly artificial one and was laid down with 4 threads currently writing to a shared subdirectory. ric Subject: Re: suspiciously good fsck times? Date: 2008-07-10 16:13:54 GMT On Thu, Jul 10, 2008 at 11:49:51AM -0400, Ric Wheeler wrote: > We did run fsck through seekwatcher & saw a significant reduction in > seeks/sec for ext4. Eric has the pretty pictures that he can share. Pictures are always fun! It would be great to see the comparison between ext3 and ext4 for fsck in this case. > [root <at> localhost rwheeler]# /sbin/hdparm -t -t /dev/sdb > > /dev/sdb: > Timing buffered disk reads: 186 MB in 3.03 seconds = 61.33 MB/sec > I meant hdparm -t -T, but that's ok, the 61.33 MB/sec is what I was curious about. So for this very artificial benchmark, fsck was using 2/3rd of the disk's full benchmark. Not bad.
From: Eric Sandeen <sandeen <at> redhat.com>
Subject: Re: suspiciously good fsck times? Date: 2008-07-10 16:14:28 GMT Ric Wheeler wrote: > Theodore Tso wrote: >> On Thu, Jul 10, 2008 at 08:36:42AM -0400, Ric Wheeler wrote: >> >>> Just to be mean, I have been trying to test the fsck speed of ext4 with >>> lots of small files. The test I ran uses fs_mark to fill a 1TB Seagate >>> drive with 45.6 million 20k files (distributed between 256 >>> subdirectories). >>> >>> Running on ext3, "fsck -f" takes about one hour. >>> >>> Running on ext4, with uninit_bg, the same fsck is finished in a bit over >>> 5 minutes - more than 10x faster. (Without uninit_bg, the fsck takes >>> about 10 minutes). >>> >>> Is this too good to be true? Below is the fsck run itself, the tree is >>> Ted's latest git tree and his 1.41 WIP tools, >>> >> Wow. My guess is that flex_bg is making the difference. What we >> would want to compare is the I/O read statistics line: I thought we actually had flex_bg off at least on the first run and it still looked good. (Ric just made the fs with mkfs.ext3 -j -I 256 -E test_fs initially I think) Val & I talked about this a little, and came to the conclusion that directory fragmentation might be a pretty big part of it. I did a similar workload on a much smaller fs, and the largest dir (~11MB) looked like this on ext3: BLOCKS: (0-4):3950592-3950596, (5):3950604, (6-7):3950606-3950607, (8):3950630, (9):3950871, (10-11):3950875-3950876, (IND):3950899, (12):3950900, (13):3950934, (14):3950937, (15-16):3950943-3950944, (17):3951390, (18):3951396, (19):3951402, (20):3951406, (21):3951408, (22):3951410, (23):3951581, (24):3951684, (25):3951985, (26):3952031, (27):3952156, (28):3952322, (29):3952418, (30):3952599, (31):3952626, (32):3954038, (33):3954693, (34):3954698, (35):3954874, (36):3955108, (37):3955708, (38):3955711, (39):3956034, (40):3956598, (41):3957173, (42):3957179, (43):3957622, (44):3957763, (45):3957824, (46):3957910, (47):3958190, (48):3958302, (49):3958488, (50):3958834, (51):3959173, (52):3959468, (53):3959842, (54):3959903, (55):3960029, (56):3960245, (57):3960446 ..... ad naseum ... (4032):4893557, (4033):4894194, (4034):4894719, (4035):4937580, (4036):4937887, (4037):4939087, (4038):4939233, (4039):4939502, (4040):4939508, (4041):4940473, (4042-4043):4940939-4940940, (4044):4941191, (4045):4941402, (4046-4048):4941409-4941411, (4049):4943061, (4050):4943307, (4051-4052):4943314-4943315 TOTAL: 4058 compared to ext4: BLOCKS: (0):1900544, (1-5070):1900546-1905615 TOTAL: 5071 > We did run fsck through seekwatcher & saw a significant reduction in > seeks/sec for ext4. Eric has the pretty pictures that he can share. sure do (AFAIK these were with neither flex_bg nor uninit_bg): http://people.redhat.com/esandeen/ext4/e4fsck-1T.png http://people.redhat.com/esandeen/ext4/e3fsck-1T.png http://people.redhat.com/esandeen/ext4/ext3-ext4-fsck-1T.png I'm still working out what's what. But that hockey-stick-shaped red line for ext4 is intriguing, I think it's very densely packed $SOMETHING that ext3 had to seek all over for, guessing it's the directories. Although that strikes me as an odd place for the root-level directories to land. I need to check, does ext3 use reservation windows for directories? Looks like maybe it should... :) -Eric Subject: Re: suspiciously good fsck times? Date: 2008-07-10 17:21:17 GMT
On Thu, Jul 10, 2008 at 11:14:28AM -0500, Eric Sandeen wrote:
> Val & I talked about this a little, and came to the conclusion that
> directory fragmentation might be a pretty big part of it.
Hmm, could be. Let's see. Ric said 46.5 million files, I don't know
how big the filenames were, but let's assume a directory entry size of
32, so that means if we assume perfect packing, 128 directory entries
per 4k block. Let's use 100 directory entries/blok just to make the
math easyer, so that's 465,000 blocks. If we assume a 10ms seek time,
and that the blocks are totally scattered, that's 4650 seconds, or
1.29 hours. So that's roughly within the ballpark that Ric measured.
- Ted
From: Eric Sandeen <sandeen <at> redhat.com>
Subject: Re: suspiciously good fsck times? Date: 2008-07-10 17:23:08 GMT Theodore Tso wrote: > On Thu, Jul 10, 2008 at 11:14:28AM -0500, Eric Sandeen wrote: >> Val & I talked about this a little, and came to the conclusion that >> directory fragmentation might be a pretty big part of it. > > Hmm, could be. Let's see. Ric said 46.5 million files, I don't know > how big the filenames were, but let's assume a directory entry size of > 32, so that means if we assume perfect packing, 128 directory entries > per 4k block. Let's use 100 directory entries/blok just to make the > math easyer, so that's 465,000 blocks. If we assume a 10ms seek time, > and that the blocks are totally scattered, that's 4650 seconds, or > 1.29 hours. So that's roughly within the ballpark that Ric measured. > > - Ted btw guys this thread is not on linux-ext4, it's going to linux-ext4-owner maybe someone who has them all can bounce to the list ;) -Eric |
|
|