Features Download
From: Jason Baron <jbaron <at> redhat.com>
Subject: Re: [RFC PATCH 0/6] jump label v3
Newsgroups: gmane.linux.kernel
Date: Thursday 19th November 2009 21:55:58 UTC (over 7 years ago)
On Wed, Nov 18, 2009 at 07:54:24PM -0800, Roland McGrath wrote:
> 2. optimal compiled hot path code
>    You and Richard have been working on this in gcc and we know the state
>    of it now.  When we get the cold labels feature done, it will be ideal
>    for -O(2?).  But people mostly use -Os and there no block reordering
>    gets done now (I think perhaps this even means likely/unlikely don't
>    really change which path is the straight line, just the source order
>    of the blocks still determines it).  So we hope for more incremental
>    improvements here, and maybe even really optimal code for -O2 soon.
>    But at least for -Os it may not be better than "unconditional jump
>    around" as the "straight line" path in the foreseeable future.  As
>    noted, that alone is still a nice savings over the status quo for the
>    disabled case.  (You gave an "average cycles saved" for this vs a load
>    and test, but do you have any comparisons of how those two compare to
>    no tracepoint at all?)

i've run that in the past, and for the nop + jump sequence its between
2 - 4 cycles on average vs. no tracepoint.

> 3. bookkeeping magic to find all the jumps to enable for a given
>    Here you have a working first draft, but it looks pretty clunky.
>    That strcmp just makes me gag.  For a first version that's still
>    pretty simple, I think it should be trivial to use a pointer
>    comparison there.  For tracepoints, it can be the address of the
>    struct tracepoint.  For the general case, it can be the address of
>    the global that would be flag variable in case of no asm goto support.
>    For more incremental improvements, we could cut down on running
>    through the entire table for every switch.  If there are many
>    different switches (as there are already for many different
>    tracepoints), then you really just want to run through the
>    insn-patch list for the particular switch when you toggle it.  
>    It's possible to group this all statically at link time, but all
>    the linker magic hacking required to get that to go is probably
>    more trouble than it's worth.  
>    A simple hack is to run through the big unsorted table at boot time
>    and turn it into a contiguous table for each switch.  Then
>    e.g. hang each table off the per-switch global variable by the same
>    name that in a no-asm-goto build would be the simple global flag.

that probably makes the most sense. Do a sort of the jump table and then
store an offset,length pair with each switch. I was thinking of this as
on optimization (the tracepoint code is already O(N) per switch toggle,
is N = total number of all tracepoint site locations, and not O(n), where
n = number of sites per tracepoint). Certainly, if this is a gating issue
this patchset, I can fix it now.

> Finally, for using this for general purposes unrelated to tracepoints,
> I envision something like:
> 	foo(int x, int y)
> 	{
> 		if (x > y && mostly_not(foobar))
> 			do_foobar(x - y);
> 	}
> 	... set_mostly_not(foobar, onoff);
> where it's:
> #define DECLARE_MOSTLY_NOT(name) ... __something_##name
> #define mostly_not(name) ({ int _doit = 0; __label__ _yes; \
> 			    JUMP_LABEL(name, _yes, __something_##name); \
> 			    if (0) _yes: __cold _doit = 1; \
> 			    unlikely (_doit); })
> I don't think we've tried to figure out how well this compiles yet.
> But it shows the sort of thing that we can do to expose this feature
> in a way that's simple and unrestrictive for kernel code to use casually.

cool. the assembly output would be interesting here...


CD: 2ms