Features Download
From: Chandler Carruth <chandlerc <at> gmail.com>
Subject: [RFC] BlockFrequency is the wrong metric; we need a new one
Newsgroups: gmane.comp.compilers.llvm.devel
Date: Sunday 2nd February 2014 10:13:05 UTC (over 3 years ago)
Right now, all profile information is funneled through two analysis passes
prior to any part of the optimizer using it.

First, we have BranchProbabilityInfo, which provides a simple interface to
the simplest form of profile information: local and relative branch
probabilities. These merely express the likelihood of taking one of a
mutually exclusive set of exit paths from a basic block. They are very
simple, and the foundation of the profile information. Even the other
analysis is merely built on top of this one.

Second we have BlockFrequencyInfo which attempts to provide a more "global"
(function-wide, not actually program wide) view of the statistical
frequency with which any particular basic block is executed. This is nicely
principled analysis that just computes the probabilistic flow of control
through the various branches according to their probabilities established
in the first analysis.

However, I think that BlockFrequencyInfo provides the wrong set of
information. There is one critical reason why. Let's take a totally
uninteresting CFG:

A -> B1, B2
B1 -> C1, C2
B2 -> C3, C4
C1 -> D1, D2
C2 -> ret
C3 -> D3, D4
C4 -> ret
D1 -> E1, E2
D2 -> ret
D3 -> E3, E4
D4 -> ret

You can imagine this repeating on for as many levels as you like. This
isn't an uncommon situation with real code. BlockFrequencyInfo computes for
this a very logical answer:

---- Block Freqs ----
 a = 1.0
  a -> b1 = 0.5
  a -> b2 = 0.5
 b1 = 0.5
  b1 -> c1 = 0.25
  b1 -> c2 = 0.25
 b2 = 0.5
  b2 -> c3 = 0.25
  b2 -> c4 = 0.25
 c1 = 0.25
  c1 -> d1 = 0.125
  c1 -> d2 = 0.125
 c2 = 0.25
 c3 = 0.25
  c3 -> d3 = 0.125
  c3 -> d4 = 0.125
 c4 = 0.25
 d1 = 0.125
  d1 -> e1 = 0.0625
  d1 -> e2 = 0.0625
 d2 = 0.125
 d3 = 0.125
  d3 -> e3 = 0.0625
  d3 -> e4 = 0.0625
 d4 = 0.125

One way of thinking about this is that for any basic block X which is
predicated by N branches of unbiased probability (50/50), the frequency
computed for that block is 2^(-N).

The problem is that this doesn't represent anything close to reality.
Processors' branch prediction works precisely because very, *very* few
branches in programs are 50/50. Most programs do not systematically explore
breadth first the full diversity of paths through the program. And yet, in
the absense of better information our heuristics would lead us to believe
(and act!) as though this were true.

Now, I'm not saying that the computation of block frequencies is wrong.
Merely that it cannot possibly be used for at least one of it purposes --
it's relative frequency (to the entry block "basis" frequency) is
completely useless for detecting hot or cold regions of a function -- it
will simply claim that all regions of the function are cold. What it *is*
useful for is establishing a total ordering over the basic blocks of a
function. So it works well for some things like code layout, but is grossly
misleading for others.

There are several possible solutions here. I'll outline my proposal as well
as some other ideas.

BlockWeights instead of BlockFrequencies. My idea is that we don't really
care about the depth of the control dependence for a particular basic
block. We care about the accumulated *bias* toward or away from a basic
block. This is predicated on the idea that branches are overwhelmingly
predictable. As a consequence, evenly distributed probabilities are really
just *uncertainties*. My proposed way of implementing this would take the
exact algorithm already used in BlockFrequency, but instead of computing a
frequency for an edge based on the probability of the branch times the
frequency of the predecessor, instead compute it as the frequency of the
predecessor times how "biased" the branch is relative to other branches in
the predecessor. Essentially, this would make a branch probability set of
{0.5, 0.5} produce edge frequencies equal to the predecessor's edge

The result of such a system would produce weights for every block in the
above CFG as '1.0', or equivalent to the entry block weight. This to me is
a really useful metric -- it indicates that no block in the CFG is really
more or less likely than any other. Only *biases* in a specific direction
would cause the block weights to go up or down significantly. This would
immediately make the analysis useful to consumers such as the vectorizer,
unroller, or when we have the capability, the inliner and an outliner which
respect cold regions of functions.

One alternative proposed by Nick was using the average of block frequencies
across a domtree as the denominator to which a particular block's frequency
is compared. I haven't really thought as much about this because it seems
quite expensive to compute and is less intuitive to me, but I wanted to
mention it. Nick might be able to provide an explanation of it that would
help folks.

Related to the idea of using the domtree, I can think of many layers on top
of the existing block frequencies that could be used to accurately do "cold
region" detection, but they would all be significantly more analysis on top
of the existing system and so are somewhat less appealing to me.

Perhaps others have still more ideas? Also comments or suggestions on mine
are very welcome.

I have a really simple patch that implements my suggestion using the
existing frequency infrastructure (as it is almost exactly the same). I
would also want to rename everything to avoid confusion as these would no
longer be "frequencies" in any strict sense. I can provide test data as
needed on the result of this change if people like the direction.

CD: 4ms