Features Download
From: Philip Reames <listmail <at> philipreames.com>
Subject: Re: [PATCH] Move ownership of GCStrategy objects to LLVMContext
Newsgroups: gmane.comp.compilers.llvm.devel
Date: Friday 16th January 2015 22:56:13 UTC (over 3 years ago)
+ llvmdev in the hopes that a broader audience sees this and chimes in.

At this point, it's clear Chandler and I disagree on the right approach 
going forward.  What do others think?

There are really two independent questions here.  I've tried to separate 
them and provide appropriate context for the discussion so far.

Chandler, if I managed to misrepresent your positions at all, sorry!, 
and please correct me.

On 01/15/2015 05:41 PM, Chandler Carruth wrote:

*ISSUE #1* - Should GC specific properties be checked in the IR Verifier?
> On Thu, Jan 15, 2015 at 4:58 PM, Philip Reames 
> > wrote:
>>     So, I'm not sold on this being in the verifier at all. But that
>>     seems quite likely a separate discussion.
>     Why?  Feel free to forward this to llvm-dev depending on the
>     answer and it's length.
>     Being able to state things like: "statepoints in functions with GC
>     X only relocate values in an address space X expects" seems to be
>     entirely reasonable and desirable to me.
> Absolutely, but I don't think it belongs in the IR verifier. The 
> verifier needs to be extremely light weight and I don't think it makes 
> sense to check these kinds of constraints there. The contract of the 
> verifier is that it validates those properties which you should feel 
> free to assert() on when writing other passes. I don't think we should 
> add such properties lightly.
> But of course it makes sense to check these properties. One way is 
> through the verifyAnalysis functionality in the old pass manager (and 
> there will be some largely-automated analog in the new one). This 
> allows an analysis to check that its analysis specific invariants 
> continue to hold. Another strategy is the (somewhat poorly named) Lint 
> pass. The idea is that for some invariants it may make more sense to 
> add a custom pass which does nothing other than check and diagnose 
> violations of an invariant.
> If checking this invariant in other ways were really hard to do, then 
> maybe it would make sense in the verifier. But I'm not seeing anything 
> like that at this point.
I believe it does make sense to check *some* GC related properties in 
the Verifier.  My position is that the GC attribute essentially imposes 
an additional set of restrictions on what it means to have valid IR.  
Specifically, we should be able to check cheap, local facts about the IR 
such as:
- A GC which uses gc.statepoints does not also use gc.roots and vice versa.
- A gc.statepoint/gc.relocate sequence only relocates pointers with the 
"right" address space.  (This applies only to GCs which use address 
spaces to distinguish pointers.)
- If a derived pointer is relocated, it's base must also be available in 
the gc.statepoint.  (This applies only to relocating GCs.)  Otherwise, 
the base pointer (must?, may?) be (null?, undef?).
- There should be no gc.relocates in IR used by a non-relocating GC.
- A gc.statepoint whose relocations haven't been made explicit yet - 
indicated by a possible future flag on gc.statepoint - isn't legal for a 
GC which doesn't use late rewriting.  (See http://reviews.llvm.org/D6975 
for context.)

(The previous list is meant to be illustrated examples.  Only some of 
them actually exist so far, and I make no commitment to ever implement 
the others since they involve infrastructure and design work which 
hasn't happened yet.)

To use your previously stated criteria, every one of the above is (or at 
least should/would be) an assertion in the lowering code or in some pass.

I do *not* believe that the Verifier is the right place to check more 
complicated GC properties.  For example, trying to check that a 
gc.statepoint relocates all pointer values which might be live over the 
gc.statepoint is an expensive validation check which does *not* belong 
in the Verifier.  I have no opinion on what mechanism might be best to 
represent such checks (yet.)  Both the verifyAnalysis, and Lint 
mechanisms might be reasonable choices.

There's also a gray area in between that I'm really not sure about. For 
example, the late safepoint placement/rewriting stuff relies on there 
not being new addrspacecast instructions inserted by the optimizer.  
(And none in the original IR.)  It's unclear whether having that as a GC 
specific check in the Verifier is the right approach or not.  I think we 
can address these case by case as they come up.

*ISSUE #2 *- What should the access model for GCStrategy be? Is it part 
of the IR?  Or should it be treated like an Analysis result?

>     "If so, have you looked at possibly making this a function
>     analysis that has a cache of these which it populates exactly the
>     same way you populate the context? That should still allow you to
>     use essentially the same lookup path here.
>     The reason I suggest this approach is that it helps simplify the
>     core IR a bit by letting the IR just deal in an opaque attribute
>     and relying on an analysis to reason about it. I think the code
>     would be almost identical to what you have here just shuffled
>     around a bit. I'd be happy with that either as the first version
>     or with that happening as the immediate follow-up refactoring
>     after this patch. " -- previous comment by Chandler
>     "But I think it would be better to not have this in the Function
>     API, and instead be an analysis over the function." -- previous
>     comment by Chandler
>     Response by Philip follows:
>     The analogous thing would be a GCStrategyCache and a wrapping
>     GCStrategyCacheWrapper pass.  I have no objection to doing this if
>     that's what the community would prefer, but I believe this is
>     undesirable.  My reasoning is:
>     1) There is no possibility of invalidation.  A GCStrategy is
>     immutable and can only be changed at compile time (of LLVM itself).
> There are actually quite a few things that have no invalidation 
> criteria but I think make sense being exposed via the pass machinery. 
> TargetLibraryInfo is an example I've just been touching.
>     2) If we have multiple instances of GCStrategyCache, we have
>     multiple copies of each GCStrategy floating around with different
>     lifetimes.  This seems potentially confusing at best. 
>     (Particularly for any out of tree users... I'd really like the
>     lifetime model to be simple.)
> I don't think this is necessary. You could easily have these be strict 
> lazily created singletons.
>     3) We're adding an IMHO utterly unnecessary abstraction which only
>     increases the complexity of an already fairly complex mechanism. 
>     I see no benefit here. 
> I think there is a significant benefit to separating the concerns of 
> GC strategy from the concerns of the core IR. I also clearly don't 
> think the complexity is going to be that large.
Chandler - Thinking about this overnight, I've realized I don't actually 
have much of an objection here.  I still think the direction you're 
pushing is the wrong one, but it doesn't actually matter that much to my 
overall goals.  Having said that, let me try to convince you why the 
current (as of change 226311 which started this thread) is better.

(For the record, I will respect whatever the community decides and 
migrate the code in that direction.  I submitted the change in question 
while we debate - with Chandler's permission - while we debate, but I 
will update as needed.)

My position comes down to two key points: the GCStrategy associated with 
a given Function is fundamental part of the IR definition, and the 
complexity we're talking about for the core IR is a single class with a 
set of boolean flags* - i.e. pretty minor.

* Let's ignore the performCustomLowering mechanism.  I'd like to factor 
that out, or get rid of it, anyway.

The properties of the garbage collector that a given piece of code is 
being compiled for is a fundamental part of the IR definition. There is 
no "safe default" to use.  If information about the GC is not available, 
the code will be likely be miscompiled.  As a result, a missing 
GCStrategy (i.e. one whose string key we don't recognize) is, and should 
be, a fatal error.  (Chandler's proposal wouldn't change this last point.)

The analogy I'd make would be to data layout information.  The 
DataLayout effects the semantics of the IR, and is thus considered "part 
of" the Module, rather than something "computed from" a Module.  
Similarly, if you're given a gc.statepoint for a relocating GC, it's not 
legal to treat it like one for a non-relocating GC.  In particular, 
there are legal optimizations for the later which are illegal for the 
former. The targeted GC is semantically "part of" the IR.

Of course, you could also analogize this to information about the 
sub-target a particular module is targeting.  We clearly don't treat 
sub-targets as "part of" the Module.  So honestly, it could go either way.

I had previously made an argument based on object lifetime.  After 
further reflection, having GCStrategy be a immutable object is fine by 
me.  It does imply we need to get rid of the performCustomLowering 
mechanism, but that's something I'd wanted to do anyway.  Note that the 
in tree ShadowStackGC subclass of GCStrategy is not an immutable object 
today.  There may also be out of tree subclasses with assumptions about 
mutability and lifetime baked in.  This is likely the case - if there 
are actually any users out there other than the ones who've already 
spoken up - since the ShadowStackGC was the logic starting place for a 
custom GC strategy implementation.

CD: 10ms