Features Download
From: Chandler Carruth <chandlerc <at> google.com>
Subject: Re: Upstreaming PNaCl's IR simplification passes
Newsgroups: gmane.comp.compilers.llvm.devel
Date: Tuesday 4th March 2014 23:17:52 UTC (over 3 years ago)
On Tue, Mar 4, 2014 at 1:04 PM, Mark Seaborn  wrote:

> The PNaCl project has implemented various IR simplification passes that
> simplify LLVM IR by lowering complex features to simpler features.  We'd
> like to upstream some of these IR passes to LLVM.  We'd like to explore
> this acceptable, and if so, how we should go about doing this.

My question is somewhat different. I'm not questioning whether these are
acceptable, I'm questioning why these are interesting and important for the
LLVM project.

Neither PNaCl nor Emscripten open source projects have extensive developer
overlap with the LLVM community, and the developers have not (so far)
become super active maintainers of LLVM, although your recent patches to
fix some bugs uncovered by PNaCl have been much appreciated. These lowering
passes are likely to have few (most likely, zero) in-tree users for the
foreseeable future. I'm not enthusiastic about the community taking on the
maintenance, update, and code review burden of these.

I would point you at the several emails I have written to folks adding new
significant features to LLVM about how to offset this by contributing
maintenance and improvements to the core infrastructure, fixing bugs and
generally making things better sufficient to offset the ongoing complexity
cost of the new features. Fortunately, the PNaCl passes seem somewhat less
complex than (for instance) the x32 backend, but they seem likely to still
add a reasonable amount of complexity. They will certainly be challenging
to review and get the design into an acceptable state across the community.
At this point, I'm not really optimistic about there being a large enough
body of community members excited about getting these passes in to offset
these costs. I'm happy to be proven wrong of course, and would also be
happy to see you, other PNaCl developers, or Emscripten developers become
more active in the community in order to build this trust and establish a
good basis for these to go into LLVM.

> The immediate reason is that Emscripten is reusing PNaCl's IR passes for
> its new "fastcomp" backend [1].  It would be really useful if PNaCl and
> Emscripten could collaborate via upstream LLVM rather than a branch.

While this does seem like a useful thing for your two projects, it isn't
clear why this benefits the LLVM community. Perhaps it does, but I'd like
to see that clarified.

> Some background:  There are two related use cases for these IR
> simplification passes:
>  1) Simplifying the task of writing a new LLVM backend.  This is
> Emscripten's use case.  The IR simplification passes reduce the number of
> cases a backend has to handle, so they would be useful for anyone else
> creating a new backend.

If these simplify writing a backend, why wouldn't the patches include
commensurate simplifications to LLVM's backends? That would both give them
an in-tree customer, and more immediate value to the community and project
as a whole.

>  2) Using a subset of LLVM IR as a stable distribution format for
> executables.  This is PNaCl's use case.  PNaCl's IR subset omits various
> complex IR features, which we lower using the IR simplification passes
>  Renderscript is an example of another project that uses IR as a stable
> distribution format, though I think currently Renderscript is not
> subsetting IR much.

Given that the bitcode is stable, I don't understand why this is important.
What technical problems are you solving other than making the IR match some
predetermined form chosen by PNaCl?

> Some examples of PNaCl's IR simplification passes are:

I have a bunch of questions about the specific passes you mention. Perhaps
these questions are better answered in the review thread for the patches,
but they are at least things that I would think about and try to address if
and when you send out the code review.

>  * Calling conventions lowering:  ExpandVarArgs and ExpandByVal lower
> varargs and by-value argument passing respectively.  They would be useful
> for any backend that doesn't want to implement varargs or by-value
> conventions.

Why wouldn't these be applicable to existing backends? What is hard about
the existing representations?

>  * Instruction-level lowering:
>     * ExpandStructRegs splits up struct values into scalars, removing the
> "insertvalue" and "extractvalue" instructions.

There are already passes that do this outside of function arguments and
return values. Why is a new one needed? How do you handle the
overflow-detecting operations?

>     * PromoteIntegers legalizes integer types (e.g. i30 is converted to
> i32).

Does it split up too-wide integers? Do we really want another integer
legalization framework in LLVM? I am actually interested in doing (partial)
legalization in the IR during lowering (codegenprep time) in order to
simplify the backend, but I don't think we should develop such a framework
independently of the legalization currently used in the backends.

>  * Module-level lowering:  This implements, at the IR level,
> that is traditionally provided by "ld".  e.g. ExpandCtors lowers
> llvm.global_ctors to the __init_array_start and __init_array_end symbols
> that are used by C libraries at startup.

This doesn't make any sense to me. The IR representation is strictly
simpler. It is trivially lowered in a backend. I don't understand what this
would benefit.

> There seems to be plenty of precedent for IR-to-IR lowering passes --
> already contains passes such as LowerInvoke, LowerSwitch and LowerAtomic.

Note that these are quite different -- they lower from a front-end
convenient form toward the canonical IR form. You are talking about
something totally different that deals with target-oriented lowering. The
correct place to look for analogies is CodeGenPrep.

> The PNaCl team (which I'm a member of) is happy to take on the work of
> maintaining this code, such as updating it as LLVM IR evolves and doing
> code reviews.  We would upstream this gradually, pass by pass, so the
> changes would be manageable.

While this is appreciated, the PNaCl team should work to much more actively
contribute to the core of LLVM if it wants to be trusted to maintain this

All of that said, while I have a lot of concerns, I do want to clarify
something: I actually think that this is the correct fundamental direction
for LLVM. I *want* to see PNaCl and Emscripten both be significantly more
involved in the community, and I think that using lowering to simplify
backends is a Very Good Thing. However, I think that unless there is a
significant consensus amongst the active LLVM developers that they are OK
accepting and maintaining these patches (currently, I'm not), I think that
the community engagement needs to happen first.

CD: 4ms