Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: John McCall via swift-dev <swift-dev <at> swift.org>
Subject: A type-checking performance case study
Newsgroups: gmane.comp.lang.swift.devel
Date: Friday 8th April 2016 18:47:19 UTC (6 months ago)
The standard library team asked the compiler team to take a look at the
performance of the type-checker.  Their observation was that several
in-progress additions to the standard library were causing a
disproportionate increase in the time it required to compile a
simple-seeming expression, and they were concerned that this was going to
block library progress.  This is the resulting analysis.

Thanks go to Doug Gregor for being a sounding board for a lot of ideas that
came up during this investigation.

1. Introduction

In general, it is expected that changes to the source code can cause
non-linear increases in type-checking time.  For one, all type systems with
some form of generic type propagation have the ability to create
exponentially-large types that will generally require exponential work to
manipulate.  But Swift also introduces several forms of disjunctive
constraint, most notably overload resolution, and solving these can require
combinatorial explosions in type-checker work.  Nonetheless, for any
particular problem, there are usually things we can do to avoid or limit
that combinatorial explosion, and so it is always reasonable to take a
look.

Some background for the rest of this discussion.  The current semantics for
name lookup, member or otherwise, are that all possible matches are dropped
into an overload set with no explicit ranking between them.  If solutions
can be found with multiple overloads, they are disambiguated (hopefully)
using the ambiguity resolution rules.  Some aspects of the ambiguity
resolution rules do apply to declarations in the abstract; for example,
there is a rule that prefers a solution that picks a function with a
strictly-more-specialized signature.  However, in general, such rules
cannot be applied immediately during type-checking.  This is because of the
"score" system in which solutions are ranked according to the kind and
number of various undesired implicit conversions they perform: for example,
it is possible that a solution using a more-specialized function will end
up requiring more undesired conversions and thus rank worse.  Therefore,
the type-checker must find all possible solutions and only then rank them,
which often prohibits short-circuits of the combinatorial explosion.

2. The problem

Okay, concrete stuff.  The expression in question is:

  (0..<5).lazy.map { String($0) }

The types of '0' and '5' are not immediately constrained; in fact, it will
turn out that they can only be constrained by assigning them their default
literal type.  Many expressions that are expensive to type-check feature
this property, because the assignment of default types must occur very late
in type-checking, as a matter of last resort.

'..<' has two overloads, one requiring the value type to be Comparable and
a more specialized one which also requires it to be Strideable.  They yield
different types.

'lazy' is a member name and therefore can only be resolved when the base
type is known.  As it happens, in both cases the overload set will end up
the same: a small number of overloads from various protocol extensions. 
The proposed library change adds several of these, greatly exacerbating the
explosion.  All of these definitions yield different types.

'map' is a member name and therefore can only be resolved when the base
type is known.  It has a small number of overloads in different types and
protocol extensions.

'String.init' has a huge number of overloads, somewhere around fifty.

3. Current type-checker behavior

The type-checker ends up taking the right basic approach for solving this:
it experimentally picks one of the overloads for '..<', then an overload
for '.lazy', then an overload for '.map', then an overload for
'String.init'.

As the constraint system is progressively solved, it will turn out that the
value type of the integer literals will propagate all the way down to be
the type of '$0' and hence of the argument to String.  This sort of
propagation could, in general, require the literals to have a specific
type.  In this case, since 'String(x)' will take basically anything, the
literals are not constrained, and we will end up defaulting their types.  I
mention this only to underline why the type-checker cannot apply this
defaulting quickly.

Because the literals end up needing to be defaulted, and because there are
two possible default types for integer literals (we will use 'Double' if
'Int' does not work), the basic combinatoric formula for this constraint
system is something like 2 x 4 x 4 x 50 x 2 x 2.  (I don't remember the
actual numbers of overloads.)  This is why type-checking is slow.

4. Possible solutions

As a patch to the problem, I recognized that many of those 50 overloads of
'String.init' were not viable due to argument-label mismatches, but were
only being recognized as such in the innermost loop.  (The rule here is
that you cannot pass a tuple value and have it automatically "explode" to
match the parameter list; the syntactic call arguments must structurally
match the parameters.  This was an accepted proposal with corresponding
compiler implementation from Februrary.  Unfortunately, it turns out that
our enforcement of this is inconsistent.)  It was simple enough to avoid
this by marking the overloads as non-viable upon lookup; it even turns out
that we already has this heuristic, and it just hadn't kept pace with the
language change.  So this limits the explosion in this case.

There are various other type-checking heuristics that might also have
helped with this.  For example, the overload sets of 'lazy' are the same in
both cases; this is something that could theoretically be observed without
expanding the overload set of '..<', and in some cases this might allow
some downstream type-checking to proceed outside of any contingencies.  I
don't know that it would have significantly helped in this case, however.

Anything beyond that will require language changes.

Note that there have been proposals about restricting various kinds of
implicit conversions.  Those proposals are not pertinent here; the implicit
conversion are not a significant contributor to this particular problem. 
The real problem is overloading.

One very drastic solution to this problem would be to eliminate
non-labelled overloads.  (A perhaps more-reasonable way to describe this is
to say that argument labels are a part of the name, and therefore two
declarations with different labels are not overloads.)  That is, given the
syntactic form of an expression, we would always be able to find a single
function that it could invoke.  The current argument-labelling design of
the language does make this more palatable than it was in earlier
iterations.  However, I don't think it's palatable enough: this change
would be very limiting, and in particular I think we are very unlikely to
accept it for operators or single-argument initializers.

A less drastic solution suggests itself when we examine the nature of these
overloads.  '..<', 'lazy', and 'map' all obey an implicit hierarchy of
overloads, each imposing stricter constraints than the last.  Furthermore,
specifically for 'lazy' and 'map', this hierarchy actually follows the
explicit hierarchy of types and the protocols they conform to.  Perhaps we
could use this to declare that certain declarations unconditionally hide
declarations with the same name from "base" types: for example, a
declaration from a concrete type would always hide a declaration from a
protocol (or an extension thereof) that the type declares conformance to,
and a declaration from a protocol would hide a declaration from a protocol
it is derived from.  (This rule could still be implemented in the presence
of conditional conformances, but it would be more difficult to apply
"statically" during type-checking; we might have to compute a full solution
before we could recognize that an overload was actually hidden by a
conditional conformance.)  In this case, I believe this would have left us
with only a single overload to consider for both 'lazy' and 'map', greatly
reducing the combinatorial space of solutions.  Of course, it's also
possible that this would remove viable solutions.

5. Actions and recommendations

I've landed the viability-limiting patch as
563057ca980fb9a8fce899184761a812164dc9d2.  Hopefully it sticks.

I strongly recommend that we pursue declaration hiding as a language
change.  Even independent of type-checker performance, this seems quite
defensible as a language direction, since it makes it much easier to reason
about overloading in the presence of protocol extensions and so on. 
However, there are still a lot of major open questions with the design.

John.
_______________________________________________
swift-dev mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-dev
 
CD: 2ms