Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: John Rose <john.r.rose-QHcLZuEGTsvQT0dZR+AlfA <at> public.gmane.org>
Subject: optional arguments for bootstrap methods
Newsgroups: gmane.comp.java.openjdk.mlvm.devel
Date: Friday 22nd October 2010 08:43:06 UTC (over 7 years ago)
Based partly on our discussions at the Summit about "live constants", and
also based on the likely requirements of Project Lambda, the JSR 292 EG is
likely to allow any single invokedynamic instruction to pass one or more
extra constant values into the bootstrap method invocation.

Here is the current thinking.  Language implementors, please tell us if we
are missing anything.

We call these "static arguments", in contrast to the normal "dynamic
arguments" that are received on every method call.  For invokedynamic, the
dynamic arguments are received as if by 'invokeExact' on the method handle
bound to the invokedynamic instruction instance, by the BSM.  The BSM
decides, once at link time, which method handle to choose based on the
static arguments.

There are three standard static arguments always passed to the BSM:
 1. an indication of the caller class (note: this is likely to change to a
MethodHandles.Lookup capability)
 2. a String naming the method apparently being called
 3. a MethodType indicating the dynamic arguments and return value types

The String and MethodType are extracted from the NameAndType constant at
the invokedynamic site.

The invokedynamic instruction points to a constant pool entry that looks
like this:

struct InvokeDynamic_info {
  u1 tag; // always CONSTANT_InvokeDynamic = 18
  u2 bsm_index;   // ref to CONSTANT_MethodHandle
  u2 descr_index; // ref to CONSTANT_NameAndType
  u2 argc;  // count of optional static arguments
  u2 argv[argc];  // refs to anything 'ldc' can refer to (int, long, float,
double, class, method handle, method type)
}

If we take this path, we will switch to the tag '18', to reduce confusion
when old and new class files are mixed.

The existing tag '17' for the no-extra-args format will drop out of use and
be illegal in JDK7 FCS.

Depending on the value of argc, the BSM will be invoked in one of three
ways:
  if (argc = 0)  binding = bsm.invokeGeneric(lookup, name, type);
  if (argc = 1)  binding = bsm.invokeGeneric(lookup, name, type, (Object)
argv[0]);
  if (argc > 1)  binding = bsm.invokeGeneric(lookup, name, type, (Object[])
argv);

Note that the BSM, since it is derived from a CONSTANT_MethodHandle, can
only be a "direct method handle", a pointer to a Java method.  It cannot be
adapted (e.g., as a spreader or collector).  But in user-visible code, it
would be reasonable to express a typical BSM as an overloaded method, whose
third overloading takes a varargs array:
  MethodHandle myBSM(MethodHandles.Lookup look, String name, MethodType
type);
  MethodHandle myBSM(MethodHandles.Lookup look, String name, MethodType
type, Object arg);
  MethodHandle myBSM(MethodHandles.Lookup look, String name, MethodType
type, Object... args);

It is natural to ask why we are using varargs, when we could just specify
that the extra static arguments could be passed positionally.  The simple
answer is positional arguments are of limited use, but a varargs array can
be used to encode very rich and useful BSM arguments.

Since very few Java methods take more than 10 parameters, allowing up to
255 extra arguments is not very interesting.  (Actually the limit would be
251 non-long non-double arguments, since there are three to start with,
plus the BSM itself.)  Writing a BSM which takes (say) 100 arguments would
be silly.  (Note that BSMs cannot be collectArguments adapters; they have
to be simple JVM methods or constructors.)  And a related one that takes 99
arguments would have to be a completely distinct method.  It is clear that
any large number of arguments has to be passed in an array.  So let's pass
them all in a trailing varargs parameter.

Will users want more than a couple of extra static arguments?  I think so. 
It will provide a way to bind interesting specifications directly into the
classfile, without cumbersome bytecode-based construction.  Examples:
 - a serialized AST structure, built from a mix of strings and method
handles, to be interpreted
 - complex application-defined constants, such as lists or sets
 - similarly, templates for partly-constant data structures (the
invokedynamic builds a factory method for the template)
 - vtables (i.e., maps of names to method handles)

All of these things can be created by executable bytecodes in , but
implementors will (in many cases) be able to create them more compactly
from series of constants.  For example, a list of integer values will
occupy 2+1+4 bytes per element if encoded as a sequence of static
arguments.  (The '2' is the argv element; the 1+4 is the CONSTANT_Integer.)
 Using  style bytecodes, the same element will require
(1+2+3+1)+1+4 bytes, where the parenthesized numbers stand for a sequence
of "aload buf; bipush J; ldc N; aastore".  (This sequence stores the
element into an object array, which is going to be passed to something like
Arrays.asList.)  The ratio is 7 to 11.  For integer values which repeat,
the ratio is closer to 2 to 6.

There is a limit to this technique, of course, since the constant pool has
only 65535 constants.  But this limit is shared with the  style
technique.

A key use case for one or two BSM arguments is closure construction for
Project Lambda.  Here, an extra static argument can specify a private
synthetic method which gives the body (code, not data) of the closure.  The
data parts are normal dynamic arguments.  The BSM produces a factory
function which (efficiently) binds the data values to the statically
specified closure body.  A second BSM argument might be the SAM type
intended for the closure.  (That could also be inferred from the
MethodType.)

Another key use case is an invokedynamic instruction that implements an
arbitrary live constant, by linking the call site (of zero arguments) to a
method handle which always returns the desired constant. 
(MethodHandles.constant will do this.)  The only missing bit is the
serialized data behind the live constant.  Again, allowing an essentially
unbounded array gives implementors the right degree (I think) of
flexibility.

If, instead of constants, we want templated values (think Groovy strings
like "hello, $name"), the statically determined structure of the value can
be expressed in static arguments to an invokedynamic, with the inserted
values ("$name") passed on the stack.  The BSM produces a factory function
which builds the desired result.  The BSM might use a templating engine to
partially evaluate the static structure, so that the dynamically changing
parts can be combined in at full speed.

(A useful thing missing here is substructure sharing:  What if two
invokedynamic instructions need almost the same static arguments?  This can
be dealt in user code, with via a static table created in  or a
similar method.  Shared values can be referred to by small integers
assigned by the language backend.  In essence, the components we are
proposing help language implementors to build better versions of constant
pools and vtables, with compactness and efficiency similar to the
corresponding native structures.)

In conclusion:  It is true that most use cases for BSM arguments will only
need one or two extra arguments.  But if we allow an array of strings,
integer, method handles, etc., with a reasonable length, suddenly our
language implementor friends have a flexible and natural way to use for
encoding the "serialized" version of their live constants.

So, let's not just one or two static arguments, and not a useless 251,
either, but rather a useful 65535.

(I'd go for a larger number, 2**31-1, but it would not mesh with the other
16-bit numbers in the class file format.  That's got to be fixed in a big
format revision, another day.)

-- John
 
CD: 2ms