Subject: Future plans for GC in LLVM
Date: Friday 5th December 2014 01:50:14 UTC (over 4 years ago)
Now that the statepoint changes have landed, I wanted to start a discussion about what's next for GC support in LLVM. I'm going to sketch out a strawman proposal, but I'm not set on any of this. I mostly just want to draw interested parties out of the woodwork. :) Overall Direction: In the short term, my intent is to preserve the functionality of the existing code, but migrate towards a position where the gcroot specific pieces are optional and well separated. I also plan to start updating the documentation to reflect a separation between the general support for garbage collection (function attributes, identifying references, load and store barrier lowering, generating stack maps) and the implementation choices (gcroot & it's lowering vs statepoints & addr spaces for identifying references). Longer term, I plan to *EVENTUALLY DELETE* the existing gcroot lowering code and in tree GCStrategies unless an interesting party speaks up. I have no problem with retaining some of the existing pieces for legacy support or helping users to migrate, but as of right now, I don't know of any such active users. The only exception to this might be the shadow stack GC. Eventually in this context is at least six months from now, but likely less than 18 months. Hopefully, that's vague enough. :) HELP - If anyone knows which Ocaml implementation and which Erlang implementation triggered the in tree GC strategies, please let me know! Near Term Changes: - Migrate ownership of GCStrategy objects from GCModuleInfo to LLVMContext. In theory, this looses the ability for two different Modules to have the same collector with different state, but I know of no use case for this. - Modify the primary Function::getGC/setGC interface to return a reference the GCStrategy object, not a string. I will provide a Function::setGCString and getGCString. - Extend the GCStrategy class to include a notion of which compilation strategy is being used. The two choices right now will be Legacy and Statepoint. (Longer term, this will likely become a more fine grained choice.) - Separate GCStategy and related pieces from the GCFunctionInfo/GCModuleInfo/GCMetadataPrinter lowering code. At first, this will simply mean clarifying documentation and rearranging code a bit. - Document/clarify the callbacks used to customize the lowering. Decide which of these make sense to preserve and document. (Lest anyone get the wrong idea, the above changes are intended to be minor cleanup. I'm not looking to do anything controversial yet.) Questions: - Is proving the ability to generate a custom binary stack map format a valuable feature? Adapting the new statepoint infrastructure to work with the existing GCMetadataPrinter classes wouldn't be particularly hard. - Are there any GCs out there that need gcroot's single stack slot per value implementation? By default, statepoints may generate a different stackmap for every safepoint in a function. - Is using gcroot and allocas to mark pointers as garbage collected references valuable? (As opposed to using address spaces on the SSA values themselves?) Long term, should we retain the gcroot marker intrinsics at all? Philip Appendix: The Current Implementations Key Classes: GCStrategy - Provides a configurable description of the collector. The strategy can also override parts of the default GC root lowering strategy. The concept of such a collector description is very valuable, but the current implementation could use some cleanup. In particular, the custom lowering hooks are a bit of a mess. GCMetadataPrinter - Provides a means to dump a custom binary format describing each functions safepoints. All safepoints in a function must share a single root Value to stack slot mapping. GCModuleInfo/GCFunctionInfo - These contain the metadata which is saved to enable GCMetadataPrinter.