Subject: Scala Modularisation - Requirements
Date: Saturday 5th September 2009 12:10:30 UTC (over 8 years ago)
In order to start on Modularising Scala, we need to define what the key goal and requirements are. Some of these have already been suggested so this is somewhat of a summary of those points and to allow others to suggest ideas I've missed out. Hopefully this can then result in a SID, but I want to try and avoid a filling-out-the-boxes of the SID itself until we've got a good idea of what is important. Goal ==== The goal of modularising Scala is to split up the scala-library in such a way that individual units of functionality are encapsulated in their own modules, such that adding/upgrading modules can be done on a module-by-module basis. In order to know about upgrades, each module will need to be versioned. Modules also need to express (versioned) dependencies on each other. The end goal will be to have multiple modules spawned off from the current scala-library (such as -actors, -testing). Running scala programs (with the scala shell script) should work in exactly the same way as before, but with more JARs on the classpath. Out of scope ------------ A modular library should subsequently allow for a Maven/CPAN/SBaz like distribution system. Whilst modularisation should enable the distribution of modules, this isn't in scope for the immediate task of modularisation. However, it should be noted that modularisation would have to be achieved before this kind of repository could be enhanced in any case. This is also not intending to develop competing modular systems to those already in use; in fact, wherever possible, the modularisation should focus on the logical inter-dependencies and the way in which the modules are broken up, such that it should be possible to adopt different module systems (OSGi, SimpleModuleSystem etc.). As with the distribution system, modularisation may then subsequently result in the ability of Scala to expose module information via a 'module' object, similar to 2.8's package object, but this won't be considered at this stage. Requirements ============ * It must continue to be possible to run a Scala program using the existing shell/batch scripts. Regardless of what module system(s) are supported, it should be possible to run a Scala program by concatenating modules/JARs on the command line. * Whilst Scala should be module-system agnostic, for practical purposes they should support OSGi. This will enable the ongoing development of the Scala IDE in Eclipse, and builds on the OSGi support already in Scala and related libraries. However, the modules should not be dependent on OSGi save for enabling OSGi-specific functionality; in other words, they should run just as well outside as inside an OSGi container. * Scala module dependencies should form a directed acyclic graph. In other words, circular dependencies should not exist (since not all module systems may be able to handle this). For most code, this shouldn't be an issue; however, it may be a specific issue in the case of scala.Predef (e.g. Predef.scala: val $scope = scala.xml.TopScope). In this case, if refactoring cannot be done to achieve the end goal, it might be better to merge modules than to introduce cycles. * Each module should be versioned. A suggestion is to use the (current) OSGi semantics of major.minor.micro.qualifier; since we're adopting OSGi for the IDE support, this may be a natural fit. In the spirit of announcing backward compatibility, it is suggested that the major be bumped whenever an incompatible change occurs; in the case of the compiler emitting incompatible bytecode, the major version of the core should be bumped. Thus code that depends on (say) scala-core-1.0 could automatically show a breakage against scala-core-2.0. (The name scala- core here refers to the key data structures, like Predef, Tuple etc.; however, this name is used solely for illustration and is not a requirement). * Module dependencies should be versioned. In order to ensure modules are compatible, we should define module's dependencies on a particular version (or version range). This can be achieved through build processes that depend on explicit versions (or version range). * Module JARs should be named with a version number. In order to distinguish between multiple versions of a module, the JAR should include (components of the) version number. This will facilitate entries in e.g. Class-Path dependencies as well as allowing an at-a-glance to determine what version numbers are present. Various build systems generate this automatically (e.g. Maven) and the 'scala' shells include lib/* anyway. Note that whilst the Scala IDE (or other development environments) may be affected if they use the JAR name explicitly, modularisation is in any case going to affect this. (e.g if scala-library.jar is replaced with scala-core.jar, scala-actors.jar etc. then build paths will need to be changed anyway). * An automated build process should be able to build Scala. An automated build/test system is needed; modularisation may have an impact on how the build system works, but it must be possible to generate an entire build from source (with a bootstrap compiler if needed). * The build process should allow modules to be built independently. It should be possible, for development of a dependent module, to build that dependent module independently of its dependencies. This would allow work to be built on scala-actors whilst consuming a pre-built scala-core module. * One-to-one relationship between module and SCM location. Practically, it makes sense for each module to correspond to a single SCM location. This is likely to result in needing to relocate existing scala source files which will need to be synchronized by the core Scala team. (It is probably going to need a proof-of-concept split based on some arbitrary version of HEAD, along with the per-module build information prior to the SID; if agreed, then the HEAD can be re- snapped and put into place.) * One-to-one relationship between module and IDE project. Each module should be able to be checked out and developed in an IDE. This includes project metadata and module metadata. It may be desirable to hook in other tools to post-process manually/ automatically generated metadata, but this should be persisted in the source control system as well. * One-to-one relationship between module and Scala packages. For consistency, each module should correspond to a separate Scala package. Although some module systems don't demand this, avoiding such 'split packages' are likely to result in an easier to build modules. At least for the initial cut of scala's modules, we should try and encourage this. * Limit scope of 'open' modules. Some code generally needs unfettered access to classes, such as remote proxies. In order to achieve this in some module systems, a generic 'import *' is needed to make any class available to the classloader. Some scala code (scala.actors.remote) may need such 'open' modules but the danger of making any module open is that unnecessary (or circular) references can be introduced. Thus, where possible, open modules should be avoided and if such modules are needed, they are limited to the smallest scope possible (e.g. scala-actors may be closed whilst scala-actors-remote may be open). * Run as self-hosted projects within Eclipse PDE. In order to facilitate on-going development of the Scala IDE, it must be necessary to be able to self-host Scala IDE within an Eclipse PDE session. It's worth noting that if the Scala modules are also themselves OSGi bundles, then the requirement of being able to edit modules independently might be satisfied by installing some of the modules into the target workspace (as binaries) whilst checking out some modules as source projects. The OSGi binding should then pick up the dependencies regardless of whether the project is in the (source) workspace or the (binary) target platform. Other issues ============ * Module documentation. Does it make sense to make module documentation (say, generated by 'scaladoc') to be made available on a per-module basis or weaved together into a top-level documentation site? Perhaps each module could generate its own documentation and this could be merged (c.f. Maven's 'site' document generation process). * Scala IDE projects and PDE nature. Since modularisation of Scala is going to facilitate the development of the IDE related components, does it follow that the Scala IDE should be based on not only Eclipse JDT but also Eclipse PDE? This would allow Scala projects to be developed as first-class bundles, which as noted elsewhere, are implicitly JARs in any case. Regardless of this, it should be possible to build Scala outside of an IDE environment (hence the automated build requirement above). Having all Scala projects implicitly OSGi bundles will result in the ability for others to generate Scala modules fairly trivially. We may wish to enable this with a 'toggle', so as to allow downstream developers to build a Scala/OSGi bundle (with PDE) with some choosing to just use JDT dependencies. * Version numbering and Scala releases. There doesn't necessarily have to be synchronisation between individual modules and the 'release version' of Scala. However, going forward, it might make sense to have a core library (which includes the scala package) and compiler (which may have tight dependencies on the core library) correspond to the 'release version' of Scala. So, Scala 2.9 may ship with scala-core-2.9.jar and scala-compiler-2.9.jar, but other libraries (actors, testing) might not change version number. This is meant to start as a discussion point to result in suggestions to the Scala release team. (Note that if the major/minor/micro is adopted, and changes in major indicate breakages, and modules depend on versioned ranges, then (say) scala-actors-2.8 could define a dependency on scala-core >= 2.8 and < 3.0 - so any further 2.x stream would be OK but if an incompatible class change occurred, then as long as the scala-core was bumped up to 3.0, the scala-actors package would need to be updated to take into account of the dependency change.) * Candidate modules. From the dependencies in the source packages already, it seems that there's a natural split amongst packages as follows: - scala.actors - scala.actors.remote (? may be some circular dependencies between this and scala.actors) - scala.concurrency - scala.xml (? depedency of 'scope' in Predef; can this be refactored/ removed?) - scala.testing There may be other dependencies which can be refactored at a later stage, but once it is proved at a simple level then other opportunities (scala.io) might present themselves. The work that has been done already (http://wiki.github.com/jsuereth/scala-jigsaw) suggests that at least some of this is already possible, although it should be noted that this doesn't do modularity at compile-time but as a post-processing of the current scala-library, so may not catch out all cases. * Package-level vs Module-level dependencies. Although strictly an OSGi implementation detail, it's possible to define module dependencies in terms of modules (Require-Bundle) or package (Import-Package). The latter allows for more easy refactoring of package locations (so a bundle with "Import-Package: scala.actors.remote" doesn't really care whether it comes from the scala-actors module or the scala-actors-remote module. Maven build projects on the other hand tend for prefer the former style of dependencies for the maven-pom dependencies, though the built artefacts can use the Import-Package style dependency with little effort. We may wish to formalise the use of Import-Package as a requirement for the OSGi aspects of the modules to permit ease of use of future refactoring (such as if we pull out scala.io in the future, bundles which Import-Package: scala.io will continue to work). * Build system constraints. There may be requirements on the EPFL infrastructure which constrain which build choices are available, over and above the existence of Java runtimes. For example, licensing or practical experience may prevent the use of some tools. It should be observed that some Scala projects already use Maven (http://scala-tools.org/mvnsites/maven-scala-plugin/ used by http://liftweb.net/download.html) and that the current scala- library is uploaded to Maven. It's also worth observing that Maven is both a build system and a distribution/repository system used by many Java projects. Perhaps those with experience of these can comment? Summary ======= This is intended to start discussion on the requirements (rather than the implementation of those requirements, which is a separate story). Furthermore, whilst I've split this up into the 'Requirements' and 'Other issues', these are only my thoughts (and the collected thoughts of some of the pre-conversations on the subject) so any/all of these are up for rejection, debate, or amendment. In addition, there's bound to be some that I have missed. Please let me know your thoughts, and debate on new items or the removal of these ones. In order to keep the mail list traffic down to a slightly smaller set, please lets trim e-mails down, and if we need to, refer to them by the titles, which I repeat here for summarisation and ease-of-replying: Requirements: * It must continue to be possible to run a Scala program using the existing shell/batch scripts. * Whilst Scala should be module-system agnostic, for practical purposes they should support OSGi. * Scala module dependencies should form a directed acyclic graph. * It must continue be possible to run a Scala program using the existing shell/batch scripts. * Whilst Scala should be module-system agnostic, for practical purposes they should support OSGi. * Scala module dependencies should form a directed acyclic graph. * Each module should be versioned. * Module dependencies should be versioned. * Module JARs should be named with a version number. * An automated build process should be able to build Scala. * The build process should allow modules to be built independently. * One-to-one relationship between module and SCM location. * One-to-one relationship between module and IDE project. * One-to-one relationship between module and Scala packages. * Limit scope of 'open' modules. * Run as self-hosted projects within Eclipse PDE. Other issues: * Module documentation. * Scala IDE projects and PDE nature. * Version numbering and Scala releases. * Candidate modules. * Package-level vs Module-level dependencies. * Build system constraints.