Features Download
From: Alex Blewitt <alex.blewitt-Re5JQEeQqe8AvxtiuMwx3w <at> public.gmane.org>
Subject: Scala Modularisation - Requirements
Newsgroups: gmane.comp.lang.scala.internals
Date: Saturday 5th September 2009 12:10:30 UTC (over 8 years ago)
In order to start on Modularising Scala, we need to define what the  
key goal and requirements are. Some of these have already been  
suggested so this is somewhat of a summary of those points and to  
allow others to suggest ideas I've missed out. Hopefully this can then  
result in a SID, but I want to try and avoid a filling-out-the-boxes  
of the SID itself until we've got a good idea of what is important.


The goal of modularising Scala is to split up the scala-library in  
such a way that individual units of functionality are encapsulated in  
their own modules, such that adding/upgrading modules can be done on a  
module-by-module basis. In order to know about upgrades, each module  
will need to be versioned. Modules also need to express (versioned)  
dependencies on each other.

The end goal will be to have multiple modules spawned off from the  
current scala-library (such as -actors, -testing). Running scala  
programs (with the scala shell script) should work in exactly the same  
way as before, but with more JARs on the classpath.

Out of scope

A modular library should subsequently allow for a Maven/CPAN/SBaz like  
distribution system. Whilst modularisation should enable the  
distribution of modules, this isn't in scope for the immediate task of  
modularisation. However, it should be noted that modularisation would  
have to be achieved before this kind of repository could be enhanced  
in any case.

This is also not intending to develop competing modular systems to  
those already in use; in fact, wherever possible, the modularisation  
should focus on the logical inter-dependencies and the way in which  
the modules are broken up, such that it should be possible to adopt  
different module systems (OSGi, SimpleModuleSystem etc.). As with the  
distribution system, modularisation may then subsequently result in  
the ability of Scala to expose module information via a 'module'  
object, similar to 2.8's package object, but this won't be considered  
at this stage.


* It must continue to be possible to run a Scala program using the  
existing shell/batch scripts.

Regardless of what module system(s) are supported, it should be  
possible to run a Scala program by concatenating modules/JARs on the  
command line.

* Whilst Scala should be module-system agnostic, for practical  
purposes they should support OSGi.

This will enable the ongoing development of the Scala IDE in Eclipse,  
and builds on the OSGi support already in Scala and related libraries.  
However, the modules should not be dependent on OSGi save for enabling  
OSGi-specific functionality; in other words, they should run just as  
well outside as inside an OSGi container.

* Scala module dependencies should form a directed acyclic graph.

In other words, circular dependencies should not exist (since not all  
module systems may be able to handle this). For most code, this  
shouldn't be an issue; however, it may be a specific issue in the case  
of scala.Predef (e.g. Predef.scala: val $scope = scala.xml.TopScope).  
In this case, if refactoring cannot be done to achieve the end goal,  
it might be better to merge modules than to introduce cycles.

* Each module should be versioned.

A suggestion is to use the (current) OSGi semantics of  
major.minor.micro.qualifier; since we're adopting OSGi for the IDE  
support, this may be a natural fit. In the spirit of announcing  
backward compatibility, it is suggested that the major be bumped  
whenever an incompatible change occurs; in the case of the compiler  
emitting incompatible bytecode, the major version of the core should  
be bumped. Thus code that depends on (say) scala-core-1.0 could  
automatically show a breakage against scala-core-2.0. (The name scala- 
core here refers to the key data structures, like Predef, Tuple etc.;  
however, this name is used solely for illustration and is not a  

* Module dependencies should be versioned.

In order to ensure modules are compatible, we should define module's  
dependencies on a particular version (or version range). This can be  
achieved through build processes that depend on explicit versions (or  
version range).

* Module JARs should be named with a version number.

In order to distinguish between multiple versions of a module, the JAR  
should include (components of the) version number. This will  
facilitate entries in e.g. Class-Path dependencies as well as allowing  
an at-a-glance to determine what version numbers are present. Various  
build systems generate this automatically (e.g. Maven) and the 'scala'  
shells include lib/* anyway. Note that whilst the Scala IDE (or other  
development environments) may be affected if they use the JAR name  
explicitly, modularisation is in any case going to affect this. (e.g  
if scala-library.jar is replaced with scala-core.jar, scala-actors.jar  
etc. then build paths will need to be changed anyway).

* An automated build process should be able to build Scala.

An automated build/test system is needed; modularisation may have an  
impact on how the build system works, but it must be possible to  
generate an entire build from source (with a bootstrap compiler if  

* The build process should allow modules to be built independently.

It should be possible, for development of a dependent module, to build  
that dependent module independently of its dependencies. This would  
allow work to be built on scala-actors whilst consuming a pre-built  
scala-core module.

* One-to-one relationship between module and SCM location.

Practically, it makes sense for each module to correspond to a single  
SCM location. This is likely to result in needing to relocate existing  
scala source files which will need to be synchronized by the core  
Scala team. (It is  probably going to need a proof-of-concept split  
based on some arbitrary version of HEAD, along with the per-module  
build information prior to the SID; if agreed, then the HEAD can be re- 
snapped and put into place.)

* One-to-one relationship between module and IDE project.

Each module should be able to be checked out and developed in an IDE.  
This includes project metadata and module metadata. It may be  
desirable to hook in other tools to post-process manually/ 
automatically generated metadata, but this should be persisted in the  
source control system as well.

* One-to-one relationship between module and Scala packages.

For consistency, each module should correspond to a separate Scala  
package. Although some module systems don't demand this, avoiding such  
'split packages' are likely to result in an easier to build modules.  
At least for the initial cut of scala's modules, we should try and  
encourage this.

* Limit scope of 'open' modules.

Some code generally needs unfettered access to classes, such as remote  
proxies. In order to achieve this in some module systems, a generic  
'import *' is needed to make any class available to the classloader.  
Some scala code (scala.actors.remote) may need such 'open' modules but  
the danger of making any module open is that unnecessary (or circular)  
references can be introduced. Thus, where possible, open modules  
should be avoided and if such modules are needed, they are limited to  
the smallest scope possible (e.g. scala-actors may be closed whilst  
scala-actors-remote may be open).

* Run as self-hosted projects within Eclipse PDE.

In order to facilitate on-going development of the Scala IDE, it must  
be necessary to be able to self-host Scala IDE within an Eclipse PDE  
session. It's worth noting that if the Scala modules are also  
themselves OSGi bundles, then the requirement of being able to edit  
modules independently might be satisfied by installing some of the  
modules into the target workspace (as binaries) whilst checking out  
some modules as source projects. The OSGi binding should then pick up  
the dependencies regardless of whether the project is in the (source)  
workspace or the (binary) target platform.

Other issues

* Module documentation.

Does it make sense to make module documentation (say, generated by  
'scaladoc') to be made available on a per-module basis or weaved  
together into a top-level documentation site? Perhaps each module  
could generate its own documentation and this could be merged (c.f.  
Maven's 'site' document generation process).

* Scala IDE projects and PDE nature.

Since modularisation of Scala is going to facilitate the development  
of the IDE related components, does it follow that the Scala IDE  
should be based on not only Eclipse JDT but also Eclipse PDE? This  
would allow Scala projects to be developed as first-class bundles,  
which as noted elsewhere, are implicitly JARs in any case. Regardless  
of this, it should be possible to build Scala outside of an IDE  
environment (hence the automated build requirement above). Having all  
Scala projects implicitly OSGi bundles will result in the ability for  
others to generate Scala modules fairly trivially. We may wish to  
enable this with a 'toggle', so as to allow downstream developers to  
build a Scala/OSGi bundle (with PDE) with some choosing to just use  
JDT dependencies.

* Version numbering and Scala releases.

There doesn't necessarily have to be synchronisation between  
individual modules and the 'release version' of Scala. However, going  
forward, it might make sense to have a core library (which includes  
the scala package) and compiler (which may have tight dependencies on  
the core library) correspond to the 'release version' of Scala. So,  
Scala 2.9 may ship with scala-core-2.9.jar and scala-compiler-2.9.jar,  
but other libraries (actors, testing) might not change version number.  
This is meant to start as a discussion point to result in suggestions  
to the Scala release team. (Note that if the major/minor/micro is  
adopted, and changes in major indicate breakages, and modules depend  
on versioned ranges, then (say) scala-actors-2.8 could define a  
dependency on scala-core >= 2.8 and < 3.0 - so any further 2.x stream  
would be OK but if an incompatible class change occurred, then as long  
as the scala-core was bumped up to 3.0, the scala-actors package would  
need to be updated to take into account of the dependency change.)

* Candidate modules.

 From the dependencies in the source packages already, it seems that  
there's a natural split amongst packages as follows:
- scala.actors
- scala.actors.remote (? may be some circular dependencies between  
this and scala.actors)
- scala.concurrency
- scala.xml (? depedency of 'scope' in Predef; can this be refactored/ 
- scala.testing

There may be other dependencies which can be refactored at a later  
stage, but once it is proved at a simple level then other  
opportunities (scala.io) might present themselves. The work that has  
been done already (http://wiki.github.com/jsuereth/scala-jigsaw)
suggests that at least some of this is already possible, although it  
should be noted that this doesn't do modularity at compile-time but as  
a post-processing of the current scala-library, so may not catch out  
all cases.

* Package-level vs Module-level dependencies.

Although strictly an OSGi implementation detail, it's possible to  
define module dependencies in terms of modules (Require-Bundle) or  
package (Import-Package). The latter allows for more easy refactoring  
of package locations (so a bundle with "Import-Package:  
scala.actors.remote" doesn't really care whether it comes from the  
scala-actors module or the scala-actors-remote module. Maven build  
projects on the other hand tend for prefer the former style of  
dependencies for the maven-pom dependencies, though the built  
artefacts can use the Import-Package style dependency with little  
effort. We may wish to formalise the use of Import-Package as a  
requirement for the OSGi aspects of the modules to permit ease of use  
of future refactoring (such as if we pull out scala.io in the future,  
bundles which Import-Package: scala.io will continue to work).

* Build system constraints.

There may be requirements on the EPFL infrastructure which constrain  
which build choices are available, over and above the existence of  
Java runtimes. For example, licensing or practical experience may  
prevent the use of some tools. It should be observed that some Scala  
projects already use Maven (http://scala-tools.org/mvnsites/maven-scala-plugin/

  used by http://liftweb.net/download.html)
and that the current scala- 
library is uploaded to Maven. It's also worth observing that Maven is  
both a build system and a distribution/repository system used by many  
Java projects. Perhaps those with experience of these can comment?


This is intended to start discussion on the requirements (rather than  
the implementation of those requirements, which is a separate story).  
Furthermore, whilst I've split this up into the 'Requirements' and  
'Other issues', these are only my thoughts (and the collected thoughts  
of some of the pre-conversations on the subject) so any/all of these  
are up for rejection, debate, or amendment. In addition, there's bound  
to be some that I have missed.

Please let me know your thoughts, and debate on new items or the  
removal of these ones. In order to keep the mail list traffic down to  
a slightly smaller set, please lets trim e-mails down, and if we need  
to, refer to them by the titles, which I repeat here for summarisation  
and ease-of-replying:

* It must continue to be possible to run a Scala program using the  
existing shell/batch scripts.
* Whilst Scala should be module-system agnostic, for practical  
purposes they should support OSGi.
* Scala module dependencies should form a directed acyclic graph.
* It must continue be possible to run a Scala program using the  
existing shell/batch scripts.
* Whilst Scala should be module-system agnostic, for practical  
purposes they should support OSGi.
* Scala module dependencies should form a directed acyclic graph.
* Each module should be versioned.
* Module dependencies should be versioned.
* Module JARs should be named with a version number.
* An automated build process should be able to build Scala.
* The build process should allow modules to be built independently.
* One-to-one relationship between module and SCM location.
* One-to-one relationship between module and IDE project.
* One-to-one relationship between module and Scala packages.
* Limit scope of 'open' modules.
* Run as self-hosted projects within Eclipse PDE.
Other issues:
* Module documentation.
* Scala IDE projects and PDE nature.
* Version numbering and Scala releases.
* Candidate modules.
* Package-level vs Module-level dependencies.
* Build system constraints.
CD: 21ms