Features Download

From: Joaquim Baptista <pxquim <at> gmail.com>
Subject: Re: Multi-file output
Newsgroups: gmane.text.docutils.devel
Date: Sunday 27th May 2007 15:43:08 UTC (over 11 years ago)

On 2007/05/09, at 00:17, Lea Wiemann wrote:

> Since you mentioned multiple PDF files, I'd be curious if
> you currently have an actual use case for multiple output files in any
> format other than HTML, or if that's just something you were
> expecting/hoping to fall out for free.

In the real world, I work as the director of a small team of  
technical writers documenting a modular software product that is over  
10 years old.

In such a long-lived documentation effort, you develop lots of  
contents over time, and develop different strategies to deliver those  
contents to different people.

I have struggled with the following issues over the years:

1) There is no simple way to just divide those contents into a set of  
books. What works today will not work tomorrow or two years from now.

2) Different user profiles may require different books, often sharing  
some of the contents. Either you plan for maintenance, or you will be  
killed by maintenance later on.

3) Different output formats work better with different amounts of  
information. For example, support prefers a large help file with  
everything, but printable books work better with 100 pages or less.

4) The different helps and books still require links between them.  
For example, links should just work in the "everything included"  
help, but something sensible should happen between books as well.

In summary, your "books":
- Are always part of something larger
- Will evolve over time

When you add the requirement of having flexible output capabilities  
in multiple formats, you quickly stumble on the following issues:
- Having the ability to publish a subset of a "book" in a sensible way
- Having the ability to publish several "books" as an integrated  
output (for a large help or for a large web site)
- Having sensible links between contents that work on the different  
output formats and different partitions. For example, link in HTML,  
link with page reference in PDF (possibly to a different document!)

Right now docutils fails in a very basic way: unless you resort to  
"raw" hacks, the images and the links do not adapt to the output  
format at all.

I used a custom build system based on XML and heavily inspired by the  
old linuxdoc that mostly solved these issues:
- Images were specified without extension. When publishing, each  
output format would try different extensions in turn. This allowed  
the HTML output to prefer .gif while the printable output  
preferred .wmf over .gif et al.
- Documents in a directory were considered peers. Assuming that  
documents were output to another directory, links between them worked  
in whatever output format. Links had the form "[email protected]", and worked in  
HTML, PDF, and Windows 95 help (PDF and Help used M$ Word as an  
intermediate format).

We moved to DITA after outgrowing the custom-build system. DITA uses  
"topics" instead of documents as a base. DITA has obvious advantages  
in a multi-writer environment (shorter pieces mean less fighting to  
edit the same piece), but it also lacks any idea of "documentation set".

> Lea Wiemann' project at the Google summer of code:
> Time Line
> =========
> present - May 27
>     Have some preliminary discussion on the Docutils mailing list
>     about how each step should be implemented.  I expect that much
>     design discussion will still take place during the implementation
>     phase as issues arise.

Sorry for replying so late... the real world got to me, and I  
actually missed this part of your message until today :-(

> May 28 - June 10
>     Add support for multiple input documents.  This may involve adding
>     a new format for a top-level "master" document which references
>     all files in the documentation.

As I see it, you cannot have sensible links between different  
documents unless you have some idea of "documentation set". This  
allows you to generate a book that is part of a set of books, and  
thus make links work across documents.

The master document is beautifully implemented in DITA, where the  
similar concept of "map" is basically a tree of references to  
included topics. Maps also feature clever ways to add "related links"  
between topics.

Important in the DITA concept of map is that you can have maps of  
maps (useful to tame complexity) as well as alternative maps.  
Meaning, a single topic can belong to different maps, and thus you  
can easily organize the same contents in small and large maps as  
needed, while varying the order and nesting of the topics.

DITA is also planning to have a "chunking" attribute that aims at  
decoupling the inputs from the outputs, namely in HTML outputs. The  
idea is that you can have multiple input files generate a single HTML  
file and vice-versa.

This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
CD: 18ms