Working With DocBook XML On Mac OS X

Here is what I did to my Mac OS X 10.4 environment to work with DocBook XML.

First I needed an editing mode for Emacs that would let me compose DocBook content. I downloaded James Clark's nXML mode. This can be installed in /usr/local/share/emacs/site-lisp, which is on the Emacs load path by default. I put it in with my other Emacs modes in ~/.emacs.d/site-lisp instead.

$ cd ~/.emacs.d/site-lisp
$ curl \
-o nxml-mode-20041004.tar.gz

$ tar -zxf nxml-mode-20041004.tar.gz
$ ln -s nxml-mode-20041004 nxml-mode
$ rm nxml-mode-20041004.tar.gz

The point of the symbolic link is to avoid having to make other changes should I install a newer version of nXML in the future (e.g., in my ~/.emacs file; see below).

I want nXML mode to be loaded whenever I launch Emacs. I want Emacs to enter nXML mode whenever I open a file with a .xml suffix (which I give to my DocBook XML files). To make both of these things happen I added the following to my ~/.emacs file:

;; This is not necessary if you put everything
;; in /usr/local/share/emacs/site-lisp, which is 
;; on the load-path already.
(add-to-list 'load-path "~/.emacs.d/site-lisp/")
;; The elisp in rng-auto.el does all the work of 
;; setting up nXML mode for you.
(load "nxml-mode/rng-auto.el")
;; Enter nXML mode whenever opening a file with any of these suffixes.
(setq auto-mode-alist
      (append (list
               '("\\.xml" . nxml-mode)
               '("\\.xsl" . nxml-mode)
               '("\\.xsd" . nxml-mode)
               '("\\.rng" . nxml-mode)
               '("\\.xhtml" . nxml-mode))
(add-hook 'nxml-mode-hook 'turn-on-auto-fill)

Next I downloaded the DocBook grammar and XSL transformation files from the DocBook download page on SourceForge. I plan on migrating to DocBook 5.0, the XML grammar for which is defined in the docbook namespace, so for that I needed the "namespace-capable" XSL files, those packaged as docbook-xsl-ns. I downloaded and unpacked these into /usr/local/share/xml/xsl/, then made a symbolic link in that directory called simply docbook-xsl. By using a version-free symbolic link to, say, docbook-xsl-ns-1.73.2, I can update my system without having to change any of my scripts that depend on the location of the XSL files.

After downloading the DocBook XSL files I could publish my DocBook 4 content as XHTML using xsltproc which comes with Mac OS X 10.4:

$ export XSL=/usr/local/share/xml/xsl/docbook-xsl
$ xsltproc $XSL/xhtml/chunk.xsl my_content.xml

To publish DocBook XML as PDF requires an additional tool. The conversion to PDF involves an intermediate step of first converting the XML to "Formatting Objects" (FO). The Formatting Objects are then transformed into PDF. xsltproc can be used to convert XML to FO. FO is another XML dialect, and the DocBook XSL files I downloaded include the necessary transformation rules. To convert the FO to PDF I use the Java application Apache-FOP. I downloaded the latest version of FOP, 0.94, and installed it by copying all of the necessary .jar files into ~/Library/Java/Extensions. E.g.,

$ cd ~/Library/Java/Extensions
$ cp ~/Downloads/fop-0.94/build/fop.jar .
$ cp ~/Downloads/fop-0.94/lib/*.jar .

I could have instead put these in /Library/Java/Extensions and then Apache-FOP would have been available to all users on my system.

Because of licensing issues, Apache cannot distribute hyphenation support with FOP, so I had to download and install this separately. The tool for this is Objects For Formatting Objects, or "Offo". To install Offo I simply copied the fop-hyph.jar file into ~/Library/Java/Extensions. This module is not necessary, but it is nice to have.

Having installed the Apache-FOP, to publish my DocBook XML file as PDF I do the following (assuming XSL is set as above):

$ xsltproc --output $XSL/fo/docbook.xsl my_content.xml
$ java org.apache.fop.cli.Main -fo -pdf my_content.pdf

Note that Apache-FOP is capable tranforming XML to PDF by itself, without using xsltproc to perform the intermediate transform. FOP comes with another Java application, Apache Xalan, which performs the XSL transformation (XSLT) from XML to FO. FOP also includes a script to make the complete XML-to-PDF process relatively simple. However, I found xsltproc to be faster and more capable than Xalan (especially for modular DocBook), so I prefer it. And if you only need to publish your DocBook as X/HTML then xsltproc, which is already included with Mac OS X 10.4, is all you need, you do not need FOP (or Xalan) at all.

DocBook 5

The above steps were sufficient for working with DocBook XML 4.x content, but one additional step was required for DocBook XML 5.x content (assuming you have created new content in DocBook 5 form, or transitioned your existing 4.x to 5.0 as discussed in Norm Walsh's transition guide). In addition to using the namespace-capable XSL files (see above), I needed to add a RELAX-NG grammar file to nXML mode that would handle DocBook 5. To do this I downloaded docbookxi.rnc from the RELAX_NG page of the latest DocBook 5.x Schema pages. The "xi" in that file name refers to XInclude, indicating that this grammar is able to process XML Inclusions and therefore able to accommodate modular DocBook.

To install the new DocBook-5-capable grammar file in nXML I moved it into my nxml-mode directory (to which I put a symbolic link in ~/.emacs.d/site-lisp), and called it docbook5xi.rnc so that it wouldn't conflict with the DocBook 4.x grammar already in there. I then added the following line to nxml-mode/schema/schemas.xml:

 <namespace ns="" uri="docbook5xi.rnc"/>

This allows nXML to associate the DocBook 5, XML-Inclusion-capable, RELAX-NG grammar with my .xml files, files such as this:

<?xml version="1.0" encoding="utf-8" ?>
<book  xmlns=""
   <title>Working With DocBook on Mac OS X</title>
  <xi:include href="chapter1.xml"/>
  <xi:include href="chapter2.xml"/>

And that is what I needed to do in order to start working with DocBook content on Mac OS X. That and copy my XSL customizations from my Linux environment.