Using HTMLbase

some authors actually write better material when they are assured that it will look sufficiently beautiful when it appears in print

Donald Knuth

Introduction

Although browsers are more than adequate at formatting text for display on the screen, it is widely acknowledged that they fall down somewhat when it comes to the printed page. HTMLbase is a TeX macro package, whose purpose is to remedy this defect.

In case you did not know, TeX is the freely available typesetting system written by Donald Knuth of Stanford University. He wrote TeX so that his magnum opus The Art of Computer Programming would be properly typeset. TeX runs on most computers, and gives essentially identical results everywhere.

HTMLbase is built upon SGMLbase, which is an SGML parser implemented in TeX, along with some typesetting macros. If you have SGML documents to typeset, then SGMLbase may provide the solution. The companion package XMLbase, we hope to release later this year.

The present version of this package is a quick and at places dirty prototype. The SGMLbase material is pretty solid, the rest may have failings. Feedback from users is welcome, and will help shape the next version. Any beauties in its output should be assigned to the high quality of the TeX typesetting engine it uses. Any deficiencies are due to present author's typographic sense and the macros he has written, and also a shortage of time. Next time, he will do better.

This style file will typeset limited subset of all possible HTML documents. We confine ourselves to a task that can be done fairly easily. At the same time, we are laying the foundations for further developments.

Retrieval

To use HTMLbase you will need a working TeX installation. Please visit the TeX Users Group (TUG) at www.tug.org for information on this.

You will also need SGMLbase. If you have the unzip program, please retrieve sgmlbase.zip and unzip it. This will be easier for you, and helps conserve bandwidth.

If you have the tar and gzip program, you can use sgmlbase.tar.gz and unpack it. Again, this is easier, and saves bandwidth.

If you have neither of these retrieve sgmlbase.bun and LaTeX this file. (All three files are presently available at http://www.active-tex.demon.co.uk/.

Either way, you will end up with a bunch of files (including a copy of this HTML document) in your current directory. So if you connected via a dial-up line, why not retrieve one of zipped (80k), gzipped tar (79k) or LaTeX bundle (265k) form of SGMLbase, and continue when you are off-line.

Installation

You will now need to create the SGMLbase format file. If you've not done something like this before, you may find it a little tricky. In particular, format files have to be stored in a special place, if they are to be visible whatever the current directory. But if you do all your HTMLbase work in a single directory, this will not bother you.

MSDOS. C:\WORK> tex -i sgmlbase will create sgmlbase.fmt

Unix and Linux. $ initex sgmlbase will create sgmlbase.fmt

Running HTMLbase

Files

SGLMLbase and HTMLbase are stored as the files sgmlbase.tex and htmlbase.tex respectively. This HTML document is stored as the file using.html (or using.htm if your file system does not support long names). The distribution also supports a dummy file texput.tex, which is used to supply the default job name.

Examples

MSDOS

C:\WORK> tex &sgmlbase -lhtmlbase -ousing.htm using.htm will cause TeX to load htmlbase.tex and then typeset using.htm, generating using.dvi as output.

Unix and Linux

$ tex \&sgmlbase -lhtmlbase -ousing.html using.htm will generate using.dvi exactly as before.

The command line

HTMLbase inherits the command line processing that is built into SGMLbase. If a filename is given without extension, then TeX appends .tex to form the file name. SGMLbase inherits this feature.

In the above examples the -o option, as in -ousing.htm, sets the output file. Whatever name is specified, TeX must be able to find a file of that name, otherwise it will complain. If -o is not specified, then texput is used instead. (If -o is specified several times, then only the first one counts.)

The -l option, as in -lhtmlbase, causes SGMLbase to load the specified file. Because no extension was supplied, TeX loads htmlbase.tex.

The unadorned using.htm (or using.html) causes the named file to be parsed. Because htmlbase has already been loaded, the output of the parser is passed through to the typesetting engine.

These are, at present, the only command line options. They can occur in any order, and any number of times. Multiple files can be typeset in a single job. It is not every required that they all be in the same syntax, although this is an advanced feature. Loading your own macro files between htmlbase and your own HTML documents is perhaps the easiest way to customize HTMLbase.

A dirty trick

This is for advanced users. For the -o option to work, TeX must be able to find the named file. But it does not particularly care about what is in the file, or even what its extension is.

To have demo as the output file, first create a file demo.log. Now call HTMLbase as before, but use -odemo.log to specify the name of the output file. This will work, even if there are no other demo files visible.

Active TeX

As you may know, although TeX is programmable, its programming language is somewhat unusual. It has no rigid distinction between code and data. All is just one long stream of tokens.

With Active TeX, every character is active, which means that it a macro. This gives the macro programmer great power and flexiblity. (With today's computers, we can well afford to loss in performance that results.)

HTMLbase is not the first TeX macro package that attempts to typeset HTML (or SGML) documents directly. David Carlisle, Jeroen Hellingman and Toby Thurston (and perhaps some others) have written such packages. They make a some, but not all, of the characters active.

HTMLbase differs in that it make all characters active. Although in the short term this requires the programmer to work harder, we believe that in the long term it allows for a more robust parser, and higher quality typesettting.

Conclusions

I hope that you have enjoyed using HTMLbase, and are now in possession of at least two printed forms of this document, namely that produced by your browser, and that produced by HTMLbase.

This style file is suitable for typesetting fairly simple documents, such this one. This is a first version, and comments from users are welcome.

There are some important features, such as tables, that are not yet implemented. These we hope to provide in a subsequent version. As this software is released under the General Public License of the Free Software Foundation, you can if you wish add such features yourself. (Certain conditions apply, please read the license if you wish to do this.)

Jonathan Fine, 203 Coldhams Lane, Cambridge CB1 3HY, UK

fine@active-tex.demon.co.uk

10 January 2000