(This document was automatically generated from LaTeX source by the ltx2x program.) To the end

LTX2X: A LaTeX to X Auto-tagger

Peter R. Wilson
Catholic University of America
(This work was performed while a Guest Researcher at the National Institute of Standards and Technology)

Email: pwilson@cme.nist.gov

January 1997

Abstract

LTX2X is a table-driven program that will replace LaTeX commands by user defined text. This report describes the beta version of the system. LTX2X supports both a declaritive command style and an interpreted procedural language tentatively called EXPRESS-A. Details are given of the program functionality including examples. System installation instructions are provided.

Introduction

LaTeX [LAMPORT94], which is built on top of TeX [KNUTH84a], is a document tagging system that is very popular in the academic and scientific publishing communities because of the high quality typeset material that the system outputs for normal text and especially for mathematics.

In particular, many of the documents forming the International Standard ISO 10303, commonly referred to as STEP [STEPIS], have been written using LaTeX as the document tagging language. Lately there have been moves towards converting the STEP documents to embody SGML [GOLDFARB90] rather than LaTeX markup. This has led to an interest in the automatic conversion from LaTeX to SGML documents. The LTX2X system is an initial attempt to provide a generic capability for converting LaTeX tags into other kinds of tags.

The LTX2X system described below is in a beta release state. That is, there is probably some more work to be done on it but experience from use is needed to determine desirable additional functionality. However, the code has been stable for some time. Bug reports or suggested enhancements (especially if the suggestions are accompanied by working code) are encouraged, as are constructive comments about this document.

Essentially, LTX2X reads a file containing LaTeX markup, replaces the LaTeX commands by user-defined text, and writes the result out to another file. The program operates from a command table that specifies the replacement text. In general, no programming knowledge or skills are required to write a command table, which LTX2X will then interpret. Some knowledge of LaTeX is required, but no more than is necessary for authoring a LaTeX document.

LTX2X has proved capable of performing such functions as:

The remainder of this introduction gives an overview of the LTX2X program. The command table is described in more detail in section sec:command-table and information on running the LTX2X program is provided in section sec:program. Section sec:expressa gives an overview of the EXPRESS-A language. (Footnote: The overview is necessarily rather brief as I am shortly moving to a new place of employment and EXPRESS-A is the latest addition to the system.) Although the functionality available through the command table facility is suitable for many tasks, especially since an interpreter for the EXPRESS-A general programming language is included within LTX2X, section sec:special gives details on how the system can be extended for cases where this proves to be inadequate.

The report ends with several appendices. An example command table for deTeXing a document is reproduced in sec:detexing and some of the issues in converting from LaTeX to HTML are discussed in sec:htmling. The known limitations of LTX2X are listed in sec:limitations and a summary of the command table facilities are given in sec:summary. Appendix sec:install provides instructions on installing the LTX2X program, together with copyright and warranty information. Finally, sec:ctabgrammar and sec:expgrammarprovide grammars for the command table and EXPRESS-A, respectively.

Overview

The intent of Leslie Lamport, the author of LaTeX, was to provide a document tagging system that enabled the capture of the logical structure of a document. This system uses Donald Knuth's TeX system as its typesetting engine [KNUTH84a], and thus has an inherent capability for high quality typesetting.

All LaTeX commands are distinguished by starting with a backslash (\). Generally speaking, the name of a command is a string of alphabetic characters (e.g. \acommand). Commands may take arguments. Required arguments are enclosed in curly braces (i.e. { and }). Optional arguments are enclosed in square brackets (i.e. [ and ]). The general syntax for a command is the command name (preceded by a backslash) followed by the argument list with a maximum (Footnote: Under very unusual circumstances this limit may be exceeded.) of nine arguments.

The LTX2X program reads a LaTeX document file and outputs a transformation of this file. By default it outputs the normal text while for each LaTeX command and argument performs some user-specified actions; typically these actions involve the output of specific text corresponding to the particular command. The actions are specified in a command table file, written by the user, which is read into the LTX2X system before document processing is begun. A command table consists of a listing of the LaTeX commands of interest together with the desired actiond for each of these commands and their arguments. Different effects may be easily obtained by changing the command table file. For example, a simple command table file may be written that will delete all the LaTeX commands from a document, resulting in a plain ASCII file with no embedded markup. (Footnote: To afficionados, this process is known as de-TeX ing.) A more complex command table may be written that will replace LaTeX tags with appropriate SGML tags.

In some circles it is traditional to introduce a programming language by providing an example program that prints `Hello world'. In contrast, the following command table file called bye.ct, when used in conjunction with a typical vanilla LaTeX file, will transform the LaTeX file to a file that consists only of the words `Goodbye document'.

C=        bye.ct   "Goodbye document" for ltx2x

TYPE= COMMAND
NAME= \documentclass
  START_TAG= "Goodbye document"
  PC_AT_END= NO_PRINT
END_TYPE
  
C= just in case a LaTeX v2.09 document
TYPE= COMMAND
NAME= \documentstyle
  START_TAG= "Goodbye document"
  PC_AT_END= NO_PRINT
END_TYPE
  
C= just in case there is no \documentclass/style command
TYPE= BEGIN_DOCUMENT
  START_TAG= "Goodbye document"
  PC_AT_END= NO_PRINT
END_TYPE

TYPE= OTHER_COMMAND
  PRINT_CONTROL= NO_PRINT
END_TYPE

TYPE= OTHER_BEGIN
  PRINT_CONTROL= NO_PRINT
END_TYPE

TYPE= OTHER_END
  PRINT_CONTROL= NO_PRINT
END_TYPE

END_CTFILE=  end of bye.ct

Essentially the command table instructs LTX2X what to print for each LaTeX command. A command table file consists of a series of commands, one per line and introduced by a keyword such as TYPE=. Keywords are case insensitive but by convention are written in upper case. Comments in a command table are introduced by the keyword C=.

The main body of a command table consists of the specification of LaTeX commands of interest and the actions to be taken for these. Each specification commences with the keyword TYPE= and is completed by the keyword END_TYPE, the relevant actions being listed between these two keywords.

LTX2X treats some LaTeX commands specially; among these are \begin{document} and \end{document}. In a command table these are specified by the types TYPE= BEGIN_DOCUMENT and TYPE= END_DOCUMENT. The actions at \begin{document} are firstly to print the string `Goodbye document' (specified in the line START_TAG= "Goodbye document") and secondly to stop printing any output (specified in the line PC_AT_END= NO_PRINT).

By not specifying the END_DOCUMENT entry, the default action is used for the \end{document} command.

The command table entries for the commands \documentclass and \documentstyle specify that, if either of these is in the source document, then it is to be replaced by the text string "Goodbye document", and then all further printing is to be switched off.

The other three entries in the command table specify the actions for any other kind of LaTeX command. The keyword OTHER_BEGIN signifies a LaTeX command of the form \begin{name} and OTHER_END signifies a command of the form \end{name}. The keyword OTHER_COMMAND signifies any other kind of LaTeX command (e.g., \acommand ... ). The actions declared for these are all PRINT_CONTROL= NO_PRINT which shuts off any printing of the command or its arguments. In the command table bye.ct these are only included to prevent printing before the \begin{document}.

To run LTX2X with the above command table, type the following (where > is assumed to be the system prompt):

> ltx2x -f bye.ct input.tex output.tex
where bye.ct is the name of the command table, and input.tex and output.tex are the names of the input LaTeX file and the resulting processed file respectively.

As an example of a more useful command table file, the following one called decomm.ct will remove all LaTeX comments from a typical LaTeX source file.

C=  decomm.ct  Command table file for ltx2x to de-comment LaTeX source

C= ------------------------------------ set newline characters
ESCAPE_CHAR= ?
NEWLINE_CHAR= N

C=   ----------------------------------- built in commands
TYPE= BEGIN_DOCUMENT
  START_TAG= "\begin{document}"
END_TYPE

TYPE= END_DOCUMENT
  START_TAG= "\end{document}"
END_TYPE

TYPE= BEGIN_VERB
  START_TAG= "\verb|"
END_TYPE

TYPE= END_VERB
  START_TAG= "|"
END_TYPE

TYPE= BEGIN_VERBATIM
  START_TAG= "\begin{verbatim}"
END_TYPE
TYPE= END_VERBATIM
START_TAG= "\end{verbatim}"
END_TYPE
TYPE= LBRACE
  START_TAG= "{"
END_TYPE

TYPE= RBRACE
  START_TAG= "}"
END_TYPE

TYPE= PARAGRAPH
  START_TAG= "?N?N    "
END_TYPE

C= ------------------- define '\item' tags within lists

TYPE= BEGIN_LIST_ENV
NAME= itemize
  START_TAG= "\begin{itemize}"
  START_ITEM= "\item "
END_TYPE

TYPE= BEGIN_LIST_ENV
NAME= enumerate
  START_TAG= "\begin{enumerate}"
  START_ITEM= "\item "
END_TYPE

TYPE= BEGIN_LIST_ENV
NAME= description
  START_TAG= "\begin{description}"
  START_ITEM= "\item"
  START_ITEM_PARAM= "["
  END_ITEM_PARAM= "] "
END_TYPE

TYPE= END_LIST_ENV
  NAME= itemize
END_TYPE

TYPE= END_LIST_ENV
  NAME= enumerate
END_TYPE

TYPE= END_LIST_ENV
  NAME= description
END_TYPE

C=    --------------------- pass through all other LaTeX commands

TYPE= OTHER_COMMAND
END_TYPE

TYPE= OTHER_BEGIN
END_TYPE

TYPE= OTHER_END
END_TYPE

END_CTFILE= end of file decomm.ct
In the above command table file, the first pair of commands (ESCAPE_CHAR= and NEWLINE_CHAR=) define the character pair that are to be used to signify a `newline' within a tag. An example of their use is later in the file in the PARAGRAPH command type.

As indicated above, LTX2X treats some LaTeX commands specially. These are listed next in the command table. The special LaTeX commands are the begin and end of the document and verbatim environments, together with the \verb command, left and right braces, the \ command, and the LTX2X PARAGRAPH specification. There are default actions for these, but apart from the \ command the defaults are not appropriate in this case. Above, the actions are to replace the LaTeX command by the string forming the LaTeX command. The exception is that paragraphs (the PARAGRAPH specification) should start with at least one blank line and be indented some spaces.

The LaTeX \item command is used within lists. LTX2X has to be told how to treat the \item command within each kind of list. This has been done above for the itemize, enumerate and description environments.

The final instructions in the command table file tell LTX2X to pass through the text of all other commands and their arguments. The end of the command table file is either the physical end of the file or the command END_CTFILE=, whichever comes first. The END_CTFILE= command acts like the C= command in that arbitrary text can be put after the command.

To use the decomm.ct command table to de-comment a LaTeX file, type the following (where > is assumed to be the system prompt):

> ltx2x -f decomm.ct input.tex output.tex
where input.tex and output.tex are the names of the input LaTeX file for de-commenting and the resulting de-commented version respectively.

The command table file

By default, LTX2X does not output any LaTeX comments. Otherwise, whenever it comes across a LaTeX command it looks at the data in the command table file to determine what actions it should take. The two most typical actions are either to print out the command as read in, or to replace the command by some (possibly empty) text.

Each line in a command table file is either blank or starts with a keyword followed by one or more blanks. For example, a comment in the file is a line that starts with C= ; the remainder of the line is any comment text. Comments may be placed anywhere in the file.

Special print characters in tags

LTX2X is written in C [KERNIGHAN88]. The C language enables certain non-printing characters to be defined. These are typically written in the form \c where \ is the C escape character and c is a particular character. LTX2X understands some of these special printing characters and the command table enables these to be given non-default values.

The default escape character (\) may be redefined via the ESCAPE_CHAR= command. For example,

ESCAPE_CHAR= ?
will make the question mark character the escape character. Typically, the escape character is changed in most command table s to avoid clashing with the LaTeX \ character. The following commands can be used to redefine the C special characters. Each of these commands takes a single character as its value. If a relevant command is not given, then the default value is used.
NEWLINE_CHAR=
a new line (default is n)
HORIZONTAL_TAB_CHAR=
horizontal tab (default is t)
VERTICAL_TAB_CHAR=
vertical tab (default is v)
BACKSPACE_CHAR=
backspace (default is b)
CARRIAGE_RETURN_CHAR=
carriage return (default is r)
FORMFEED_CHAR=
formfeed (default is f)
AUDIBLE_ALLERT_CHAR=
beep the terminal (default is a)
HEX_CHAR=
following characters form the hexadecimal number of the character to be printed (default is x) (e.g. ?xA3)
These command lines are all optional within a command table and their ordering is immaterial. However, if any are present then they must be at the beginning of the command table.

The above special characters are useful when specifying the replacement text for LaTeX commands.

LaTeX command types

The commands for controlling the actions performed on LaTeX commands are enclosed between the command lines TYPE= and END_TYPE, as below.

TYPE= CommandType
  C= a possibly empty set of commands
END_TYPE
where CommandType is an LTX2X keyword signifying the kind of LaTeX command being specified.

Built in command types

Some LaTeX commands are pre-defined within LTX2X. Default actions are provided for these but it is recommended that type specifications for each of these commands be put in the command table anyway. The keywords for these commands are:

BEGIN_DOCUMENT
Corresponds to the LaTeX command \begin{document}.
END_DOCUMENT
Corresponds to the LaTeX command \end{document}.
BEGIN_VERBATIM
Corresponds to the LaTeX commands \begin{verbatim} and
\begin{verbatim*}.
END_VERBATIM
Corresponds to the LaTeX commands \end{verbatim} and \end{verbatim*}.
BEGIN_VERB
Corresponds to the LaTeX commands \verb and \verb*, together with the succeeding character.
END_VERB
Corresponds to the appearance of the character that completes the LaTeX commands \verb and \verb*.
LBRACE
Corresponds to the LaTeX left brace character {.
RBRACE
Corresponds to the LaTeX right brace character }.
BEGIN_DOLLAR
Corresponds to the LaTeX $ symbol signalling the start of an in-text math formula.
END_DOLLAR
Corresponds to the LaTeX $ symbol signalling the end of an in-text math formula.
PARAGRAPH
Corresponds to the LaTeX protocol of a blank line signalling the start/end of a paragraph.
SLASH_SPACE
Corresponds to the LaTeX \ command.
OTHER_COMMAND
Corresponds to any LaTeX command of the form \command not specified elsewhere within the command table.
OTHER_BEGIN
Corresponds to any LaTeX command of the form \begin{environment} not specified elsewhere within the command table.
OTHER_END
Corresponds to any LaTeX command of the form \end{environment} not specified elsewhere within the command table.

The ordering of these built in type specifications is immaterial. If any of the above are not specified within the command table then LTX2X will use their default action. With the exception of the SLASH_SPACE command type, the default action is to do nothing (i.e., produce no output). The default action for the SLASH_SPACE command type is to output a space.

Optional command types

For the purposes of LTX2X, LaTeX commands are divided into various classes. The keywords for these clases, and the class descriptions, are listed below.

TEX_CHAR
Corresponding to LaTeX's special characters (with the exception of the $, { and } characters).
CHAR_COMMAND
Corresponding to LaTeX commands of the type \c where c is a single non-alphabetic character.
COMMAND
Corresponding to LaTeX commands of the type \command, where command is the name of the command (except for \begin, \end and \item).
BEGIN_ENV
Corresponding to LaTeX commands of the type \begin{environment} where environment is the name of the environment, except for those list environments whose bodies consist of \item commands.
END_ENV
Corresponding to LaTeX commands of the type \end{environment}, with the same restrictions as for BEGIN_ENV.
BEGIN_LIST_ENV
Corresponding to LaTeX commands of the type \begin{environment} where environment is the name of an environment whose body consists of \item commands.
END_LIST_ENV
Corresponding to LaTeX commands of the type \end{environment} to match BEGIN_LIST_ENV.
VCOMMAND
Corresponding to a LaTeX \verb-like command.
BEGIN_VENV
Corresponding to the start of a verbatim-like environment.
END_VENV
Corresponding to the end of a verbatim-like environment.
SECTIONING
Corresponding to LaTeX commands of the type \command, where command is a document sectioning command such as chapter or subsection.
SPECIAL
Reserved for possible future use.
SPECIAL_COMMAND
Corresponding to the COMMAND keyword, except that some special output processing is to be defined.
SPECIAL_BEGIN_ENV
Corresponding to the BEGIN_ENV keyword, except that some special output processing is to be defined.
SPECIAL_END_ENV
Corresponding to the END_ENV keyword, except that some special output processing is to be defined.
SPECIAL_BEGIN_LIST
Corresponding to the BEGIN_LIST_ENV keyword, except that some special output processing is to be defined.
SPECIAL_END_LIST
Corresponding to the END_LIST_ENV keyword, except that some special output processing is to be defined.
SPECIAL_SECTIONING
Corresponding to the SECTIONING keyword, except that some special output processing is to be defined.
_PICTURE_
Corresponding to some of the LaTeX picture drawing commands.
COMMAND_...
Corresponding to some of the LaTeX commands whose arrangements of required and optional arguments are untypical.

The ordering of these types within a command table is immaterial.

Each of the above type specifications requires a NAME= command, whose value is the name of the relevant command or environment being specified. For example, the following is a (partial) specification of the figure environment and the caption command.

TYPE= BEGIN_ENV
NAME= figure
END_TYPE

TYPE= END_ENV
NAME= figure
END_TYPE

TYPE= COMMAND
NAME= \caption
END_TYPE

Command action tags

When LTX2X reads a LaTeX command it performs the following actions:

  1. Looks up the name of the command or environment in the command table. If it is not found, then the appropriate default type is used.
  2. Sets the printing mode according to the PC_AT_START= command.
  3. Performs the actions specified in the command table by the START_TAG= command.
  4. Processes any specified arguments to the command.
  5. Performs the actions specified in the command table by the END_TAG= command.
  6. Sets the printing mode according to the PC_AT_END= command.

NOTES
:
  1. Except for the default processing of OTHER_ types, it does not output the command itself.
  2. If a tag action is not specified, then the default action is null (e.g., nothing will appear in the output).

Within a command table all text strings for output are enclosed within double quotes. For example:

START_TAG=     "Some "text" string\n another line of text."

Assuming that \n means a newline, when this string action is performed by LTX2X it will appear in the output file as:

Some "text" string
another line of text.

A text string starts with the first double quote and ends with the last double quote on the command line. A text string has to be written on a single line within the command table. C language special print characters can be embedded within the text string (e.g. the \n for a newline in the above example). Remember that the first section of the command table is used for specifying the particular command table version of these.

If a text string is too long to fit comfortably on a single line in the command table, it may be continued via the STRING: command. As many of these can be used in succession as required (subject to internal limitations within LTX2X).

For instance,

START_TAG=     "Some "text" string\n"
  STRING: "another line of text."
has the same effect as the previous example.

The following specification is designed to write out the contents of the \caption command (Footnote: Strictly speaking, the specification does not do this exactly, but this simplified illustration will be corrected in the next sections.) , preceded by the word `CAPTION' and followed by at least one blank line (assuming that the escape character has been set to ?).

TYPE= COMMAND
NAME= \caption
  START_TAG= "?n      CAPTION "
  END_TAG= "?n?n"
END_TYPE
Assuming that somewhere in a LaTeX file there is the command
stuff
\caption{This is a caption.}
more stuff
then the expected effect (see footnote) is
stuff

    CAPTION This is a caption.

more stuff

Argument actions

LaTeX commands can take arguments. The text for a required argument is enclosed in curly braces, while the text for an optional argument is enclosed in square brackets. LTX2X can be directed to perform actions at the start and end of each argument.

The number of required arguments is specified by the command line REQPARAMS= where the value of the command is a digit between 1 and 9 inclusive.

LTX2X assumes that a command can have only one optional argument, and that this is either first or last in the argument list. The potential presence of an optional argument is indicated by the command line OPT_PARAM=, where the value is either the keyword FIRST (for first in the list) or LAST (for last in the list).

The actions to be performed at the start and end of each required argument are specified via the commands START_TAG_1= and END_TAG_1= for the first required argument, through START_TAG_9= and END_TAG_9= for the ninth argument. The actions to be performed at the start and end of the optional argument are specified by the command lines START_OPT= and END_OPT=.

The argument delimiters (the braces or brackets) are not printed.

In the simplest case, the action is to print a specified text string (enclosed in double quotes, and continued with STRING: commands if necessary). Other kinds of actions are also possible. An unspecified tag defaults to doing no action.

Print options

Argument processing

By default, LTX2X processes (i.e. outputs as appropriate) the text of a argument. Printing of the argument text may be disabled, if required. The command line that controls argument printing is of the form PRINT_P1= through PRINT_P9= for required arguments and PRINT_OPT= for the optional argument. The value of these commands is one from several keywords, the most common being NO_PRINT; this switches off printing of the text of the indicated argument. Default printing is resumed after the indicated argument.

Continuing the caption example from earlier, we can now complete it. The full syntax of the LaTeX command is:

\caption[optional table of contents entry]{Caption in the text}
That is, it has one required argument, which prints the caption text both in the body of the document and in the table of contents, unless the first optional argument is present, in which case its value gets printed in the table of contents instead.

Assume that an instance of the caption command in a document is:

Some stuff
\caption[Short caption]{Long caption for the body of the text.}
More stuff
Recall the previous command table caption specification. The actual output from processing this would be
Some stuff

    CAPTION [Short caption]{Long caption for the body of the text.}

More stuff
because, unless LTX2X is told that there are command arguments and how they should be treated, it will just print them out together with their surrounding delimiters.

The following command table entry will give more acceptable results.

TYPE= COMMAND
NAME= \caption
  START_TAG= "?n      CAPTION "
  END_TAG= "?n?n"
  OPT_PARAM= FIRST
  PRINT_OPT= NO_PRINT
  REQPARAMS= 1
END_TYPE

For the above captioning instance, the output will now be:

Some stuff

    CAPTION Long caption for the body of the text.

More stuff

The default print mode is to print text to the output file.

The keywords that can be used to control argument printing are:

NO_PRINT
Do not print anything.
TO_SYSBUF
Print to the LTX2X system buffer.
TO_BUFFER num
Print to the LTX2X buffer number num.
TO_FILE name
Print to the file called name.
NO_OP
Skip all processing of the argument.
Note that even if the print mode is set to NO_PRINT, the argument text will still be processed. Only the NO_OP specification temporarly turns off the processing.

General printing

Just as the printing mode can be set for each argument, it can also be set at the start and end of processing a LaTeX command or environment.

The specifications PC_AT_START= and PC_AT_END= can be used to set the printing mode at the start of processing a command and at the end, respectively. The keywords that can be used in these specifications are:

NO_PRINT
Do not print anything.
TO_SYSBUF
Print to the LTX2X system buffer.
TO_BUFFER num
Print to the LTX2X buffer number num.
TO_FILE name
Print to the file called name.
RESET
Reset the print mode back to what it was.

Unlike the argument printing controls, the print mode is not automatically reset. This has to be explicitly specified.

As an example, assume that it is required to remove all figure environments from a LaTeX source and put them into a file on their own. The following command table code could be used to accomplish this.

TYPE= BEGIN_ENV
NAME= figure
  PC_AT_START= TO_FILE allfigs.tex
  START_TAG= "?n\begin{figure}"
END_TYPE

TYPE= END_ENV
NAME= figure
  START_TAG= "\end{figure}"
  PC_AT_END= RESET
END_TYPE
When a LaTeX figure environment is started, printing is switched to go to the file called allfigs.tex. At the end of the figure environment, the print mode is reset back to what it was before the environment began. If at the first figure environment the allfigs.tex file did not exist, then LTX2X would create it automatically.

Read actions

As noted above, one of the actions that can be specified for a LaTeX comand's argument is to set the print mode for printing to a buffer or a file. Similarly there are actions which will read from a buffer or a file and print the contents. Within an argument tag these kinds of actions are specified via the keyword SOURCE:. This can take one of several values:

SYSBUF
Print the contents of the LTX2X system buffer.
BUFFER num
Print the contents of the LTX2X buffer number num.
FILE name
Print the contents of the file called name.

In a previous example, the LaTeX figure environments were all written to the file allfig.tex. This file could be read in again just before the end of the document so that all figures will be typeset after everything else.

TYPE= END_DOCUMENT
  END_TAG= "?n %  figures collected here by LTX2X ?n"
    SOURCE: FILE allfigs.tex
    STRING: "?n\end{document}?n"
END_TYPE

As another example of the use of the print actions consider the LaTeX \maketitle command. This typesets the arguments of the \title, \author and \date commands, which must have been previously specified but not necessarily in this ordering. Here is one way this can be simulated using LTX2X.

TYPE= COMMAND
NAME= \title
  START_TAG=
    RESET_BUFFER: 1
  REQPARAMS= 1
  PRINT_P1= TO_BUFFER 1
END_TYPE

TYPE= COMMAND
NAME= \author
  START_TAG=
    RESET_BUFFER: 2
  REQPARAMS= 1
  PRINT_P1= TO_BUFFER 2
END_TYPE

TYPE= COMMAND
NAME= \date
  START_TAG=
    RESET_BUFFER: 3
  REQPARAMS= 1
  PRINT_P1= TO_BUFFER 3
END_TYPE

TYPE= COMMAND
NAME= \maketitle
  START_TAG= "?n"
    SOURCE: BUFFER 1
    STRING: "?n?n"
    SOURCE: BUFFER 2
    STRING: "?n?n"
    SOURCE: BUFFER 3
    STRING: "?n?n"
  END_TAG=
    RESET_BUFFER: 1
    RESET_BUFFER: 2
    RESET_BUFFER: 3
END_TYPE
For the \title command, the print mode for its argument is set for printing to the buffer number 1. The single action at the start of the command is to make sure that buffer 1 is empty (the line RESET_BUFFER: 1). The actions for the \author and \date commands are similar, except that they print their argument texts to buffers 2 and 3 respectively.

The \maketitle command takes no arguments, so all actions must be placed under START_TAG= and/or END_TAG=. There are a set of actions specified for START_TAG=. Firstly a newline is printed and this is followed by the contents of buffer 1 (i.e., the text of the argument of the \title command). Then two new lines are printed, followed by the contents of buffer 2 (the author). Finally another two newlines are printed, the contents of buffer 3 (the date), and another two newlines. The actions for END_TAG= are to clear the contents of the three buffers.

Just to extend the example, here is a specification for the LaTeX \thanks command. LTX2X is not designed to do footnoting (as it does not do page breaking) so instead the thanks text will be placed inside parentheses on a new line.

TYPE= COMMAND
NAME= \thanks
  START_TAG= "?n ("
  REQPARAMS= 1
  END_TAG= ") "
END_TYPE

Given these command table specification and the following portion of a LaTeX document

\date{29 February 2000}
\title{The Calculation of Leap Days\thanks{Originally published in JIR}}
\author{A. N. Other}
...
\maketitle
then output from LTX2X will be:
The Calculation of Leap Days
 (Originally published in JIR)

A. N. Other 

29 February 2000
Note that as the \thanks command appears within the argument of the \title command, it is written to the same place as the text of the argument of \title. Thus, it also gets written to the output file when \maketitle is processed.

Print switching

There are individual actions that enable the printing destination to be changed at will within the action set for any particular tag.

SWITCH_TO_BUFFER: num
Direct any following printing to the LTX2X buffer number num.
SWITCH_TO_FILE: name
Direct any following printing to the file called name.
SWITCH_TO_SYSBUF
Direct any following printing to the LTX2X system buffer.
SWITCH_BACK:
Undo the effect of the last SWITCH_TO... action.

As an example of the utility of this type of action, consider again the LaTeX \maketitle command. When LaTeX processes this command, it typesets the date as specified by the \date command, or if this has not been specified then it prints the current date instead. We can arrange for LTX2X to do something similar by adding the following to the command table shown earlier for the \date and \maketitle commands.

TYPE= COMMAND
NAME= \documentclass
  OPT_PARAM= FIRST
  REQPARAMS= 1
  PRINT_OPT= NO_PRINT
  PRINT_P1= NO_PRINT
  START_TAG=
      c= Initialise buffer 3 to `Today'
    RESET_BUFFER: 3
    SWITCH_TO_BUFFER: 3
    STRING: "Today"
    SWITCH_BACK:
END_TYPE
At the start of the document, the above actions put the string Today into BUFFER 3, having first ensured that it is empty. If the LaTeX source includes a \date command, then the contents of the buffer will be overwritten, otherwise it will be as initialised. In any event, when the \maketitle command is processed, the value output for the date will be either Today or whatever the argument was of the \date command.

Notes on the use of buffers and files

Resetting a buffer or a file always has the effect of emptying it of an prior contents.

When printing from a buffer or a file, the entire contents are printed. There is no limit to the number of times that a buffer or a file can be used as a printing source.

When printing to a buffer, the new strings are appended at the end of the current contents of the buffer, at least until it overflows. Unlike the behaviour of files, this is independant of any intervening prints from the buffer.

When printing to a file, the new strings are appended at the end of the current contents of the file. However, if a file is printed to after it has been printed from, the prior contents of the file are lost, and the new string is added at the start of the file. In general, it is safest to treat files as either read-only or write-only.

User specified modes

Consider the LaTeX command \\. In normal text this signifies that a line break must occur. In a tabular environment, though, it signifies the end of a row in a table. Suppose that in the LTX2X procesing of a tabular environment it is required to start and end each row with a vertical bar and to seperate each column also with a vertical bar. However, in normal text a \\ command should just translate into a newline. Just to complicate matters further, assume that in an eqnarray environment, the & column seperator is to be translated to some spaces, and that the string `(X)' is to be put at the end of every row.

In other words, we need to process some commands differently according to where they appear in the LaTeX source. An LTX2X command table provides this capability through mode setting and mode-dependent actions. Here is an example of using this facility to solve the requirements outlined above.

TYPE= BEGIN_ENV
NAME= tabular
  C= starting actions, etc., here
  END_TAG=
    SET_MODE: tabular
END_TYPE

TYPE= END_ENV
NAME= tabular
  START_TAG=
    RESET_MODE:
END_TYPE

TYPE= BEGIN_ENV
NAME= eqnarry
  C= starting actions, etc., here
  END_TAG=
    SET_MODE: eqn
END_TYPE

TYPE= END_ENV
NAME= eqnarray
  START_TAG= "    (X)?n"
    RESET_MODE:
END_TYPE

TYPE= TEX_CHAR
NAME= &
  START_TAG= "  |  "
IN_MODE= eqn
  START_TAG= "  "
END_MODE
END_TYPE

TYPE= CHAR_COMMAND
NAME= \\
  START_TAG= "?n"
IN_MODE= tabular
  START_TAG= " |?n"
    STRING: "     |  "    
END_MODE
IN_MODE= eqn
  START_TAG= "    (X)?n"
END_MODE
END_TYPE

Let us look at the specification for the \tabular environment first. The END_TAG= action is specified by the single command line SET_MODE: tabular, where tabular is any convenient name for identifying a mode. Thus, this will set the mode to be tabular. The action at the end of the environment is to reset the mode (RESET_MODE:) to whatever its previous value was. It is assumed that the last row in any tabular environment is finished by \\. Similar actions are performed for the eqnarray environment, except that the mode is called eqn instead of tabular. The other difference is that it is assumed that the last row is not ended by \\, so the end of the eqnarray environment has to also act like the \\.

Turning now to the specification for the & command, the first part of the specification identifies the type and name of the LaTeX command. This is then followed by the mode-independent set of actions, which in this case consists of printing a vertical bar with some spaces on either side of it. Following these are any mode-dependent actions, bracketed between IN_MODE= and END_MODE. The value for IN_MODE= is the name of the relevent mode. In this case the only mode-dependent action occurs when MODE eqn is in effect and it is to print some spaces instead of the default spaces and vertical bar.

The specification for the \\ command has its set of mode-independent default actions, namely just to print a newline, and two sets of mode-dependent actions. When the tabular mode is in effect, it prints some spaces, a vertical bar, a newline, more spaces, a vertical bar, and finally some more spaces. On the other hand, when the eqn mode is in effect, it prints some spaces, the string `(X)' and a newline. If a mode is in effect that is not defined within the specification (e.g., mode anon) it performs the default mode-independent actions.

As a perhaps more practical example, the following command table code will convert simple LaTeX tabular environments to appropriate mark-up for HTML tables. It is assumed that the tabular environment is always within a table environment.

To set the perspective a little, here is the code for a simple table in LaTeX:

\begin{table}[tbp]
\centering
\caption{A simple table typeset by \LaTeX.} \label{tab:lxtab}
\begin{tabular}{|l||r|r||r|r|} \hline
Stock & \multicolumn{2}{c||}{1994} & \multicolumn{2}{c|}{1995} \\ \cline{2-5}
      &  low    &  high  &   low  & high  \\ \hline
ABC   &  27     &  36    &   23   & 45     \\
DEF   &  53     &  72    &   19   & 54     \\
GHI   &  28     &  49    &   17   & 79     \\ \hline
\end{tabular}
\end{table}
This will be typeset as shown in table tab:lxtab.
(Table tab:lxtab)
A simple table typeset by LaTeX.
Stock 1994 1995
low high low high
ABC 27 36 23 45
DEF 53 72 19 54
GHI 28 49 17 79

The corresponding HTML code for the table after translation is:

<p><center><table border>
<caption>A simple table typeset by LaTeX.</caption> <a name="tab:lxtab"></a>


<tr><td> Stock </td><td colspan=2> 1994 </td><td colspan=2> 1995 </tr> <tr><td> </td><td> low </td><td> high </td><td> low </td><td> high </tr> <tr><td> ABC </td><td> 27 </td><td> 36 </td><td> 23 </td><td> 45 </tr> <tr><td> DEF </td><td> 53 </td><td> 72 </td><td> 19 </td><td> 54 </tr> <tr><td> GHI </td><td> 28 </td><td> 49 </td><td> 17 </td><td> 79 </tr> <tr><td> </table></center>

In the HTML browser that I use this is displayed approximately as shown for table tab:httab.

(Table tab:httab)
A simple table typeset after translation to HTML.
Stock 1994 1995
low high low high
ABC 27 36 23 45
DEF 53 72 19 54
GHI 28 49 17 79

In HTML a table is enclosed between <table> and </table> tags. Each row of the table is enclosed between <tr> and </tr> tags, and each element in a row is enclosed between <td> and </td> tags. Under certain circumstances the closing tags (i.e., those like </...>) can be inferred by the HTML procesors and need not be explicitly put into the source text. The equivalent HTML tags to a LaTeX \multicolumn{num}{col}{text} command are
<td colspan=num> text </td>.

The general actions that LTX2X has to perform in doing the LaTeX to HTML translation are:

We solve this last problem partly by using buffers (numbers 8 and 9 in the specification below) as temporary storage, and partly by a subtle specification for the \multicolumn command.

C=   start of a table
TYPE= BEGIN_ENV
NAME= table
  START_TAG= "<center><table border>"
  OPT_PARAM= FIRST
  C=  ignore the optional positioning argument
  PRINT_OPT= NO_PRINT
END_TYPE


C= end a table TYPE= END_ENV NAME= table START_TAG= "</table></center>" END_TYPE

C= start a tabular TYPE= BEGIN_ENV NAME= tabular START_TAG= "?n<tr><td" RESET_BUFFER: 8 RESET_BUFFER: 9 OPT_PARAM= FIRST C= ignore the optional positioning argument PRINT_OPT= NO_PRINT REQPARAMS= 1 C= ignore the column specification PRINT_P1= NO_OP END_TAG= SET_MODE: tabular PC_AT_END= TO_BUFFER 9 END_TYPE

C= end a tabular TYPE= END_ENV NAME= tabular PC_AT_START= RESET START_TAG= ">" RESET_BUFFER: 8 RESET_BUFFER: 9 RESET_MODE: END_TYPE

C= we can do some processing of the \multicolumn command TYPE= COMMAND NAME= \multicolumn PC_AT_START= TO_BUFFER 8 REQPARAMS= 2 START_TAG_1= " colspan=" PRINT_P2= NO_PRINT PC_AT_END= RESET END_TYPE

C= now for the end/start of a row TYPE= CHAR_COMMAND NAME= \\ START_TAG= "<br>" IN_MODE= tabular PC_AT_START= RESET START_TAG= SOURCE: BUFFER 8 STRING: "> " RESET_BUFFER: 8 SOURCE: BUFFER 9 END_TAG= "</tr>?n<tr><td " RESET_BUFFER: 9 PC_AT_END= TO_BUFFER 9 END_MODE END_TYPE

C= and the column seperator TYPE= TEX_CHAR NAME= & PC_AT_START= RESET START_TAG= SOURCE: BUFFER 8 STRING: "> " RESET_BUFFER: 8 SOURCE: BUFFER 9 END_TAG= " </td><td " RESET_BUFFER: 9 PC_AT_END= TO_BUFFER 9 END_TYPE

Regarding the \multicolumn specification, we state that as far as LTX2X is concerned, it only has two required parameters, and that the action for the second one is NO_PRINT. The first argument is written to buffer 8 after `colspan=' has first been put into it. LTX2X will treat the actual third argument to the \multicolumn as ordinary text, just as if there was no \multicolumn in the LaTeX source. We use buffer 9 for storing the text of a data element. When LTX2X processes a & column delimeter it first outputs the contents of buffer 8 (the number of columns specification) and then appropriate HTML characters. It then outputs the contents of buffer 9 (the element text), finishes off the element and partially starts the next element. Similar actions are performed at the start of the tabular environment and at the end of each row in the table.

Sectioning command types

LTX2X does some particular processing for sectioning command types. Although LaTeX can determine where any section of a document ends, other tagging systems cannot always do this. They require both a `begin section' and an `end section' tag. LTX2X can take account of the nesting depth of document sections and, given appropriate specifications, can supply both `begin section' and `end section' tags appropriately. This requires a little bit more in the way of specifications than we have met so far.

For a SECTIONING command type, the command line SECTIONING_LEVEL= must be included within the specification. The value of this command is a keyword from the following list.

PART
For a sectioning command equivalent to the LaTeX \part command.
CHAPTER
For a sectioning command equivalent to the LaTeX \chapter command.
SECT
For a sectioning command equivalent to the LaTeX \section command.
SUBSECT
For a sectioning command equivalent to the LaTeX \subsection command.
SUBSUBSECT
For a sectioning command equivalent to the LaTeX \subsubsection command.
PARA
For a sectioning command equivalent to the LaTeX \paragraph command.
SUBPARA
For a sectioning command equivalent to the LaTeX \subparagraph command.

When a sectioning command is read from the LaTeX source, LTX2X firstly performs the END_TAG= actions for any `lower level' sections that this one is closing off. It then performs the START_TAG= actions for the current command, and stores its own END_TAG= actions for later use. It then goes on and process any arguments as usual. The END_DOCUMENT command automatically closes off any opened sections.

As an example, assume that some kind soul has supplied a LaTeX style file that makes the commands \clause synonymous with \subsection, and \sclause synonymous with \subsubsection, etc. Also assume that it is required to output start and end tags of the form <div.1> and </div.1> for sections, <div.2> for clauses, etc., and surround the headings with tags <heading> and </heading>. Further, the first optional argument is of no interest as the output is going to be used by a processing system unable to automatically handle tables of contents. Part of an appropriate command table for doing this is:

TYPE= SECTIONING
NAME= \section
  SECTIONING_LEVEL= SECT
  START_TAG= "?n?n<div.1>?n"
  END_TAG= "?n</div.1>"
  OPT_PARAM= FIRST
  PRINT_OPT= NO_PRINT
  REQPARAMS= 1
  START_TAG_1= "<heading>"
  END_TAG_1= "</heading>?n"
END_TYPE


TYPE= SECTIONING NAME= \clause SECTIONING_LEVEL= SUBSECT START_TAG= "?n?n<div.2>?n" END_TAG= "?n</div.2>" OPT_PARAM= FIRST PRINT_OPT= NO_PRINT REQPARAMS= 1 START_TAG_1= "<heading>" END_TAG_1= "</heading>?n" END_TYPE

An example output resulting from this command table (if it had been applied to this document) is:

...
</div.2>
</div.1>


<div.1> <heading>The command table file</heading>

By default, ...

List environment types

In LaTeX the use of the \item command is restricted to within a list environment. The typeset appearance of an \item typically depends on the particular environment in which it is used. LTX2X has a limited capability of modifying its \item tagging output. It can also provide an `end item' tag for those tagging systems that require such a thing.

For such list environments, identified by the command type keyword BEGIN_LIST_ENV, the following command lines should be included within the type specification.

START_ITEM=
Actions to be performed at the start of each \item command in the list.
END_ITEM=
Actions to be performed after processing all the \item's text.
START_ITEM_PARAM=
Actions to be performed at the start of an \item's optional argument text.
END_ITEM_PARAM=
Actions to be performed at the end of an \item's optional argument text.
As usual, an unspecified tag defaults to no actions.

For example, assume that we are not interested in tagging the end of an item, but we do want to mark each item in an itemize environment with the lowercase letter `o', each enumerate item with `(N)' and put a colon after the optional argument in a description environment. Also, each item should have some indentation from the left hand margin.

TYPE= BEGIN_LIST_ENV
NAME= itemize
  START_ITEM= "?n    o "
END_TYPE

TYPE= END_LIST_ENV
NAME= itemize
END_TYPE

TYPE= BEGIN_LIST_ENV
NAME= enumerate
  START_ITEM= "?n    (N) "
END_TYPE

TYPE= END_LIST_ENV
NAME= enumerate
END_TYPE

TYPE= BEGIN_LIST_ENV
NAME= description
  START_ITEM= "?n  "
  END_ITEM_PARAM= " : "
END_TYPE

TYPE= END_LIST_ENV
NAME= description
END_TYPE

With the above commands, this LaTeX text:

\begin{description}
\item[An example]
  \begin{itemize}
  \item the first item;
  \item the second item.
  \end{itemize}
\end{description}
will be transformed into:
  An example :
    o the first item;
    o the second item.

Character types

LaTeX treats some characters specially. These special characters are: #, $, %, &, ~, _, ^, \, {, }, and, under some circumstances, also the character @. LTX2X recognizes these special characters and, if directed, will perform specified actions; otherwise it treats them as it treats any alphanumeric character, which is just to print it.

It has already been stated that commands for the left and right braces (i.e. { and }) must be given within the command table as command types LBRACE, RBRACE respectively. The dollar symbol ($) must also be specified via the two command types BEGIN_DOLLAR and END_DOLLAR. Here is an example of replacing the dollar signs by tags intended to indicate the start and end of a mathematical phrase.

TYPE= BEGIN_DOLLAR
  START_TAG= "<math>"
END_TYPE


TYPE= END_DOLLAR START_TAG= "</math>" END_TYPE

Commands for the other special LaTeX characters are specified with the TEX_CHAR command type keyword.

The characters _ (underscore) and ^ (caret) are used in LaTeX math mode to indicate subscripting and superscripting respectively. The following will replace ^ by <sup>, print the superscript text (which must be enclosed in braces (Footnote: It is good practice to always enclose superscript and subscript text in braces, even though TeX does not always require this.) ) and at the end close with </sup>.

TYPE= TEX_CHAR
NAME= ^
  START_TAG= "<sup>"
  REQPARAMS= 1
  END_TAG= "</sup>"
END_TYPE

Given the above specifications, then $(2^{15} - 1)$ will be transformed into
<math>(2<sup>15</sup> - 1)</math>.

Verbatim like types

The command type VCOMMAND is for the procesing of LaTeX \verb-like commands where the argument of the command is to be typeset as-is. For example, there might be a command called \url which takes one argument which is meant to be an Internet URL. If the application was the conversion of a LaTeX document to HTML, then the following specification could be useful.

TYPE= VCOMMAND
NAME= \url
  REQPARAMS= 1
  PRINT_P1= TO_BUFFER 7
  START_TAG=
    RESET_BUFFER: 7
  END_TAG= "<a href=""
    SOURCE: BUFFER 7
    STRING: "">"
    SOURCE: BUFFER 7
    STRING: "</a>"
    RESET_BUFFER: 7
END_TYPE

If the LaTeX source included:

... obtainable from 
\url{http://www.cdrom.com/pub/tex}
then the resulting LTX2X output would be:
... obtainable from 
<a href="http://www.cdrom.com/pub/tex">http://www.cdrom.com/pub/tex</a>
which, if this was then read via an appropriate browser, a link to the URL http://www.cdrom.com/pub/tex would be automatically established.

Similarly verbatim-like environments can also be specified with the types BEGIN_VENV and END_VENV. For example, the html.sty package defines three LaTeX environments for documents that might be converted from LaTeX tagging to HTML tagging. One of these, latexonly is for LaTeX code that is not to occur in the HTMLed document and another is htmlonly which contains HTML code that is required for an HTML version of the document but which is not to appear in the LaTeX ed document. The third one is rawhtml which is for HTML code to be output verbatim to the HTML document source. These could be simulated by:

TYPE= BEGIN_VENV
NAME= latexonly
  PC_AT_START= NO_PRINT
END_TYPE

TYPE= END_VENV
NAME= latexonly
  PC_AT_END= RESET
END_TYPE

TYPE= BEGIN_ENV
NAME= htmlonly
END_TYPE

TYPE= END_ENV
NAME= htmlonly
END_TYPE

TYPE= BEGIN_VENV
NAME= rawhtml
END_TYPE

TYPE= END_VENV
NAME= rawhtml
END_TYPE

Odd command types

The majority of commands in LaTeX that take optional arguments have only a single optional argument that is either immediately after the command or after all the required arguments. There are, however, some commands that do not fit this pattern. This set of command types enables at least some of these `odd' commands to be handled.

The command type keyword is of the form COMMAND_code, where code indicates the type and ordering of the arguments. The code is composed from combinations of the letters O (for an optional argument) and P (for a required parameter (i.e., argument)). The ordering of these letters in the code specifies the type and ordering of the command's arguments.

The `odd' command types are:

COMMAND_OOP
Corresponding to a LaTeX command of the form
\com[OptParam][OptParam]{ReqParam}. For example, the \makebox command falls into this category.

COMMAND_OOOPP
Corresponding to a LaTeX command of the form
\com[OptParam][OptParam][OptParam]{ReqParam}{ReqParam}. For example, the \parbox command falls into this category.

COMMAND_OPO
Corresponding to a LaTeX command of the form
\com[OptParam]{ReqParam}[OptParam]. For example, the \RequirePackage and \LoadClass commands fall into this category.

COMMAND_POOOP
Corresponding to a LaTeX command of the form
\com{ReqParam}[OptParam][OptParam][OptParam]{ReqParam}.

COMMAND_POOP
Corresponding to a LaTeX command of the form
\com{ReqParam}[OptParam][OptParam]{ReqParam}. For example, the \newcommand and its companion commands fall into this category.

COMMAND_POOPP
Corresponding to a LaTeX command of the form
\com{ReqParam}[OptParam][OptParam]{ReqParam}{ReqParam}. For example, the \newenvironment and its companion command fall into this category.

As usual, the command name is required, as are any actions. However, it is not necessary to specify the number of required arguments (i.e. REQPARAMS=) nor the position of the optional argument (i.e. OPT_PARAM=), as LTX2X already has this information. The tag actions are according to the argument ordering given in the code and are specified by the required argument tags (e.g. START_TAG_n= and END_TAG_n=). Do not use any of the command lines for optional arguments. Argument actions are controlled in the usual manner.

A typical example of the use of these commands is to supress any processing of the LaTeX \newcommand and its ilk. For example:

TYPE= COMMAND_POOP
NAME= \providecommand
  PRINT_P1= NO_OP
  PRINT_P2= NO_OP
  PRINT_P3= NO_OP
  PRINT_P4= NO_OP
END_TYPE

TYPE= COMMAND_POOPP
NAME= \renewenvironment
  PRINT_P1= NO_OP
  PRINT_P2= NO_OP
  PRINT_P3= NO_OP
  PRINT_P4= NO_OP
  PRINT_P5= NO_OP
END_TYPE

Other command types

The OTHER_ command types (OTHER_COMMAND, OTHER_BEGIN and OTHER_END) are very limited in what can be affected. Basically, these provide for default printing actions if the corresponding LaTeX command has not been identified elsewhere in the command table.

If there are no commands within the specification, the name of the command and all its arguments will be printed verbatim.

The command lines START_TAG= and END_TAG= cause the corresponding actions to be performed before and after the name of the command is printed. Any arguments are printed verbatim.

The command line PRINT_CONTROL= with a value of NO_PRINT causes the command name not to be printed, nor any arguments that LTX2X may find associated with the command.

Picture types

The _PICTURE_ command types differ from all the other types in LTX2X, just as they do in LaTeX. In LaTeX some of the picture drawing commands take arguments of the form (number, number), representing a coordinate pair, as well as the usual required arguments enclosed in curly braces and possibly an optional argument enclosed in square brackets. Within LTX2X, commands that take coordinate arguments are treated specially in the command table.

Generally speaking, the LTX2X command types are of the form PICTURE_code, where code indicates the type and ordering of the arguments. The code is composed from combinations of the letters C (for a coordinate argument), O (for an optional argument) and P (for a required argument). For example, PICTURE_PCOP indicates a picture command that has a required argument, followed by a coordinate argument, followed by an optional argument and finally another required argument.

The provided picture types are:

BEGIN_PICTURE_CC
Corresponding to a LaTeX command of the form
\begin{PictureEnv}(coords)(coords), where the final coordinate argument is optional.
PICTURE_CCPP
Corresponding to a LaTeX command of the form
\com(coords)(coords){ReqParam}{ReqParam}. For example, the \multiput command falls into this category.
PICTURE_CO
Corresponding to a LaTeX command of the form
\com(coords)[OptParam]. For example, the standard LaTeX \oval command falls into this category.
PICTURE_COP
Corresponding to a LaTeX command of the form
\com(coords)[OptParam]{ReqParam}. For example, the \makebox and \framebox commands fall into this category.
PICTURE_CP
Corresponding to a LaTeX command of the form
\com(coords){ReqParam}. For example, the \put, \line and \vector commands fall into this category.
PICTURE_OCC
Corresponding to a LaTeX command of the form
\com[OptParam](coords)(coords). For example, the \graphpaper command from the graphpap package falls into this category.
PICTURE_OCCC
Corresponding to a LaTeX command of the form
\com[OptParam](coords)(coords)(coords). For example, the \qbezier command falls into this category.
PICTURE_OCO
Corresponding to a LaTeX command of the form
\com[OptParam](coords)[OptParam]. For example, the \oval command from the pict2e package falls into this category.
PICTURE_PCOP
Corresponding to a LaTeX command of the form
\com{ReqParam}(coords)[OptParam]{ReqParam}. For example, the \dashbox and \savebox commands fall into this category.
END_PICTURE
Corresponding to a LaTeX command of the form
\end{PictureEnv}.

As usual, the command name is required, as are any actions. However, it is not necessary to specify the number of required arguments (i.e. REQPARAMS=) nor the position of the optional argument (i.e. OPT_PARAM=), as LTX2X already has this information. The tag actions are according to the argument ordering given in the code and are specified by the required argument tags (e.g. START_TAG_n= and END_TAG_n=). Do not use any of the command lines for optional arguments. Argument actions controlled in the usual manner.

As an example, the following specifications within a command table should be sufficient to ensure that any picture commands in a source file are not passed through to the output file.

TYPE= BEGIN_PICTURE_CC
NAME= picture
PRINT_P1= NO_PRINT
PRINT_P2 = NO_PRINT
END_TYPE

TYPE= PICTURE_CP
NAME= \put
PRINT_P1= NO_PRINT
PRINT_P2= NO_OP
END_TYPE

TYPE= PICTURE_CCPP
NAME= \multiput
PRINT_P1= NO_PRINT
PRINT_P2= NO_PRINT
PRINT_P3= NO_OP
PRINT_P4= NO_OP
END_TYPE

TYPE= PICTURE_PCOP
NAME= \savebox
PRINT_P1= NO_OP
PRINT_P2= NO_PRINT
PRINT_P3= NO_OP
PRINT_P4= NO_OP
END_TYPE

TYPE= PICTURE_OCC
NAME= \graphpaper
PRINT_P1= NO_OP
PRINT_P2= NO_PRINT
PRINT_P3= NO_PRINT
END_TYPE

TYPE= PICTURE_OCCC
NAME= \qbezier
PRINT_P1= NO_OP
PRINT_P2= NO_PRINT
PRINT_P3= NO_PRINT
PRINT_P4= NO_PRINT
END_TYPE

TYPE= END_PICTURE
NAME= picture
END_TYPE

NOTE 1:
The action NO_OP cannot be applied to an argument that is a coordinate pair.

NOTE 2:
As LTX2X is essentially limited to printing actions, and cannot actually process any LaTeX picture drawing commands, the suppression of picture printing is probably the most usual use of the picture commands.

Special command types

The SPECIAL_ commands, namely SPECIAL_COMMAND, SPECIAL_BEGIN_ENV, SPECIAL_BEGIN_LIST, SPECIAL_END_ENV, SPECIAL_END_LIST and SPECIAL_SECTIONING, are provided for cases where some special kind of output processing is required that is not built into LTX2X. In order to implement any commands of these types, it is necessary to modify the internals of LTX2X and recompile the source code. This is not recommended.

File inclusion

A command table file can include other command table files. In turn an included file can recursively include other command table files. The file inclusion command line is

INCLUDE= FileName
where FileName is the name of the command table file to be included. The effect is that the above line is replaced by the contents of FileName.

For example, assume that there are three command table files called respectively detex.ct, detex.l2x and detex.fl. The contents of these files are:

C=  ----------file detex.ct
...
INCLUDE= detex.l2x
...
C= ----------end of detex.ct
END_CTFILE=        end of detex.ct
and for detex.l2x as:
C= ---------- file detex.l2x

TYPE= COMMAND
NAME= \lx
  START_TAG= "LTX2X"
END_TYPE

INCLUDE= detex.fl

TYPE= COMMAND
NAME= \ctab
  START_TAG= "command table"
END_TYPE

C= ---------- end of file detex.l2x
END_CTFILE=         end of file detex.l2x
and lastly detex.fl is:
C= ---------- file detex.fl

TYPE= COMMAND
NAME= \fl
  START_TAG= "FLaTTeN"
END_TYPE

C= ---------- end file detex.fl
END_CTFILE=         end file detex.fl
Then, as far as LTX2X is concerned, the original detex.ct file is treated as though it had been written as:
C=  ----------file detex.ct
...
C= ---------- file detex.l2x

TYPE= COMMAND
NAME= \lx
  START_TAG= "LTX2X"
END_TYPE

C= ---------- file detex.fl

TYPE= COMMAND
NAME= \fl
  START_TAG= "FLaTTeN"
END_TYPE

C= ---------- end file detex.fl

TYPE= COMMAND
NAME= \ctab
  START_TAG= "command table"
END_TYPE

C= ---------- end of file detex.l2x
...
C= ----------end of detex.ct
END_CTFILE=        end of detex.ct

Note that nasty things will happen if you have a cycle of inclusions. That is, you must not have anything similar to file A including file B which includes file C which in turn includes either file A or B.

Interpreter commands

LTX2X includes an interpreter for a procedural programming language that is based on the ISO international standard EXPRESS information modeling language [EBOOK, EXPRESSIS]. At the moment the programming language within LTX2X is anonymous, but for ease of reference I will call it EXPRESS-A (EXPRESS -Almost? -Approximate? -Anonymous?). The EXPRESS-A language is described later in section sec:expressa, but for now it is sufficient to know the commands that signify the start and end of this code.

The command CODE_SETUP= indicates the commencement of code to be run before any document processing occurs. The END_CODE command signifies the end of this code block. This block should be placed in the command table before any other commands except for the ESCAPE... commands, if any. This block can contain variable declarations, function and procedure declarations, and statements.

Code consisting purely of statements can be placed anywhere that a tagging action may be specified. These statements are enclosed between a CODE: and END_CODE pair of commands.

The EXPRESS-A language is described in detail in sec:expressa, but to give a flavour of it here is a simple possible application. It has been noted that LTX2X will find difficulty in processing the contents of the LaTeX picture environment. The following portions of a command table write the contents of a figure environment to an external file and uses the programming language to keep a count of the number of figures so processed.

c=  declare and initialise a variable
CODE_SETUP=
  LOCAL
    fignum : INTEGER;
  END_LOCAL;
  fignum := 0;
END_CODE

c= write figure contents to an external file
TYPE= BEGIN_ENV
NAME= figure
  OPT_PARAM= FIRST
  PRINT_OPT= NO_PRINT
  PC_AT_START= TO_FILE figs.tmp
  START_TAG=
    CODE:
      fignum := fignum + 1;           -- increment figure counter
      println;                        -- print a blank line
      println('%%% FIGURE ', fignum); -- write counter as a LaTeX comment
    END_CODE
    STRING: "\begin{figure}"
END_TYPE

c= close figure environment, back to normal output, and output
c= text indicating that a figure should be here
TYPE= END_ENV
NAME= figure
  PC_AT_START= RESET
  START_TAG=
    SWITCH_TO_FILE: figs.tmp
    STRING: "\end{figure}?n?n"
    SWITCH_BACK:
    CODE:
      println;
      println('PLACE FOR FIGURE ', fignum);
      println;
   END_CODE
END_TYPE

The LTX2X program

LTX2X is written using flex and bison. The resulting C code should compile on any system. More details are given later, but for the end-user the next section describes how to run the program, assuming that is available on your system.

Running LTX2X

The syntax for running the compiled version of LTX2X is:

ltx2x [-c] [-f table-file] [-p number] [-w] [-D dir_cat_char]
      [-P path_seperators] [-S]
      [-i number] [-l number] [-t] [-y number] [-C] [-E] 
      input-file output-file
where elements in square brackets are options. The options fall into two groups, one for the casual user and the other for those who may be interested in the internals of LTX2X. The first group of options includes:
-c
By default, LTX2X ignores all LaTeX comments in the input file. This option causes LTX2X to write the comments to the output file.

-f
By default, LTX2X reads the command table from a file called ltx2x.ct. If the required command table is in a file with another name this option is used to change from the default file. For example,
> ltx2x in.tex out.l2x
reads a command table from ltx2x.ct, while
> ltx2x -f detex.ct in.tex out.l2x
reads a command table from file detex.ct.

-p
This option causes LTX2X to `pretty print' the output file (as far as it is able to). The number is required and it indicates the desired maximum number of characters per output line. If this is considered to be too small, then LTX2X chooses a value. Note that pretty printing is only applied to the source file --- not to any replacement tags. That is, it only tries to format the running text from the source file.

-w
By default, LTX2X outputs source white space just at it reads it. This option causes LTX2X to collapse any amount of contiguous white space to a single space. The -p option includes the -w option.

-D
The value of this option is the character that the operating system uses to catenate directory names to form a path (see sec:search). The default value is a slash (i.e. /). The default could be changed to a backslash, for example, by -D \.

-P
The environment variable (see sec:search) contains a list of directories (also known as path names). In the operating system that I use, these are separated by the colon (:) character which, together with the semi-colon and space characters, form the LTX2X default separators. The path separator characters can be changed with this option. For example, -P : will make the separators be a colon or a space (space is automatically included in the separator list).

-S
This option enables the source level debugger (see sec:sld) for any embedded EXPRESS-A code.

The second group of options are principally for those who might be extending the LTX2X system.

-i
This produces information that may be useful for debugging the EXPRESS-A interpreter. number is an integer between 1 and 9 inclusive. The greater the number, the more diagnostics are generated.

-l
This produces information that may be useful for debugging the LTX2X program. number is an integer between 1 and 9 inclusive. The greater the number, the more diagnostics are generated.

-t
This generates diagnostics related to the processing of the command table file.

-y
Like the l option, but produces diagnostic information from the parser (this is actually a null option, but may be useful in the future).

-C
Disable any interpreter debugging information during the code generation pass. This is not necessary unless the -i option is used.

-E
Disable any interpreter debugging information during the code execution processing.This is not necessary unless the -i option is used.

LTX2X first reads the specified command table file, together with any included files, looking first in the current directory, then in the directories specified by the environment variable (if it exists). It then reads the input-file from the current directory, performs the actions specified in the command table and outputs the results to the output-file.

Three other files are also generated.

When LTX2X is running normally it prints out a counter to the terminal indicating how many hundreds of input source file lines it has processed. Lack of such output is an indication that the program may be in a loop and chewing up CPU cycles to no avail. In this case, stop the program and examine the output for indications of where the trouble is occurring.

A limited number of errors are allowed when processing the command table and the input LaTeX file before LTX2X gives up and quits. In particular, if it is reading a command table file that includes another file, say one called zilch, that it cannot read, it prints the following message to the user's terminal.

Can't open file zilch
Enter new file name, or I to ignore, or Q to quit
: 
A Q (or q) response stops LTX2X from any further processing. An I (or i) response causes LTX2X to stop looking for the included file and continue processing the current file. Any other response is taken to be the name of an included file, which LTX2X then tries to read. If it fails, then the above message is repeated. The user is given a limited number of opportunities to identify a readable file before LTX2X quits altogether with this message:
Last attempt. Can't open file zilch. I'm giving up.

Regarding performance, the time taken by LTX2X to process a document does not appear to be significantly different from the time to LaTeX the same document.

Directory searching

The program employs a search algorithm to find files that are not in the current directory. It first looks in the current directory and if a file of the given name is found, then that is used. If the file is not found, then it searches for it among directories that are specified in a system environment variable. This variable specifies a list of pathnames, where the directories forming the path are combined using a catenation character. For example, dir1/dir2/dir3 could be a pathname, where the slash (/) is the catenation character. If it is looking for file afile.txt it will catenate the file name to the path name (e.g. dir1/dir2/dir3/afile.txt) and look for that. The pathnames in the list are separated by another character (in fact it can be one from a list of characters). For example here is a list of two pathnames; dir1/dir2;dir1/dir4, where the semi-colon (;) is the pathname separator.

By default, the program uses a slash (/) as the directory catenation character and the pathname separators can be a space, or a colon or a semi-colon (i.e., any of :;). All these characters can be altered via the program command line options, and should be set to match the conventions of your operating system.

The environment variable used by the program is LTX2XTABLES. On the operating system that I use, I set this in my login file like:

setenv LTX2XTABLES .:/dir1/dir2:/dir3/dir4
Your system may have different conventions. Note that if the environment variable is not set, only files in the current directory are considered.

System components

The system consists of five main components --- a lexer, a parser, a library of support functions and command table parsing code, a user-defined library of functions, and an interpreter for the EXPRESS-A language.

The lexer

The lexer is generated by flex [LEVINE92] (a more functional version of lex [LESK75]). The source for the lexer is in file l2x.l. Its principal function is to read a LaTeX source file and recognize LaTeX commands. In general, it passes off the relevant command tokens to the parser for performing appropriate actions.

However, the lexer does do some processing of the source itself.

The lexer is designed to recognize four kinds of LaTeX commands.

When it finds a command, it looks up the command or environment name in the command table and sends the appropriate token and its command table location to the parser. As a special case the contents of verbatim-like environments and the argument of verb-like coommands are processesd within the lexer and not sent to the parser.

The parser

The parser is generated by bison [LEVINE92] (a more powerful version of YACC [JOHNSON75]). The source for the parser is in file l2x.y. Essentially it defines a very simple grammar for a LaTeX document. That is, the grammar is limited to generic kinds of commands and command arguments. It does not understand the `meaning' of any of the commands or arguments.

When the parser receives a token from the lexer it tries to match it with one of the grammar rules, performing the actions specified by the command table. Here is an extract from the parser grammar file, l2x.y, for a LaTeX command that has two required arguments followed by an optional argument.

l2xComm2Opt: COMMAND_2_OPT
        {
          start_with_req($1);
        }
        ReqParam
        {
          action_p_p1($1,1);
        }
        ReqParam
        {
          action_p_opt($1,2);
        }
        OptParam
        {
          action_last_opt($1);
        }
        ;

The actions are enclosed in braces, and are interspersed with the elements of the grammar.

The token COMMAND_2_OPT indicates that the lexer has found a command that takes two required arguments followed by an optional argument. The parser then performs some actions. The start_with_req function is the standard LTX2X function for the first action in a command production where the final argument is optional. The $1 refers to the location of the particular command in the command table, and its value is passed to the parser by the lexer.

The parser then expects a required argument (i.e. {, token LBRACE) as the start of the required argument, followed by the text of the argument and finished off by a right brace (i.e. }, token RBRACE); the grammar for all of this is specified in the production called ReqParam). If it finds these it performs some further actions, otherwise it reports an error. In this case the action is defined by the function action_p_p1, which is the standard action performed between two required arguments (the second argument in the function call specifies the Pth argument that has been recognized). Another required argument is then expected. In this case the action is defined by the function action_p_opt, which is the standard action performed between the end of the Pth required argument and the start of an optional argument. It then looks for an optional argument, the grammar for which is specified in the production called OptParam. The final action is specified by the standard function action_last_opt for finishing off a command that ends with an optional argument.

The grammar for a command that that has two required arguments, and possibly an initial optional argument is similar:

l2xComm2: COMMAND_2
        {
          start_with_opt($1);
        }
        OptParam
        {
          action_opt_first($1);
        }
        ReqParam
        {
          action_p_p1($1,1);
        }
        ReqParam
        {
          action_last_p($1,2);
        }
        ;

The support libraries

Source code for the C main program and support functions is in file l2xlib.c. The main program is responsible for reading in the command table and calling the lexer and parser to do the appropriate processing. The file also contains a variety of support functions that are, or could be, used in the lexer, parser, action library, or user-defined library.

The standard actions for the grammar are contained in file l2xacts.c.

The user-defined library

The intent of this library is that masochistic users can define their own functions for use within LTX2X when processing their SPECIAL_ commands, without having to modify the LTX2X support or action libraries.

Source code for the user-defined library should be maintained in a file called l2xusrlb.c and a corresponding header file called l2xusrlb.h.

The EXPRESS-A interpreter

The EXPRESS-A interpreter is based on algorithms originally developed by Ronald Mak [MAKR91] for interpreting Pascal. His original algorithms have been modified and extended to cater for EXPRESS-A. The interpreter module has a minimal interface with the rest of the LTX2X system, and could easily be modified to be a stand-alone program (in fact it started that way in the first place). The interface between LTX2X and the interpreter is confined to the small l2xistup.c file.

The EXPRESS-A programming language

EXPRESS is a language for information modeling and includes both declarative and procedural aspects [EBOOK]. There are also two other companion languages called respectively EXPRESS-G and EXPRESS-I. The former of these is a graphical form of the declaritive aspects of EXPRESS, and the later is an instiation and test case specification language. These languages are either ISO international standards [EXPRESSIS] or on the way to becoming so [EXPRESSITR].

Certain of the procedural aspects of EXPRESS and EXPRESS-I are relevent to the LTX2X concepts and so, together with some other reasons, it seemed appropriate to provide an interpreter for a similar language for use within LTX2X. EXPRESS-A provides a major subset of the EXPRESS procedural language, together with some Pascal-like additions for input and output. Of particular note, strings are a built-in type in EXPRESS-A. The language also supports three-valued logic and the concept of an `indeterminate' value of any type.

Earlier I gave an example command table to replace the text of a LaTeX document with the words `Goodbye document'. Here is an EXPRESS-A program that outputs `Goodbye document'.

println('Goodbye document');
END_CODE

The following gives a brief overview of EXPRESS-A. For more details consult Schenck & Wilson [EBOOK].

Basic elements

EXPRESS-A is a case-insensitive language and uses the ASCII character set. Two kinds of comments are supported --- an end of line comment, which starts with a -- pair and continues until the end of the current line --- and an extended comment. An extended comment starts with a (* pair and is ended by a matching *) pair; extended comments may be nested.

The language contains many reserved words, some of which are only applicable to the EXPRESS and EXPRESS-I languages.

Identifiers are composed of an initial letter, possibly followed by any number of letters, digits, and the underscore character.

Literals are self defining constant values. An integer literal consists of one or more digits, the first of which shall not be zero. Real numbers start with one or more digits, followed by a decimal point. Further digits may occur after the point, and finaly there may be an exponent in the `e' notation format (e.g., 123.456e-78).

A string literal is any sequence of characters enclosed by single quote marks. If a single quote mark is meant to form part of the string, two quote marks must be used at that point.

Logical literals consists of one of these keywords: FALSE, UNKNOWN or TRUE.

EXPRESS-A also includes some other constants. PI stands for the value of the mathematical constant (3.1415...), and CONST_E stands for the value of the mathematical constant e (2.7182...), the base of natural logarithms. The special token ? stands for an indeterminate value of any type. The three constants THE_DAY, THE_MONTH and THE_YEAR are integer values for the current date holding the day of the month (1--31), the month of the year (1--12) and the year (four digits), respectively.

Data types

EXPRESS-A is a typed language. The simple data types are: INTEGER, REAL, STRING and LOGICAL.

The aggregation data types are ARRAY, BAG, LIST, and SET. The array data type is of a fixed size and must have declared lower and upper bounds (index range), such as ARRAY [-7:10] OF. The other aggregate data types are dynamic in size, but may have lower and upper bounds specified for the number of elements, such as SET [2:5] OF, meaning a set that should have between two and five members. For the dynamic aggregates the upper bound may be given as ?, which means an unlimited upper bound, such as LIST [2:?] OF. If a bound specification is absent, then the dynamic aggregate can hold from zero to any number of elements. (Footnote: The dynamic aggregates may not be fully implemented due to lack of time.)

Aggregates are one dimensional, but can be chained together for multi-dimensional aggregates, like

ARRAY [1:4] OF LIST OF INTEGER;

The enumeration data type is a parenthesised comma seperated list of identifiers. These identifiers represent the values of the enumerated type; for instance

ENUMERATION OF (red, green, blue)

A defined data type is one declared and named by the user using the TYPE and END_TYPE construct. For example

TYPE length = REAL; END_TYPE;
TYPE crowd_size = INTEGER; END_TYPE;
TYPE signal_colour = ENUMERATION OF (red, amber, green); END_TYPE;

An entity data type consists of a list of attributes and their types, enclosed in a ENTITY and END_ENTITY pair. An entity type is named.

ENTITY an_ent;
  auditorium_width : length;
  audience         : crowd_size;
  title            : STRING;
  profit           : REAL;
END_ENTITY;

EXPRESS-A provides for algorithms in the form of functions and procedures.

A FUNCTION is an algorithm that operates on parameters and returns a single resultant value of a specified data type. An invocation of a function in an expression evaluates to the resultant value at the point of invocation. For example:

FUNCTION func (par1 : INTEGER; par2 : STRING) : STRING;
  LOCAL
    str : STRING;
    -- other variable declarations
  END_LOCAL;
  -- the algorithm statements are here
  RETURN(str);
END_FUNCTION;
Note that the parameters are typed.

A PROCEDURE is an algorithm that receives parameters from the point of invocation and operates on them in some manner. Changes to the parameters within the procedure are only reflected to the point of invocation when the formal parameter is preceded by the keyword VAR. For example:

PROCEDURE proc (par1 : INTEGER; VAR par2 : STRING);
  -- local declarations and the algorithm statements
END_PROCEDURE;
Note that the parameters are typed. In this case the value of par2 may be changed.

Variables are declared in a local block, enclosed by the keywords LOCAL and END_LOCAL. A variable declaration consists of an identifer and its type, such as:

LOCAL
  str    : STRING;
  e1, e2 : an_ent;     -- e1 and e2 are both of type an_ent
  e3     : an_ent;     -- so is e3
  num    : INTEGER;
  col    : signal_colour;
  matrix : ARRAY [1:15] OF ARRAY [1:15] OF REAL;
END_LOCAL;

The above declarations must be in the following order:

  1. ENTITY and/or TYPE declarations
  2. FUNCTION and/or PROCEDURE declarations
  3. a LOCAL declaration block

After the above can come any number of statements.

Statements

EXPRESS-A supports the following statements:

All the above statements are completed by a ; (semicolon). The null statement just consists of a semicolon.

The assignment statement is used to assign an instance to a local variable or parameter. The data types must be compatible.

LOCAL
  a, b, c : REAL;
END_LOCAL;
...
  a := 2.3E-6;
  b := a;
  a := -27.0;
  c := 33.3*b;

The call statement invokes a procedure or a function. The actual parameters provided with the call must agree in number, order and type with the formal parameters specified in the procedure or function declaration. The supplied parameter values must be assignment compatible with the formal parameters. This is an example of calling the EXPRESS-A defined INSERT procedure which takes three parameters:

INSERT(my_list, list_element, 0);

The compound statement consists of one or more statements enclosed between a BEGIN and END pair. The enclosed statements are treated as a single statement.

...
  BEGIN
    a := 2.3e-7;
    b := a;
    c := b*33.3;
  END;

The case statement is a means of selectively executing statements based on the value of an expresion.

LOCAL
  a : INTEGER;
  x, y : REAL;
END_LOCAL;
...
  a := 2;
  x := 21.9;
  CASE 2*a OF
    1         : x := SIN{x};
    2         : x := SQRT(x);
    3         : x := LOG(x);
    4         : x := COS(x);  -- this is executed
    5, 6      : y := y**x;
    OTHERWISE : x := 0.0;
  END_CASE;
The integer expression following the CASE keyword is evaluated. The result is compared to the values of the case labels and the statement following the first matching label is executed. Execution then continues at the statement following the END_CASE;. If no label matches, then no statements within the case block are executed, except if an OTHERWISE label is included, which will match anything. All other labels are examined before looking for the OTHERWISE.

The if ... then ... else statement allows the conditional execution of statements depending on the value of a LOGICAL expression. When the expression evaluates to TRUE the statement(s) following the THEN are executed, after which control passes to the statement following the closing END_IF. When the logical expression evaluates to FALSE or UNKNOWN the THEN statements are jumped over and execution starts at the statement(s) following the ELSE keyword if present, or at the statement following the END_IF keyword.

IF a > 20 THEN
  b := a + 2;
  c := c - 1;
ELSE
  IF a > 10 THEN
    b := a + 1;
  ELSE
    c := c + 1;
  END_IF;
END_IF;

The repeat statement is used to control the conditional repetition of a series of statements. The control conditions are:

REPEAT i := 100 TO 0 BY -7 WHILE r >= 0.0 UNTIL err < 1.0e-8;
  ...
  r := ...;
  err := ...;
END_REPEAT;
At entry to the REPEAT statement the iteration variable is initialized to the first bound. If the variable less than or equal to the TO bound and the increment is positive, or the variable is less than the TO bound and the increment is negative, processing jumps to after the END_REPEAT, otherwise processing continues. The WHILE condition is checked and if TRUE then the statements in the body are executed. After these have been executed the UNTIL condition is checked. If this is not TRUE then processing continues by incrementing the iteration variable by either unity or by the BY value if present. The whole process then starts again with the checking of the iteration variable against the TO bound.

All three types of controls are optional. If none are given then the REPEAT statement will loop for ever. The escape statement causes an immediate transfer out of the REPEAT statement in which it occurs. The skip statement causes a jump to the end of the REPEAT statement in which it occurs (i.e., to the point where the UNTIL expression is tested).

REPEAT UNTIL (a = 1);
  ...
  IF a = 0 THEN 
    ESCAPE;
  END_IF;
  ...
  IF a > 10 THEN
    SKIP;
  END_IF;
  ...
  ...
-- SKIP transfers control to here
END_REPEAT;
 -- ESCAPE transfers control to here

The return statement terminates the execution of a FUNCTION or PROCEDURE. The RETURN statement within a function must specify an expression, the value of which is the value returned by the function. A RETURN in a procedure must not specify an expression.

RETURN(a <> b);  -- example for within a function
RETURN;          -- example for within a procedure

Expressions

Expressions are combinations of operators, operands and function calls which are evaluated to produce a value. The simplest expression is either a literal value or the name of a variable.

Arithmetic operators

The arithmetic operators act on number values and produce a number result. If any operand is indeterminate (i.e., ?) then the result is also indeterminate. The operators are:

Unary
The operators + and -, the latter of which negates its following operand.
Binary
Addition (+), subtraction (-), multiplication (*), real division (/), exponentiation (**), integer division (DIV), and modulo (MOD).

Relational operators

The result of a relational expression is a LOGICAL value. If either operand is indeterminate, the expression evaluates to UNKNOWN.

Value comparison
Equal (=), not equal (<>), greater than (>), less than (<), greater than or equal (>=), and less than or equal (<=).

Membership
The IN operator tests an item for membership in a dynamic aggregate (e.g., IF fred IN mylist THEN ...).

Matching
The LIKE operator compares a string against a pattern, evaluating to TRUE if they match. The pattern characters are:

Some examples:

Logical operators

The logical operators produce a logical result. Except for the NOT operator which takes one logical operand (e.g., NOT op), they take two logical operands (e.g., op1 XOR op2).

The evaluation of the NOT operator is given in table tab:not.

(Table tab:not)
The NOT logical operator
Operand value Result value
TRUE FALSE
UNKNOWN UNKNOWN
FALSE TRUE

The evaluation of the AND, OR and XOR operators is given in table tab:andorxor.

(Table tab:andorxor)
The AND, OR and XOR logical operators
Op1 Op2 Op1 AND Op2 Op1 OR Op2 Op1 XOR Op2
TRUE TRUE TRUE TRUE FALSE
TRUE UNKNOWN UNKNOWN TRUE UNKNOWN
TRUE FALSE FALSE TRUE TRUE
UNKNOWN TRUE UNKNOWN TRUE UNKNOWN
UNKNOWN UNKNOWN UNKNOWN UNKNOWN UNKNOWN
UNKNOWN FALSE FALSE UNKNOWN UNKNOWN
FALSE TRUE FALSE TRUE TRUE
FALSE UNKNOWN FALSE UNKNOWN UNKNOWN
FALSE FALSE FALSE FALSE FALSE

Miscellaneous

Function call
A function may be called without the result necessarily being assigned to a variable. If fun is a function with two arguments (for simplicitly integer arguments) and returning a logical value, then
log := fun(i1, i2);
fun(i3, 24*i4);
are both legitimate calls.

Dot operator
The dot operator is used to access an attribute from an entity. If ent is an ENTITY type with an attribute att, then ent.attr evaluates to the value of the attr attribute within the ent.

String operators

The + operator takes two strings as its operands and evaluates to the string that is the concatenation of its operands. For example:
str1 := 'string1';
str2 := 'string2';
str1 := str1 + str2;
-- str1 = 'string1string2'   is TRUE

The substring operator [i1:i2] is a postfix operator that when applied to a string, evalutes to the string whose characters are composed of the i1'th through the i2'th characters, inclusively, of its operand. Note that i2 must be greater than or equal to i1, and both must be within the limits of the number of characters in the string. For example:

str1 := 'string';
str2 := str1[2:4];
str1 := str1 + str2;
-- str1 = 'tristring'   is TRUE

Aggregate operators

The index operator [i] is a postfix operator that can be applied to an aggregate operand; the expression evaluates to the value of the aggregate at the index position. For example, if lagg is a list of integers:
insert(lagg, 20, 0);
insert(lagg, 40, 0);
insert(lagg, 60, 0);
insert(lagg, 80, 0);
-- lagg[2] = 60    is TRUE

Interval expression

An interval expression is a LOGICAL expression consisting of three operands and two operators. It has the form:
{ low op1 test op2 high }
where op1 and op2 are either of the two relational operators < or <=, and low, test and high are expressions of the same type. The interval expression is equivalent to:
((low op1 test) AND (test op2 high))
The value of the interval expression is given by
  1. If any operand is indeterminate, then it evauates to UNKNOWN.
  2. If either of the logical relationships evaluates to FALSE, then it evauates to FALSE.
  3. If both logical relationships evalute to TRUE, then it evauates to TRUE.
  4. Otherwise it evaluates to UNKNOWN.
For example:
i := 10;
{1 <= i < 20}  -- is TRUE
{1 <= i < 10}  -- is FALSE
i := ?;
{1 <= i < 10}  -- is UNKNOWN

Built in procedures and functions

Procedures

The following procedures are an integral part of EXPRESS-A. They are shown as signatures to inidicate the data types of the formal parameters. For convenience, GENERIC is used to indicate any type.

Functions

The following functions are supplied as part of EXPRESS-A. They are exhibited as signatures to show the formal parameters. For convenience, NUMBER is being used to denote either an INTEGER or a REAL number.

Source level debugger

The EXPRESS-A interpreter includes a source level debugger for use when your code appears to be misbehaving. When in operation the debugger will prompt for a command to be entered. It understands the following commands.

Example EXPRESS-A code

The following demonstrates most of the functionality of EXPRESS-A. Most of this is not particularly interesting, except possibly for the algorithms for calculating the date of Easter and for generating magic squares.

      c=        fun.ct  Test of CODE ltx2x

CODE_SETUP=
  ENTITY ent;
    attr1, attr3 : INTEGER;
    attr2 : STRING;
  END_ENTITY;

  TYPE joe = INTEGER;
  END_TYPE;

  TYPE colour = ENUMERATION OF (red, blue, green);
  END_TYPE;


PROCEDURE easter;
(* calculates the date of Easter for the present year 
   The algorithm can be applied to any year between 
   1900 and 2099 inclusive, but if so, then the year
   should be checked to ensure that it is within this range. *)
  LOCAL
    n, a, b, m, q, w : INTEGER;
    day : INTEGER;
    month : STRING;
  END_LOCAL;

  n := THE_YEAR - 1900;
  a := n MOD 19;
  b := (7*a + 1) DIV 19;
  m := (11*a + 4 - b) MOD 29;
  q := n DIV 4;
  w := (n + q + 31 - m) MOD 7;
  day := 25 - m - w;
  month := 'April';
  IF (day < 1) THEN
    month := 'March';
    day := day + 31;
  END_IF;
  writeln('In ', THE_YEAR:1, ' Easter is on ', month,  day:3);
END_PROCEDURE;


FUNCTION magic_square(order:INTEGER): LOGICAL;
(* calculates magic squares from order 1 through 15.
   The order must be an odd number. *)
  LOCAL
  row, col, num : INTEGER;
  sqr_order : INTEGER;
  magic : ARRAY[1:15] OF ARRAY[1:15] OF INTEGER;
  END_LOCAL;

  IF (order > 15) THEN  -- only squares up to order 15
    RETURN(FALSE);
  ELSE
    IF (order < 1) THEN -- squares have at least one entry
      RETURN(FALSE);
    ELSE
      IF (NOT ODD(order)) THEN -- squares are odd
        RETURN(FALSE);
      END_IF;
    END_IF;
  END_IF;

  sqr_order := order**2;
  row := 1;
  col := (order + 1) DIV 2;
  REPEAT num := 1 TO sqr_order;
    magic[row][col] := num;
    IF ((num MOD order) <> 0) THEN
      IF (row = 1) THEN row := order; ELSE row := row - 1; END_IF;
      IF (col = order) THEN col := 1; ELSE col := col + 1; END_IF;
    ELSE
      IF (num <> sqr_order) THEN row := row +