The DOX Text Processor

User Manual

Version 0.1

© 2006 Emanuele Altieri <emaalt@users.sourceforge.net>

Table of Contents:

1  Introduction

1.1  What is DOX?

1.2  Output Formats

1.3  System Requirements

1.4  License Information

2  The Document

2.1  Overview

2.2  Configuration Parameters

2.3  Title of the Document

2.4  Table of Contents

2.5  Sections

2.5.1  Paragraphs

2.5.2  Lists

2.5.3  Figures

2.5.4  Tables

2.5.5  Preformatted Blocks

2.6  Formatting Words or Sentences

2.7  Labels and References

2.7.1  Labels

2.7.2  References

2.8  Comments

1  Introduction

1.1  What is DOX?

DOX is a document processing system inspired by TeX, but much more lightweight and simple to use. Similarly to TeX, documents are written in plain text. However, DOX focuses on keeping the source text as simple as possible. This is achieved by replacing obscure formatting symbols and constructs with more "natural" alternatives.

For instance, paragraphs do not need to be enclosed in any kind of construct. Paragraphs are simply blocks of text separated by empty lines:

Linux is a computer operating system and its kernel. It is one of the most
prominent examples of free software and of open-source development; unlike
proprietary operating systems such as Windows, all of its underlying source
code is available to the public for anyone to freely use, modify, improve, and
redistribute.

In the narrowest sense, the term Linux refers to the Linux kernel, but it is
commonly used to describe entire Unix-like operating systems (also known as
GNU/Linux) that are based on the Linux kernel combined with libraries and...

The instructions needed to format a document are not included in the source file. Doing so would make the text much harder to read. Instead, they are specified in an external entity. For instance, if the selected output format is HTML, formatting is done entirely through cascading style sheets (CSS).

Another example illustrating the simplicity of DOX is the list construct. Again, DOX uses a "natural" representation for lists:

This is the paragraph preceeding the list.

   - This is the first item in the list.
   
   - This is the second item in the list.
   
     -- This is a subitem of the previous point.
   
     -- Another subitem.
   
   - Back to the original level.

This is the paragraph following the list.

Each list item is separated by empty spaces. The white spaces preceeding the "-" symbols are optional and help make the document more clear.

Sections in a document must begin with one or more "=" symbols, followed by the title of the section:

= History                                                          [<- history]

== Licensing

The Linux kernel, along with most of the GNU components, is licensed under the
GNU General Public License (GPL). The GPL requires that all source code
modifications and derived works also be licensed under the GPL, and is
sometimes referred to as...

== Pronunciation

Linux is most commonly pronounced either to rhyme with minix, or to sound like
lie-nix. The first pronunciation is considered more correct, while the second
has become popular for sounding more natural in English...

Notice that a label has been defined for the History section. A label is defined by simply embedding a [<-LabelName] token anywhere in the text of the element. When referencing a label, the direction of the arrow is reversed (->), as shown below:

The history of the Linux operating system is described in [->history].

Document composition is explained in Section 2.

1.2  Output Formats

The current version of the DOX text processor (version 0.1) only supports one output format, XHTML. We hope to be able to support more output formats in future releases.

1.3  System Requirements

The DOX processor is written in Java 1.5, so an up-to-date Java Runtime Environment (JRE) is required in order to run the processor. JREs exist for virtually any architecture and operating system. They can be downloaded from the Sun website at http://java.sun.com/j2se/1.5.0/download.jsp.

1.4  License Information

The DOX processing system source code is released under the GNU Public License, version 2. This documentation is released under the GNU Free Documentation License, version 1.2.

2  The Document

2.1  Overview

The general structure of a DOX source document is very simple. A document is divided into the following blocks, in order of appearance:

A document can contain any number of sections. In turn, each section may contain multiple paragraphs, lists, figures, and other sub-sections in any order, as described in Section 2.5.

2.2  Configuration Parameters

Configuration parameters appear at the beginning of a document. The general format of a configuration parameter is `ParameterName: Value.

At this time, the only supported configuration parameter is the CSS parameter, used to specify a cascading style-sheet file. This option is relevant only when the output format of the text processor is XHTML.

`CSS: manual.css

2.3  Title of the Document

The title of a document is specified using a " symbol, followed by the title's text:

" Algorithms and data structures for flash memories

A title can contain multiple lines. For this reason, the title ends when an empty line is found.

`CSS: paper.css

" Lowering the barriers to programming: A taxonomy of programming environments
  and languages for novice programmers

= Abstract

...

Subtitles are also possible, using two or more " characters. The meaning of each subtitle is not predefined. In the example below, the "" subtitle is used to indicate the paper's authors, while the """ subtitle specifies the publishing date.

"   Power reduction techniques for microprocessor systems

""  Vasanth Venkatachalam, Michael Franz

""" September 2005

= Abstract

...

2.4  Table of Contents

A table of contents is automatically generated by the DOX processor when a <TOC> token is found. The table of contents is generated at the location of the token, which should therefore be placed between the title of the document and the first section.

"  Title of the document

"" Subtitle

<TOC>

= First Section

...

2.5  Sections

A section begins with one ore more = symbols, followed by the title of the section. The number of = symbols indicates the level of the section (ie: section, subsection, subsubsection, etc).

= Section

== SubSection

=== SubSubSection

= Section

For instance:

= Introduction

Version control is the art of managing changes to information. It has long been
a critical tool for programmers, who typically spend their time making small
changes to software and then undoing those changes...

== What is Subversion?

Subversion is a free/open-source version control system. That is, Subversion
manages files and directories over time. A tree of files is placed into a
central repository. The repository...

== History

In early 2000, CollabNet, Inc. (http://www.collab.net) began seeking developers
to write a replacement for CVS. CollabNet offers a collaboration software suite
called CollabNet Enterprise Edition (CEE) ...


= Basic Concepts

...

The title of a section can span multiple lines. The end of the title is delimited by an empty line.

=== Name lookup, templates, and accessing members of 
    base classes

It is illegal to start a section at a level greater than the previous level plus one. For similar reasons, the first section in a document must be at level 1.

On the other hand, it is perfectly legal to start a section at any level lower than the previous one. In the example below, section "C" is at level 3, while the following section "D" is at level 1.

= Section A

== SubSection B

=== SubSubSection C

= Section D

The following elements can appear in any number and order inside a section.

2.5.1  Paragraphs

A paragraph is simply a block of text separated from the rest of the section by at least one empty line. For instance:

== Statements and Declarations in Expressions

A compound statement enclosed in parentheses may appear as an expression in GNU
C. This allows you to use loops, switches, and local variables within an
expression.

Recall that a compound statement is a sequence of statements surrounded by
braces; in this construct, parentheses go around the braces. For example:

Paragraphs generally begin with an alphanumeric character or a reference (Section 2.7.1). In all other cases, however, the token PAR: must be inserted at the beginning of the paragraph. This allows the text processor to recognize the block as a paragraph. For instance:

PAR: (NOTE: This paragraph starts with a parenthesis, so a "PAR:" token must be
inserted at the beginning of the text block).

The PAR: token is also interpreted as an artificial empty line when the paragraph contains no text.

An artificial empty line is inserted between this paragraph and the second
paragraph.

PAR:

Second paragraph, after the artificial empty line.

2.5.2  Lists

Lists can be unordered or ordered. Unordered lists begin with a dash symbol (-). Each item of the list must be separated by one or more empty lines:

Paragraph before the list.

   - List item

   - List item

     -- List subitem

     -- List subitem

        --- List subsubitem

   - List item

Paragraph after the list.

The number of dash symbols (-) indicates the level of an item in the list. White spaces in front of an item are optional.

Similarly, ordered lists begin with a percent symbol (%) istead of a dash:

Paragraph before the list.

   % List item

   % List item

     %% List subitem

     %% List subitem

        %%% List subsubitem

   % List item

Paragraph after the list.

2.5.3  Figures

Figures can be inserted in a document using the <fig> tag:

<fig PATH width=Wpx height=Hpx>

where PATH indicates the path of the image file, W its width in pixels, and H its height. The width and height parameters are optional.

This is a simple figure:

   <fig images/example.png>

This is a figure tag with explicit width and height attributes:

   <fig images/example.png width=100px height=50px>

Splitting the figure tag into multiple lines is also valid:

   <fig
      images/example.png
      width=100px
      height=50px>

Figure elements allow an optional caption and/or label (Section 2.7.1). A caption or label must start after the terminal ">" symbol of the figure tag (on the same line) and can span multiple lines.

Figure with a caption:

   <fig example1.png> Example image.

Figure with a caption and a label:

   <fig example2.png> Example image.                          [<- fig.example2]

Figure with just a label:

   <fig example3.png>                                         [<- fig.example3]

Figure with a caption spanning multiple lines:

   <fig example4.png> This caption spans multiple lines.      [<- fig.example4]
                      Notice that the label does not interfere with the text
                      and could in fact be placed anywhere within the caption.

   <fig example5.png>                                         [<- fig.example5]
                      Another example in which the caption follows the label.
                      The label is considered part of the caption by the DOX
                      processor, so this text is accepted without any problem.

2.5.4  Tables

Currently, the DOX processor lacks a native construct for tables. Instead, tables are written using an HTML text block.

<<
... HTML Code ...
>>

The text within a <<...>> block is intepreted as HTML code. Therefore, a table can be constructed in HTML as follows:

Paragraph preceeding the table.

<<
<table>
   <tr> <th> First Name </th> <th> Last Name </th> </tr>
   <!-- ==========================================   -->
   <tr> <td> Frank      </td> <td> Smith     </td> </tr>
   <tr> <td> John       </td> <td> Johnson   </td> </tr>
   <tr> <td> Antony     </td> <td> Miller    </td> </tr>
</table>
>>

Paragraph following the table.

Of course, an HTML block can also be used for things other than tables. However, doing so is not recommended as HTML blocks may become obsolete once a native construct for tables is introduced.

HTML blocks support the same caption/label construct described for figures in Section 2.5.3. For instance:

<<
<table>
   <tr> <th> First Name </th> <th> Last Name </th> </tr>
   <!-- ==========================================   -->
   <tr> <td> Frank      </td> <td> Smith     </td> </tr>
   <tr> <td> John       </td> <td> Johnson   </td> </tr>
   <tr> <td> Antony     </td> <td> Miller    </td> </tr>
</table>
>> A table of people.                                         [<- table.people]

2.5.5  Preformatted Blocks

A preformatted block of text is output "as-is", without any interpretation or formatting of the characters within it.

{{
... Preformatted Text ...
}}

For instance:

{{
/**
 * Object representing a block of HTML code.
 */
public class Html extends Element {
   /** Begin of HTML block token */
   private static final String BEGIN = "<<";
   
   /** End of HTML block token */
   private static final String END = ">>";
   
   /** Text inside the HTML block */
   private String text = "";
   
   ...
}}

However, the are two exceptions to the no-formatting rule:

  1. All white spaces (including newlines) preceeding the first non-space character and following the last non-space character are deleted.
  2. The }\} character sequence is replaced with }}.

Preformatted blocks support a caption construct as described for figures in Section 2.5.3 and tables in Section 2.5.4. For instance:

{{
/**
 * Object representing a block of HTML code.
 */
public class Html extends Element {
   /** Begin of HTML block token */
   private static final String BEGIN = "<<";
   
   /** End of HTML block token */
   private static final String END = ">>";
   
   /** Text inside the HTML block */
   private String text = "";
   
   ...
}} Code for the Html class.                             [<- program.htmlClass]

2.6  Formatting Words or Sentences

Inside a text block, words or sentences can be formatted in italic, bold, or monospace type.

The italic type is selected by placing two underscore characters (_) at the beginning and end of a word or sentence. For instance:

This __word__ is formatted using italic characters. 

This __sentence is formatted__ using italic characters.

Similarly, the bold type is selected by two star symbols (*):

This **word** is formatted using bold characters.

Finally, the monospace type is selected by a single "|" character:

This |word| is formatted using monospace characters.

2.7  Labels and References

2.7.1  Labels

A label is defined by inserting a [<-LabelName] tag anywhere in the title (for sections) or caption of an element, where LabelName is a string of characters in the set [A-Za-z0-9._]. For instance:

= Title of the section                                               [<- intro]
<fig example.png> A figure.                                    [<- fig.example]

The elements that support labels are listed below:

In the case of HTML and preformatted blocks, label names should be in the form "ContentType.Name".

Possible ContentType values are:

For instance:

<<
...
>> Caption of an HTML block.                                 [<- table.example]

Labels never interfere with the text of a title or caption. For this reason, the code below — although deprecated — is still valid:

<<
...
>> Caption of [<- table.example] an HTML block.
<<
...
>> [<- table.example] Caption of an HTML block.

Thanks to this property, it is possible to have multi-line titles and captions that contain a label:

= This section title spans multiple lines and also has a label,      [<- intro]
  but the label does not interfere with the text of the title.

2.7.2  References

References to labels can appear anywhere in a title, paragraph, or caption. A reference is written as [->LabelName], where LabelName is the name of the referenced label. For instance:

Please refer to [->table.people] for a list of clients.
[->table.values] lists all possible values.

The DOX processor replaces each reference with an appropriate string in the form Type Index. For instance: Section 1.1, Table 2, etc. The type string of a referenced object is Section or Figure for sections and figures respectively. On the other hand, the type string of an HTML or preformatted block is determined by the name of its label. For instance, a reference to table.values is resolved as Table X, while a reference to prog.example is resolved as Program X.

Sections can also be referenced using the form [->"LabelName"], which is expanded to Section X, Title of the Section.

2.8  Comments

Source lines that begin with a // token are discarded. For instance:

// This line is discarded
// Another discarded line

Paragraph text.

However, these lines will not be treated as comments if they are part of a text block:

Parapath text. // This line is part of a paragraph, so it is NOT discarded.

Paragraph text.
// This line is also part of a paragraph and it is NOT discarded.