The valep$\mathsf{\TeX }$ Handbook

Author:	Christian Damböck
Version:	0.1
Date:	May 2024
Location:	https://valep.vc.univie.ac.at/files/py/valeptex/valeptex_handbook.html

A. What $\mathsf{valep\TeX}$is [not]

$\mathsf{valep\TeX}$is a $\mathsf{\LaTeX}$parser for programmers and editors who need a stable solution for converting a specific $\mathsf{\LaTeX}$pattern or style to HTML, (TEI-)XML, epub, Json, txt and other formats. Although this first version only supports HTML and (to some extent) txt, other renderings are easily implemented and will certainly be supported in the future with standard rendering schemes. Since $\mathsf{valep\TeX}$directly converts $\mathsf{\LaTeX}$macros without trying to read their declaration – the only exception to this rule are text macros – it must first be customized by the user by feeding $\mathsf{valep\TeX}$appropriate command declarations and rendering rules. $\mathsf{valep\TeX}$offers class definitions and rendering procedures for some standard $\mathsf{\LaTeX}$macros but in most cases this needs to be supplemented. If, for example, a document is using the custom macro \greetingcard for typesetting a chapter heading for a greeting card, then the user will have to add suitable specifications. This task is basically twofold. First add a class declaration in a file ’/custom/my_classes.py’:

class greetingcard(Section):
    args = 'title'
    level = 1

Second, add rendering information for this macro in a file ’html/custom/my_renderings.jinja2s’:

name: greetingcard
<h1 class="card">{{ obj.attributes.title }}</h1>

This information tells $\mathsf{valep\TeX}$that \greetingcard is a level 1 section header with 1 parameter (first part) that should be rendered as an HTML tag ‘h1’ of class ‘card’ (second part). (See sections B and C for the details.)

$\mathsf{valep\TeX}$produces only a semantic rendering of the original document, which means that you have to write your own style sheets to define the graphical layout of the output document according to your wishes. For standard output, $\mathsf{valep\TeX}$uses some standard CSS and Javascript content that is in no way related to any custom definitions of the parsed $\mathsf{\LaTeX}$document. Moreover, $\mathsf{valep\TeX}$does not generate embedded tables of contents, indices, bibliographies or any other automated output of a typical $\mathsf{\LaTeX}$document, because this is typically treated differently in HTML output.

The overall strategy of $\mathsf{valep\TeX}$is somewhat borrowed from plasTeX. plasTeX is much richer in features but $\mathsf{valep\TeX}$is also more flexible in important ways, it provides better rendering of tabulars and tabbings, and it is faster than plasTeX and therefore better suited for very large documents or large batch jobs.

$\mathsf{valep\TeX}$is designed to convert $\mathsf{\LaTeX}$documents to HTML in the context of VALEP – the Virtual Archive for Logical Empiricism. It is funded by the Austrian Science Fund (research grants P34887 and PUD 31-G). The other results of this project can be seen here.

While it is helpful to use consistent $\mathsf{\LaTeX}$code that is approved by the $\mathsf{\LaTeX}$compiler, this is not a requirement in $\mathsf{valep\TeX}$. You are free to create your own pseudo $\mathsf{\LaTeX}$, in particular you can use macros that have no $\mathsf{\LaTeX}$declaration at all. $\mathsf{valep\TeX}$will also not hang if the $\mathsf{\LaTeX}$syntax is incorrect, i.e. if brace blocks or environments do not end.

There is still a long to-do list in this version of $\mathsf{valep\TeX}$, mainly because certain features of $\mathsf{\LaTeX}$are hardly used in VALEP and the Carnap edition we are working on. These include theorems, counters, and bibliographies. But each of these things might be more or less easy to handle in pre- or post-processing.

Further reading: How to embed $\mathsf{\LaTeX}$, HTML, and TEI-XML with $\mathsf{valep\TeX}$and carnap-compact

1. First steps

Open a terminal window in the folder where this document is located. Make sure that python is installed and then type:

python valeptex.py valeptex_handbook ENTER

The result should be a html file “valep_handbook.html” in the same folder. And this html file should of course contain the text of this handbook in a more or less nicely formatted form. If this is the case, then $\mathsf{valep\TeX}$ is obviously working on your computer and you can try it with any other $\mathsf{\LaTeX}$file in a different folder or even just run

python valeptex.py --folder

python valeptex.py --subfolders

to convert all .tex documents in the current folder (and its subfolders). – However, it is highly recommended that you read at least the next section of this document before attempting any other conversions.

B. A tour through $\mathsf{valep\TeX}$

1. $\mathsf{valep\TeX}$ is a semantic converter …

This parser is not suitable, if your goal is to generate some html content, which more or less perfectly matches the graphical layout of your $\mathsf{\LaTeX}$document from scratch. There are already very good solutions for this, e. g. pandoc and several online converters.

There is only one very specific reason why you should try to use $\mathsf{valep\TeX}$ instead of the tools mentioned above, and this is if you want to create semantic markup. While graphical markup only specifies what should be put on a page or screen, semantic markup first and foremost specifies the meaning of the text to be marked up. Semantic markup is the right choice whenever the meaning of the processed content needs to be preserved.

A semantic tag generally associates an optional text with some semantic meaning, such as: is highlighted, is a heading, was deleted, is a comment, is the beginning of a text. And this also clarifies what kind of parsing semantic markup needs: it must allow to replace any specific semantic tag of language A by exactly one semantic specification of language B, while it remains a purpose of language B and its internal features whether and to what extent it reproduces the graphical layout that a specific semantic feature received in language A.

2. …and therefore it converts only text macros

One aspect of semantic markup in the specific case of $\mathsf{\LaTeX}$is that macros are used in $\mathsf{\LaTeX}$in two different ways. First, as commands that either switch to a certain format or attach a semantic meaning or format to a piece of text. Second, as text macros that simply write some text. Assuming that macros of the first kind always have some semantic meaning, $\mathsf{valep\TeX}$intentionally does not try to interpret environments or macros that receive parameters. For example, the macro \chapter{Title of this chapter} is supposed to say something meaningful about its content, namely that it represents a heading of a certain level. To replace \chapter with some more or less esoteric internal $\mathsf{\TeX}$specification, which more or less only deals with the graphical appearance of the string “title of this chapter”, would mean to lose the semantic meaning of “chapter”. The information “chapter” is something that should still be present at the end of the parsing process in order to allow the renderer to deal with this information, e. g. by replacing the “chapter” command with an h1 HTML tag. Therefore, $\mathsf{valep\TeX}$interprets only text macros but no environments or macros with more than zero parameters.

In the Carnap edition for which this parser is designed, we use text macros that represent metadata about people, publications, and archival sources as collected in VALEP. These text macros are then used to consistently write index entries. For example, there is a text macro \carnap whose definition contains biographical information about the philosopher Rudolf Carnap. All these macros are collected in a $\mathsf{\LaTeX}$package, which contains several thousand macro definitions. $\mathsf{valep\TeX}$loads these definitions and replaces all text macros wherever they appear in the document.

You can easily accomplish similar things with your own text macros. Suppose you have your text macros either in the form of a list of \newcommand declarations, or in the form of a file mymacros.tex, or even in the form of a tex package mypackage.sty. Then you can put all these things in a file “default-fileopening.tex”, which must be stored in the “custom” folder. Just list your declarations there or write things like

\input{mymacros}
\usepackage{mypackage}

But you should also use this option with caution, because $\mathsf{valep\TeX}$will try to load everything in each of the files you include in the default fileopening, and this can cause the system to slow down or even hang. So as a first step it is probably always best to run $\mathsf{valep\TeX}$with a default fileopening that is either empty or contains only content you already tested, and then gradually add more content to the default fileopening. If everything works fine you may even use the program option “--useownpreamble” to parse a document not with the default preamble but with its own preamble settings.

3. Five different ways to process a macro

The first option: declare a text macro $\mathsf{valep\TeX}$either replaces a text macro by means of its own resources, or the user can add a declaration for the text macro to the “default-fileopening.tex”. This macro will then be replaced with the content of its declaration whenever it occurs in the text. Use this option for any text macros that have no semantic meaning that you want to carry through to the rendering process.

The second option: existing class and rendering information Consider an example. The command “chapter” (without the backslash) is defined as a python class in the file “parser_classes_standard_latex.py”. The specification looks like this:

class chapter(Section):
      args = '[ toc ] title'
      level = 1

The first line declares “chapter” as a Python class, which is defined as a subclass of the Python class “Section”. Section, in turn, is defined as a subclass of “Command”, as is any $\mathsf{\LaTeX}$command or environment that is declared as a Python class.

The second line declares each parameter of “chapter”. The names “toc” and “title” are arbitrary names that might be useful for calling a parameter in the rendering process, which will be clarified soon. The brackets indicate that “toc” is an optional parameter (this means that “title” is a required parameter, since it is not set in brackets). The third line specifies another parameter, which is only relevant for members of the “Section” class: the sectioning level, which is a natural number between -10 and 20 (see the specification of “Section” in the file “parser_classes.py”).

The rendering information for the “chapter” command can be found in the file “html/latex-standard.jinja2s”:

name: chapter chapter_
<h1 class="editor">{{ obj.attributes.title }}</h1>

To be correct this declaration must be preceded by a blank line and begin with the sequence “name:” followed by command names separated by whitespace. The second command name in this declaration is the asterisk form “chapter*” which in $\mathsf{valep\TeX}$must be specified with an underscore instead of an asterisk to be compatible with Python syntax.

The second line of the declaration – and any subsequent lines – contains the rendering information, which in this case says that the “chapter” command should be rendered in HTML as an h1 tag with the class specification "editor". The middle of the block, enclosed in double curly braces – which is jinja syntax: see section D.5 – tells the renderer that the parameter of the h1 tag is the “title” attribute of the “chapter” command.

The third option: your own class and rendering information Use the custom specification files, which are stored by the user in the folder “custom” or, for renderer ‘XXX’, in the folder ‘XXX/custom’. $\mathsf{valep\TeX}$identifies each file in this folder with the extension “.py” as a class specification file and each file with the extension “.jinja2s” as a rendering specification file. If you want to change existing specifications in some of the system files just add them to your custom files: they will overwrite the system files.

The fourth option: intentionally let a command be ignored To let a command and all its parameters – or an environment with all its contents – be ignored by the system you need to add an appropriate class specification and a rendering specification with the second line left blank, telling the system that the command should be ignored. For example, the command \markboth specifies the content of the running titles of a book, which is nothing that an HTML document will typically implement. This ensures that the HTML document ignores the command:

class markboth(Command):
    args = 'left right'
    
name: markboth

The system correctly identifies the command but the renderer ignores it. Only if this strategy is adopted, it can be ensured that the HTML document contains no remainders from the initial command.

The fifth option: do nothing at all If a command is not mentioned in the rendering files (though there might still be class information available on it), the renderer processes this command with a standard rendering procedure. The information, consisting of the command name and all parameters in brackets and curly braces, is placed in some HTML tags that are set to the CSS style “hidden” by default. For example, the command \mycomm{my Argument}, which has no rendering specification, results in this HTML output:

<latex-cmd data-name="mycomm"><latex-reqarg data-nr="1">
my Argument</latex-reqarg></latex-cmd>

“latex-cmd” is the default tag that $\mathsf{valep\TeX}$uses for unknown $\mathsf{\LaTeX}$commands. The command name is set as the “data-name” parameter, and the body of the ”latex-cmd” tag contains numbered instances of “latex-reqarg” and “latex-optarg” for all optional and required parameters of the command. As you can easily see, this may already be a way to handle this command directly in HTML and CSS. All you need to do is include the ‘mycomm’ class in CSS, along with any tags of the form ‘latex-reqarg’ or ‘lagex-optarg’. So, in principle, your custom $\mathsf{\LaTeX}$macros are available in HTML even without any customization inside $\mathsf{valep\TeX}$. It is a matter of taste and convenience to decide at which stage of the conversion process you want to customize the output.

4. How to specify a layout for the HTML output

$\mathsf{valep\TeX}$not only renders the part of a document that consists of macros and text, but also puts this content into an HTML structure that can be freely customized. There is a predefined file “html/default-layout.jinja2” (with no “s” at the end) in the system folder, which can be replaced by the user: just put a file with the same name in the “custom” folder. The only important thing is that the file must contain the jinja expression “{{ obj }}” to indicate where the body of the HTML file should be placed. This custom file template can, in addition to calling MathJax, specify custom CSS and Javascript and the like.

C. The class hierarchy of $\mathsf{valep\TeX}$

The implementation of $\mathsf{\LaTeX}$commands in $\mathsf{valep\TeX}$is fully object-oriented, which means that you can always define new commands as subclasses of other commands while inheriting their properties. Therefore it is important to understand the class hierarchy of $\mathsf{valep\TeX}$.

A note on class names In general, the name of a command class is exactly the same as the name string of the corresponding $\mathsf{\LaTeX}$command, with two exceptions. The asterisk form of a command, as mentioned above, is represented in Python by an underscore at the end of the class name. And there are some $\mathsf{\LaTeX}$command names that interfere with other Python class names. For example, the $\mathsf{\LaTeX}$environment “list” shares its name with Python list objects. Therefore, in this case the string “list” is internally converted to “List”, which also affects the rendering procedure: your rendering for the $\mathsf{\LaTeX}$environment “list” must use the class name “List”.

An overview of the class hierarchy Any python class representing a $\mathsf{\LaTeX}$command that is to be rendered in $\mathsf{valep\TeX}$must be defined as a subclass of the class “Command” – otherwise the class will simply be ignored by the system. This does not mean, however, that you always have to explicitly set “Command” as the superclass, because you can inherit the properties of “Command” from any other existing subclass of it. These are the main subclasses that provide important features:

Command 
   |
   |___ Environment = 
   |    anything of the form \begin{env} ... \end{env}
   |       |
   |       |__ Formula = $...$ \(...\) \[...\] etc. etc.
   |       |
   |       |__ Table = tabular, tabular*, tabbing etc.
   |       |
   |       |__ LaTeXList and Item = any lists and items
   |       |      |
   |       |      |__ Bibliography 
   |       |
   |       |__ FormatEnv = center, flushright, flushleft etc.
   |       |
   |       |__ VerbatimEnvironment = code remains unchanged
   |         |
   |         |__ verbatim for code listings (e.g. this table)
   |
   |___ Section = any sectioning command
   |
   |___ Whitespace 
   |       |
   |       |__ NoLineend = \quad \qquad ~ \, ␣ etc.
   |       | 
   |       |__ Lineend = par \\ \\*[x]  \newline etc.
   |
   |___ Note = any footnote, endnote, marginnote
   |
   |___ Reference = any index entry, label, link etc.
   |      |
   |      |__ Index = any index entry (open reference)
   |      |
   |      |__ Label = any label or anchor in the text
   |      |
   |      |__ Ref = any reference to an internal Anchor
   |      | 
   |      |__ Link = any reference to an external recource
   |      |
   |      |__ Graphic = any image 
   |
   |___ TextMacro = produces (formatted or unformatted) text
   |
   |___ Format = italics, underlined, deletion, etc. etc.
   |   
   |___ VerbatimCommand = code remains unchanged
   |
   |___ Math = prevents that a macro is interpreted in math mode
   |      
   |___ LaTeXOnly = will be entirely removed (even from formulas)

These are the current special Command classes (later versions may introduce new ones). When you create classes for your custom macros, it might be a good idea to figure out where they fit in the hierarchy above, and not always inherit your class from Command, but also use the more specific classes. You can even inherit your class from any special command such as, e.g. \emph or \chapter. – Some examples:

class mytabular(Table):
    pass

If “mytabular” shares the general structure with “tabular” this is all you need. However, it is important to use “Table” (or any of its subclasses, e.g., ‘tabular’) instead of the more general “Command” or “Environment” class, because otherwise the parser will not be able to process “mytabular” properly as a table environment. Tables are indeed a special case, because both parsing and rendering involve iteration over rows, columns, and cells. $\mathsf{valep\TeX}$uses a special form of declaration for rendering tables, which is not jinja style:

name: mytabular
@tb<table>@rb<tr>@cb<td>@cc@ce</td>@re</tr>@te</table>

This declaration disintegrates into seven elements:

@tb ... beginning of the table
@rb ... beginning of the row
@cb ... beginning of the cell
@cc ... cell content (usually blank)
@ce ... end of the cell
@re ... end of the row
@te ... end of the table

What you write after any of the @xy keystrings is entirely up to you.

Correctly classifying your custom Note, Index, or Format commands is not that critical, since they are all fairly straightforward instances of Command. However, the correct distribution to the right semantic category is important for the rendering process. For example, you may want to have some optional renderings where Note and Index commands are suppressed but Format commands are printed (e. g., as plain text). So correct distribution is still important here for correct semantic handling of your document.

A more critical thing regarding parsing is the class “Verabtim”:

Verbatim 
   |
   |___ VerbatimEnvironment
   |
   |___ VerbatimCommand

This class is not only used for the presentation here – the diagrams in this section were all written with the $\mathsf{\LaTeX}$environment “verbatim”, which is specified in $\mathsf{valep\TeX}$as a subclass of “VerbatimEnvironment”. The “Verbatim” class can also be used for any command or environment where the parameter (in the case of a command) or the body content (in the case of an environment) should be passed through the parsing process unchanged. This is used for the command “\htmlcode”, which receives the following $\mathsf{\LaTeX}$, class, and rendering specifications:

.tex declaration:
\newcommand{\htmlcode}[1]{}

.py class specification:
class htmlcode(VerbatimCommand):
args = 'self'

.jinja2s rendering specification:
name: htmlcode
{{ obj }}

In $\mathsf{\LaTeX}$, the command produces no output at all, because the HTML code would most likely cause a mess. In the Python class declaration, the command is declared as a subclass of “VerbatimCommand” (not “VerbatimEnvironment”). And the renderer just declares that the content should be put into the HTML source without any addition. If you now write something like this somewhere in your $\mathsf{\LaTeX}$document:

\htmlcode{<img src="myimage.svg" />}

then the rendered HTML document will contain

<img src="myimage.svg" />.

To achieve this, the renderer takes care to leave the string unchanged, i.e. doesn’t remove any whitespace and doesn’t replace any macros. For example, \"{o} is rendered here as \"{o} instead of ö. To get ö you have to use the corresponding Unicode character inside the $\mathsf{\LaTeX}$file. Finally, there is an important difference between the original “verbatim” environment and any custom environment or command you declare as a subclass of “Verbatim”. In the “verbatim” environment, $\mathsf{valep\TeX}$replaces instances of <and >with < and > to prevent HTML from interpreting the contents of the environment as HTML tags. In the case of all other subclasses of “Verbatim” this substitution is intentionally skipped, because what you want to achieve is that the content is interpreted as HTML, XML, etc. code.

Special classes for declarations and output features Not all classes of $\mathsf{valep\TeX}$are declared as subclasses of “Command”. First, there are internal classes that facilitate the parsing and rendering process but have nothing to do with $\mathsf{\LaTeX}$commands (see section C). Second, there are classes that only act as additional labels via multi-inheritance. These are

Division   = indicates the end of a section
Ignore     = do not include this while normalizing text

“Divison” can be specified as an additional inheritance if a command COMM typically ends a section and/or starts a block where a new level of sectioning takes place if COMM is not declared as a “Section” class itself.

“Ignore” can be used as an additional flag if the contents of a command should not be included in text normalization procedures. Ignored commands may contain temporary additions by the editor, for example.

D. The implementation of $\mathsf{valep\TeX}$

This is a description of the internal structure of $\mathsf{valep\TeX}$, which might be helpful if you want to get familiar with the code to edit it or use it in your own applications, e. g. if you want to pre- and post-process files and use $\mathsf{valep\TeX}$only as one step of a more complex Python workflow.

1. The $\mathsf{valep\TeX}$file structure

Your installation of $\mathsf{valep\TeX}$should consist of a folder containing the following files, along with any instances of “__pycache__” which is a Python system folder:

custom                            = see section B.2
default_fileopening.tex           = see section B.2
parser_classes.py                 = see this section
parser_classes_standard_latex.py  = see section C
parser_functions                  = see this section
valeptex.py                       = see this section
valeptex_handbook.tex             = this handbook
custom                            folder for all custom .py files
html                              folder for all html specs
xml                               folder for all xml specs

any of these specification folders must contain: 
default-layout.jinja2             = see section B.4
latex-standard.jinja2s            = see section B.3
custom                            = all custom rendering specs

Remember that all your custom files should be located in “custom” folders, namely, ‘/custom’ for custom .py files and ‘html/custom’ or ‘XXX/custom’ for any custom rendering specifications. Note that you of course can also change the content of system files, but then you might lose your changes if you upgrade to a newer version of $\mathsf{valep\TeX}$.

2. The main file valeptex.py

The __main__ function takes the following parameters:

tex_file          = specific LaTeX file to be parsed
--folder          = all LaTeX files in the current folder
--subfolders      = also includes all subfolders
--normalize       = adds a .txt output
--writelog        = writes a logfile valeptex.log
--writeunknown    = log only contains unknown commands
--useownpreamble  = use the document's own preamble

The --normalize option generates a plain text representation of the .tex file(s), which can be useful for feeding into translation engines, search engines, and the like. By default, no .txt file is written.

The --writelog option writes all specifications, error messages and unknown commands from “stack” (clarifications coming soon); the --writeunknown option limits the log to a list of all unknown commands. By default, no log file is written. Finally, the --useownpreamble option implies that the parser is not using the default preamble but the document’s own preamble, if there is any: incomplete $\mathsf{\LaTeX}$documents that do not contain a document environment are still parsed using the default preamble. Before the parsing process can start, the script has to create a variable “stack” which is an instance of the class “docstack” in “parser_classes.py”. This variable is passed through the whole process, even if multiple files are batch converted. Among other things it contains

- A list of all commands known to the system
- A list of all \newcommand declarations found
- A list of all theorems, packages, counters, etc.
- Default file opening information
- An error message stream
- A list of unknown commands
- The jinja specification needed for rendering

The main advantage of the “stack” variable is that the declarations referenced in the default fileopener only need to be loaded once, even if a batch conversion runs over several files. If these declarations are large – as it is the case in VALEP – this speeds up batch processes significantly.

Once the “stack” variable is filled – there are several functions that support this task (load_jinja_template, load_jinja_specs, etc.) – the system starts either with a single file or with all files in the current folder (with or without its subfolders) and passes them one by one to the parse_latex function.

The parse_latex function passes the document through the parsing process. It takes a filename and a stack variable and returns a parsed and rendered string along with the stack variable. If you want to embed $\mathsf{valep\TeX}$in a process that requires some pre- and postprocessing, parse_latex is probably the function to use. If you are preprocessing a tex_file before passing it to parse_latex, it is probably best to store it in a temporal file and pass the filename to parse_latex (you can delete the temporary file at the end). Your script can then call parse_latex and afterwards postprocess the string it receives. – These are the main steps of the process as performed by parse_latex:

(1) start the process       start_parse
(2) produce a DOM object    parse_blocks
(3) render the document     doc_list.render
(4) embed in template       doc_list.render_document
(5) clean up                polish the rendered document
(6) return 	                string and stack

All these tasks are done by the functions in “parser_functions.py” and the classes in “parser_classes.py”.

3. The file parser_functions.py

This file contains the two main functions of the parser “start_parse” and “parse_blocks”. The former is a fairly straightforward and simple construct: it (1) loads the file, along with recursive calls to \input{file} commands, (2) appends any content from the default preamble (at the first call of start_parse during a batch process), and (3) does some polishing on the string:

- remove escape sequences \{ \} \$ \& \# \% \_ \~{}
- replace escaped characters \c{c} \"{o} etc. 
  with their Unicode equivalents
- remove comments and linebreaks
- but leave any Verbatim Commands untouched

The second function “parse_blocks” is by far the most complex element of the parser and actually does the bulk of the parsing. During this process, the string polished with start_parse remains untouched and is just iterated exactly once, token by token. This is facilitated by the three functions “match_brackets”, “match_braces” and “match_environments”, which identify the end of a block (of brackets, braces or \{begin}…\{end} SOMETHING). These are the main tasks of “parse_blocks”

- catch Verbatim stuff and leave the content unchanged
- replace any text macros with a known declaration
- identify the parameters of all other commands
- identify declarations and store them in ``stack''
- identify and store every command in the DOM tree
- differentiate between known and unknown commands
- take care of other special commands:
  - formulas
  - tables
- call any \include{file} commands at the end

It seems that the elegance of the “parse_blocks” function is the main reason why $\mathsf{valep\TeX}$is so much faster and probably more stable than plasTeX. “parse_blocks” is highly recursive: it contains no less than 28 calls to the function itself, and only this function builds the entire DOM tree.

However, there are still open tasks: the parsing of tables and tabbings is still rather rudimentary; and there is plenty of room for further treatment of sections, theorems, counters, and bibliographies.

4. The file parser_classes.py

It might be worth mentioning that when I started writing this parser, I first worked out a preliminary version of “start_parse” and “parse_blocks” without including any object-oriented features. Only at a stage where large parts of these two functions were already developed, I started to implement the class features that can now be found in the file “parser_classes.py”. Whether this somewhat unusual approach causes any shortcomings (or is even an advantage) is something I cannot say at this stage. However, the current version of the parser is strongly based on object-oriented features as shown in section C.

Besides the “Command” class and the “docstack” class already discussed above, there are two main classes on which the whole parsing process is based: “docnode” and “jinjaSpec”.

“docnode” represents the main design idea of the parser: any $\mathsf{\LaTeX}$document is transformed into a linear representation in docnode, namely a list of elements that are either strings – i.e. pure sequences of Unicode characters – or potentially nested instances of the “Command” class. This is the general scheme of the $\mathsf{valep\TeX}$DOM:

 text, command, text, command, text, ...    docnode
          |
        name/class reqarg reqarg            command
                     |
                   text, commmand, ...      docnode
                            |
                          name/class ...    command
                                              ...

The first layer is a docnode object that decomposes into a list whose elements are either commands or text snippets. Each command is either a simple text macro, or the command has parameters, each of which is represented by a docnode object, and so on. This whole nested structure is created by the “parse_blocks” function.

A “jinjaSpec” object, on the other hand, is constructed from the available jinja2s files: those provided by the system plus custom files. The __init__ function of “jinjaSpec” parses the contents of these files and iterates through them line by line, i. e. for any command X there may be multiple specifications in the files but jinjaSpec will only keep the last one and overwrites system files with custom files. The details of the rendering process are not important here. Rather, it seems appropriate to have a look at the syntax of jinja files:

5. A custom pseudo jinja syntax

The main reason for using the jinja syntax here was that I started working with plasTeX and already had a bunch of jinja specifications for my own $\mathsf{\LaTeX}$macros that I wanted to reuse with my own parser. Actually, however, I neither use jinja nor any jinja module in $\mathsf{valep\TeX}$, nor do I closely follow the jinja specification. Rather, I have used some elements of jinja while ignoring others, and I have also added some ad hoc features, e. g. for specifying tables or normal text representations of macros. This sloppy way of handling jinja is something that may only be relevant for those who are already familiar with the jinja syntax. My pseudo-jinja syntax includes five different syntactic elements, of which only the first three are originally jinja:

{% ... %} for statements
{{ ... }} for expressions
{# ... #} for comments
{! ... !} for normaltext representations
@tb ... 
@rb ... 
@cb ... 
@cc ...
@ce ... 
@re ... 
@te ...   for tables

Note that the overall structure of a jinja2s file has already been discussed in section B.3. Table specifications have already been discussed in section C. Normal text representations can be added to any jinja specification, by convention they should be added at the end of the specification. The only purpose of a {! …!} string is to specify a normal text representation for any macro that, for example, produces text in a more graphical way. The macro \LaTeX does not automatically produce the string “LaTeX”, especially in HTML where it is rendered as a formula, which means that it shows up in the system as a vector graphic. To add the string “LaTeX” as a normal text, the jinja specification of \LaTeX looks like this:

name: LaTeX
\(\mathsf{\LaTeX}\){! LaTeX !}

For standard rendering jobs the {! …!} string is simply ignored. But for normal text output, the system grabs the normal text string and delivers “LaTeX” instead of “$\mathsf{\LaTeX}$”.

{# Comments #} can be placed anywhere in a jinja2s specification and will be ignored.

{{ Expressions }} are those key elements of a jinja2s specification that put the contents of a “Command” object inside strings that specify tag elements. With the simplest expression {{ obj }} you can always get the default parameter of an object, which is either the only parameter or, if there are more than one parameter, the last required parameter. Arguments also can be accessed using the names specified in the attribute string of the class specification. For example:

name: href
<a href="{{ obj.attributes.link }}">
{{ obj.attributes.caption }}</a>

This specifies how to pass a \href command to an HTML link tag. The two attributes “link” and “caption” are given in the class specification of href:

class href(Link):
    args = 'link caption'

Finally, $\mathsf{valep\TeX}$also supports a form of jinja statements, namely if clauses, which can have the following form:

{% if EXP %} specification content 
{% elif EXP %} specification content
...
{% else %} specification content
{% endif %}

Multiple elif clauses are allowed. The main purpose of if clauses is to set some tags only if a certain parameter actually exists. This is an example from the Carnap edition:

name: neueseite
<new-p></new-p><fac-simile>{% if obj  %}
<img class="fac-simile" 
src="https:// ... facsimile/{{ obj }}" 
height="100%" />{% endif %}</fac-simile>

Here, a tag fac-simile contains an img tag with an appropriate src specification only if the information {{ obj }} needed to construct the link is present.

In the current implementation of $\mathsf{valep\TeX}$, the jinja part does not cover for loops, filters, and other elements included in the jinja specification. This may change in the future.

Appendix 1: How to control the rendered output

There are three scenarios for idiosyncratic $\mathsf{\LaTeX}$macros that are not rendered by the default resources of $\mathsf{valep\TeX}$and/or get only default rendering (cf. section B.3). First, many idiosyncratic $\mathsf{\LaTeX}$macros are simply irrelevant to the rendered output and can be ignored. Second, there is an easy way to render a macro using the tools described in section B.3. Third, there still might be cases where it turns out to be impossible to directly produce suitable output with $\mathsf{valep\TeX}$while leaving the original $\mathsf{\LaTeX}$document unchanged. In this third scenario, some changes to the original $\mathsf{\LaTeX}$documents become necessary. Three cases are noteworthy.

Minor hacks It often may be simply a bug or an idiosyncratic feature of the parser that leads to an unwanted output. In cases like that you may just add some whitespace or braces that leave the pdf output unchanged but brush up the HTML output. If this does not work, chose the next option:

Provide alternative $\mathsf{\LaTeX}$content for the renderer The simplest way to systematically change your $\mathsf{\LaTeX}$macros in ways that prepare for the parser is to create additional parameters. The soft option for this is that you add an optional parameter to a macro which not yet has one. For this purpose you need to increment the number of paramters in the command declaration and also increment all #n expressions. Consider the following example:

\newcommand{\mymacro}[1]{\par\bigskip#1\par}

This macro switches to a new paragraph and sets a bigskip before the paragraph starts. In the renderer you can of course simulate this, e. g. in the following form:

name: mymacro
</p><p>{{ obj }}</p>

And you could also add some style information here that increases the space between the paragraphs. But HTML layout is very different from $\mathsf{\LaTeX}$layout, which is also the reason why macros like \bigskip are generally ignored by the renderer. So, you may decide to add no additional vertical space by default, but enable the user to insert one in specific cases. For this purpose you may change the specification of \mymacro:

\newcommand{\mymacro}[2][]{\par\bigskip#2}

as well as the class declaration and rendering information:

class mymacro(Command):
    args = '[ skip ] text'	
	
name: mymacro
</p><p{% if obj.attributes.skip %}
 style="margin-top:{{ obj.attributes.skip }}"
{% endif %}>{{ obj }}</p>

Now you can add spacing information to your command:

\mymacro[300em]{This is the most vertically spaced 
 text I ever saw.}

The harder way to add options for the renderer would be to create an additional required parameter, which would imply that every instance of the command must contain an additional curly brace block. The advantage of this strategy is that you can entirely separate the code for the $\mathsf{\LaTeX}$document and the renderer.

\newcommand{\latexhtml}[2]{#1}

This command ignores the second parameter in $\mathsf{\LaTeX}$while you may deal with it in $\mathsf{valep\TeX}$as follows:

class latexhtml(Command):
    args = 'latex html'

name: latexhtml
... {{ obj.attributes.html }} ...

In this case, however, you still have to write the code for the second parameter in $\mathsf{\LaTeX}$because it will run through the parser. If this is also not working you can chose the second main alternative:

Use non $\mathsf{\LaTeX}$code inside $\mathsf{\LaTeX}$ If it is impossible or just too complicated to express what you want to get by the renderer in $\mathsf{\LaTeX}$code, then you may decide to create a command or environment instancing the “Verbatim” class in the way described in section C.

\newcommand{\plainhtml}{}

class plainhtml(VerbatimCommand):
    args = 'html'
    
name: plainhtml
{{ obj }}

This command produces no $\mathsf{\LaTeX}$output and its class specification identifies it as a subclass of “VerbatimCommand”, which means that the parser leaves the code untouched. Whereas the second parameter of \texthtml must contain $\mathsf{\LaTeX}$code, the parameter of \plainhtml may contain any code that will be reproduced token by token by the renderer, so you can possibly write in your $\mathsf{\LaTeX}$document:

\plainhtml{<img src="myimage.svg" width="200" />}

Recall that only this command will cause HTML to interpret the content as an image tag, whereas the standard $\mathsf{\LaTeX}$“verbatim” environment will lead to the output of the string

<img src="myimage.svg" width="200" />

You can, of course, also combine the two main strategies in all kinds of ways.

Appendix 2: Formulas with $\mathsf{\LaTeX}$and MathJax

$\mathsf{\LaTeX}$is still the only useful tool for typesetting formulas. These formulas can be also displayed in HTML using https://www.mathjax.org/. For example:

\[\mathfrak{e}(\text{tf} , \mathfrak{K}_i , e) = \sum ^s_{m=0} [m \times \mathfrak{c}(h_m , e)]. \]

The only limitation in MathJax is that you need to deal with custom Macros first. For example, there may be a macro \boldP that you use in formulas and which is defined this way:

\newcommand{\boldP}{\ensuremath{\mathbf{P}}}

In Formulas you then can say things like $\boldP = x^2$ but if you render this formula with MathJax without telling it the command specification it leads to an undesired result of the form \boldP$=x^2$ because the macro is not defined in MathJax. Instead of explicitly defining this macro inside of MathJax you can also include it to the preamble and let $\mathsf{valep\TeX}$do the processing. This has the additional advantage that even in those cases where you use the macro in text mode, $\mathsf{valep\TeX}$will automatically transfer it to math mode and thus it will be correctly rendered by MathJax. This certainly covers most cases where people want to use custom macros in formulas.

The second scenario is less important and $\mathsf{valep\TeX}$does not offer an internal solution at this stage. It is the scenario of macros being not mere text macros and which MathJax does not correctly interpret. These can be either your own custom macros or some esoteric math stuff that MathJax simply does not know. An example for the latter case is the command \nicefrac{num}{denom} from the nicefrac package. This macro works both in math and text mode and produces nice fractions like ²⁄₃. Unfortunately, there is no rendering for this command available in MathJax, so you need to deal with it by yourself. You can specify a rendering procedure for the macro:

class nicefrac(Format):
    args = 'num denom'

name: nicefrac
<sup>{{ obj.attributes.num }}</sup>⁄
<sub>{{ obj.attributes.denom }}</sub>

This rendering is not particularly good but it works in text mode, e.g. \nicefrac{2}{3} produces ²⁄₃. However, the rendering does not reach math mode, so, if you write $\nicefrac{2}{3}$instead, the result will be \nicefrac23. There is simply no support for custom macros with parameters in the present version of $\mathsf{valep\TeX}$: cases like this must be handled directly in MathJax. Still, for the bulk of all $\mathsf{\LaTeX}$scenarios this will not cause any issues.

Appendix 3: Footnotes, margin notes and the like

Footnotes, endnotes, margin notes, and the like – for simplicity’s sake, I will refer to them as “footnotes” – need to be treated differently in print and on a screen, simply because a screen is structured differently from a sheet of paper. There are many different ways to create footnotes on a screen, the following three seem to be the most important. First, one can simply put the note at a different place of the document or even in an extra document. To achieve this some postprocessing of the html output of $\mathsf{valep\TeX}$would be necessary, to correctly position the notes at the desired place.

Fortunately, the other two varieties are not only much more useful for editorial purposes but also involve no postprocessing at all. One may either put the footnote somewhere in the margin*^* Note that you can see this feature only in the HTML output because in $\mathsf{\LaTeX}$this is simply a footnote. The other option, which I generally prefer, is to put the footnote close to the original text in a popup window.1This is my prefered way of typesetting footnotes in HTML/XML. In this case, the CSS and Javascript are adjusted in such a way that you can even keep the note on the screen while clicking on it (click again for deactivating it – and bare in mind that you can see this only in the HTML output 😉).

As you can see, this even works in tabulars and boxes.

†

^†

Note, however, that the position is not identical to the previous margin note. Positioning in tables is actually not as straightforward for margin notes, which is another reason why I prefer popup notes.

Although $\mathsf{\LaTeX}$ forces you to split the notes here into a label part

Another margin footnote.

and a text part, the result is as desired.
Exercise: look what happens if you change the rendering file, e. g. to also display the content of \fnEtext.

Appendix 4: $\mathsf{\LaTeX}$for data entry

$\mathsf{\LaTeX}$is undoubtedly a dinosaur in the field of markup languages. Its main shortcomings are

A very complicated syntax full of exceptions and ad hoc solutions.
A one-for-all solution: no clear distinction between markup language, style language and scripting language.
Lack of flexibility: the $\mathsf{\TeX}$basis of $\mathsf{\LaTeX}$leads to serious limitations

Compare it to HTML/XML. There is an extremely simple syntax; there is a clear distinction between the markup language (HTML/XML), the stylesheets (e. g. in CSS), and the scripting (e. g. with Javascript or Python); and there is infinite flexibility, since neither HTML nor XML is tied to a base tool or engine and can therefore be extended in any direction. Nevertheless, even HTML/XML have their limitations, albeit at different levels than $\mathsf{\LaTeX}$.

They are less suitable for printing than $\mathsf{\LaTeX}$.
They do not provide adequate typesetting for formulas.
They are poorly suited for manual data entry.

To start with the third point, the extremely simple syntax of HTML/XML comes at a price, because it implies that these markup languages are hardly usable for manual data entry by those who prefer to “speak” a markup language directly rather than use a WYSIWYG tool. The reasons for preferring manual data entry may be partly a matter of taste, but there are also strong logical reasons for rejecting WYSIWYG tools, since manual data entry is the only way to get full control over the output. Especially for complex edition projects, manual data entry will usually be the best choice. And this is where $\mathsf{\LaTeX}$offers unbeatable advantages. The complex syntax of $\mathsf{\LaTeX}$is mainly, if not exclusively, due to the efforts of its inventors to facilitate manual data entry. $\mathsf{\LaTeX}$is a language that is perfectly designed to be used by real people to edit their texts while directly using the markup language.

When it comes to formulas, every editor ends up using $\mathsf{\LaTeX}$anyway. So why not just use $\mathsf{\LaTeX}$from the start, because it is so much better for manual data entry, and formulas are just an exclusive domain of $\mathsf{\LaTeX}$? And there is even a third reason to choose $\mathsf{\LaTeX}$as a platform for data entry, namely for all editions that are not exclusively electronic, but plan to produce a printed version or at least a pdf version. Here, the peculiarities of book production make $\mathsf{\LaTeX}$the best choice. Books are in many respects much more complex typesetting tasks than web pages. Editors can create a $\mathsf{\LaTeX}$for perfect book and print typesetting, as well as for perfect formula typesetting, and then produce HTML/XML and other non-printing formats from it, while simply discarding any information that is purely a matter of print typesetting.

Have fun with $\mathsf{valep\TeX}$!

Processed with $\mathsf{valep\TeX}$, Version 0.1, May 2024.

The valep\(\mathsf{\TeX }\) Handbook

A. What \(\mathsf{valep\TeX}\)is [not]

1. First steps

B. A tour through \(\mathsf{valep\TeX}\)

1. \(\mathsf{valep\TeX}\) is a semantic converter …

2. …and therefore it converts only text macros

3. Five different ways to process a macro

4. How to specify a layout for the HTML output

C. The class hierarchy of \(\mathsf{valep\TeX}\)

D. The implementation of \(\mathsf{valep\TeX}\)

1. The \(\mathsf{valep\TeX}\)file structure