Wgml Sequencing
From Open Watcom
Contents |
Introduction
This page is intended to consolidate information developed while working on other topics on the sequence in which wgml 4.0 performs various actions. Although some "rounding out" of the topics is unavoidable, a comprehensive discussion of these topics lies in the future.
Duplicating the steps shown is mandatory only to the extent that following the same sequence as wgml 4.0 is needed to ensure that our wgml produces the same output file from the same input.
From time to time, statements are made about where and when text output occurred. This always refers to output intended to be part of the document, as opposed to the control codes or, in the test framework, identifying text, emitted as a result of interpreting the various compiled function blocks. It might be wondered how it was possible to be certain that no text output occurred when that output included space characters. These steps were taken to ensure accuracy in this matter:
- The :DEVICE block was given an :OUTTRANS block which translated " " to "|".
- Only %image() (never %text()) was used for the function block output, and any embedded spaces were not converted.
- All text output was interpreted.
Thus, spaces intended to appear in the document when printed appear as "|" characters and were quite obvious -- as was their absence.
The test framework used implemented all of the function blocks. When a block is identified as not being interpreted when it is expected to be, that applies to the situation when that block exists: that is, the block exists, a context exists in which it is usually interpreted, and yet in this particular case it is not.
Blocks which do not exist, of course, cannot be interpreted. Every statement that a block is interpreted at a particular point must be understood as qualified by "if that block exists". Unless otherwise noted, the effect of a block not existing is identical to it existing and doing nothing whatsoever when interpreted.
Startup and General Processing
This section discusses what can be said, from the evidence available and from reasonable hypotheses, of how wgml 4.0 starts up document processing. A few insights into the general course of document processing are also documented.
The Initial Startup Sequence
The startup sequence appears to be:
- Process the command line and other startup activities (hypothetical, but quite likely).
- Extract the information for the specified device from the binary device library.
- Interpret the START :PAUSE block.
- Interpret the START :INIT block.
- If more than one pass was specified, display "pass #1".
- Find the document specification.
- Make the document specification the current file.
- Find any included layout files, and make each of them the current file in turn.
- Find any layout file specified by the command line option LAYOUT and make it the current file.
- Process the layout.
- Make the document specification the current file again. Any included files will be opened and made the current file as they are called for during the rest of the processing.
- Begin formatting the document.
- Interpret the DOCUMENT :PAUSE block.
- Interpret the DOCUMENT :INIT block.
- Perform an implicit %enterfont().
How This Was Determined
There are several factors which determine how much detail wgml 4.0 reports during document processing. These items were used to maximize screen output in investigating sequencing:
- The command-line option "incl" was used so that file names would be displayed as each file was used.
- The command-line option "layout" was used to more fully show how the various files that can affect the layout were processed.
- The command-line option "pass" was used to determine, so far as possible, what happened on each pass.
- The :PAUSE and :FONTPAUSE blocks were implemented so that they produced output if interpreted.
- The blocks in the :DRIVER block were implemented so that they could be clearly identified in the output file.
Now consider this output from wgml 4.0:
WATCOM Script/GML V4.0 Copyright by WATCOM International Corp. 1985,1993. Processing device information *** START PAUSE block. pass #1 Current file is 'e:\progdev\cpp\owtest\wgml\docs\plain.gml' Current file is 'e:\progdev\cpp\owtest\wgml\docs\testlay.gml' Processing layout Current file is 'e:\progdev\cpp\owtest\wgml\docs\plain.gml' Formatting document *** DOCUMENT PAUSE block. *** FONTPAUSE pause01.
examination of the output file shows these blocks in this order:
- The START :INIT block :VALUE block
- The START :INIT block :FONTVALUE block (multiple instances)
- The DOCUMENT :INIT block :VALUE block
- The DOCUMENT :INIT block :FONTVALUE block (multiple instances)
- The :FONTSWITCH block :STARTVALUE block for :DEFAULTFONT 0
- The :FONTSTYLE block :STARTVALUE block for :DEFAULTFONT 0
- The :FONTSTYLE block :LINEPROC block :STARTVALUE block for :DEFAULTFONT 0
- The :FONTSTYLE block :LINEPROC block :FIRSTWORD block for :DEFAULTFONT 0
By using %setsymbol() and %image(%getstrsymbol()) (which returns a non-null result only when after the %setsymbol(), and so the block it is in, has been interpreted) it can be shown that the blocks are interpreted in this order:
- The START :PAUSE block
- The START :INIT block
- The DOCUMENT :PAUSE block
- The DOCUMENT :INIT block
- The :FONTPAUSE block for :DEFAULTFONT 0
- The :FONTSWITCH block :STARTVALUE block for :DEFAULTFONT 0
- The :FONTSTYLE block :STARTVALUE block for :DEFAULTFONT 0
- The :FONTSTYLE block :LINEPROC block :STARTVALUE block for :DEFAULTFONT 0
- The :FONTSTYLE block :LINEPROC block :FIRSTWORD block for :DEFAULTFONT 0
It is, of course, this analysis that produced the bulk of the sequence given above.
The multiple instances of the :INIT block :FONTVALUE block are explored in more detail here.
Supplemental Tests
These tests helped in determining the relative order in which certain actions occurred, and in justifying treating certain sequences as independent of the text line output sequence.
If an invalid document specification is used, then the screen output is:
Processing device information
*** START PAUSE block.
****ERROR**** IO--001: For file 'none'
System message is 'No such file or directory'
Cannot open file
the output file contains:
- The START :INIT block :VALUE block
- The START :INIT block :FONTVALUE block (multiple instances)
This shows that the START blocks are done before wgml 4.0 attempts to locate the document specification.
If a large amount of text, a large number of passes, and the "Pause/Break" key on the keyboard are used with a device configured to write to an OS/2 printer from the DOS version of wgml 4.0, so that OS/2's timeout for DOS printing causes two files to be captured. The first contains exactly what the sequence above indicates, down to and including the implicit %enterfont(); the second begins with the initial vertical positioning. This shows that the sequence above is output to the device on the first pass -- and that the rest of the output to the device is done on the last pass.
If the :LINEPROC 1 of :DEFAULTFONT 0 contains only the pass number, then wgml 4.0 does emit the message
Abnormal program termination: Memory protection fault
but only after the sequence above has completed. It has no problems interpreting the :LINEPROC block :STARTVALUE or :FIRSTWORD block as part of the explicit or implicit invocation of device function %enterfont(). This implies that the sequence shown above is indeed independent of the normal sequencing for text line output.
General Processing
Examination of the screen output (from the :DEVICE block) from wgml 4.0 when more than one pass is specified produces some additional information:
- The current "pass #" is emitted at the start of each pass.
- The same files are opened on each pass in the same order, including any layout file specified on the command line or in an option file.
- The layout, however, is only processed (that is, the message stating that it is being processed only appears) on pass 1.
- The DOCUMENT :PAUSE block, DOCUMENT :INIT block and virtual %enterfont(0) are done on pass 1 (that is, the DOCUMENT :PAUSE block only appears on pass 1).
- The :FONTPAUSE block for :DEFAULTFONT 0 which is part of the implicit %enterfont() only appears on pass 1.
- The remaining :PAUSE and :FONTPAUSE blocks do not appear until the last pass is done.
The Last Pass
This section uses concepts which are developed here and here.
The last pass is when the bulk of the output file is produced. This does not necessarily mean that the various blocks are not interpreted during the preceding passes; indeed, the fact that the global symbol table can be read and written in these blocks quite likely means that they are interpreted during each pass; but their output only reaches the screen (for the :DEVICE block) or the output file (for the :DRIVER block) on the last pass.
These actions are observed at the start of the last pass:
- Perform the initial vertical positioning.
- Establish the left margin.
- Output the first text line.
These steps will now be discussed in greater detail.
Initial Vertical Positioning
The initial vertical positioning can be modeled as a straightforward adaptation of the normal vertical positioning sequence. Starting with the value of "0" for the fields currentState.y_address and desiredState.y_address and the return value of device function %y_address(), setting the value of field desiredState.y_address to the desired value and applying that sequence will produce the observed effects.
It should be kept in mind that, if the :ABSOLUTEADDRESS block is defined, then, since no :NEWLINE blocks appear, the value returned by the device function %y_address() in the :FONTSTYLE block :LINEPROC block :ENDVALUE block which now immediately follows the implicit %enterfont(0) is still found set to the desired vertical position. The actual positioning occurs when the first line of text is actually output, using the :ABSOLUTEADDRESS block (unless %dotab() is encountered and causes the :ABSOLUTEADDRESS block to be interpreted earlier).
Establishing the Left Margin
The left margin is established in this context:
- The font used is :DEFAULTFONT 0.
- The value returned by device function %font_number() is "0".
- The value returned by device function %x_address() is "0".
- The value returned by device function %y_address() is the value set as part of the initial vertical positioning.
using this sequence:
- Interpret the :LINEPROC block :ENDVALUE block.
- Set desiredState.x_address to the value corresponding to the left margin.
- Set the value returned by device function %x_address() to the value of desiredState.x_address.
- Interpret the :LINEPROC block :STARTVALUE block.
- Interpret the :LINEPROC block :FIRSTWORD block, if defined; if not, interpret the :LINEPROC block :STARTWORD block.
The device function %dotab(), if used in the blocks interpreted in the last two steps, will cause horizontal positioning to occur. This implies that currentState.x_address has the value "0".
If device function %dotab() is used in more than one block, it will only cause horizontal positioning in the first block. Thus, in terms of the model, the value of currentState.x_address is set to the current print head position after the horizontal positioning has occurred..
First Text Line
The first text line is the first line actually output to the device. This can be part of a title page, or a heading (the :H0 heading was tested), or part of a banner (one with "body" and "top" specified was tested), or a horizontal line drawn with characters defined by the :BOX block (both boxes produced by control word .bx and by tag :FIG were used), and perhaps other features not yet discovered, as well as the first line of actual document text.
For anything except actual document text, that is, at least as seen so far, anything except text in a block controlled by tag :P., the line is processed per the appropriate sequences, keeping in mind that the vertical positioning has already been done and so will not be done again. If an indent is specified, it is handled in the usual manner.
For actual document text, at least when in a block controlled by tag :P, then, if there is an indent, a second sequence appears, separately establishing the indent.
This is the context:
- The font used is :DEFAULTFONT 0.
- The value returned by device function %font_number() is "0".
- The value returned by device function %x_address() is "0".
- The value returned by device function %y_address() is the value set as part of the initial vertical positioning.
If device function %dotab() was encountered in establishing the left margin, the value returned by device function %x_address() will not be "0" but rather will reflect the horizontal positioning done as a result.
This is the sequence used:
- Interpret the :LINEPROC block :ENDVALUE block.
- Set desiredState.x_address to the value corresponding to the left margin plus the indent.
- Set the value returned by device function %x_address() to the value of desiredState.x_address.
- Interpret the :LINEPROC block :STARTVALUE block.
- Interpret the :LINEPROC block :FIRSTWORD block, if defined; if not, interpret the :LINEPROC block :STARTWORD block.
The device function %dotab(), if used in any of the blocks interpreted in this sequence, will cause horizontal positioning to occur, if not done previously. The value of currentState.x_address is set to the current print head position after the horizontal positioning has occurred.
Thus, the indent is established in the same sense that the margin was. After this, the line is processed per the appropriate sequences, keeping in mind that the vertical positioning has already been done and so will not be done again.
Possible Future Research
The WGML Reference discusses the various values of the various place attributes in terms of when the function blocks are interpreted, and the terminology poses some questions that may need to looked at in the future. Consider this table, where the third column summarizes the event which causes the corresponding block to be interpreted:
Block Place Location :INIT START wgml starts processing the input source :INIT DOCUMENT wgml starts processing a document :FINISH DOCUMENT wgml finishes processing a document :FINISH END wgml finishes processing the input source :PAUSE START wgml begins processing the source input :PAUSE DOCUMENT wgml begins processing the document text :PAUSE DOCUMENT_PAGE the beginning of each document page :PAUSE DEVICE_PAGE wgml begins a new page on the output device
The last two lines will be discussed below.
It is an open question whether or not "the input source" and "the source input" refer to the same concept; it is quite likely that they do, although, technically, the first would refer to the stream from which the input is received, and the latter to the input itself.
It is also an open question whether or not "a document" and "the document text" refer to the same concept; and this case is less clear, since "a document" might refer to the "document specification", of the "document text" is but a part. Unless, of course, what is meant by "the document text" is "the document specification text". Alternately, either or both could refer to the output file.
Whether it is worth while to investigate these questions is anybody's guess at this point.
A more interesting question is the distinction between "the input source/a source input" and "a document/the document text".
As far as the :PAUSE and :INIT blocks are concerned, the sequencing above suggests a straightforward interpretation: "input" includes everything wgml 4.0 uses to produce the document, including all command line options, while "document" refers to a subset of the "input", that is, the document specification itself.
And it has long been known (see the discussion toward the end of this section) that, if an END :FINISH block is present, any DOCUMENT :FINISH block will be ignored by wgml 4.0 -- which implies that these two events occur at the same time.
So, since the order of events is known, it is not clear that this question needs to be further examined either. Only time will tell how important it is to investigate these issues.
Turning now to DOCUMENT_PAGE and DEVICE_PAGE, the distinction between "document page" and "device page" is also a question that may or may not require further study. The WGML Reference defines a document page in this way in Section 15.10.2.1 PLACE Attribute:
A document page is the amount of output that WATCOM Script/GML formats for a page in the document. The document page may be smaller or larger than the physical page produced by the output device. If the page being printed is both the document page and the device page, the document page pause block takes precedence over the device page pause block.
A good example of this difference can be seen by using device TERM: generally, the document pages are longer than the screens (device pages), and the pauses are written to reflect the difference between starting a new page and continuing the current page.
This was also observed using the very simple document specification described here. Each DEVICE_PAGE or DOCUMENT_PAGE :PAUSE block interpretation was paired with an interpretation of the :NEWPAGE block in the output file. When the header and footer were re-enabled, headers and footers only appeared in conjunction with a DOCUMENT_PAGE :PAUSE block.
Other aspects of the concept of "selected fonts", such as the use of the command line option FILE to associate font names and font styles with font numbers for which no :DEFAULTFONT block was defined in the :DEVICE block will need to be explored: do these become generated :DEFAULTFONT instances? How do they interact with any such instances resulting from the use of a font name with the :BOX block or :UNDERSCORE block? Other ways of pairing, in effect, :DEVICEFONT instances and :FONTSTYLE instances will also have to be explored for their effect on the number of :DEFAULTFONT instances used for a given document specification.
Outputting Lines
Our wgml is intended to replace the existing wgml 4.0 in the Open Watcom documentation build system, which means that it should produce the same output as wgml 4.0 when given the same inputs. These "outputs" are text files which are used, in the case of device WHELP, as input to whlpcvt.exe, and, in the case of PS, primarily for the creation of PDF files. This means that our wgml will not only have to emit lines of text but also do so identically to wgml 4.0. Clearly, as time goes by, exactly what this involves will become clearer and clearer.
It has been noted elsewhere that the code written so far necessarily depends on a model of wgml 4.0 which may prove to be, if not actually incorrect, then less useful than alternate models which might develop in the course of implementing our wgml. That is just as true here as anywhere else; the code that results from these investigations should be regarded as a useful first draft, subject to revision as needed.
The Physical Device Model
The physical device model that works best in describing how wgml 4.0 outputs a text line that of a dot matrix printer. Even when the actual output is a disk file, the terminology still works. These are some notes on the terms used:
- A page is a physical piece of paper, on which (a part of) the document is printed. When used of an output file, this is the text between any of
- the start of the file and the first :NEWPAGE block;
- any two successive :NEWPAGE blocks;
- the final :NEWPAGE block and the end of the file.
- A line is a specific vertical position on the page. When used of an output file, it is the location the line will have in the final product.
- A print head is a physical device which prints a letter on the page. When used of an output file, it is a short form of print head position or print position.
- The print head position or print position is the position of the print head. When used of an output file, it is the location the next character output will appear in the final product.
- A pass is the physical movement the print head over a given line on the paper. The various "overprint" font styles are created by specifying multiple passes over the same line. In an output file, of course, each pass appears on a separate line, but, in the final product, they will be printed on top of each other. This term is often used to designate the actions taken by wgml 4.0 during each pass.
Text Line Model
This section is based on a very simple, quite vague, model of how wgml 4.0 produces text lines:
The value of wgml 4.0 lies in how it uses its layout, tags, control words, symbols, macros, and so on to produce a document. Nonetheless, if all that processing is factored out, wgml 4.0 can be said to produce a sequence of text lines for output, preceeded by the :INIT block(s), followed by the :FINISH block, and with a small number of other blocks embedded in the sequence of text lines (some startup material, the :NEWPAGE block, the :HLINE block, the :VLINE block, and the :DBOX block, and boxing lines drawn using the :BOX block).
The rest of this section decribes the model, not necessarily the reality, of how wgml 4.0 processes each text line.
A text line is a linked list of these structs:
struct TextChars {
TextChars * next;
uint8_t font_number;
int32_t x_address;
uint32_t length;
uint16_t repetitions;
uint16_t count;
uint8_t chars[count];
}
This struct is said to control the sequence of non-space characters pointed to by field chars. This struct will be discussed in more detail below.
Each line of text is assembled in an instance of this struct:
struct TextLine {
int32_t y_address;
TextChars * first;
}
through this procedure:
- Set the field y_address to the desired line.
- If a TextChars instance is left over from the previous line, make it the first TextChars instance in this line, adjusting its field x_address to the appropriate starting position for the line and adjusting the remaining line length.
- Acquire the next TextChars instance and determine its length. If it will fit on the current line, attach it to the list, set its field x_address appropriately and adjust the remaining line length.
- Repeat step 3 until the new TextChars instance will not fit on the line. Save this instance (and its length) for use with step 2 of the next line.
- If justification is to be done, do it now.
- Output the text line to the device.
Additional insights into the layout procedure can be found here.
The fields in struct TextLine are used in this way:
- TextLine.y_address encodes the position on the page of the line on which the text is to appear. Note the distinction between "the line on which the text is to appear", which is a number, and "the text line", which is the text which is to appear.
- TextLine.first is a pointer to the first TextChars instance.
The fields in struct TextChars are used in this way:
- TextChars.next is a pointer to the next TextChars instance.
- TextChars.font_number encodes the font number, which, for now at least, is taken to be a binary :DEFAULTFONT block and so to specify a :FONTSTYLE instance, a :DEVICEFONT instance and (through the :DEVICEFONT instance) a :FONTPAUSE instance and a :FONTSWITCH instance.
- TextChars.x_address encodes the position on the line of the first character in the sequence of non-space characters.
- TextChars.length encodes the length of the sequence of non-space characters. This is the value that will be used to increment, among other things, the value returned by device function %x_address(). It is 32-bit because the corresponding attributes are 32-bit.
- TextChars.repetitions contains the number of times the sequence of non-space characters is to be output.
- TextChars.count contains the number of characters in the sequence of non-space characters.
- TextChars.chars contains a pointer to the first non-space character in the sequence of non-space characters.
The positional fields are signed in case they need to be used with negative numbers, as for relative positioning or for devices that position the print head relative the bottom or right side of the page. It is known that most 32-bit attribute values are, in fact, limited to $7FFFFFFF, that is, to the positive values of an int32_t, so the model includes the possibility that that negative values may occur within wgml 4.0.
The field TextChars.chars is not a pointer to a null-terminated string because the value is expected to be a pointer into a buffer containing (potentially) the entire text of the document, which will not consist of null-terminated strings. In these cases the value of the field repetitions would be "1". This allows the text to be processed to be separated from the structure encoding how it is to be output.
In some cases, the TextChars field chars would point at a single character, such as one of the boxing characters or the underscore character. In this case, the field repetitions would contain a value larger than "1", indicating that the character is to be printed multiple times.
It is important to notice two things that this model does not require:
- This model does not require wgml to construct the entire page before outputting any part of it.
- This model does not require that a text line to be output actually exist physically in a contiguous buffer.
The last point, of course, refers to the pre-output state of the text line: the result of outputting it is, indeed, to place it into a contiguous output buffer, from which it is sent to the device (or output file).
Insights Into wgml 4.0 Layout
This section contains information developed during testing which provides insight into how wgml 4.0 does the layout for a document. It is necessarily incomplete.
When drawing horizontal lines using the characters defined by the :BOX block, the code doing layout clearly forms the entire line in a buffer and uses, in terms of the model, a single TextChars instance to control it. It also uses a special sequence to output it.
Several other instances in which a complete line, including internal spaces, is controlled by a single TextChars instance, exist:
- Text in a box created with tag :FIG -- but not with control word .bx.
- Index entries -- but not the page number, at least, not as part of the same TextChars instance.
- The title "Table of Contents" -- but not the entries.
- The title "List of Figures" -- but not the entries.
In some cases, of course, the text may already be in a contiguous buffer and so require no additional construction. So far, these appear to use the normal output sequence for text lines.
A TextChars instance will be said to be "empty" if its count field contains a NULL pointer. Such an instance can be used to produce horizontal positioning using a specified font. The :LINEPROC block :STARTWORD and :ENDWORD blocks will appear in their normal positions. Such instances are presumed to be produced by the wgml 4.0 layout code, so the text line output sequence need only process the TextChars instance.
The use of "empty" TextChars instances is unavoidable: consider how these tags are used:
:HP0. :HP1. :HP2. :HP3. :SF.
These are usually used to surround a phrase, starting with a non-space character. Except for the very first word in a paragraph or other layout element, this phrase will be preceded by text, which will be controlled by preceding TextChars instance. The text controlled by a TextChars instance ends with the last non-space character. This leaves a space character which has the same font as the previous TextChars instance. When that font differs from the font specified by the tags shown, then an empty TextChars instance will be needed to ensure that that space is output. The value of the x_address fields of the empty TextChars instance and the first TextChars instance of the phrase will be identical: the horizontal positioning will be done by the empty TextChars instance, none will be done by the first TextChars instance of the phrase. Note that this information is based solely on tests of :HP1 versus default text (implicit :HP0). It does, however, apply to spaces that follow the text in an :HP1 phrase but are part of the phrase: it is not unique to the default font, although that is where it is most likely to be seen in practice.
When the font styles of the empty TextChars instance had a subsequent pass (variants of "plain" and "bold" with three :LINEPROCs each were used), then the horizontal positioning appears on the subsequent passes as well, at least within the TextLine.
An empty TextChars instance sometimes occurs at the start of a text line. This is actually an extension of the situation discussed above: in this case, the phrase starts a new line. Investigation, including the use of "uscore" (for variety) on the very first word of text, suggests that these are the rules:
- The first text line uses, on the first pass, the :DEFAULTFONT of the first TextChars instance.
- All subsequent text lines use, on the first pass, the :DEFAULTFONT of the last TextChars instance in the preceeding TextLine.
- On the subsequent passes, these rules appear to apply:
- If all TextChars instancess are associated with a font style that defines a :LINEPROC instance for that block, then the same rules apply as do to first pass.
- In other cases, it is ignored and the :DEFAULTFONT of the first TextChars instance which uses a font style which has a :LINEPROC defined for that pass is used for the initial horizontal positioning as well as the text output (if any).
In terms of the model being used, the correct :DEFAULTFONT to use with the initial horizontal positioning must be determined by the page layout code of wgml 4.0 (which, after all, deals with text lines not passes and so "sees" all the TextChars instances, not just those with :LINEPROC blocks for a particular pass) and presented to the text output code with the appropriate initial TextChars instance already present in the TextLine provided. The subsequent passes, necessarily, have this TextChars instance, but the sequencing ignores it in the cases indicated.
Careful testing confirms that, if the document specification has a line break after the space encoded by the empty TextChars instance, then the behavior differs from that seen when no such line break exists. The actual behavior also depends on whether or not the empty TextChars has "0" as the value of its font_number field. That is to say, when the space between "first" and "sentence" is in the default font,
:P.This :HP1.is the :eHP1.first :HP1.sentence of:eHP1.
then a TextChars instance which does horizontal positioning only is not produced, but rather a perfectly ordinary TextChars doing both horizontal positioning and text output, while
:P.This :HP1.is the :eHP1.first :HP1.sentence of:eHP1.
does produce a TextChars instance which does horizontal positioning only, and the next TextChars instance starts its text ("sentence") on the left margin. On the other hand, when the space between "is" and "the" is part of the :HP1 phrase (and so uses font style "bold"),
first:eHP1. paragraph. This :HP1.is :eHP1. the second sentence in
then a TextChars instance which does horizontal positioning only is produced, but the next TextChars instance starts its text ("the") further in from the left margin (wgml 4.0 prints two spaces immediately before "the"), while
first:eHP1. paragraph. This :HP1.is :eHP1.the second sentence in
does produce a TextChars instance which does horizontal positioning only, and the next TextChars instance starts its text ("the") on the left margin. This was tested with various :DEVICEFONT and :FONTSTYLE instances: all that mattered is whether the default font (:DEFAULTFONT 0) was involved or not and whether a newline was present or not. Neither the :DEVICEFONT nor the :FONTSTYLE associated with :DEFAULTFONT 0 made any difference in the pattern shown.
Thus, intentionally or not, wgml 4.0 does distinguish between a space character and a space character followed by a newline character; it also treats the default font differently from the other fonts. When our wgml's layout code is written, additional research on this topic may be needed.
When a TextChars instance would normally control a really long sequence of characters (certainly one longer than the allowed line length will cause this, although it is possible that shorter sequences may cause this also in some contexts), wgml 4.0 will hyphenate it. It does this by finding the character which would appear at the right margin, and breaking the word at that point, so that that character starts a new TextChars instance, and places a hyphen in the right margin position. No attempt is made to find stems or other "allowed" hyphenation points. When our wgml's layout code is written, additional research on this topic may be needed.
If the information in wgml Fonts is carefully considered, then it is apparent that:
- input translation occurs before width computation; and
- width computation occurs before output translation.
Indeed, those steps must occur as part of forming the TextChars instances into a TextLine instance. Output translation must occur much later, when the text is placed into the output buffer and so is not actually part of the layout process at all.
Program Context Model
This section will, eventually, discuss what program context the sequences presented require.
A few definitions are needed, in no particular order at present:
- The phrase text line will refer to text, whether resulting from ordinary text, a title page, a banner, a title, or any other named feature, which is to be printed out on the same line of a page by the device.
- The phrase initial vertical positioning will refer to the establishment of the vertical position of the first text line in the output document.
- The phrase initial horizontal positioning will refer to the zero-based horizontal location specified by the first TextChars instance in a TextLine instance.
- The phrase internal horizontal positioning will refer to the space between TextChar instances in a TextLine instance.
This struct is postulated:
struct PageState {
uint8_t font_number
uint32_t y_address
uint32_t x_address
}
and from it these two variables are postulated:
currentState desiredState
The variable currentState will always contain the current font number and location of the print head. The variable desiredState will contain the values which are to be adopted next. Note that neither of these contains the values returned by device functions %font_number(), %x_address(), or %y_address().
These flags are postulated:
text_output text_pass uline_on
The flag text_output is used to indicate whether or not the current line of text is considered to be following a preceding line of text. The flag text_pass is used to control whether or not text is to be output. The flag uline_on is used to control whether or not the character provided by the :UNDERSCORE block is to be output.
The Sequence for Text Lines
This sequence appears to apply to text lines, wherever they occur. It does not apply to horizontal and vertical lines using the characters defined by the :BOX block.
The sequence appears to be:
- Set the value of field desiredState.y_address to the value of field of the TextLine.y_address.
- Set the value of field desiredState.font_number to the value of field TextChars.font_number of the first TextChars instance.
- Set the value of field desiredState.x_address to the value of field TextChars.x_address of the first TextChars instance.
- If the value of the flag text_output is "true", interpret the :LINEPROC block :ENDVALUE block using the value of the field currentState.font_number to identify the appropriate :FONTSTYLE block.
- Perform the normal vertical positioning.
- Perform the first pass.
- For each subsequent pass:
- Set the value of field desiredState.font_number to the value of field TextChars.font_number of the first TextChars instance.
- Set the value of field desiredState.x_address to the value of field TextChars.x_address of the first TextChars instance.
- Interpret the :LINEPROC block :ENDVALUE block using the value of the field currentState.font_number to identify the appropriate :FONTSTYLE block.
- Perform the overprint vertical positioning.
- Perform the subsequent pass.
The Sequences for Boxing
These sequences appears to apply to to horizontal and vertical lines using the characters defined by the :BOX block.
-- these are just copies of the above -- will be revised -- -- there may only be one sequence, the intended distinction is between :FIG text lines and everything else --
The first sequence appears to be:
- Set the values of the fields of desiredState to the values of the corresponding fields of the TextLine and its first TextChars instance.
- Set the value returned by device function %font_number() to the value of the field desiredState.font_number.
- If the value of the flag text_output is "true", interpret the :LINEPROC block :ENDVALUE block using the value of the field currentState.font_number to identify the appropriate :FONTSTYLE block.
- Perform the normal vertical positioning.
- Perform the first pass.
- For each subsequent pass:
- Interpret the :LINEPROC block :ENDVALUE block using the value of the field currentState.font_number to identify the appropriate :FONTSTYLE block.
- Perform the overprint vertical positioning.
- Perform the subsequent pass.
The second sequence appears to be:
- Set the values of the fields of desiredState to the values of the corresponding fields of the TextLine and its first TextChars instance.
- Set the value returned by device function %font_number() to the value of the field desiredState.font_number.
- If the value of the flag text_output is "true", interpret the :LINEPROC block :ENDVALUE block using the value of the field currentState.font_number to identify the appropriate :FONTSTYLE block.
- Perform the normal vertical positioning.
- Perform the first pass.
- For each subsequent pass:
- Interpret the :LINEPROC block :ENDVALUE block using the value of the field currentState.font_number to identify the appropriate :FONTSTYLE block.
- Perform the overprint vertical positioning.
- Perform the subsequent pass.
Related Topics
The Normal Vertical Positioning
The normal vertical positioning sequence is:
- If the values of the fields currentState.y_address and desiredState.y_address are different, then:
- Set the value returned by device function %x_address() to "0".
- Set the value returned by device function %y_address() to the value of desiredState.y_address.
- If the :ABSOLUTEADDRESS block is not defined, use one or more :NEWLINE blocks to position the print head vertically to the correct line.
Note these two cases:
- If the :ABSOLUTEADDRESS block is defined, then the values returned by device functions %x_address() and %y_address() are still changed but the print head is not moved.
- If the values of the fields currentState.y_address and desiredState.y_address are the same, nothing happens.
The Overprint Vertical Positioning
The overprint vertical positioning sequence is:
- If the :ABSOLUTEADDRESS block is not defined:
- Interpret the :NEWLINE block for which the value of attribute advance is "0". If no such block exists, then interpret the :NEWLINE block for which the value of attribute advance is "1".
- Set the value returned by device function %x_address() and the value of desiredState.x_address to "0".
Note this case:
- If the :ABSOLUTEADDRESS block is defined, then the value returned by device functions %x_address() and the value of desiredState.x_address are still changed to "0" but the print head is not moved.
:ABSOLUTEADDRESS and :NEWLINE
When the :ABSOLUTEADDRESS block is interpreted, the effect to to do both horizontal and vertical positioning to the location specified by the values returned by device functions %x_address() and %y_address(). Part of interpreting the :ABSOLUTEADDRESS block is deemed to be:
- replacing the value of currentState.x_address with the value of desiredState.x_address; and
- replacing the value of currentState.y_address with the value of desiredState.y_address.
When the :ABSOLUTEADDRESS block does not exist, one or more :NEWLINE blocks are interpreted. Part of interpreting a :NEWLINE block is deemed to be:
- replacing the value of currentState.y_address with the value of the vertical position actually attained.
Of course, when this process is complete, then the value of currentState.y_address will be the same as the value of desiredState.y_address.
When more than one line needs to be skipped, wgml 4.0 appears to follow this simple algorithm:
- Interpret the :NEWLINE block the value of whose attribute advance is as large as possible, but no larger than the number of lines that need to be skipped.
- Reduce the number of lines that need to be skipped by value of the attribute advance of the :NEWLINE block just interpreted.
- If more lines need to be skipped, returned to step 1.
Future Research
Using the default layout, which produces a single-spaced document, the line numbers, as reported by device function %y_address() were successive multiples of "2" whether :ABSOLUTEADDRESS was available or not. When the layout was altered to specify a double-spaced document, the first non-zero line number was still "2", but after that they were incremented by "4". Curiously, the number of lines skipped over by the various :NEWLINE blocks was correct: one for single-spaced, two for double-spaced. An :ABSOLUTEADDRESS block written to use this value as-is would seem to have incorrect line spacing. This may depend on the test devices, that is, it may be an artifact that does not affect real devices.
The value of the :DEVICE block attribute vertical_base_units was "6"; the value of the :FONT block attribute line_height was "1"; the value returned by device function %line_height() was "2": could wgml 4.0 be trying to fit three lines into one inch?
Applying Font Styles
This section deals with the sequence used by wgml 4.0 to output each segment for the current pass. The discussion of the :FONTSTYLE block found here deals with the how the block is used.
In researching this topic, the :FONTSTYLE block :LINEPROC blocks were set up to indicate when the various sub-blocks were interpreted. These indicators started and ended with a : character and included a two-letter abbreviation of the sub-block name, the pass number, and a one to three letter abbreviation of the font style name. These indicators made it very clear where each block of each pass of each style was being interpreted.
The First Pass Sequence
This section deals with the sequence usually used to output the first pass of text lines.
This sequence is used for an entire text line, that is, for all of the TextChars instances in the current TextLine instance, for the first pass. It has some presuppositions:
- The fields of desiredState have been set appropriately.
- The value returned by device function %font_number() is that specified by the field desiredState.font_number.
- The :LINEPROC block :ENDVALUE block, if needed, has been done.
- The vertical positioning has been done.
The sequence itself is:
- Do the "first TextChars instance" sequence on the first TextChars instance in the TextLine.
- Do the "subsequent TextChars instance" sequence on each TextChars instance in turn until a TextChars instance which contains a new value of field font_number is reached.
- Each time a TextChars instance which contains a new value of field font_number is reached:
- Do the "new font TextChars instance" sequence on that TextChars instance.
- Do the "subsequent TextChars instance" sequence on each TextChars instance in turn until a TextChars instance which contains a new value of field font_number is reached.
The "first TextChars instance" sequence appears to be:
- Set the value returned by device function %x_address() to the value of the field desiredState.x_address.
- If a font switch is required, do one. If not, interpret the :FONTSTYLE block :STARTVALUE block.
- Interpret the :LINEPROC block :STARTVALUE block.
- Interpret the :LINEPROC block :FIRSTWORD block, if defined; if not, interpret the :LINEPROC block :STARTWORD block.
- Interpret the :LINEPROC block :STARTWORD block if a font switch was not done.
- If the value of the text_pass flag is "true":
- Do the initial horizontal positioning, if not already done.
- Print out the text controlled by the TextChars instance, if any.
- Update the value of currentState.x_address to reflect the current position of the print head.
- Set the value returned by device function %x_address() to the value of the field currentState.x_address.
- Interpret the :LINEPROC block :ENDWORD block.
If no font switch occurs and no :LINEPROC block :FIRSTWORD block was defined, then the :LINEPROC block :STARTWORD block will be interpreted twice in succession.
The "new font TextChars instance" sequence appears to be:
- Set the value of field desiredState.font_number to the value of field font_number in the current TextChars instance.
- Set the value of field desiredState.x_address to the value of field x_address in the current TextChars instance.
- Set the value returned by device function %font_number() to value of the field desiredState.font_number.
- Interpret the :LINEPROC block :ENDVALUE block, using the value of the field currentState.font_number to identify the appropriate :FONTSTYLE block.
- Do a font switch. (It will be required by definition: if the value of field font_number has not changed, this sequence will not be in effect).
- Interpret the :LINEPROC block :STARTVALUE block.
- Interpret the :LINEPROC block :FIRSTWORD block, if defined; if not, interpret the :LINEPROC block :STARTWORD block.
- If the value of the text_pass flag is "true":
- Do the internal horizontal positioning, if not already done.
- Print out the text controlled by the TextChars instance, if any.
- Update the value of currentState.x_address to reflect the current position of the print head.
- Set the value returned by device function %x_address() to the value of the field currentState.x_address.
- Interpret the :LINEPROC block :ENDWORD block.
Note that the :LINEPROC block :STARTVALUE block appears at most once, and only if no :LINEPROC block :FIRSTWORD block is defined.
The "subsequent TextChars instance" sequence appears to be:
- Set the value of field desiredState.x_address to the value of field x_address in the current TextChars instance.
- Interpret the :LINEPROC block :STARTWORD block.
- If the value of the text_pass flag is "true":
- Do the internal horizontal positioning, if any.
- Print out the text controlled by the TextChars instance, if any.
- Update the value of currentState.x_address to reflect the current position of the print head.
- Set the value returned by device function %x_address() to the value of the field currentState.x_address.
- Interpret the :LINEPROC block :ENDWORD block.
Since this sequence will only be used when the value of field font_number has not changed, no font switch will occur, and the :LINEPROC block :STARTWORD block will always appear.
If any of the blocks interpreted (including all blocks interpreted as part of a font switch) include device function %dotab() and certain other conditions are met, the horizontal positioning will be done during the interpretation of that block.
At this level, a font switch is "required" whenever the font number for the current TextChars instance differs from the value of currentState.font_number. The font switch sequence determines whether an actual "font switch", in sense of actually interpreting any :FONTSWITCH block sub-blocks, is needed.
The Subsequent Pass Sequence
This section deals with the sequence usually used to output the subsequent passes of text lines.
This turns out to be less straightforward than the first pass sequence. One of the reasons for this appears to be that, while all font styles define a first pass (if not done explicitly, then this is used), not all define any subsequent passes. Thus, when processing a subsequent pass, some TextChars instances may be associated with a font style which does nothing on that pass. The one complication that can not occur is a font style that skips a pass: as stated here, the values of the :LINEPROC block attributes pass must, within a given :FONTSTYLE, start at "1" and be numbered consecutively (that is, no gaps are allowed).
The term processed will be used to refer to those TextChars sequences which are associated with a font style which defines a :LINEPASS block for the current pass. The term skipped will be used to refer to those TextChars sequences which are associated with a font style which does not define a :LINEPASS block for the current pass.
Testing was done with three font styles:
- "plain", which had one pass;
- "plain2", which had two passes; and
- "bold", which had four passes.
When a second pass in which all TextChars instances were processed was examined, it was seen to be done with these presuppositions:
- The fields of desiredState have been set appropriately.
- The value returned by device function %font_number() is that specified by the last TextChars instance in the TextLine.
- The value returned by device function %x_address() is that resulting from the processing of the last TextChars instance in the TextLine on the previous pass.
- The :LINEPROC block :ENDVALUE block has been done.
- The overprint vertical positioning has been done.
- The value returned by device function %x_address() was set to "0" by the overprint vertical positioning.
The sequence itself is identical to that given above. The "first TextChars instance" sequence differs only in omitting step 1: the value returned by device function %x_address() is not set to the value of the field desiredState.x_address on subsequent passes. The other two subsequences are identical to those used above.
This difference is what distinguishes the first pass from the subsequent passes as such. But what of passes in which some TextChars instances are skipped?
Initially, these appeared to be a veritable zoo of different patterns. However, it soon became apparent that the pattern for each pass depended almost entirely on which font styles were in use, and in which order. What mattered in most cases was where each skipped TextChars instance was located.
There are three positions in which a skipped TextChars instance may appear:
- It may appear at the start of the TextLine, and so before the first TextChars instance which is processed (initial).
- It may appear within the TextLine between two TextChars instances that are processed (medial).
- It may appear at the end of the TextLine, that is, after the last TextChars instance to be processed (final).
There may, of course, be more than one skipped TextChars instance in any of those positions. Note that a line consisting entirely of skipped TextChars instances does not have the pass in question (nothing appears), so the skipped/processed dichotomy only makes sense in a TextLine that contains at least one of each.
The medial position appears to work this way:
- The :FONTSTYLE block :ENDVALUE block, preceded by the :LINEPROC block :ENDVALUE block, is interpreted at the end of the processed TextChars instance immediately preceding the skipped TextChars instance. Multiple skipped TextChars instances, even with different values for font_number, have no additional effect.
- If the next processed TextChars instance uses the same font style as the last processed TextChars instance, then the :FONTSTYLE block :STARTVALUE block is interpreted; however, since the :FONTPAUSE block is not interpreted, this is not the font switch sequence. The :LINEPROC block :ENDVALUE block does not appear between the :FONTSTYLE block :ENDVALUE block and the :FONTSTYLE block :STARTVALUE block.
- If the next processed TextChars instance uses a different font style than the last processed TextChars instance, then the full font switch is done, starting with the :FONTSTYLE block :ENDVALUE block (which thus appears twice) and including the :FONTPAUSE block. The :FONTSTYLE block :ENDVALUE block is preceded by the :LINEPROC block :ENDVALUE block.
The final position appears to be quite simple:
- The :FONTSTYLE block :ENDVALUE block, preceded by the :LINEPROC block :ENDVALUE block, is interpreted at the end of the processed TextChars instance immediately preceding the skipped TextChars instance. Multiple skipped TextChars instances, even with different values for font_number, have no additional effect. It is then followed by the :LINEPROC block :ENDVALUE block emitted as part of the setup for the next line or pass.
The initial position shows this sequence twice:
- The :FONTSTYLE block :STARTVALUE block was done for "bold".
- The :LINEPROC block :STARTVALUE block was done for "bold" for the current pass.
- The :LINEPROC block :FIRSTWORD block was done for "bold" for the current pass. Prior testing included these blocks, so, if the :LINEPROC block :FIRSTWORD block is not defined, then the :LINEPROC block :STARTWORD block for "bold" for the current pass would be done instead.
The sequence itself is normal; the duplication is not.
The initial position appears to be quite simple:
- The sequence above is done twice on the second or subsequent pass in which some TextChars instances are skipped when the TextLine ends with a TextChars instance which is skipped. It does not matter if the TextLine starts with a skipped TextChars instance or not; if it starts with multiple skipped TextChars instances, even with different values for font_number, there is no additional effect.
It is possible that the reason this does not happen on the first pass in which some TextChars instances are skipped is because that pass begins with a font switch.
The behavior noted above for the initial and final positions of skipped TextChars instances also occurs in a different context:
- The first TextChars instance is skipped on the current pass.
- The last TextChars instance is processed on the current pass.
- All TextChars instances were processed on the preceding pass, which was a subsequent pass, not a first pass.
It must be noted that, even if the other conditions are met, if only the first pass processed all TextChars instances, then this behavior does not occur. It is not clear why this happens; however, there should be no problem writing the code so that it does happen or not as appropriate.
The :LINEPROC Block With %ulineon()/%ulineoff()
Most of the testing done in researching the sequencing used an :UNDERSCORE block which had a non-null character string as the value of attribute font. This attribute can instead take a number (which designates a :DEFAULTFONT block) or an empty string (which causes the underscore to use the same font as the text).
The complexity which resulted from this caused me to restart the testing using, first, just font style "plain" with no markup at all, and then to use "plain" and "bold". This has helped in isolating the effects of using device functions %ulineon() and %ulineoff() when the font attribute of the :UNDERSCORE block is a non-null character string. Testing was done with the overprint versions of both font style uline and font style uscore.
These modifications of the sequences above are found:
- Only the horizontal positioning is actually done using the specified font style.
- The :STARTVALUE block, :FIRSTWORD block, :STARTWORD block, and :ENDVALUE block are all done from pass 2 of the specified font style.
- The font is then changed. However, this is not quite an ordinary font change because no :FONTSTYLE block :STARTVALUE block is interpreted.
- The underscore characters for the current TextChars instance are then emitted.
- A normal font change is then done, starting with the :FONTSTYLE block :ENDVALUE block for font style "plain".
If the :UNDERSCORE block uses numeric value for attribute font, then much the same happens as shown above. The differences are:
- The :LINEPROC block :ENDWORD block is interpreted after the :FIRSTWORD block (and so before the :STARTWORD block) when the horizontal positioning (but not the initial horizontal positioning, that is, the left margin plus any indentation) is done.
- If the font style associated with the designated :DEFAULTFONT block is the same as that assigned in the option file, then the :LINEPROC block :ENDVALUE block is interpreted after the underscores and before the :FONTSTYLE block :ENDVALUE block for that font style.
- If the font style associated with the designated :DEFAULTFONT block is different from that assigned in the option file, then the only difference is that the :FONTSTYLE block :ENDVALUE block is for the font style associated with the designated :DEFAULTFONT block.
When the font attribute of the :UNDERSCORE block was given an empty string as its value, this was observed:
- The normal font switch, ending with the :FONTSTYLE block :STARTVALUE block, was done.
- The first TextChars instance, that is, the one preceded by the left margin, was done as a unit: first the initial horizontal positioning, then the underscore characters, with nothing in between. This was preceded by the :LINEPROC block :STARTVALUE block and (for font style "uline") the :FIRSTWORD block or (for font style "uscore", which had no :FIRSTWORD block) the :STARTWORD block.
- The :FONTSTYLE block :STARTVALUE block was then repeated. No :LINEPROC sub-blocks occurred between the last underscore and this block.
- The remaining TextChars instances were then done. Each was output as a unit (spaces followed by underscores for any text for font style "uscore", a solid line of underscores for font style "uline", that is, the usual and expected behavior) but was preceded and followed by :LINEPROC sub-blocks per the procedure shown above.
- The actual sequence of :LINEPROC sub-blocks leading up to the second TextChars instance (that is, the first of the grouped instances) was (for font style "uline") the :STARTVALUE block, the :FIRSTWORD block, the :ENDWORD block, and the :STARTWORD block (in that order) or (for font style "uscore", which had no :FIRSTWORD block) the :STARTVALUE block, the :STARTWORD block, the :ENDWORD block, and the :STARTWORD block.
Clearly, at some point, a more detailed analysis of the use of these sub-blocks will be needed.
Drawing Lines Using Characters
This section is not about this situation:
- The value of the attribute frame of tag :FIG is a character string.
Suppose the character string is 'abcde'. The output line which resulted with the test file was:
abcdeabcdeabcdeabcdeabcd
This line was output using the normal procedure for a TextLine with a single TextChars instance. The only point of interest is that the entire "horizontal line" was formed as a unit. This turns out to be quite common when wgml 4.0 draws lines using characters, rather than the :HLINE block, the :VLINE block, or the :DBOX block.
While attempting to express the content seen in the output files generated by wgml 4.0 from the test document specifications in terms of the model and structs discussed here, it became apparent that, when the characters specified by the :BOX block are used to create a horizontal or a vertical line, the procedure for outputting those characters differs from the standard procedure given earlier. This section is about this alternate procedure.
A horizontal line appears as a single block, for example:
+----------------+
just as the line formed from the character string did above.
A vertical line appears as a single character, for example:
|
This is what caught my attention: these items are neither preceeded by the :LINEPROC block :STARTWORD block nor followed by the :LINEPROC block :ENDWORD block.
The sequences discussed in this section apply to horizontal lines (not followed by text) associated with the use of both control word .bx and tag :FIG, and with both the top line and the bottom line of the box.
The sequence, which uses the normal font switch sequence if a font switch is required, is:
- If a font switch is required, do one. If not, interpret the :FONTSTYLE block :STARTVALUE block.
- Interpret the :LINEPROC block :STARTVALUE block.
- Interpret the :LINEPROC block :FIRSTWORD block, if defined; if not defined, interpret the :LINEPROC block :STARTWORD block.
- Print out the horizontal or vertical line.
- If a font switch was done in step 1, do a font switch back to the original font. If not, interpret the :FONTSTYLE block :STARTVALUE block.
- Interpret the :LINEPROC block :STARTVALUE block.
- Interpret the :LINEPROC block :FIRSTWORD block, if defined; if not defined, interpret the :LINEPROC block :STARTWORD block.
The value returned by device function %font_number() is what would be expected normally at all points. The value returned by device function %x_address() is constant throughout the sequence -- even after the line has been printed out, device function %x_address() returns the value designating the left margin.
If the value of attribute font of the :BOX block is a character string, then the font style "plain" is used. If the value of attribute font of the :BOX block is a number, then the font style specified by that :DEFAULTFONT is used, but only the first pass :LINEPROC block is used (and, of course, only the :LINEPROC sub-blocks shown above are interpreted): no overprinting occurs.
The steps seen next fall into two categories:
- what happens most of the time; and
- what happens with vertical lines when tag :FIG is being used.
Testing has been patchy both because of my inexperience with Watcom GML (for example, no attempt was made to wrap text around a box and so no horizontal lines followed by text were observed) and because wgml 4.0 does not always emit vertical line characters where expected. Device TERM shows these problems clearly with my use of control word .bx; with tag .FIG, TERM shows the box with a vertical line on the right, something that my test driver never showed.
Most of the time, that is, for control word .bx and for tag :FIG for horizontal lines (at least those with no text following them) this happens next:
- The text controlled by the first TextChars instance, if any, is printed out.
- The :LINEPROC block :ENDWORD block is interpreted.
After that, the normal text output sequence is clearly in effect. The value of device function %x_address() is updated to reflect the position of the last character printed with this :LINEPROC block :ENDWORD block.
For tag .FIG, when the vertical line is drawn, this happens next:
- Interpret the :LINEPROC block :STARTVALUE block.
- Interpret the :LINEPROC block :FIRSTWORD block, if defined; if not defined, interpret the :LINEPROC block :STARTWORD block.
The value returned by device function %x_address() in these steps refers to the start position for the text itself. Thus, if the left margin requires six spaces, seven characters will be printed (six space characters plus one vertical line character) but the value returned by %x_address() will increase from "6" to "8" without ever having the value "7". The resulting space character appears before the output text, that is, it is treated as normal horizontal positioning. This is followed by the output text itself, printed out almost as if it were a TextLine with one TextChars instance, the difference being that the initial space character is treated as internal horizontal positioning (the :ABSOLUTEADDRESS block, if available, is not used).
There are also two other conclusions that may be drawn:
- It appears that wgml 4.0 does not use the same procedure for outputting all possible text lines.
- It appears that wgml 4.0 keeps track of the widths of characters output which are intended to actually appear in the final document in a way that all text line output sequences can access.
The model provides a variable field, currentState.x_address, which can probably be used to perform this function.
Presumably, the value of currentState.x_address is read by :LINEPROC block :ENDWORD or :ENDVALUE block, whichever comes first, and set to zero at some point after that but before additional characters are output.
Supplemental Tests
When the implicit %textpass() is used, there are no :LINEPROC blocks. The result is entirely consistent with the sequences shown with nothing appearing when the :LINEPROC blocks are interpreted; that is, the text appears but nothing else does.
When font style "bold" was modified to use three :LINEPROC blocks, identical except for the value of n (see the notes on the test setup above), the results showed that the third pass was treated identically to the second pass, except, of course, that the :LINEPROC sub-blocks were from the appropriate pass. Thus, wherever "second pass" is mentioned, the same remarks can be taken to apply to all subsequent passes as well.
When a "plain" :FONTSTYLE block with a :LINEPROC 1 block with no device function %textpass() in its :STARTVALUE block is used, then the first pass only was affected: the TextChars instances using font style "plain" were processed normally, except that neither the horizontal positioning nor the text appeared. Each TextChars instance was clearly present, marked by the :ENDWORD block (the :STARTWORD block appeared or not as indicated above). The values returned by device function %x_address() were updated exactly as they were when device function %textpass() was present; this is why, in the sequences above, the steps involving device function %x_address() and the x_address fields are shown as not depending on whether or not text is actually output.
This is, of course, completely different from the behavior on the second pass, where the TextChars instances using font style "plain" are skipped by either spaces or, if available and more than eight spaces would be needed, the :HTAB block.
Removing device function %textpass() from the first pass, second pass, or both, :LINEPROC blocks of the overprint "bold" :FONTSTYLE produced the same effect: the TextChars instances using font style "bold" worked normally on the affected pass, except that neither the horizontal positioning nor the text appeared.
Horizontal Positioning
The definitions of "initial horizontal positioning" and "internal horizontal positioning" are found here. This section discusses both, as well as the action of device function %dotab(). The reason for this is that there are at least three different patterns with which wgml 4.0 does horizontal positioning.
This is the core supposition of the model with regard to horizontal positioning:
Horizontal positioning only occurs when there is a difference between the values of the fields currentState.x_address and desiredState.x_address.
The history of this section will show that this supposition has varied between using the value of the field desiredState.x_address and the value returned by device function %x_address(). Since the state variables are purely hypothetical, it is difficult to be certain what value they contain at any given point. For normal text line output, at least, the deciding test is to put device function %dotab() into a second or third pass :LINEPROC block :STARTVALUE block: the effect is to do the initial horizontal positioning and change the value returned by device function %x_address() from "0" to the proper position. Thus, for normal text line output, horizontal positioning cannot depend on the value returned by device function %x_address(), leaving the value of the field desiredState.x_address as the best option in terms of the model.
Extensive testing confirms that device function %dotab() always produces horizontal positioning when the core supposition calls for it.
The first pattern is the normal pattern for initial horizontal positioning:
- If the :ABSOLUTEADDRSS block is available, it is used.
- Otherwise, if the :HTAB block is available, it is used if the number of spaces needed would be greater than eight.
- Otherwise, spaces are used.
The second pattern is the normal pattern for internal horizontal positioning:
- If the :HTAB block is available, it is used if the number of spaces needed would be greater than eight.
- Otherwise, spaces are used.
Note that the :ABSOLUTEADDRESS block is not used, even if it is available.
The third pattern is the pattern used with device function %dotab():
- If the :ABSOLUTEADDRSS block is available, it is used.
- Otherwise, spaces are used.
Note that the :HTAB block is not used, ever. This allows the :HTAB block, if desired, to be defined as device function %dotab().
Device function %dotab() will produce horizontal positioning in accordance with the core supposition above in these blocks:
:FONTSTYLE
:STARTVALUE
:ENDVALUE
:LINEPROC
:STARTVALUE
:FIRSTWORD
:STARTWORD
:ENDVALUE
:FONTSWITCH
:STARTVALUE
:ENDVALUE
Device function %dotab() was never observed in the test files to produce horizontal positioning in the :LINEPROC block :ENDWORD block, most likely because, at the point this block was interpreted, the core supposition never called for it to do so.
The initial horizontal positioning can consist of two elements, which can appear separately at the start of the output file:
- The correct value to establish (skip over) the left margin.
- The correct value for any further indentation which must be skipped over.
In this context, the term "indentation" includes at least these cases:
- The layout specifies an indentation for the given line (for example, the first line of a paragraph may have an indentation specified).
- On the second pass, the horizontal spacing from the left margin to the first TextChars instance associated with a :FONTSTYLE block which has a :LINEPROC block defined for that pass is treated as an indentation.
Other contexts may exist.
Switching Fonts
While this might appear to be a fairly simple topic, it turns out to have a few interesting quirks.
The test framework was set up so that each :DEVICEFONT associated a unique :FONTPAUSE and :FONTSWITCH with the font it names, and so that each :DEFAULTFONT associates a unique :FONTSTYLE with the font it names (which has the effect of tying the :DEVICEFONT which names the same font into that :DEFAULTFONT). Each :FONTPAUSE was implemented to increment a symbol and each :FONTPAUSE block each sub-block of each :FONTSTYLE block and each :FONTSWITCH block to print it out as an "Instance" number. This allowed the :FONTPAUSE, :FONTSWITCH, and :FONTSTYLE blocks to be associated with each other unambiguously. The file "default.opt", using the FONT option, was then used to vary this setup for test purposes.
The program context model discussed here is extended in this section to include two flags:
- do_always, which is set by the binary library parsing code; and
- do_now, which is set separately for each font switch.
The following sections contain further details.
The Normal Sequence
The tests performed revealed the actual sequence of events in switching a font. This precondition must be satisfied:
The value returned by device function %font_number() must have been changed to that of the font being switched to before this sequence is applied.
The top-level sequence is:
- Set the value of the flag do_now.
- Perform the switch-from sequence.
- Perform the switch-to sequence.
Thus, it has three sub-sequences.
The sequence for setting the value of the flag do_now is separated out because of its complexity. The "switch-from" and "switch-to" sequences are separated out because device function %enterfont() (explicit or implicit) does the switch-to procedure without a switch-from procedure.
If any of the blocks shown contain device function %dotab(), then the horizontal positioning may occur when they are interpreted, as discussed here.
The sequence for setting the value of the flag do_now is:
- If the font switch involves two distinct :FONTSWITCH instances, set the value of the flag do_now to "true".
- If the font switch involves only one :FONTSWITCH instance, set the value of the flag do_now to the value of the flag do_always.
- If the value of the flag do_now is "false", then set it per the result of the :FONTSWITCH block :STARTVALUE block evaluation.
This is the "switch-from" sequence:
- Interpret the :FONTSTYLE block :ENDVALUE block for the :DEFAULTFONT instance being switched from.
- If the value of the flag do_now is true, then interpret the :FONTSWITCH block :ENDVALUE block for the :DEFAULTFONT instance being switched from.
This is the "switch-to" sequence:
- Interpret the :FONTPAUSE block for the :DEFAULTFONT instance being switched to.
- If the value of the flag do_now is true, then interpret the :FONSTSWITCH block :STARTVALUE block for the :DEFAULTFONT instance being switched to.
- Interpret the :FONTSTYLE block :STARTVALUE block for the :DEFAULTFONT instance being switched to.
The :FONTSTYLE sub-blocks were included because including them in this sequence appears to make more sense than attempting to include them elsewhere. For example, as noted here, at times the :FONTSTYLE block :ENDVALUE block occurs twice, and the second occurrence is clearly part of a font switch while the first is clearly not. Thus, the structure of the output file suggests that they belong to this sequence when this sequence is invoked.
Evaluating Function Blocks
The WGML Reference states in part in Section 15.9.11.2 STARTVALUE Section:
When a switch between two fonts is necessary, the startvalue sections of the two fonts are evaluated. The font switch is only performed if the results of the two evaluations are different.
This is, at best, only partially correct:
- If the :FONTSWITCH instances involved are distinct, then the :FONTSWITCH blocks are always interpreted, without regard to whether their "evaluations" are the same or different.
- Even when the :FONTSWITCH instances are the same, if the 21 flags show that any one of these device functions:
%date() %pages() %time() %wgml_header()
is present in the :FONTSWITCH block :STARTVALUE block, then the :FONTSWITCH blocks are always interpreted, without regard to whether their "evaluations" are the same or different.
The binary device library parsing code is presumed, as part of the model, to set the flag do_always to "true" if any of those four device functions are used, and to "false" if none of them are used.
The remaining seventeen functions whose presence is signaled by a flag are:
%default_width() %font_outname1() %font_outname2() %font_resident() %font_height() %font_number() %font_space() %line_height() %line_space() %page_depth() %page_width() %tab_width() %thickness() %x_address() %x_size() %y_address() %y_size()
When the :FONTSWITCH instances are the same, then each of these, when it causes the only difference between the two "evaluations", does control whether or not the :FONTSWITCH blocks are actually interpreted.
This leaves all Type I and these Type II these device functions to consider:
%add() %decimal() %divide() %getnumsymbol() %getstrsymbol() %hex() %lower() %remainder() %subtract()
If this section is consulted, it will be seen that, for these seven Type II device functions:
%add() %decimal() %divide() %hex() %lower() %remainder() %subtract()
gendev 4.1 folds the entire expression into a single literal parameter, which it compiles as if it were the argument of device function %image(), unless a non-literal parameter which is not one of the seven device functions shown is present in the expression. But the only device functions that can be used as a non-literal parameter are precisely those involved with the 21 flags plus device functions %getnumsymbol() and %getstrsymbol().
Device functions %getnumsymbol()and %getstrsymbol(), when tested in contexts where they returned the same value which was output rather than just tested behaved exactly like the seventeen device functions listed above. The only difference is that their presence in a :FONTSTYLE block :STARTVALUE block cannot be detected by consulting the 21 flags.
Since these evaluations are only done if the same :FONTSWITCH block :STARTVALUE block is used by both the "switch-from" font and the "switch-to" font, and the only difference is the font number (that is, the :DEFAULTFONT and its related :FONT block), it follows that the only device functions that can differ are those discussed here plus device function %font_number(), each of which is associated with a specific one of the 21 flags:
Flag Device Function 01 %font_outname1() 02 %font_outname2() 03 %font_resident() 06 %default_width() 07 %font_number() 16 %font_height() 17 %font_space()
These flags, then, must be made available for use in the evaluation process, the sequence for which has this precondition:
the value of the flag do_now is "false"
and contains these steps:
- Invoke each of the device functions listed above which are present in the :FONTSWITCH block :STARTVALUE block for each font.
- If any pair of results differ, set the value of the flag do_now to "true".
Thus, some of the 21 flags will have to be reported by the binary device parsing code for the :FONTSWITCH block. Those for which no use has been discovered, however, need not be, including all of them for the :FONTSTYLE block, until a use for them is discovered.
This method of evaluation, by avoiding actually interpreting the blocks, also avoids any side effects from such device functions as %dotab(), %enterfont(), or %flushpage(), which might have a negative effect on the output.
Modified Sequence Used With %ulineon()/%ulineoff()
When the character provided by the :UNDERSCORE block is used with device functions %ulineon() and %ulineoff(), then, as noted here if a font name is given for use with the underscore character, then the font switch sequence changes: the last step of the switch-to method, the interpretation of the :FONTSTYLE block :STARTVALUE block, does not occur.
At least, at the moment this appears to be a modified sequence. An alternative is that this step is actually part of the font style application sequence, and that the reason for its omission is to be found there.

