Device Function Language
From Open Watcom
Contents |
Introduction
This page focuses on the device functions as such. Closely related (so closely that some cross-referencing is inevitable) pages are:
- Device File Blocks, which documents where device functions occur in the :PAUSE and :FONTPAUSE blocks;
- Driver File Blocks, which documents where device functions occur in almost every included block; and
- Device Functions, which documents those parts of the binary file format within which the compiled form of the device functions is placed.
From the viewpoint of the page Binary Device Files, then, this page is just another part of the topic it addresses.
However, considered by itself, this page will do much more than document a part of the source and binary formats used by the device library. It will document the device function language, which is to say:
- which device functions are recognized by gendev 4.1 and implemented by wgml 4.0;
- a description of the language they form: how they are grouped, how they interact, how they can be used; and
- a description of the compiled form of these functions.
All of the statements made here were confirmed by actual test. Many of them match statements made in the WGML Reference or the README file produceable from the WGML 3.33 Update and should be taken as confirming the documentation, not as new discoveries. Those which contradict the documentation or which provide information not included in the documentation are new discoveries.
Definitions
In April, 1982, I purchased a book by R. G. Loeliger titled Threaded Interpreted Languages. I then implemented a TIL: not FORTH, the system I was using provided a line editor that reacted to "@" by erasing the text and starting over, but the same sort of language. I managed a cross-compiler (which, in the TIL world, means that it produced stand-alone programs rather than packages that had to be invoked within the interpreter) and an assembler for the Z-80 that used the Zilog instruction formats (i.e., not RPN formats). The latter involved treating the opcodes as verbs, that is, as the names of functions, rather than processing them as data, a technique that may or may not reappear when gendev is written. This was interrupted when I purchased my first "IBM PC" clone, and thus could obtain software with which useful work could be done.
That constitutes my entire experience in compiler theory and practice. If any of the definitions offered here are wrong, please let me know (on the newsgroup) and I will make corrections.
A device function name is a token which gendev 4.1 recognizes as naming a device function and so compiles into the CodeBlock it is creating.
The term value block tag refers to any or all of the tags in this table:
Start Tags End Tags :ENDVALUE :eENDVALUE :ENDWORD :eENDWORD :FIRSTWORD :eFIRSTWORD :FONTVALUE :eFONTVALUE :STARTVALUE :eSTARTVALUE :STARTWORD :eSTARTWORD :VALUE :eVALUE
In addition, the term start tag will be used to refer to the tags in the first column, and the term end tag will be used to refer to the tags in the second column, when it is clear that it is value block tags that are under discussion. These tags are said to correspond when they are on the same line in the table above.
The term function block is used to refer to the device functions placed between a start tag and its corresponding end tag.
The term function sequence is used to refer to a Type I device function and all of its parameters, and all of its parameters' parameters, to whatever depth they exist.
Although gendev is generally said to "encode" the attributes in its source files in the binary files it produces, gendev is said to compile the function blocks into the field function of the corresponding CodeBlock.
While wgml is generally said to "use" or "apply" the data encoded in binary device files, wgml is said to interpret the field function of each CodeBlock it uses.
These codes were intended to be used to refer to the various types of functions block and CodeBlock, and are occasionally, although so far I have found it clearer to cite the exact context instead:
Source Binary Context FB00 CB00 a :VALUE block FB02 CB02 a :FONTVALUE block FB04 CB04 an :ENDVALUE block not within a :LINEPROC block FB05 CB05 a :STARTVALUE block not within a :LINEPROC block FB08 CB08 an :ENDVALUE block within a :LINEPROC block FB09 CB09 a :STARTVALUE block within a :LINEPROC block FB28 CB28 an :ENDWORD block FB29 CB29 a :STARTWORD block FB49 CB49 a :FIRSTWORD block
They consist, of course, of FB or CB followed by the corresponding CodeBlock designator.
The term parameter block is used to refer to either of the two structs identified below used to hold the parameters of device functions that have been compiled.
Two successive %text() functions with literal parameters are compiled as if they were a single %text() function with the parameters concatenated. The %image() device function is compiled the same way. The term merged will be used to refer to this situation; that is, the %text() or %image functions will be said to be "merged".
Some device functions have similar names and are occasionally referred to as a group:
- %binaryN() is used to refer to any or all of
%binary() %binary1() %binary2() %binary4()
- %ifX() is used to refer to any or all of
%ifeqn() %ifeqs() %ifnen() %ifnes()
Device Function List
The device functions given in the WGML Reference are:
%add() %binary1() %binary2() %binary4() %cancel() %clear3270() %clearPC() %date() %decimal() %default_width() %divide() %flushpage() %font_height() %font_number() %font_outname1() %font_outname2() %font_resident() %font_space() %hex() %image() %line_height() %line_space() %page_depth() %page_width() %pages() %recordbreak() %remainder() %sleep() %subtract() %tab_width() %text() %thickness() %time() %wait() %wgml_header() %x_address() %x_size() %y_address() %y_size()
The additional device functions given in the README file produceable from the WGML 3.33 Update are:
%endif() %getnumsymbol() %getstrsymbol() %ifeqn() %ifeqs() %ifnen() %ifnes() %lower() %setsymbol()
Note: the README actually shows "%endif" (no parentheses), however, all instances shown are "%endif()" and, in fact, removing the parentheses produces this note:
String is = <endif>
and this error message:
DF--001: Unrecognized device function tag
from gendev 4.1. Clearly, "%endif()" is the correct form.
The research program findfunc.exe identified these completely undocumented functions in the :DEVICE and :DRIVER blocks available to me:
%binary() %dotab() %enterfont() %textpass() %ulineoff() %ulineon()
%binary() was not, in fact, found with findfunc, but was discovered when research was done to see if %binary2() or %binary4() were used anywhere.
It is, of course, possible that others are recognized by gendev 4.1 and wgml 4.0 but, since they are neither documented nor used, they cannot be identified. Well, provided I've found all the device functions used in the source files available to me.
Grammar
I am using (or abusing) the term "grammar" to refer to all aspects of the source form of the device function language.
Available Documentation
The WGML Reference provides basic information about the device functions it documents.
The README file produceable from the WGML 3.33 Update provides one-line descriptions of the additional device functions it lists.
Additional sources of documentation do exist. The first source consists of error messages listed in The WGML Reference. The second source consists of :CMT. lines in the various source files available to me.
Orthography
This section deals with how the language is written. These rules apply:
- Only alphabetic letters, the underscore character, and numbers are used in device function names.
- Only 7-bit ASCII character encodings are used.
- No device function names contain spaces.
- Each alphabetic letter can be in upper or lower case.
- Once the start tag of a function block has been seen, no other tags may appear except the end tag which, of course, terminates the function block. In particular, neither :CMT. nor :INCLUDE may appear.
- Each ( must be matched by a ). An end tag does not close open parentheses.
- Multiple parameters to the same device function must be separated by commas.
- Spaces may not be used between the device function name and the preceding %.
- Spaces may not be used between the device function name and the following (.
- Whitespace can be used between device function names and, within the parentheses, before and/or after the parameters. This allows function blocks to be written on multiple lines, if desired.
Rules 1, 2, and 3 are based entirely on examination of the known device function names.
Rule 4 is based on actual testing with gendev.
With regard to rule 5, a typical error message (one exists for each end tag) produced by gendev when a tag is encountered which is not the expected end tag is:
SN--046: Expecting :evalue tag
With regard to rules 6 and 7, omitting a closing parenthesis produces this note:
Parameter = recordbreak , Tag = image
and this error message from gendev:
DF--005: Commas must separate device function parameters
when a device function is encountered next and the error message
DF--004: Not a valid character in a device function
(presumably a reference to :) when the end tag is encountered next.
With regard to rules 8, 9, and 10, this example is offered:
When one of the :VALUE blocks in my test.pcd file,
%image("*** START PAUSE block.")
%recordbreak()
is modified to be
%image ("*** START PAUSE block.")
%recordbreak()
then gendev issues this note:
String is = <image ("*** START PAUSE block.")%recordbreak()>
and this error message from gendev:
DF--001: Unrecognized device function tag
If the first line is modified to:
% image("*** START PAUSE block.")
then the same error results with, of course, a slightly different note:
String is = < image("*** START PAUSE block.")%recordbreak()>
from which these conclusions can be drawn:
- since % does not appear in the string shown in the note, it is not part of the device function name but instead marks where the device function name begins;
- the string shown in the note extends to the end of the function block; this is probably done to make the location of the error as clear as possible; and
- whitespace is generally allowed between device function names and within the parentheses before and/or after the parameter.
It is, of course, true that rules 8 and 9 may reflect the fact that none of the device function names actually used begins or ends with a space, rather than that they cannot begin or end with a space. It is simply not possible to distinguish the two cases.
Device Function Types
The WGML Reference distinguishes two types of device functions with respect to whether or not they directly produce output to the device:
The result of some device functions will be used as final values for the sequence being defined. A final value is sent directly to the output device. Some of the device functions produce results which are not suitable for use as a final value. The result of this type of function must be supplied as a parameter value to a device function which can produce a final value.
The function-by-function documentation then identifies these functions as producing "results which are not suitable for use as a final value":
%add() %date() %decimal() %default_width() %divide() %font_height() %font_number() %font_outname1() %font_outname2() %font_resident() %font_space() %hex() %line_height() %line_space() %page_depth() %page_width() %pages() %remainder() %subtract() %tab_width() %thickness() %time() %wgml_header() %x_address() %x_size() %y_address() %y_size()
It might be thought that the remaining functions:
%binary1() %binary2() %binary4() %cancel() %clear3270() %clearPC() %flushpage() %image() %recordbreak() %sleep() %text() %wait()
all produce a "final value". This, however, is not the case: the only characteristic they have is common is that they cannot be used as parameters to other functions.
These are the functions documented to produce a "final value":
%binary1() %binary2() %binary4() %image() %text()
These are documented to have no effect when used in a :DRIVER block, but only in a :DEVICE block:
%clear3270() %clearPC() %wait()
which leaves functions without an explicit grouping:
%cancel() %flushpage() %recordbreak() %sleep() %wait()
although they could be considered "control functions".
The WGML Reference then makes this statement:
Prior to transmitting the device function sequences to the output device, WATCOM Script/GML translates each character of the sequence into another character. The translation values are defined in the font definitions used with the device. Some of the device functions produce final values which will not be translated.
In actual fact, per the documentation, only the %text() function's output is translated. Of course, of the other functions which actually produce a final value, translating the output of %binary1(), %binary2(), and %binary4() (which insert uint8_t, uint16_t, and uint32_t values, respectively) would not make much sense. The only other such function is %image() and, in fact, the only documented difference between %image() and %text() is that the result of the %image() function is not translated, while that of %text() is.
Based on the above, these device function types can be distinguished:
- Type Ia device functions produce final values.
- Type Ib device functions are used to control the process.
- Type Ic device functions are used for user interaction.
- Type II device functions can only be used as arguments to another device function.
Investigation of the Type II device functions suggests these sub-types:
- Type IIa device functions are used for mathematical operations.
- Type IIb device functions provide values from a :DEVICE, a :DRIVER, or a :FONT block.
- Type IIc device functions provide formatting.
- Type IId device functions do various other things.
To categorize all of the functions, not just those given in the WGML Reference, I took advantage of the fact that wgml executes the CodeBlocks produced from the :INIT block with the value "start" for the attribute place almost immediately, even before looking for the document specification file. wgml can thus be used to produce the output of any device function or combination of device functions allowed in an :INIT block very easily.
Starting with a complete list of all device functions, these caused gendev to emit this message:
DF--008: This tag at start of device function sequence is invalid
thus showing that they are Type II functions:
%add() %date() %decimal() %default_width() %divide() %font_height() %font_number() %font_outname1() %font_outname2() %font_resident() %font_space() %getnumsymbol() %getstrsymbol() %hex() %line_height() %line_space() %lower() %page_depth() %page_width() %pages() %remainder() %subtract() %tab_width() %thickness() %time() %wgml_header() %x_address() %x_size() %y_address() %y_size()
That leaves these as the Type I functions:
%binary %binary1() %binary2() %binary4() %cancel() %clear3270() %clearPC() %dotab() %endif() %enterfont() %flushpage() %ifeqn() %ifnen() %ifeqs() %ifnes() %image() %recordbreak() %setsymbol() %sleep() %text() %textpass() %ulineoff() %ulineon() %wait()
Dividing them into the three subtypes given above required the use of both gendev and wgml, and, in the process, produced the initial information on sequencing and function signatures. The functions of each sub-type are:
Type Ia ("final"):
%binary() %binary1() %binary2() %binary4() %image() %text()
Type Ib ("control"):
%cancel() %dotab() %endif() %enterfont() %flushpage() %ifeqn() %ifnen() %ifeqs() %ifnes() %recordbreak() %setsymbol() %sleep() %textpass() %ulineoff() %ulineon()
Type Ic ("user interaction"):
%clear3270() %clearPC() %wait()
Type IIa ("math"):
%add() %divide() %remainder() %subtract()
Type IIb ("device info"):
%default_width() %font_height() %font_number() %font_outname1() %font_outname2() %font_resident() %font_space() %line_height() %line_space() %page_depth() %page_width() %pages() %tab_width() %thickness()
Type IIc ("formatting"):
%decimal() %hex() %lower()
Type IId ("other"):
%date() %getnumsymbol() %getstrsymbol() %time() %wgml_header() %x_address() %x_size() %y_address() %y_size()
Device Function Signatures
This is the information accumulated as a result of investigating other topics. The binary file was examined to make some of these determinations.
A few notes on parameters:
- gendev does object if it does not find the required number of parameters;
- gendev does not object if it finds more than the required number of parameters;
- gendev only compiles the required number of parameters, moving from left to right; additional parameters are ignored (at least for %emit());
- random tests showed functions accepting strings or numbers without regard to the documented requirements.
There are five return value/parameter types:
- numeric, that is, a sequence of digits; for a literal parameter, either decimal or, if preceeded by $, hexadecimal; for a return value, some suitable integer;
- uint8_t, uint16_t and uint32_t are used for one-byte, two-byte and four-byte integers (the two-byte and four-byte integers are little-endian);
- character, that is, a sequence of characters; for a literal parameter, it is enclosed in delimiters;
- symbol, which is the same as character but is used as the name of a user-defined symbol rather than as a value; and
- void, which, as usual, means that the function takes no parameter or returns no value, depending on where it is used.
This is for the Type I device functions. "Returns" indicates what, if anything, is inserted into the output buffer. "Parameters" reflects the documented types.
Function Returns Parameters Side Effects binary uint8_t numeric binary1 uint8_t numeric binary2 uint16_t numeric binary4 uint32_t numeric cancel void character clear3270 void void clearPC void void dotab void void endif void void enterfont void numeric invoke :FONTSWITCH flushpage void void ifeqn void numeric, numeric ifnen void numeric, numeric ifeqs void character, character ifnes void character, character image character character recordbreak void void flushes the buffer setsymbol void symbol, character sleep void numeric hangs gendev text character character textpass void void insert current text ulineoff void void wgml undercore off ulineon void void wgml underscore on wait void void
This is for the Type II device functions. "Returns" indicates what the function returns. "Parameters" reflects the documented types.
Function Returns Parameters Side Effects add numeric numeric, numeric date character void decimal character numeric default_width numeric void divide numeric numeric, numeric font_height numeric void font_number numeric void font_outname1 character void font_outname2 character void font_resident character void font_space numeric void getnumsymbol numeric symbol getstrsymbol character symbol hex character numeric line_height numeric void line_space numeric void lower character character page_depth numeric void page_width numeric void pages numeric void remainder numeric numeric, numeric subtract numeric numeric, numeric tab_width numeric void thickness numeric void time character void wgml_header character void x_address numeric void x_size numeric void y_address numeric void y_size numeric void
The return types map to specific output-related functions that can accept that type of parameter:
Return Type Useable With
character %image(), %lower(), %text()
numeric %add(), %binary(), %binary1(), %binary2(),
%binary4(), %decimal(), %divide(), %hex(),
%remainder(), %subtract()
Ultimately, of course, a Type I function must be used to actually insert bytes into the output buffer. By using %decimal() or %hex() with functions having numeric return values, %image() and %text() can insert the result of any Type II function. The functions %binary(), %binary1(), %binary2(), and %binary4(), in contrast, only work correctly with functions providing numeric return values: when used with character values, for example, %wgml_header() ("V4.0 PC/DOS"), they produce:
Function Sequence Decimal Result Hex Result %image(%decimal(%wgml_header())) 394944 606C0 %image(%hex(%wgml_header())) 395040 60720 %text(%decimal(%wgml_header())) 395136 60780 %text(%hex(%wgml_header())) 395232 607E0
It appears possible that wgml 4.0 is treating a char * as if it were a uint16_t.
Device Function Notes
This section contains the notes pertaining to gendev 4.0. The page Device Function Notes contains the notes pertaining to wgml 4.1.
The initial version of our gendev will have to behave in a very similar manner to gendev 4.1, the primary difference being the emission of error messages giving useful information indicating what the problem encountered was instead of relying on "Abnormal program termination" messages which do not give any useful indication of the problem. Also, it should not hang when device function %sleep() is encountered.
General Rule
The general rule for gendev 4.1 is this:
- Extensive testing has shown that gendev 4.0 will allow any device function to occur in any block, except as noted below.
%sleep()
The function %sleep(), when used with a literal parameter, hangs gendev (whether DOS version 3.33, DOS version 4.1, or OS/2 version 4.1), and peaks the processor per Task Manager (Windows XP)/System Activity Monitor (OS/2)! This function is documented to cause "WATCOM Script/GML to suspend document processing for the specified number of seconds". None of the source files available to me use this function.
When used in this way:
%setsymbol("fred","1")
%sleep(%getnumsymbol("fred"))
gendev 4.1 successfully produces the binary file. wgml 4.0, however, then proceeds to hang!
There is additional information here. It records a coding error on the part of gendev 4.1 when compiling %sleep() with a non-literal parameter which may, or may not, explain why wgml 4.0 hangs.
%textpass(), %ulineoff, and %ulineon()
Extensive testing has shown that these functions cannot be used anywhere else than in a :LINEPROC block within a :FONTSTYLE block.
Within a :LINEPROC block within a :FONTSTYLE block, these functions can appear in these sub-blocks:
- %textpass() is only allowed in :STARTVALUE blocks;
- %ulineon() is allowed in :STARTVALUE, :FIRSTWORD, and :STARTWORD blocks but not in :ENDWORD or :ENDVALUE blocks; and
- %ulineoff() is allowed in :FIRSTWORD, :STARTWORD, :ENDWORD, and :ENDVALUE blocks, but not in :STARTVALUE blocks
These rules apply to their interaction:
- At most one %textpass() can be used in :LINEPROC block.
- Neither %ulineon() nor %ulineoff() can be used in the same function block as %textpass().
- %ulineon() must be found before %ulineoff() in the same :LINEPROC block.
- If %ulineon() is found, then %ulineoff() must be present in the same :LINEPROC block.
When a %textpass() function is used where it is not allowed, this message is emitted by gendev:
SN--023: Invalid location for a TEXTPASS directive
When more than one %textpass() function is used in the the :STARTVALUE block of a :LINEPROC block, this error message results:
SN--024 More than one TEXTPASS directive specified in :lineproc
When a %ulineoff() function is used where it is not allowed, this message is emitted by gendev:
SN--052 Invalid location for a ULINEOFF directive
When a %ulineon() function is used where it is not allowed, this message is emitted by gendev:
SN--051 Invalid location for a ULINEON directive
When %textpass() and either (or both) of %ulineon() or %ulineoff() occur in the same function block, gendev presents this message:
SN--029 Both a TEXTPASS directive and a ULINEON or ULINEOFF found in a :lineproc
If %ulineoff() is found anywhere in a :LINEPROC block and no %ulineon() preceeds it in that same :LINEPROC block (they do not have to be in the same sub-block), then this message results:
SN--060 Expecting a ULINEON directive before a ULINEOFF
If %ulineon() is found anywhere in a :LINEPROC block and no %ulineoff() follows it in that same :LINEPROC block (they do not have to be in the same sub-block), then this message results:
SN--061 No corresponding ULINEOFF directive for a ULINEON
It is their characterization as "directives" in the messages shown that suggested the name "Directive" for one of the structs found in the parameter block.
Memory Protection Faults
This section discusses the situations in which gendev 4.1 emits the message
Abnormal program termination: Memory protection fault
and halts.
This occurs when these device functions:
%add() %divide() %remainder() %subtract()
are used with two literal arguments directly with %image() and %text. The specific function sequences tested were:
%image(%add(1,2)) %image(%divide(2,1)) %image(%remainder(1,2)) %image(%subtract(2,1)) %text(%add(1,2)) %text(%divide(2,1)) %text(%remainder(1,2)) %text(%subtract(2,1))
These function sequences resulted in the Memory protection fault in these blocks:
START :PAUSE DOCUMENT :PAUSE DOCUMENT_PAGE :PAUSE DEVICE_PAGE :PAUSE
Binary Output Values
This section discusses the output values of these device functions:
%binary() %binary1() %binary2() %binary4()
If the value passed to %binary() requires more than one byte to express it, then only the lower-order byte is used by gendev when the function is compiled. %binary1() behaves the same way.
%binary2() is used in a few drivers, but only with the parameter "0", producing two null bytes. Testing shows that will encode two-byte values, in little-endian form. If given a value that requires more than two bytes, only the lower-order two bytes are encoded.
%binary4() is not used, so far as I can tell, in the source files available to me. Testing shows that gendev emits this error message
SN--001: Number is too large or contains invalid characters
for values above "$7FFFFFFF". Testing also showed that "$7FFFFFFF" was compiled into "0xFFFFFFFF". Additional testing only increased my confusion, and there is no point to it anyway, since %binary4(), as noted above, is not used.
Consideration should be given to adding a new Type I device function, %nulls(), which would take a numeric parameter and generate the indicate number of nulls, if gendev/wgml is ever released for general use.
Compiled Form
Introduction
When I first encountered the compiled form of the function block, I looked at this very simple :VALUE. block:
:value. %text( "Just a test of the DOCUMENT PAUSE block." ) :evalue.
but I presented it like this:
2C 00 FF FF 00 16 25 00 J u s t a t e s t o f t h e S T A R T P A U S E b l o c k .
This turned out to be a mistake, one of many made in the course of my investigation: although 0x002C was clearly the length, I did not actually show 0x002C bytes. I was also misled by the use of "0x00" to designate a CodeBlock compiled from a :VALUE block into believing that two nulls followed the CodeBlock.
A more accurate picture of the encoding can be given by enclosing the data bytes within a tabular array:
2C 00
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0000 FF FF 00 16 25 00 J u s t a t e s
0010 t o f t h e S T A R T P
0020 A U S E b l o c k . 00
This made it clear that, for device function %text() with an explicit string parameter, the CodeBlock ends with a "0x00" byte. When I began analysing the :DRIVER block, I quickly realized that the byte after this "0x00" byte was, in fact, a designator for the next CodeBlock (if there was a next CodeBlock).
I also drew these conclusions:
- The function %text() is compiled as "0xFF 0xFF 0x00 0x16".
- The value "0x25 0x00" is a count of the number of bytes in the parameter of the function %text(), but not the final "0x00".
The second conclusion is easier to verify if the same format is applied to the parameter:
25 00
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0000 J u s t a t e s t o f t
0010 h e S T A R T P A U S E b
0020 l o c k .
I then considered a more complicated (and more realistic, for a :PAUSE block) :VALUE. block:
:value. %text( "Just a test of the DOCUMENT PAUSE block." ) %text( "Press enter to start the document." ) %recordbreak() %wait()%clearpc() :evalue.
which, presented in the same format as before, is compiled as:
5D 00
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0000 4E 00 00 16 4A 00 J u s t a t e S
0010 t o f t h e D O C U M E N
0020 T P A U S E b l o c k . P r
0030 e s s e n t e r t o s t a
0040 r t t h e d o c u m e n t .
0050 00 01 00 00 01 01 00 00 25 FF FF 00 1E
These conclusions appear to be drawable:
- The strings in the two %text() function invocations are concatenated into a single parameter.
- The function %text() now appears to be compiled as "0x4E 0x00 0x00 0x16": thus, the actual encoding may by "0x00 0x16", that is, two bytes rather than four.
- If the "0x4E 0x00" is a count, it counts the same bytes as the "0x4A 0x00", just including "0x00 0x16 0x4A 0x00" in the count. In particular, it does not count the "0x00" after the string.
- If the "0x00" after the string is taken as part of the encoding of function text and the encoding is taken as four bytes, the remaining encodings are:
01 00 00 01 01 00 00 25 FF FF 00 1E
It was suggested by my co-implementor that:
- Perhaps the "0xFFFF" string marks the last or only function.
When I first tested a Type II function, a further clarification resulted. The :VALUE block is:
:value. %text( "Just a test of the INIT block." ) %text( %font_outname1() ) %recordbreak :evalue.
and the result, displayed as above, is:
46 00
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0000 22 00 00 16 1E 00 J u s t a t e s
0010 t o f t h e I N I T b l
0020 o c k . 00 1A 00 10 16 FF FF 0D 00 FF FF 00
0030 00 00 00 00 00 37 00 00 FF FF FF FF 00 00 00 00
0040 00 00 FF FF 00 01
This illustrates two things:
- "0x10 0x16" is a %text() device function with a single-parameter parameter block.
- Parameter blocks, even when the parameter is a Type II function, are quite different from the struct used for Type I functions.
If the error message used to identify Type II device functions is recalled:
DF--008: This tag at start of device function sequence is invalid
it is now possible to characterize a function block as:
a linked list of device function sequences
where each sequence is headed by a Type I device function.
The struct used for this linked list is:
FunctionList {
uint16_t offset;
uint8_t parameter_type;
uint8_t byte_code;
}
The field offset counts the number of bytes from the field byte_code to the first byte of the next offset. This field will have the value "0xFFFF" when the last compiled function is reached.
The following sections discuss the values of the fields parameter_type and byte_code.
Two notes on investigative technique:
- In many cases, where the visible test was not clear enough as to what was going on, wgml's screen output was redirected to a text file and that file was examined using wdump.
- In those same cases, the output file was also examined using wdump.
Literal Parameters
This section discusses literal parameters and how they are treated by gendev when it is compiling a function block.
For %image() and %text(), this is quite simple: no parameter block is present, instead an instance of this struct appears:
CharParameter {
uint16_t count;
char data[count];
char null = 0x00;
}
It might be thought that having both a count and a null terminator is a bit much, but, as will be seen shortly, the value of field data can include nulls, so the value of field count must be used to extract the data.
This may not apply to %text(), since text editors don't generally allow non-characters to be entered into strings; however, once a function is written to process a CharParameter for output for %image(), that function will probably be used for %text() as well. For one thing, such a function must take output record length into account as well; it will not be a simple "write the chars and return" function.
A CharParameter used with %image() can also contain null bytes in its data field, but for a very different reason.
Consider this function block:
:value %binary(3) %binary1(4) %binary2(5) %binary4(6) :evalue
gendev compiles this into:
FF FF 00 15 08 00 03 04 05 00 06 00 00 00 00
which is indistinguishable from a compiled %image() device function.
Although the exact process cannot be reconstructed, it can be conceptualized as having two steps:
- each of the %binaryN() functions was convertd to the compiled form of an %image() function with the correct byte(s) in its CharParameter; then
- since the resulting forms met the criteria for being merged, they were merged into a the compiled form of a single %image() function whose CharParameter had, for the value of its data field, the merged values of the data fields of all of the CharParemeter blocks formed in the first step.
This preprocessing of literals by gendev extends to several other cases. Note: although all tests were performed with %image(), %text() very likely behaves the same way.
The Type II device function %lower() is documented in the README file produceable from the WGML 3.33 Update by the statement "returns the lower case of the string" (where "the string" is its parameter). When the parameter is a literal string, this is actually done by gendev, so that both of these invocations:
%image(%lower("SUZY"))
%image("suzy")
are compiled as:
08 00 00 15 04 00 73 75 7A 79 00 suzy
The Type II device functions %hex() and %decimal() convert their numeric parameters into hexadecimal and decimal character representations (respectively). If they are, in turn, used as the parameter of %image(), then that representation is treated as a character literal. Thus, both of these invocations:
%image(%hex(15))
%image("f")
are compiled as:
05 00 00 15 01 00 66 00 f
and both of these invocations
%image(%decimal(15))
%image("15")
are compiled as:
06 00 00 15 02 00 31 35 00 15
If the parameter is a character literal, something strange happens:
%image(%hex("fred"))
09 00 00 15
05 00 31 37 37 35 63 00 1775c
The parameter is, of course, supposed to be numeric.
The type II device functions %add(), %divide(), %remainder(), and %subtract() take two numeric parameters and produce a numeric result. If the parameters are both literals, then the result is treated as a literal. So far as I can tell, this applies to any level of inclusion. For example, the invocation
%image(%decimal(%add(3,%add(3,2))))
is compiled as:
05 00 00 15 01 00 38 00 8
since 3 + 3 + 2 is 8. Similarly, the invocation
%binary(%add(%add(3,%add(3,2)),15))
is compiled as:
05 00 00 15 01 00 17 00 0x17
since 15 + 8 is 23 in decimal, which is 17 in hexadecimal notation.
This behavior has implications for parsing these functions, not for wgml, which can simply emit the characters in the CharParameter for each %image() function it encounters, but for the research programs copparse and cfparse, which are intended to be completed with a parser that provides enough information to reconstruct the source code for a given binary device file.
For copparse and cfparse, this will probably turn out to be the best method possible:
- Non-character bytes will be presented as arguments to %binary().
- Character bytes will be presented, as strings, as arguments to %image() or to %text(), depending on which binary code (0x15 or 0x16) is present.
- It will not be possible to distinguish such invocations as "%image(%decimal(%add(3,%add(3,2))))" from the form in which they will be reported, "%image('8')".
- It will not be possible to distinguish such invocations as "%image(%lower("FRED")) from the form in which they will be reported, %image("fred").
So the goal has to be, not to reproduce the source file (impossible in any case since :CMT. lines are not compiled), but to produce a source file which, when processed by gendev, produces the same binary file which copparse or cfparse analysed.
Parameter Blocks
These are the values of field FunctionList.parameter_type which have been seen and their inferred meanings:
- 0x00 indicates no parameter block at all.
- 0x10 indicates that a parameter block is present.
The values shown do not fully determine the behavior of wgml: while "0x10" shows that a the parameter block is present, "0x00" does not mean that no parameter is present, only that no parameter block exists, as can be seen from the examples above using %text(). Thus, wgml must use its own knowledge of the %image() and %text() to determine what to expect:
Parameter Type Function wgml Action 0x00 %image() expect character parameter only 0x00 %text() expect character parameter only 0x00 other expect no parameter 0x10 any expect parameter block
The remainder of this section discusses parameter block structure. It only applies in cases where the Parameter Type is "0x10".
If a character parameter is given in the source, then the CharParameter struct, discussed in Literal Parameters, is used as part of the parameter block. Several other structs are also useful in understanding parameter block structure.
The following discussion is based on limited testing. When the parsing code is written, it will be modified as needed.
The first struct encountered is a Parameter struct:
Parameter {
uint16_t offset1;
uint16_t offset2;
uint16_t offset3;
uint16_t offset4 = 0x0000;
}
Discussion of how these fields are used must be postponed until the full structure of the parameter block has been examined.
Each parameter block begins with a header:
ShortHeader {
Parameter parameter;
}
or:
LongHeader {
Parameter parameter;
uint16_t value = 0x0000;
uint16_t nulls = 0x0000;
}
Each parameter is encoded in this struct:
Directive {
char op_code;
Parameter parameter;
uint16_t value;
uint16_t nulls = 0x0000;
}
Now the various fields will be discussed.
The field nulls contains two null bytes under all tested device function sequences.
The value of the field value will always be "0x0000" in a LongHeader instance; in Directive instances where the parameter was not given as a numeric literal, its value will also be "0x0000", but when a literal numeric parameter was used, then it will contain the value of that literal.
The values observed for the field op_code are:
- 0x00 if a literal character parameter was given, in which case a CharParameter is appended to the LongData instance with the value of that parameter.
- 0x3C if a literal numeric parameter was given, in which case the field value in the LongData instance contains the value of that parameter.
- other values are the byte code of the Type II device function which was used as the parameter.
The ShortHeader struct has only been seen used correctly when the first parameter is a character string. If there are no parameters, or there is only one parameter, whether it is a character string or not, or if the second parameter is a character string but the first is not, then the LongHeader struct is used. When device function %sleep() with a non-literal parameter was set up for interpretation, however, it turned out that, in this case, a ShortHeader is used. As noted below, however, it was encoded in such a way that the interpreter expected a LongHeader until it was hacked to ignore the value encoded (0x000D) and use the correct value (0x0009) instead.
Now the fields in struct Parameter can be characterized. It may be helpful to present a table of multiples for Directive instance lengths, since these are the values used in the illustrations:
Nr of Instances Total Length 0 0x0000 1 0x000D 2 0x001A 3 0x0027 4 0x0034 5 0x0041 6 0x004E 7 0x005B
To illustrate the use of the field Parameter.offset1, consider this schematized compiled function sequence (the Parameter.offset1 fields are emphasized):
Function sequence: %image(%add(%subtract(%line_space(),%font_number()),%remainder(%line_space(),%font_number()))) %image(): 10 15 LongHeader: FF FF 0D 00 FF FF 00 00 00 00 00 00 Level 1: 0E 00 00 1A 00 41 00 00 00 00 00 00 00 Level 2: 0F 0D 00 27 00 34 00 00 00 00 00 00 00 Level 3: 34 1A 00 FF FF FF FF 00 00 00 00 00 00 Level 3: 28 1A 00 FF FF FF FF 00 00 00 00 00 00 Level 2: 11 0D 00 4E 00 5B 00 00 00 00 00 00 00 Level 3: 34 41 00 FF FF FF FF 00 00 00 00 00 00 Level 3: 28 41 00 FF FF FF FF 00 00 00 00 00 00
This suggests that the field Parameter.offset1, when not "0xFFFF", when added to the start of the first Directive following the parameter block locates the first byte of the Directive instance representing the function of which the current Directive instance represents a parameter.
These values of Parameter.offset1 appear to have special meanings:
- 0xFFFF indicates that this is a ShortHeader or LongHeader instance;
- 0x0000 indicates that this is a parameter of the Type I device function standing at the head of the function sequence.
To illustrate the use of the field Parameter.offset2, consider this schematized compiled function sequence (the Parameter.offset2 fields are emphasized):
Function sequence: %image(%add(%subtract(%line_space(),%font_number()),%remainder(%line_space(),%font_number()))) %image(): 10 15 LongHeader: FF FF 0D 00 FF FF 00 00 00 00 00 00 Level 1: 0E 00 00 1A 00 41 00 00 00 00 00 00 00 Level 2: 0F 0D 00 27 00 34 00 00 00 00 00 00 00 Level 3: 34 1A 00 FF FF FF FF 00 00 00 00 00 00 Level 3: 28 1A 00 FF FF FF FF 00 00 00 00 00 00 Level 2: 11 0D 00 4E 00 5B 00 00 00 00 00 00 00 Level 3: 34 41 00 FF FF FF FF 00 00 00 00 00 00 Level 3: 28 41 00 FF FF FF FF 00 00 00 00 00 00
This suggests that the field Parameter.offset2, when not "0xFFFF", when added to the start of the parameter block locates the first byte of the Directive instance representing the first parameter of the function represented by the current Directive instance.
The value of Parameter.offset2 in a ShortHeader instance is "0x0009", matching the number of bytes in a ShortHeader instance, just as the value "0x000D" found in a LongHeader matches the number of bytes in a LongHeader. These are the only two values observed in either type of header. As noted above, device function %sleep(), with a non-literal parameter, is encoded using a ShortHeader but with a value of "0x000D" for Parameter.offset2, which is, of course, wrong and may explain why device function %sleep() hangs wgml 4.0.
To illustrate the use of the field Parameter.offset3, consider this schematized compiled function sequence (the Parameter.offset3 fields are emphasized):
Function sequence: %image(%add(%subtract(%line_space(),%font_number()),%remainder(%line_space(),%font_number()))) %image(): 10 15 LongHeader: FF FF 0D 00 FF FF 00 00 00 00 00 00 Level 1: 0E 00 00 1A 00 41 00 00 00 00 00 00 00 Level 2: 0F 0D 00 27 00 34 00 00 00 00 00 00 00 Level 3: 34 1A 00 FF FF FF FF 00 00 00 00 00 00 Level 3: 28 1A 00 FF FF FF FF 00 00 00 00 00 00 Level 2: 11 0D 00 4E 00 5B 00 00 00 00 00 00 00 Level 3: 34 41 00 FF FF FF FF 00 00 00 00 00 00 Level 3: 28 41 00 FF FF FF FF 00 00 00 00 00 00
This suggests that the field Parameter.offset3, when not "0xFFFF", when added to the start of the parameter block locates the first byte of the Directive instance representing the second parameter of the function represented by the current Directive instance.
The field Parameter.offset4 is always 0x0000. It was so-named because it is tempting to think that it was intended to provide the offset to a third parameter, but is never used for that purpose because no device function, in its compiled form, has more than two parameters.
Device Function Code Bytes
This table lists the code bytes found for all known device functions:
Device Function Code Byte %add() 0E %binary() 09 %binary1() 09 %binary2() 0A %binary4() 0B %cancel() 24 %clear3270() 1F %clearPC() 1E %date() 3B %decimal() 0C %default_width() 27 %divide() 10 %dotab() 23 %endif() 1C %enterfont() 06 %flushpage() 1D %font_height() 31 %font_number() 28 %font_outname1() 37 %font_outname2() 38 %font_resident() 39 %font_space() 32 %getnumsymbol() 12 %getstrsymbol() 13 %hex() 0D %ifeqn() 1A %ifeqs() 18 %ifnen() 1B %ifnes() 19 %image() 15 %line_height() 33 %line_space() 34 %lower() 14 %page_depth() 2A %page_width() 2B %pages() 35 %recordbreak() 01 %remainder() 11 %setsymbol() 17 %sleep() 26 %subtract() 0F %tab_width() 29 %text() 16 %textpass() 20 %thickness() 30 %time() 3A %ulineoff() 22 %ulineon() 21 %wait() 25 %wgml_header() 36 %x_address() 2C %x_size() 2E %y_address() 2D %y_size() 2F
When the list is sorted by byte code, several ranges of values are missing:
- 0x00 before %recordbreak();
- 0x02 through 0x05 between %recordbreak() and %enterfont(); and
- 0x07 and 0x08 between %enterfont() and %binary()
The highest value is "0x3B"; however the values "0x00" and "0x3C" are used to designate literal parameters in the Directive struct discussed in the section on parameter blocks.
Projected Code for wgml
When I started working on the binary device file format, I did not realize that it would make sense to create the Binary Device File Subsystem after finishing work on the format itself and before investigating the Device Functions as such. Even when various characteristics of that code occurred to me, I did not write them down and, as a result, certain minor corrections had to be done while reviewing the Wiki before starting on the Device Function Language.
This section contains my initial thoughts on the code to be produced after the Device Function Language has been investigated. It will be updated as new ideas or details occur. What code is actually produced will only be determined when the time comes, but this should help ensure that I remember everything I thought might be done.
The Minimum
This has, in fact, been accomplished: cfparse.exe and copparse.exe now display the compiled device functions in the form of the device functions compiled by gendev to create it.
This allows the source for a binary device file to be reconstructed subject to these limitations:
- the comments are unrecoverable as they do not appear in the binary format;
- multiple :INTRANS, :OUTTRANS, and :WIDTH blocks appear in the binary file as single blocks, so the reconstruction will only show (at most) one block of each type;
- included blocks in the :DRIVER block have no prescribed order (other than having to appear after all of the attributes) and so their order in the original source file cannot be completely reconstructed;
- some compiled %image() functions were actually compiled from those function sequences involving literal parameters which are discussed here; these are reported as a mixture of %binary() and %image() functions but the full complexity of the source cannot be recovered.
The reconstructed source file will, however, produce the same binary file when processed by gendev as the file that was analysed.
Device Function API
The parser, developed from the initial version already done, can be implemented by implementing a set of C functions which perform the appropriate action.
It has become clear that almost every function will, in fact, be implemented. The only exceptions are device functions %binary2() and %binary4(), which never appear in compiled form. The use of device function %binary2() suggests that replacing it with a device function %nulls() would be a useful enhancement.
Some of these functions presuppose a specific environment. Thus, the function %font_outname1() clearly presuppose that some sort of font-related struct, which contains an entry for the attribute font_out_name1 is available when the function is executed. More subtly, perhaps, the function %recordbreak(), which flushes the output buffer, presupposes not only the existence of an output buffer but also that these functions directly (if not exclusively) control it. Characterization of this environment, that is, of the things that must exist or that must be provided for these functions to work at all, should help in guiding the development of the part of wgml that actually does the output.
Other items may occur. Feel free to point them out.
:DEVICE Blocks
These notes build on the material given here to produce guidelines for interpreting the compiled device functions in :DEVICE blocks. This material will be reshaped as appropriate as development of the page Device Function Notes proceeds.
The goal here is to produce as consistent and sensible an interpreter as possible. It will not necessarily do everything as wgml 4.0 does, but, for the device files in the Open Watcom repository, it should behave exactly as wgml 4.0 does.
Fortunately, for the Open Watcom document build system, wgml 4.0 does exactly nothing with device functions in :DEVICE blocks because neither the WHELP nor the PS :DEVICE block contains either :PAUSE or :FONTPAUSE blocks. Indeed, the only :DEVICE block in the Open Watcom repository that contains :PAUSE blocks is TERM; no :DEVICE block known to me contains a :FONTPAUSE block.
The :PAUSE blocks in the TERM :DEVICE block use these device functions:
%clearPC() %recordbreak() %text() (with a literal argument) %wait()
So, those functions must work in our wgml exactly as they do in wgml 4.0. Our wgml can handle the others in a reasonable (certainly) but not-necessarily-identical-to-how-wgml-4.0-does-it manner.
These Type I functions will be interpreted:
%clear3270() %clearPC() %endif() %ifeqn() %ifeqs() %ifnen() %ifnes() %image() %recordbreak() %setsymbol() %text() %wait()
and these will be ignored:
%binary() %binary1() %binary2() %binary4() %cancel() %dotab() %enterfont() %flushpage()
Those functions which are interpreted will work correctly. Those functions which are ignored will do nothing whatsoever: no MPFs, no error messages, no output, no effect on anything. Since %clear3270() and %clearPC() behave identically, they will be implemented identically.
When %sleep() is used with a non-literal parameter, gendev 4.1 does compile it; however, wgml 4.0 proceeds to hang when it encounters it. Our wgml will have to do better.
The remaining Type I functions will never be encountered: %textpass(), %ulineoff(), and %ulineon() are only allowed in :LINEPROC blocks by gendev 4.1.
All Type II functions return a value of either character or numeric type. Initially, I regarded wgml 4.0's behavior when the return type of the called function did not match the required type of the calling function as a problem to be solved. However, further reflection suggests these points:
- The current device library causes none of these problems, so our wgml does not have to solve them to work properly.
- These problems are really language usage problems, and so should be caught by our gendev, so that the source file can be corrected.
For this reason, I now expect the interpreter used in wgml to use an array of pointers to a function taking no parameters and returning a void *. The calling function will cast the void * to either a uint32_t or a char *. As noted above, the current library has no mismatches and our gendev will prevent any in the future.
As with the current interpreter, %image() will still output literal parameters character-by-character, since gendev 4.1 (and our gendev) can put nulls into a literal parameter. There is, so far as I can tell, no way to insert nulls into the various values returned by those device functions which return character values.
There are some situations where error messages will be needed.
Consider first the functions %setsymbol(), %getnumsymbol(), and %getstrsymbol(). With wgml 4.0, this works when tested:
%setsymbol("fred","tom")
%setsymbol(%getstrsymbol("fred"),"sally")
%image(%getstrsymbol(%getstrsymbol("fred")))
and produces the output "sally", while this does not work:
%setsymbol("fred",3)
%setsymbol(%getnumsymbol("fred"),"sally")
since it produces this message:
SN--057: The symbol name must have at least one character
Our wgml should behave similarly, since symbols are used with the conditionals. Note that, if the second parameter is numeric, the observed behavior is to assign a null string as the value of the symbol -- treating it, in effect, as an "ignorable value".
Similarly, if any of the conditionals
%ifeqn() %ifeqs() %ifnen() %ifnes()
are given a parameter of the wrong type, an error message should result since otherwise wgml has no way of telling whether or not to execute the functions controlled by the conditional statement.
Finally, if %divide() or %remainder() is passed, as its second parameter, a numeric value of "0", that will need to be reported, since proceeding with the operation would produce a "Divide overflow" message.
If all the information discovered about the use of these functions is considered, then this appears to be the best solution:
- Initially, every global variable whose value is returned by a device function is set to either "0" or a null string. Indeed, all the functions indicated above as ignored use this same function.
- After the START :PAUSE block has been interpreted, then every global variable whose value is returned by a device function and is dependent on the data in the device library is set to the proper value. This leaves the ignored functions and those globals maintained by wgml alone. Symbols given values by the START :PAUSE block will still have those values.
- If there is only one pass, then no further changes are needed. However, if there are multiple passes, then this will be needed:
- After the DOCUMENT :PAUSE block has been interpreted, the functions implementing %image() and %text() will have to be replaced with versions which do not actually produce output. They must, however, otherwise be fully functional.
- At the start of the last pass, the functions implementing %image() and %text() which actually send output to the screen will have to be swapped back in, so that the various :PAUSE and :FONTPAUSE blocks appear in their proper place.
To illustrate the first and second steps, consider %font_height(): examination of the actual values output show clearly that it has (as expected) a non-zero value for all values of %font_number(). Yet it is ignored or returns "0" in the START :PAUSE block -- and only in the START :PAUSE block. When used with %decimal() or %hex(), output always appears (and is non-zero except, as stated, in the START :PAUSE block).
Some items may need to be reset at the start of each pass: for example, the global symbol table may need to be restored to a prior state: if the START and DOCUMENT :INIT and :PAUSE blocks are executed, without output except on the first pass, before the start of each pass, then the "prior state" would be whatever it starts at when wgml starts up; if the START and DOCUMENT :INIT and :PAUSE blocks are not executed again after the first pass, then that state would have to include the result of any symbol table activity in those blocks.
Other items, such as the information used for the table of contents, table of figures, and the index are almost certainly retained between passes: the primary purpose of multiple passes appears to be resolving forward references (second pass) and using the correct page numbers (third pass -- only needed for this purpose if resolving the forward references changes which page any of these items falls on). Multiple passes may produce other benefits as well; there may be other state information that needs to be retained, rather than reset, between passes. Only time will tell.
:DRIVER Blocks
Device function useage in :DRIVER blocks can be controlled by writing our gendev to retain those restrictions enforced by gendev 4.0 and add any additional restrictions suggested by the examination of the weirder device functions.
As with the :DEVICE block, each function block must be presumed to be done on each pass if only to ensure that the global symbol table has the correct state at each point. However, output is restricted to the first pass for the :INIT blocks and the implicit %enterfont(), and the last pass for everything else. Thus, the device functions %binary(), %image(), %nulls (if added), and %text() will need two versions: one to do no output at all, and one to do output to the device. Alternately, perhaps the output buffer could be written so that it can either output to the device or not actually do any output at all, as desired. Presumabley, all other actions, including the updating of the value returned by device function %x_address() and any internal variables, will be done even when no actual output is occurring, although that will probably depend on what the code requires to work properly.
Device Function Utility API
There are also several areas where it might be possible to write functions that wgml could call to perform the output. Those that have occurred to me are discussed here until a better location is found for them; others may exist; feel free to add them as they occur to you.
Items identified elsewhere include:
- the various output sequences;
- the input translation;
- the output translation; and
- the width computation.

