Writing XML with Unotal

“Writing XML with Unotal ” describes how XML -documents can be described in Unotal, which allows for some workarounds to problems with XML. These documents then can be converted to XML.

A Small Introduction to Unotal

Unotal was developed to be able to represent data in a natural way. A very coarse description of Unotal might be: It is XML with structured attributes. There are also no end tags for elements: An element always starts with a left angle bracket and ends with a right angle bracket and actually it is called a “room” in Unotal. The type of a room is marked with an ampersand "&".

A tower description in Unotal

< &towerdescription
  height=<40 &meter>
  name=<[Miller tower] &text>>

A string may be written as it is, but must be enclosed in brackets if it contains special characters, such as a blank. The position of the room type within a room is not fixed, so that the room "<&meter 40>" might also be written as the room "<40 &meter>".

Unotal gives the freedom to the writer to always use attribute names for property names and element type names for data type names, because:

In Unotal, attribute values might be structured.

Thus, it can be told with a wink of the eye, what's a property name and what's a type name.

Writing XML with Unotal

Because Unotal allows a more natural notation of data, one might want to write documents in Unotal instead of XML and then have them converted to XML. For example, a natural tower description can be given as follows.

A tower description in Unotal

< &towerdescription
  height=<40 &meter>
  name=<[Miller tower] &text>>

An XML -language (DTD) might require the description as follows.

A tower description in XML

<towerdescription name="Miller tower">
  <height><meter>40</meter></height>
</towerdescription>

Above, the "height" semantically is not a type of the element, but the name for the relation between the towerdescription and the value "<meter>40</meter>", i.e., the rôle of that value. But in XML, this can not be written with the proper means, i.e., as an attribute, so an element has to be abused for that purpose.

Because in XML attribute-values may not be structured, in XML one has to abuse element types for property names, which actually are not types of one thing alone at all, but names for the relation between two things.

How should the converter know, that the name attribute has to be converted to an XML -attribute, but the height attribute has to be converted to an XML -subelement? A special description file might provide these information. But for a recent project another approach was chosen, where the writer needs to know something about the XML structure, but can still express whether a name is a property name or a type name.

A tower description in Unotal for XML -conversion

< &towerdescription
  name=[Miller tower]
  height-<<40 &meter>>
  >

Here the hyphen (to be pronounced as “is”) is used as a replacement for the equals sign, but it includes the “hint” to the XML -writer that the property is to be written as an XML -subelement instead of an XML -attribute. By this means, it still can be expressed that the name on the left hand of the hyphen is a property name and not a type name. So, property names are written on the left side of an equals sign or a hyphen , while type names are written as type names. The property names written with an equals sign are converted to XML -attributes and the property names written with a hyphen are converted to XML -subelements with the property name as an element type name. The double angle brackets are required only if a type is specified, because this has to be translated into "<height><meter>40</meter></height>", i.e., into two elements, where the property name given is used as the type of the outer XML -element.

A note on the implementation: The hyphen used in this way does not have any special meaning for Unotal , which the equals sign has. So the above tower description in Unotal is a room with a type "towerdescription" and an attribute "name". It has three entries in its body, i.e., the entry "height", the entry "-", and the entry "<<40 &meter>>". The hyphen will be interpret by the XML writer: If the right hand side is a room, the left hand side will become its type and the hyphen will be removed, so the XML writer will convert the above tower description to the following tower description. So the intermediate representation of the text "height-<<40 &meter>>" will be the room "<&height<40 &meter>>", which then is converted to the XML -element "<height><meter>40</meter></height>".

An intermediate representation of the room given above

< &towerdescription
  name=[Miller tower]
  < &height <40 &meter>>
  >

This modified tower description than can be converted to XML straightforward by writing Unotal -attributes as XML -attributes and Unotal rooms as XML elements.

A tower description in XML generated from the above Unotal -description

<towerdescription name="Miller tower">
  <height><meter>40</meter></height>
</towerdescription>

Here is an example of an XHTML -document written with Unotal. Some attributes that require special treatment by the converter have names beginning with a dot ".".

An XHTML -document written in Unotal

< &xml

  .xmldecl = [version = "1.0" encoding="UTF-8"]

  .doctype = [html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"]

  html -

  < xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"

    head - < title - [Virtual Library] >

    body - < &p [Moved to ]
      < &a href=[http://vlib.org/] [vlib.org] > [.] >>

One can see that the head attribute and the body attributes of the html element can be written as what they are: As attributes of the html room. This gives a clear contrast, so that the paragraph type "p" and the anchor type "a" stand out as what they are: as types of their rooms. By the brackets it is obvious which spaces (blanks) are a significant part of the text and which are not (i.e., the spaces inside of brackets are a significant part of the text, the spaces outside of the brackets are not).

Also, the text "head - < title - [Virtual Library] >" (where the hyphen is pronounced “is”) seems to be quite readable and writable, compared with the text "<head><title>Virtual Library</title></head>".

The hyphen notation allows to repeat properties with the same name and can retain the order of property definitions. Both can be important when creating XML documents. (The order of the Unotal attributes is not significant and thus might get lost while processing it.)

Multiple occurrences of properties with the same name and a specific order in Unotal

< &article
  keyword-<alpha>
  keyword-<beta>
  >

Another Example

Another XML -example

<web-resource-collection>

  <web-resource-name>User Section</web-resource-name>

  <description>no description</description>

  <url-pattern>/protected/*</url-pattern>

  <http-method>POST</http-method>

  <http-method>GET</http-method>

</web-resource-collection>

The preceding example written in Unotal

web-resource-collection -

< web-resource-name - [User Section]
  description       - [no description]
  url-pattern       - [/protected/*]
  http-method       - [POST]
  http-method       - [GET]
  >