[an error occurred while processing this directive]

A lesson on data representation with XML [] (XML data representation), Lesson, page 721970
https://www.purl.org/stefan_ram/pub/xml_data_representation_en (permalink) is the canonical URI of this page.
Stefan Ram

Representation of data with XML

Structures of Data

Data describe properties  of some object. All data are of a certain data type. An assertion about an object has the form: “The property p  of the object o  has the value L (t ).”, where t  is a data type and L  is a literal that is to be interpret using the type t. An example of such an assertion is “The property height  of the object “tower ” has the value "40"( meter  ).” Here the type “meter” specifies how to interpret the literal "40", i.e., as “40 meters”. Other examples of data type names include the name "int" or the name "float" in some programming languages.

A data language appropriate for the representation of the most fundamental assertions involving data should allow at least to mark names of properties and types as such. For example, it might look as follows, using a convention to write object names  in front of braces, to write property names  in front of equal signs and to write data-type names  in parentheses behind literals.

Example
"towerdescription"
{ "height" = "40" ( "meter" );
"circumference" = "20" ( "meter" );
"name" = "Miller tower" ( "text" ); };

The name of a data type (such as "meter") may be used repeatedly within a description, while the name of a property (such as "height") may be used reasonably at most once. A tower can have only one height, but several dimension measured in meters.

To conclude this section one can observe that there are property names, like "height" and "name", and data-type names, like "meter" and "text", and that property names obviously are something different than data-type names. Property names  describe the rôle  a value has within a description, while type names  describe how to interpret a given representation (like a literal) of a value. For example, the data type of the height might be changed, while its property name "height" and the meaning of the whole assertion is left unchanged.

Example 1
"towerdescription"
{ "height" = "40000" ( "mm" );
"name" = "Miller tower" ( "text" ); };

Representation of Data with XML

Now let's look at how well XML  does fulfill the fundamental requirements for the representation of assertions involving properties and data types.

There is one obvious starting point: property names  map to attribute names  and data type names map to element type names.

Properties in XML
<towerdescription
height="40"
name="Miller tower" />
Data types in XML
<meter>40</meter>

<text>Miller tower</text>

Mapping property names to attribute names is supported by the suggestive notation with the equals sign and by the correspondence that attribute names may occur at most once within an element, which is exactly the natural property of property names as described above.

The affiliation of data-type names to element-type names in turn is supported by the suggestive equality in the word “type”: An element, after all, describes a certain value (possibly a structured value), so that the type of this element is the type of that value in the sense of a data type.

Representation of properties and data types in XML
property names: XML attribute names

data type name: XML element type names

But one can see, that it is not easy to combine the two approaches into one object description. The data type might be added to the text of attribute values.

Typed Properties in XML
<towerdescription
height="40 (meter)"
name="Miller tower (text)" />

In this case, however, one is not using XML  anymore, but a new custom language inside the attribute values. The problem with XML  that becomes visible is:

Attribute values cannot be structured with the means of XML.

Another approach would be to use subelements.

Typed Properties in XML, 1
<towerdescription>
<height><meter>40</meter></height>
<name><text>Miller tower</text></name>
</towerdescription>

With subelements the structure can be expressed somehow, but the distinction between properties and data types gets blurred, because element-type names are used for both property names (rôles) and  type names.

Finally, one might even decide to use an attribute for the name of the tower, because this does not need a type specification: If the type of the property "name" has to be "text", then the type information might be omitted, after defining the type "text" to be the default type for this case. The type specification of the height has to be retained, if there are several possible values.

Typed Properties in XML, 2
<towerdescription name="Miller tower">
<height><meter>40</meter></height>
</towerdescription>

This looks a little bit shorter and makes better use of an attribute in one case, but actually it is even more disordered, because any regular assignment between the actual structure of data and the structure of the XML  document (attributes and elements) is lost now: Irregularly sometimes attribute names and sometimes element types are used to express property names and data types.

Representation of properties and data types in XML, 1
property names: XML attribute names or XML element type names

data type name: XML element type names (or omitted)

Because in XML  attribute values cannot be structured, element type  names, which should specify the type  of data, are abused for giving the name of a property, i.e., the name of the rôle of data within its container, which should better be specified with an attribute name. A distinction between attributes and subelement types by their meaning is not possible, instead technical restrictions govern the choice, resulting in a disordered mixture of both.

Several other languages have been designed to allow structured attribute values and might be used instead of XML  to avoid the problems being described here.

A hypothetical XML-Variant

A hypothetical variant of XML  might allow for structured attributes.

A hypothetical XML-variant
< tower
height = <meterlength>40</meterlength>
Name = "miller tower"
/>

With such a variant of XML  types  of data always can be written as element-type names  while rôles  of data  always can be written as attribute names.

The deep structure of the assertion made could easily be comprehended by looking at the visible surface structure of the XML  element. It would be immediately visible, what constitutes a type and what constitutes a relation.

In XHTML, the element-type name "head" and the element-type name "body" are intended to describe rôles of elements within an element with the type "html".

A hypothetical variant of XHTML
< html
head = ...
body = ...
/>

Multiples occurrences of a value with the rôle "head" would not make sense and a specific order of these entries is not necessary (although required in XHTML ). This shows that these two values indeed are candidates for attributes.

See Also

Measurement Units in XML Datatypes
Frank Olken  and John McCarthy
http://pueblo.lbl.gov/~olken/mendel/w3c/xml.schema.wg/units/syntax.htm
Unotal
Unotal  is an XML -like notation, but has structured attributes.
Stefan Ram 
https://www.purl.org/stefan_ram/pub/unotal_en

Del.icio.us   |   About this page, Impressum  |   Form for messages to the publisher regarding this page  |   "ram@zedat.fu-berlin.de" (without the quotation marks) is the email-address of Stefan Ram.   |   Beginning at the start page often more information about the topics of this page can be found. (A link to the start page appears at the very top of this page.)  |   Copyright 2004 Stefan Ram, Berlin. All rights reserved. This page is a publication by Stefan Ram. slrprd, PbclevtugFgrsnaEnz