specification S. Ram Stefan L. Ram February 11, 2006 Unotal - Syntax For Information Abstract This memo describes the notation "Unotal". Unotal might be used to represent textual as well as structured information. A unotal standard interpretation can be used to interpret a unotal room as an assertion. These denotational semantics of Unotal, however, are not part of this specification, which only specifies the syntax of Unotal. This specification does not contain a tutorial-like introduction or examples, which are available elsewhere. Ram [Page 1] Unotal - Syntax For Information February 2006 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Status of productions and natural language text . . . . . . . 3 3. Character Set and Encoding . . . . . . . . . . . . . . . . . . 3 4. Token Grammar . . . . . . . . . . . . . . . . . . . . . . . . 4 4.1 White Space Tokens . . . . . . . . . . . . . . . . . . . . 4 4.2 Free String Tokens . . . . . . . . . . . . . . . . . . . . 4 4.3 Single Tokens . . . . . . . . . . . . . . . . . . . . . . 4 4.4 Bracketed String Tokens . . . . . . . . . . . . . . . . . 5 4.5 Tokens . . . . . . . . . . . . . . . . . . . . . . . . . . 6 5. Base Structure . . . . . . . . . . . . . . . . . . . . . . . . 6 5.1 Base Strings . . . . . . . . . . . . . . . . . . . . . . . 6 5.2 Base Rooms . . . . . . . . . . . . . . . . . . . . . . . . 6 5.3 Base Expressions . . . . . . . . . . . . . . . . . . . . . 7 6. Extended Structure . . . . . . . . . . . . . . . . . . . . . . 7 6.1 Concatenation . . . . . . . . . . . . . . . . . . . . . . 7 6.2 Assignments Process . . . . . . . . . . . . . . . . . . . 8 6.3 Comments . . . . . . . . . . . . . . . . . . . . . . . . . 8 6.4 Namespaces . . . . . . . . . . . . . . . . . . . . . . . . 9 6.5 Types . . . . . . . . . . . . . . . . . . . . . . . . . . 9 6.6 Rooms . . . . . . . . . . . . . . . . . . . . . . . . . . 9 6.7 Expressions . . . . . . . . . . . . . . . . . . . . . . . 9 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 9 A. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 Ram [Page 2] Unotal - Syntax For Information February 2006 1. Introduction This specification intentionally contains no or little semantics, examples, explanations, or rationales. These might be added as separate documents. This document is intended to be a reference for the Unotal syntax. It is not a tutorial and might not be the best text to be read for a first introduction to Unotal. 2. Status of productions and natural language text A Unotal unit is a tuple of Unicode-characters encoded with UTF-8 as being described by the rules of this specification. These rules can not be expressed by EBNF-productions only. Additional restrictions and explanations given in the english text of this specification apply, too. 3. Character Set and Encoding The characters "%d" in the EBNF-productions precede a decimal number to specify the character with the code point given by the number. ::= %d0 - %d2097151. ::= %d10.
::= %d12. ::= %d13. ::= %d29. ::= %d32. ::= %d33. ::= %d35. ::= %d37. ::= %d38. ::= %d40. ::= %d41. ::= %d43. ::= %d44. ::= %d45. ::= %d46. ::= %d47. ::= %d58. ::= %d60. ::= %d61. ::= %d62. ::= %d91. ::= %d92. ::= %d93. ::= %d95. ::= %d123. ::= %d125. Ram [Page 3] Unotal - Syntax For Information February 2006 ::= %d126. 4. Token Grammar 4.1 White Space Tokens ::= | | | . ::= {}. 4.2 Free String Tokens A sequence of at least one is considered to be a free string if it is neither directly preceded nor directly followed by a free string character. The upper-case names of the following productions refer to the Unicode character categories of the same name. A "", "", or "" is a character, whose two-letter Unicode character category name starts with "L", "M", or "N", respectively. ::= | | | | | | | | | . ::= { }. 4.3 Single Tokens A single token is a character that can not be a part of a free string. So a single token is never merged with other characters to form a multi-character token, unless it is part of a bracketed Ram [Page 4] Unotal - Syntax For Information February 2006 string. ::= ( | | | | | | | | | | | ). 4.4 Bracketed String Tokens The reverse solidi "\" (read: "except") in the following productions precede symbols to be excluded from the set being described by the production. For example, " \ \ " means any character except the left and the right square bracket. A is a that is not contained in any other . The text value of a is its , with the final vertical bar removed if the text core should end in a sequence of vertical bars that is directly preceded by an . This sequence of vertical bars needs to consist of at least one vertical bar. ::= \ \ \ \ . ::= ( | | ). ::= . ::= | | | . ::= {}. Ram [Page 5] Unotal - Syntax For Information February 2006 ::= . ::= . 4.5 Tokens Some of the token types of this section are not used in any other productions of this text, but are defined here for reference in other texts. The symbol denotes a token that is different from any other tokens but otherwise unspecified. The reverse solidi in the following productions precede symbols to be excluded from the set being described by the production. ::= | | | . ::= \ \ . ::= | . ::= | . 5. Base Structure 5.1 Base Strings Starting with this section, additional white space may be inserted between all symbols on the right hand side of a production; it might also be inserted in front or after any such symbol. This white space is not shown explicitly in the productions. White space must be used to separate two adjecent free strings, which otherwise would be regarded as one single free string. This white space is also not shown explicitly in the productions. ::= | . 5.2 Base Rooms ::= | . ::= {}. Ram [Page 6] Unotal - Syntax For Information February 2006 ::= . 5.3 Base Expressions ::= | . 6. Extended Structure The following sections add structure to the base structure. For some applications, it might be appropriate to use the base structure only. All of the following procedures are valid only directly within a room or outside of any room. I.e., they are not be valid within a bracketed string. All of the following procedures are to be applied in the order, in which they are given here. 6.1 Concatenation A is searched from left to right for . When a token is found that is a valid start of a concatenation, this concatenation must be extended as far as possible, that is, if a is followed by tokens so that a longer might be built, this has to be done. For example, the room "<[a]~[b]~[c]>" contains one concatenation with three bracketed strings, not one followed by a tilde "~" and a bracketed string "[c]". After this process the is viewed as a a room containing symbols. ::= . ::= . ::= | . ::= | . ::= . Ram [Page 7] Unotal - Syntax For Information February 2006 For example, the "" contains three -symbols, namely "a", "[b]~[c]", and "[d]". The "[b]~[c]" consists of three s itself, but has to be interpret as only one within its room. 6.2 Assignments Process To identify assignments within a room, first, the leftmost token sequence that is an assignment has to be identified. The tokens of this assignment then are considered to be consumed for this assignment. Then, the next leftmost assignment has to be search, ignoring all tokens that already were consumed and until no more assignment can be identified. ::= . ::= . When analyzing Unotal text to identify a , the interpretation as a must not be chosen if another interpretation according to the preceding productions for is possible. ::= . 6.3 Comments To identify commens within a room, first, the leftmost token sequence that is a comment has to be identified. The tokens of this comment then are considered to be consumed for this comment. Then, the next leftmost comment has to be search, ignoring all tokes that already were consumed and until no more comment can be identified. ::= . When analyzing Unotal text to identify a , the interpretation as a must not be chosen if another interpretation according to the preceding production for is possible. ::= . Ram [Page 8] Unotal - Syntax For Information February 2006 6.4 Namespaces ::= . When analyzing Unotal text to identify a , the following interpretation as an must not be chosen if another interpretation according to the preceding production for is possible. ::= . 6.5 Types ::= . When analyzing Unotal text to identify a , the following interpretation as an must not be chosen if another interpretation according to the preceding production for is possible. ::= . 6.6 Rooms ::= | . ::= {}. ::= . 6.7 Expressions ::= | . Author's Address Stefan L. Ram Stefan L. Ram EMail: ram@zedat.fu-berlin.de URI: http://www.purl.org/stefan_ram/ Ram [Page 9] Unotal - Syntax For Information February 2006 Appendix A. Acknowledgements The author gratefully acknowledges the use of RFC-2629 software that was written by M. Rose. Ram [Page 10]