< &rfc ipr=[none] docName=[draft-ram-unotal-01] front - < title - [Unotal - Syntax For Information] author - < initials = [S.L.] surname = [Ram] fullname = [Stefan L. Ram] organization - [Stefan L. Ram] address - < email - [ram@zedat.fu-berlin.de] uri - [http://www.purl.org/stefan_ram/]>> date - < month = [February] year = [2006] > area - [Applications] keyword - [text] keyword - [code] abstract - < < &t [This memo describes the notation "Unotal". ] [Unotal might be used to represent textual as well as structured information. ] [A unotal standard interpretation can be used to interpret a unotal room as an assertion. ] [These denotational semantics of Unotal, however, are not part of this specification, which only specifies the syntax of Unotal. ] [This specification does not contain a tutorial-like introduction or examples, which are available elsewhere. ] >>> middle - < < §ion title=[Introduction] < &t [This specification intentionally contains no or little semantics, examples, explanations, or rationales. ] [These might be added as separate documents. ] [This document is intended to be a reference for the Unotal syntax. ] [It is not a tutorial and might not be the best text to be read for a first introduction to Unotal. ] >> < §ion title=[Status of productions and natural language text] < &t [A Unotal unit is a tuple of Unicode-characters encoded with UTF-8 ] [as being described by the rules of this specification. ] [These rules can not be expressed by EBNF-productions only. ] [Additional restrictions and explanations given in the english text of this specification apply, too. ] > > < §ion title=[Character Set and Encoding] < &t [The characters "%d" in the EBNF-productions precede a decimal number to specify the character with the code point given by the number. ] > < &artwork [ ::= %d0 - %d2097151. ::= %d10.
::= %d12. ::= %d13. ::= %d29. ::= %d32. ::= %d33. ::= %d35. ::= %d37. ::= %d38. ::= %d40. ::= %d41. ::= %d43. ::= %d44. ::= %d45. ::= %d46. ::= %d47. ::= %d58. ::= %d60. ::= %d61. ::= %d62. ::= %d91. ::= %d92. ::= %d93. ::= %d95. ::= %d123. ::= %d125. ::= %d126. ] > > < §ion title=[Token Grammar] < §ion title=[White Space Tokens] < &artwork [ ::= | | | . ::= {}. ]>> < §ion title=[Free String Tokens] < &t [A sequence of at least one is considered to be a free string ] [if it is neither directly preceded nor directly followed by a free string character. ] > < &t [The upper-case names of the following productions refer to ] [the Unicode character categories of the same name. ] [A "", "", or "" is a character, whose two-letter Unicode character category name starts with "L", "M", or "N", respectively. ] > < &artwork [ ::= | | | | | | | | | . ::= { }. ]>> < §ion title=[Single Tokens] < &t [A single token is a character that can not be a part of a free string. ] [So a single token is never merged with other characters to form a multi-character token, unless it is part of a bracketed string. ] > < &artwork [ ::= ( | | | | | | | | | | | ). ]>> < §ion title=[Bracketed String Tokens] < &t [The reverse solidi "\" (read: "except") in the following productions precede symbols to be excluded from the set being described by the production. ] [For example, " \ \ " means any character except the left and the right square bracket. ] > < &t [A is a that is not contained in any other . ] > < &t [The text value of a is its , with the final vertical bar removed ] [if the text core should end in a sequence of vertical bars that is directly preceded by an . ] [This sequence of vertical bars needs to consist of at least one vertical bar. ] > < &artwork [ ::= \ \ \ \ . ::= ( | | ). ::= . ::= | | | . ::= {}. ::= . ::= . ]>> < §ion title=[Tokens] < &t [Some of the token types of this section are not used in any other productions of this text, but are defined here for reference in other texts. ] [The symbol denotes a token that is different from any other tokens but otherwise unspecified. ] > < &t [The reverse solidi in the following productions precede symbols to be excluded from the set being described by the production. ] > < &artwork [ ::= | | | . ::= \ \ . ::= | . ::= | . ]>>> < §ion title=[Base Structure] < §ion title=[Base Strings] < &t [Starting with this section, additional white space may be inserted between all symbols on the right hand side of a production; it might also be inserted in front or after any such symbol. ] [This white space is not shown explicitly in the productions. ] > < &t [White space must be used to separate two adjecent free strings, which otherwise would be regarded as one single free string. ] [This white space is also not shown explicitly in the productions. ] > < &artwork [ ::= | . ]>> < §ion title=[Base Rooms] < &artwork [ ::= | . ::= {}. ::= . ]>> < §ion title=[Base Expressions] < &artwork [ ::= | . ]>> > < §ion title=[Extended Structure] < &t [The following sections add structure to the base structure. ] > < &t [For some applications, it might be appropriate to use the base structure only. ] > < &t [All of the following procedures are valid only directly within a room or outside of any room. I.e., they are not be valid within a bracketed string. ] > < &t [All of the following procedures are to be applied in the order, in which they are given here. ] > < §ion title=[Concatenation] < &t [A is searched from left to right for . When a token is found that is a valid start of a concatenation, this concatenation must be extended as far as possible, that is, if a is followed by tokens so that a longer might be built, this has to be done. ] > < &t [For example, the room "<[a]~[b]~[c]>" contains one concatenation with three bracketed strings, not one followed by a tilde "~" and a bracketed string "[c]". ] > < &t [After this process the is viewed as a a room containing symbols. ] > < &artwork [ ::= . ::= . ::= | . ::= | . ::= . ]> < &t [For example, the "" contains three -symbols, namely "a", "[b]~[c]", and "[d]". The "[b]~[c]" consists of three s itself, but has to be interpret as only one within its room. ] > > < §ion title=[Assignments Process] < &t [To identify assignments within a room, first, the leftmost token sequence that is an assignment has to be identified. ] [The tokens of this assignment then are considered to be consumed for this assignment. ] [Then, the next leftmost assignment has to be search, ignoring all tokens that already were consumed and until no more assignment can be identified. ] > < &artwork [ ::= . ::= . ]> < &t [When analyzing Unotal text to identify a , the interpretation as a must not be chosen if another interpretation according to the preceding productions for is possible. ] > < &artwork [ ::= . ]>> < §ion title=[Comments] < &t [To identify commens within a room, first, the leftmost token sequence that is a comment has to be identified. ] [The tokens of this comment then are considered to be consumed for this comment. ] [Then, the next leftmost comment has to be search, ignoring all tokes that already were consumed and until no more comment can be identified. ] > < &artwork [ ::= . ]> < &t [When analyzing Unotal text to identify a , the interpretation as a must not be chosen if another interpretation according to the preceding production for is possible. ] > < &artwork [ ::= . ]>> < §ion title=[Namespaces] < &artwork [ ::= . ]> < &t [When analyzing Unotal text to identify a , the following interpretation as an must not be chosen if another interpretation according to the preceding production for is possible. ] > < &artwork [ ::= . ]> > < §ion title=[Types] < &artwork [ ::= . ]> < &t [When analyzing Unotal text to identify a , the following interpretation as an must not be chosen if another interpretation according to the preceding production for is possible. ] > < &artwork [ ::= . ]> > < §ion title=[Rooms] < &artwork [ ::= | . ::= {}. ::= . ]>> < §ion title=[Expressions] < &artwork [ ::= | . ]>> > > back - < < §ion title=Acknowledgements < &t [The author gratefully acknowledges the use of RFC-2629 software that was written by M. Rose. ]>>>>