123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111111121113111411151116111711181119112011211122112311241125112611271128112911301131113211331134113511361137113811391140114111421143114411451146114711481149115011511152115311541155115611571158115911601161116211631164116511661167116811691170117111721173117411751176117711781179118011811182118311841185118611871188118911901191119211931194119511961197119811991200120112021203120412051206120712081209121012111212121312141215121612171218121912201221122212231224122512261227122812291230123112321233123412351236123712381239124012411242124312441245124612471248124912501251125212531254125512561257125812591260126112621263126412651266126712681269127012711272127312741275127612771278127912801281128212831284128512861287128812891290129112921293129412951296129712981299130013011302130313041305130613071308130913101311131213131314131513161317131813191320132113221323132413251326132713281329133013311332133313341335 |
- Simplification and restrictions are two topics on which I have been very evasive all over this book. The reason for this is that they are pretty technical and have few direct concrete impact when you're writing a Relax NG schema. Still, this book wouldn't be complete without describing it.
- Why should we care at all of the simplification if it's so technical and looks like an implementation algorithm? To be honest, most of the time we do not have to care about this stuff at all. The simplification can be seen as an intermediary step when a Relax NG processor reads a schema. During this step, all the syntactical sugar is removed and the processor can then work with a perfectly normalized schema. On the other hand, the few restrictions existing with Relax NG are formalized relatively to this normalized version of the schema. Because of the flexibility of Relax NG, formalizing them on schemas before simplification would be very complex and very difficult to read. The downside is that when you hit one of these restrictions you often need to understand the main principles of the simplification process to understand what's happening. The good news is that it doesn't happen so often!
- !!!Simplification
- During its conception, Relax NG has always tried to keep a balance between simplicity of use, simplicity to implement and the simplicity of its data model. What's simple to implement is often simple to use, but there are many features which are very useful to the users, add complexity for the implementers and clutter the data model. This is the case, for instance, of all the features designed to create building blocks (named patterns, includes, embedded grammars): they are very useful for the users but the fact that you've used named patterns or a Russian doll style, has zero impact on the validation itself. This is also the case for shorthands, such as the mixed pattern which is just a more concise way of writing an interleave pattern with an embedded text pattern.
- The quest for simplicity has had a huge influence over the design of Relax NG and here is the view of James Clark on the subject:
- "Simplicity of specification often goes hand in hand with simplicity of use. But I find that these are often in conflict with simplicity of implementation. An example would be ambiguity restrictions as in XSD: these make implementation simpler (well, at least for people who don't want to learn a new algorithm) but make specification and use more complex. In general, RELAX NG aims for implementation to be practical and safe (i.e. implementations shouldn't use huge amounts of time/memory for particular schemas/instances), but apart from that favors simplicity of specification/use over simplicity of implementation."
- To keep the description of the restrictions and validation algorithm simple while offering those useful features to the users, Relax NG has chosen to describe validation through a Relax NG as a two step process:
- *First, the schema is read and simplified. The purpose of the simplification is to remove all the additional complexity of the "syntactic sugar" and to reduce the schema to its most simple equivalent form.
- *Then, instance documents are validated against the simplified schema. Since all the syntactic sugar has been removed from the simplified schema, it does not need to be taken into account in the description of the validation, leading to much simpler algorithms.
- The simplification is described for each Relax NG element in the reference manual and we won't do deep into its details here but just give the main points. If you don't want to go into too much detail, let's just say that the simplification removes all the syntactic sugar, consolidate all the external schemas, uses a subset of all the available Relax NG elements and transforms the resulting structure into a flat schema where each element is embedded in a named pattern and all the named patterns contain the definition of a single element.
- The Relax NG specification is very clear that this simplification is done by the Relax NG processors after on the data model resulting of the reading of the complete schema and that the results of this simplification doesn't have to be serialized as XML. However, I think that showing intermediary results presented as XML help to visualize the simplification process (note that even if these intermediary results are presented indented for readability even if we have seen that text nodes with only whitespaces have been removed in one of the first steps of the simplification).
- The XML syntax is closer to the data model used to describe the simplification than the compact syntax and the details of the simplification will be shown below on XML snippets but for each sequence of steps I will also give the compact syntax for the whole schema to show a better overall view of the impact on the structure of the schema (note that some impact of the simplification are just lost on the compact syntax).
- The schema which will be used in this chapter is a consolidation of features seen all over this book to cover most of the elements impacted by the simplification. It is composed of three documents:
- * library.rnc (or .rng):
- namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
- namespace hr = "http://eric.van-der-vlist.com/ns/person"
- namespace local = ""
- default namespace ns1 = "http://eric.van-der-vlist.com/ns/library"
- namespace sn = "http://www.snee.com/ns/stages"
- a:documentation [[ "Relax NG schema for our library" ]
- sn:stages [[
- sn:stage [[ name = "library" ]
- sn:stage [[ name = "book" ]
- sn:stage [[ name = "author" ]
- sn:stage [[ name = "character" ]
- sn:stage [[ name = "author-or-book" ]
- ]
- start =
- [[ sn:stages = "library" ] element library { book-element+ }
- | [[ sn:stages = "book author-or-book" ] book-element
- | [[ sn:stages = "author author-or-book" ] author-element
- | [[ sn:stages = "character" ] character-element
- include "foreign.rnc" {
- foreign-elements = element * - (local:* | ns1:* | hr:*) { anything }*
- foreign-attributes = attribute * - (local:* | ns1:* | hr:*) { text }*
- }
- author-element =
- element hr:author {
- attribute id {
- xsd:NMTOKEN { maxLength = " 16 " }
- },
- name-element,
- born-element,
- died-element?
- }
- include "book-content.rnc"
- book-content &= foreign-nodes
- book-element = element book { book-content }
- born-element = element hr:born { xsd:date }
- character-element = external "character-element.rnc"
- died-element = element hr:died { xsd:date }
- isbn-element = element isbn { foreign-attributes, token }
- name-element = element hr:name { xsd:token }
- qualification-element = element qualification { text }
- title-element = element title { foreign-attributes, text }
- available-content = "true" | xsd:token " false " | " "
- * book-content.rnc (or .rng)
- book-content =
- attribute id { text },
- attribute available { available-content },
- isbn-element,
- title-element,
- author-element*,
- character-element*
- * foreign.rnc (or .rng)
- anything =
- (element * { anything }
- | attribute * { text }
- | text)*
- foreign-elements = element * { anything }*
- foreign-attributes = attribute * { text }*
- foreign-nodes = (foreign-attributes | foreign-elements)*
- * character-element.rnc (or .rng)
- start =
- element character {
- attribute id { text },
- parent name-element,
- parent born-element,
- parent qualification-element
- }
- !!Whitespace and attribute normalization and inheritance
- The first sequence of simplification steps realizes various normalizations without changing the structure of the schema:
- * Annotations (i.e. attributes and elements from foreign namespaces) are removed.
- * Text nodes with only whitespaces are removed except when found in "value" and "param" elements and whitespaces are normalized in "name", "type" and "combine" attributes and in "name" elements.
- * The characters which are not allowed in the "datatypeLibrary" attributes are escaped and these attributes are transfered through inheritance to each "data" and "value" pattern.
- * The "type" attributes of the "value" pattern are defaulted to the "token" datatype from the built in datatype library.
- After this sequence of steps, the structure of the schema is still unchanged, but all the cosmetic features which have no impact on the schema have been removed. For instance, the following schema snippet:
- <?xml version="1.0" encoding="utf-8"?>
- <grammar xmlns="http://relaxng.org/ns/structure/1.0"
- xmlns:hr="http://eric.van-der-vlist.com/ns/person"
- ns="http://eric.van-der-vlist.com/ns/library"
- xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
- xmlns:sn="http://www.snee.com/ns/stages"
- datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
- <a:documentation>Relax NG schema for our library</a:documentation>
- <sn:stages>
- <sn:stage name="library"/>
- <sn:stage name="book"/>
- <sn:stage name="author"/>
- <sn:stage name="character"/>
- <sn:stage name="author-or-book"/>
- </sn:stages>
- <start>
- <choice>
- <element name=" library " sn:stages="library">
- <oneOrMore>
- <ref name="book-element"/>
- </oneOrMore>
- </element>
- <ref name="book-element" sn:stages="book author-or-book"/>
- <ref name="author-element" sn:stages="author author-or-book"/>
- <ref name="character-element" sn:stages="character"/>
- </choice>
- </start>
- .../...
- <define name=" author-element ">
- <element name="hr:author" datatypeLibrary="">
- <attribute name="id" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
- <data type="NMTOKEN">
- <param name="maxLength"> 16 </param>
- </data>
- </attribute>
- <ref name=" name-element"/>
- <ref name="born-element"/>
- <optional>
- <ref name="died-element"/>
- </optional>
- </element>
- </define>
- .../...
- <define name="available-content">
- <choice>
- <value>true</value>
- <value type="token"> false </value>
- <value> </value>
- </choice>
- </define>
- </grammar>
- will be transformed during this sequence of steps into this (note that I am still showing whitespace for readability even though they should have been removed):
- <?xml version="1.0"?>
- <grammar xmlns="http://relaxng.org/ns/structure/1.0"
- xmlns:hr="http://eric.van-der-vlist.com/ns/person"
- xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
- xmlns:sn="http://www.snee.com/ns/stages"
- ns="http://eric.van-der-vlist.com/ns/library">
- <start>
- <choice>
- <element name="library">
- <oneOrMore>
- <ref name="book-element"/>
- </oneOrMore>
- </element>
- <ref name="book-element"/>
- <ref name="author-element"/>
- <ref name="character-element"/>
- </choice>
- </start>
- .../...
- <define name="author-element">
- <element name="hr:author">
- <attribute name="id">
- <data datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" type="NMTOKEN">
- <param name="maxLength"> 16 </param>
- </data>
- </attribute>
- <ref name="name-element"/>
- <ref name="born-element"/>
- <optional>
- <ref name="died-element"/>
- </optional>
- </element>
- </define>
- .../...
- <define name="available-content">
- <choice>
- <value type="token" datatypeLibrary="">true</value>
- <value datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" type="token"> false </value>
- <value type="token" datatypeLibrary=""> </value>
- </choice>
- </define>
- </grammar>
- After this sequence of steps, our schema is:
- namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
- namespace hr = "http://eric.van-der-vlist.com/ns/person"
- namespace local = ""
- default namespace ns1 = "http://eric.van-der-vlist.com/ns/library"
- namespace sn = "http://www.snee.com/ns/stages"
- start =
- element library { book-element+ }
- | book-element
- | author-element
- | character-element
- include "foreign.rnc" {
- foreign-elements = element * - (local:* | ns1:* | hr:*) { anything }*
- foreign-attributes = attribute * - (local:* | ns1:* | hr:*) { text }*
- }
- author-element =
- element hr:author {
- attribute id {
- xsd:NMTOKEN { maxLength = " 16 " }
- },
- name-element,
- born-element,
- died-element?
- }
- include "book-content.rnc"
- book-content &= foreign-nodes
- book-element = element book { book-content }
- born-element = element hr:born { xsd:date }
- character-element = external "character-element.rnc"
- died-element = element hr:died { xsd:date }
- isbn-element = element isbn { foreign-attributes, token }
- name-element = element hr:name { xsd:token }
- qualification-element = element qualification { text }
- title-element = element title { foreign-attributes, text }
- available-content = "true" | xsd:token " false " | " "
- !!Retrieval of external schemas
- This second sequence of steps reads and processes "externalRef" and "include" patterns:
- *"externalRef" patterns are replaced by the content of the resource referenced by their "href" attributes and all the simplification steps up to this one must be recursively applied during this replacement to make sure that we merge schemas at the same level of simplification.
- *The schemas referenced by "include" patterns are read and all the simplification steps up to this are recursively are applied on these schemas. Their definitions are overridden by those found in the "include" pattern itself when it is the case and the content of their grammar is added in a new "div" pattern to the current schema. This "div" pattern is needed to carry namespace information to the next sequence of steps and will be remove in the following one.
- After this sequence of steps, we obtain a standalone schema without any reference to external documents.
- The following snippet:
- <define name="character-element">
- <externalRef href="character-element.rng" ns="http://eric.van-der-vlist.com/ns/library"/>
- </define>
- will be transformed into:
- <define name="character-element">
- <grammar ns="http://eric.van-der-vlist.com/ns/library">
- <start>
- <element name="character">
- <attribute name="id"/>
- <parentRef name="name-element"/>
- <parentRef name="born-element"/>
- <parentRef name="qualification-element"/>
- </element>
- </start>
- </grammar>
- </define>
- And the snippet:
- <include href="foreign.rng">
- <define name="foreign-elements">
- <zeroOrMore>
- <element>
- <anyName>
- <except>
- <nsName ns=""/>
- <nsName ns="http://eric.van-der-vlist.com/ns/library"/>
- <nsName ns="http://eric.van-der-vlist.com/ns/person"/>
- </except>
- </anyName>
- <ref name="anything"/>
- </element>
- </zeroOrMore>
- </define>
- <define name="foreign-attributes">
- <zeroOrMore>
- <attribute>
- <anyName>
- <except>
- <nsName ns=""/>
- <nsName ns="http://eric.van-der-vlist.com/ns/library"/>
- <nsName ns="http://eric.van-der-vlist.com/ns/person"/>
- </except>
- </anyName>
- </attribute>
- </zeroOrMore>
- </define>
- </include>
- is replaced by:
- <div>
- <define name="foreign-elements">
- <zeroOrMore>
- <element>
- <anyName>
- <except>
- <nsName ns=""/>
- <nsName ns="http://eric.van-der-vlist.com/ns/library"/>
- <nsName ns="http://eric.van-der-vlist.com/ns/person"/>
- </except>
- </anyName>
- <ref name="anything"/>
- </element>
- </zeroOrMore>
- </define>
- <define name="foreign-attributes">
- <zeroOrMore>
- <attribute>
- <anyName>
- <except>
- <nsName ns=""/>
- <nsName ns="http://eric.van-der-vlist.com/ns/library"/>
- <nsName ns="http://eric.van-der-vlist.com/ns/person"/>
- </except>
- </anyName>
- </attribute>
- </zeroOrMore>
- </define>
- <define name="anything">
- <zeroOrMore>
- <choice>
- <element>
- <anyName/>
- <ref name="anything"/>
- </element>
- <attribute>
- <anyName/>
- </attribute>
- <text/>
- </choice>
- </zeroOrMore>
- </define>
- <define name="foreign-nodes">
- <zeroOrMore>
- <choice>
- <ref name="foreign-attributes"/>
- <ref name="foreign-elements"/>
- </choice>
- </zeroOrMore>
- </define>
- </div>
- The schema after this sequence of steps is:
- namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
- namespace hr = "http://eric.van-der-vlist.com/ns/person"
- namespace local = ""
- default namespace ns1 = "http://eric.van-der-vlist.com/ns/library"
- namespace sn = "http://www.snee.com/ns/stages"
- start =
- element library { book-element+ }
- | book-element
- | author-element
- | character-element
- div {
- foreign-elements = element * - (local:* | ns1:* | hr:*) { anything }*
- foreign-attributes = attribute * - (local:* | ns1:* | hr:*) { text }*
- anything =
- (element * { anything }
- | attribute * { text }
- | text)*
- foreign-nodes = (foreign-attributes | foreign-elements)*
- }
- author-element =
- element hr:author {
- attribute id {
- xsd:NMTOKEN { maxLength = " 16 " }
- },
- name-element,
- born-element,
- died-element?
- }
- div {
- book-content =
- attribute id { text },
- attribute available { available-content },
- isbn-element,
- title-element,
- author-element*,
- character-element*
- }
- book-content &= foreign-nodes
- book-element = element book { book-content }
- born-element = element hr:born { xsd:date }
- character-element =
- grammar {
- start =
- element character {
- attribute id { text },
- parent name-element,
- parent born-element,
- parent qualification-element
- }
- }
- died-element = element hr:died { xsd:date }
- isbn-element = element isbn { foreign-attributes, token }
- name-element = element hr:name { xsd:token }
- qualification-element = element qualification { text }
- title-element = element title { foreign-attributes, text }
- available-content = "true" | xsd:token " false " | " "
- !!Name classes normalization
- This third sequence of steps performs the normalization of name classes:
- *The "name" attribute of the "element" and "attribute" patterns are replaced by "name" element, i.e. a name class matching only this single name.
- *The "ns" attributes are transfered through inheritance to the elements which need them, i.e. "name", "nsName" and "value" ("value" patterns need this attribute to support QName datatypes). Note that the "ns" attribute behaves like the default namespace in XML and isn't passed to attributes which, by default, are considered as having no namespace URI.
- *The QName (qualified name) used in "name" elements are replaced by their local part and the "ns" attribute of these elements is replaced by the namespace URI defined for their prefix.
- After this sequence of steps, name classes are almost normalized (the except and choice name class will be normalized in the next sequence of steps).
- During this sequence of steps, the snippet:
- <element name="hr:author">
- <attribute name="id">
- <data datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" type="NMTOKEN">
- <param name="maxLength"> 16 </param>
- </data>
- </attribute>
- <ref name="name-element"/>
- <ref name="born-element"/>
- <optional>
- <ref name="died-element"/>
- </optional>
- </element>
- is transformed into:
- <element>
- <name ns="http://eric.van-der-vlist.com/ns/person">author</name>
- <attribute>
- <name ns="">id</name>
- <data datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" type="NMTOKEN">
- <param name="maxLength"> 16 </param>
- </data>
- </attribute>
- <ref name="name-element"/>
- <ref name="born-element"/>
- <optional>
- <ref name="died-element"/>
- </optional>
- </element>
- Note that none of these modifications are visible on the compact syntax which requires that all the namespace declarations are done in the declaration section of the schema and for which there is no difference between name elements and attributes.
- !!Pattern normalization
- In this fourth sequence of steps, pattern are normalized:
- *"div" elements are replaced by their children.
- *"define", "oneOrMore", "zeroOrMore", "optional", "list" and "mixed" patterns are transformed to have exactly one child pattern: if they had more than one pattern, these patterns are wrapped into a "group" pattern.
- *"element" patterns follow a similar rule and are transformed to have exactly a name class and a single child pattern.
- *"except" patterns and name classes are also transformed to have exactly one child pattern, but since they have a different semantic, their children elements are wrapped in a "choice" element.
- *If an "attribute" pattern has no child pattern, a "text" pattern is added.
- *The "group" and "interleave" patterns and the "choice" pattern and name class are recursively transformed to have exactly two sub elements: if it has only one child, it is replaced by this child and if it has more than two children, the first two child elements are combined into a new element until there are exactly two child elements.
- *"mixed" patterns are transformed into "interleave" patterns between their unique child pattern and a "text" pattern.
- *"optional" patterns are transformed into "choice" patterns between their unique child pattern and an "empty" pattern.
- *"zeroOrMore" patterns are transformed into "choice" patterns between a "oneOrMore" pattern including their unique child pattern and an "empty" pattern.
- After this sequence of steps, the number of different type of patterns has been reduced to a set of "primitive" patterns and all the patterns which are left have a fixed number of children elements.
- During this sequence of steps, the snippet:
- <define name="foreign-elements">
- <zeroOrMore>
- <element>
- <anyName>
- <except>
- <nsName ns=""/>
- <nsName ns="http://eric.van-der-vlist.com/ns/library"/>
- <nsName ns="http://eric.van-der-vlist.com/ns/person"/>
- </except>
- </anyName>
- <ref name="anything"/>
- </element>
- </zeroOrMore>
- </define>
- is transformed into:
- <define name="foreign-elements">
- <choice>
- <oneOrMore>
- <element>
- <anyName>
- <except>
- <choice>
- <choice>
- <nsName ns=""/>
- <nsName ns="http://eric.van-der-vlist.com/ns/library"/>
- </choice>
- <nsName ns="http://eric.van-der-vlist.com/ns/person"/>
- </choice>
- </except>
- </anyName>
- <ref name="anything"/>
- </element>
- </oneOrMore>
- <empty/>
- </choice>
- </define>
- After this sequence of steps, our schema is:
- namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
- namespace hr = "http://eric.van-der-vlist.com/ns/person"
- namespace local = ""
- default namespace ns1 = "http://eric.van-der-vlist.com/ns/library"
- namespace sn = "http://www.snee.com/ns/stages"
- start =
- ((element library { book-element+ }
- | book-element)
- | author-element)
- | character-element
- foreign-elements =
- element * - ((local:* | ns1:*) | hr:*) { anything }+
- | empty
- foreign-attributes =
- attribute * - ((local:* | ns1:*) | hr:*) { text }+
- | empty
- anything =
- ((element * { anything }
- | attribute * { text })
- | text)+
- | empty
- foreign-nodes = (foreign-attributes | foreign-elements)+ | empty
- author-element =
- element hr:author {
- ((attribute id {
- xsd:NMTOKEN { maxLength = " 16 " }
- },
- name-element),
- born-element),
- (died-element | empty)
- }
- book-content =
- ((((attribute id { text },
- attribute available { available-content }),
- isbn-element),
- title-element),
- (author-element+ | empty)),
- (character-element+ | empty)
- book-content &= foreign-nodes
- book-element = element book { book-content }
- born-element = element hr:born { xsd:date }
- character-element =
- grammar {
- start =
- element character {
- ((attribute id { text },
- parent name-element),
- parent born-element),
- parent qualification-element
- }
- }
- died-element = element hr:died { xsd:date }
- isbn-element = element isbn { foreign-attributes, token }
- name-element = element hr:name { xsd:token }
- qualification-element = element qualification { text }
- title-element = element title { foreign-attributes, text }
- available-content = ("true" | xsd:token " false ") | " "
- !!First set of constraints
- A first set of constraints is defined at this point. They are mainly sanity checks conform to the common XML sense but are easier and safer to check at that stage than on the complete schema:
- *It's not possible to define name classes -or "except"- which would contain no name at all by including "anyName" in an "except" name class or "nsName" in an "except" name class included in another "nsName".
- *It's not possible to define attributes having the name "xmlns" or a namespace URI equal to the namespace URI "http://www.w3.org/2000/xmlns" (corresponding to the "xmlns" prefix).
- *Datatype libraries are used correctly: each type exists in their datatype library and its "param" elements are valid for it).
- !!Grammar merge
- During this sequence of steps, "define" and "start" elements are combined when needed in each grammar and the grammar are merged into a single top level grammar:
- *In each grammar, multiple "start" elements and multiple "define" element with the same name are combined as defined in their "combine" attribute.
- *The names of the named patterns are then changed to be unique to the whole schema and the references to these named patterns changed accordingly.
- *A top level "grammar" and its "start" element are created if not already present, all the named patterns are moved to be children if this top level grammar, "parentRef" elements are replaced by "ref" elements and and all the other "grammar", "start" elements are replaced by their children elements.
- During this sequence of steps,
- <define name="born-element">
- <element>
- <name ns="http://eric.van-der-vlist.com/ns/person">born</name>
- <data datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" type="date"/>
- </element>
- </define>
- <define name="character-element">
- <grammar>
- <start>
- <element>
- <name ns="http://eric.van-der-vlist.com/ns/library">character</name>
- <group>
- <group>
- <group>
- <attribute>
- <name ns="">id</name>
- <text/>
- </attribute>
- <parentRef name="name-element"/>
- </group>
- <parentRef name="born-element"/>
- </group>
- <parentRef name="qualification-element"/>
- </group>
- </element>
- </start>
- </grammar>
- </define>
- is replaced by:
- <define name="born-element-id2613943">
- <element>
- <name ns="http://eric.van-der-vlist.com/ns/person">born</name>
- <data datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" type="date"/>
- </element>
- </define>
- <define name="character-element-id2613924">
- <element>
- <name ns="http://eric.van-der-vlist.com/ns/library">character</name>
- <group>
- <group>
- <group>
- <attribute>
- <name ns="">id</name>
- <text/>
- </attribute>
- <ref name="name-element-id2613832"/>
- </group>
- <ref name="born-element-id2613943"/>
- </group>
- <ref name="qualification-element-id2613840"/>
- </group>
- </element>
- </define>
- No specific algorithm to create unique names for named pattern is described in the specification and these names will vary between implementations.
- The complete schema after this sequence of steps is:
- namespace local = ""
- namespace ns1 = "http://eric.van-der-vlist.com/ns/person"
- default namespace ns2 = "http://eric.van-der-vlist.com/ns/library"
- start =
- ((element library { book-element-id2613963+ }
- | book-element-id2613963)
- | author-element-id2614058)
- | character-element-id2613924
- foreign-elements-id2614183 =
- element * - ((local:* | ns2:*) | ns1:*) { anything-id2614112 }+
- | empty
- foreign-attributes-id2614152 =
- attribute * - ((local:* | ns2:*) | ns1:*) { text }+
- | empty
- anything-id2614112 =
- ((element * { anything-id2614112 }
- | attribute * { text })
- | text)+
- | empty
- foreign-nodes-id2614043 =
- (foreign-attributes-id2614152 | foreign-elements-id2614183)+ | empty
- author-element-id2614058 =
- element ns1:author {
- ((attribute id {
- xsd:NMTOKEN { maxLength = " 16 " }
- },
- name-element-id2613832),
- born-element-id2613943),
- (died-element-id2613856 | empty)
- }
- book-content-id2614016 =
- (((((attribute id { text },
- attribute available { available-content-id2613805 }),
- isbn-element-id2613872),
- title-element-id2613819),
- (author-element-id2614058+ | empty)),
- (character-element-id2613924+ | empty))
- & foreign-nodes-id2614043
- book-element-id2613963 = element book { book-content-id2614016 }
- born-element-id2613943 = element ns1:born { xsd:date }
- character-element-id2613924 =
- element character {
- ((attribute id { text },
- name-element-id2613832),
- born-element-id2613943),
- qualification-element-id2613840
- }
- died-element-id2613856 = element ns1:died { xsd:date }
- isbn-element-id2613872 =
- element isbn { foreign-attributes-id2614152, token }
- name-element-id2613832 = element ns1:name { xsd:token }
- qualification-element-id2613840 = element qualification { text }
- title-element-id2613819 =
- element title { foreign-attributes-id2614152, text }
- available-content-id2613805 = ("true" | xsd:token " false ") | " "
- !!Schema flattening
- The basic style of the schema (Russian doll or named templates) has been preserved by the previous steps. The goal of this new step is on the contrary to normalize the use of named templates. The target is to make the schema similar in structure to a DTD with each element cleanly embedded in its own named pattern and no other use of named pattern than to embed a single element:
- * For each element which is not the unique child of a "define" element, a named pattern is created to embed its definition.
- * Each named pattern which does not embed a single element pattern is suppressed and the references to this named pattern replaced by its definition.
- During this step,
- <start>
- <choice>
- <choice>
- <choice>
- <element>
- <name ns="http://eric.van-der-vlist.com/ns/library">library</name>
- <oneOrMore>
- <ref name="book-element-id2613963"/>
- </oneOrMore>
- </element>
- <ref name="book-element-id2613963"/>
- </choice>
- <ref name="author-element-id2614058"/>
- </choice>
- <ref name="character-element-id2613924"/>
- </choice>
- </start>
- is replaced by:
- <start>
- <choice>
- <choice>
- <choice>
- <ref name="__library-elt-id2615152"/>
- <ref name="book-element-id2613963"/>
- </choice>
- <ref name="author-element-id2614058"/>
- </choice>
- <ref name="character-element-id2613924"/>
- </choice>
- </start>
- .../...
- <define name="__library-elt-id2615152">
- <element>
- <name ns="http://eric.van-der-vlist.com/ns/library">library</name>
- <oneOrMore>
- <ref name="book-element-id2613963"/>
- </oneOrMore>
- </element>
- </define>
- The full schema after this step is:
- namespace local = ""
- namespace ns1 = "http://eric.van-der-vlist.com/ns/person"
- default namespace ns2 = "http://eric.van-der-vlist.com/ns/library"
- start =
- ((__library-elt-id2615152 | book-element-id2613963)
- | author-element-id2614058)
- | character-element-id2613924
- author-element-id2614058 =
- element ns1:author {
- ((attribute id {
- xsd:NMTOKEN { maxLength = " 16 " }
- },
- name-element-id2613832),
- born-element-id2613943),
- (died-element-id2613856 | empty)
- }
- book-element-id2613963 =
- element book {
- (((((attribute id { text },
- attribute available { ("true" | xsd:token " false ") | " " }),
- isbn-element-id2613872),
- title-element-id2613819),
- (author-element-id2614058+ | empty)),
- (character-element-id2613924+ | empty))
- & (((attribute * - ((local:* | ns2:*) | ns1:*) { text }+
- | empty)
- | (__-elt-id2615098+ | empty))+
- | empty)
- }
- born-element-id2613943 = element ns1:born { xsd:date }
- character-element-id2613924 =
- element character {
- ((attribute id { text },
- name-element-id2613832),
- born-element-id2613943),
- qualification-element-id2613840
- }
- died-element-id2613856 = element ns1:died { xsd:date }
- isbn-element-id2613872 =
- element isbn {
- (attribute * - ((local:* | ns2:*) | ns1:*) { text }+
- | empty),
- token
- }
- name-element-id2613832 = element ns1:name { xsd:token }
- qualification-element-id2613840 = element qualification { text }
- title-element-id2613819 =
- element title {
- (attribute * - ((local:* | ns2:*) | ns1:*) { text }+
- | empty),
- text
- }
- __-elt-id2615020 =
- element * {
- ((__-elt-id2615020
- | attribute * { text })
- | text)+
- | empty
- }
- __library-elt-id2615152 = element library { book-element-id2613963+ }
- __-elt-id2615098 =
- element * - ((local:* | ns2:*) | ns1:*) {
- ((__-elt-id2615020
- | attribute * { text })
- | text)+
- | empty
- }
- !!Final cleanup
- Now, we're almost done and just need to do a bit of final cleanup:
- *Recursively escalate "notAllowed" patterns when they are at a location where their effect is that their parent pattern itself is "notAllowed" and remove choices which are "notAllowed". Note that this simplification doesn't pass through elements boundaries and that "element foo { notAllowed }" is not transformed into "notAllowed".
- *Remove "empty" elements which have no effect.
- *Move "empty" elements to be the first child in "choice" elements.
- After this cleanup, our schema is:
- namespace local = ""
- namespace ns1 = "http://eric.van-der-vlist.com/ns/person"
- default namespace ns2 = "http://eric.van-der-vlist.com/ns/library"
- start =
- ((__library-elt-id2615152 | book-element-id2613963)
- | author-element-id2614058)
- | character-element-id2613924
- author-element-id2614058 =
- element ns1:author {
- ((attribute id {
- xsd:NMTOKEN { maxLength = " 16 " }
- },
- name-element-id2613832),
- born-element-id2613943),
- (empty | died-element-id2613856)
- }
- book-element-id2613963 =
- element book {
- (((((attribute id { text },
- attribute available { ("true" | xsd:token " false ") | " " }),
- isbn-element-id2613872),
- title-element-id2613819),
- (empty | author-element-id2614058+)),
- (empty | character-element-id2613924+))
- & (empty
- | ((empty
- | attribute * - ((local:* | ns2:*) | ns1:*) { text }+)
- | (empty | __-elt-id2615098+))+)
- }
- born-element-id2613943 = element ns1:born { xsd:date }
- character-element-id2613924 =
- element character {
- ((attribute id { text },
- name-element-id2613832),
- born-element-id2613943),
- qualification-element-id2613840
- }
- died-element-id2613856 = element ns1:died { xsd:date }
- isbn-element-id2613872 =
- element isbn {
- (empty
- | attribute * - ((local:* | ns2:*) | ns1:*) { text }+),
- token
- }
- name-element-id2613832 = element ns1:name { xsd:token }
- qualification-element-id2613840 = element qualification { text }
- title-element-id2613819 =
- element title {
- (empty
- | attribute * - ((local:* | ns2:*) | ns1:*) { text }+),
- text
- }
- __-elt-id2615020 =
- element * {
- empty
- | ((__-elt-id2615020
- | attribute * { text })
- | text)+
- }
- __library-elt-id2615152 = element library { book-element-id2613963+ }
- __-elt-id2615098 =
- element * - ((local:* | ns2:*) | ns1:*) {
- empty
- | ((__-elt-id2615020
- | attribute * { text })
- | text)+
- }
- !!!Restrictions
- With the exception of the constraints expressed by the Relax NG schema for Relax NG and those which are part of the simplification itself (see above), all the restrictions of Relax NG are expressed on the simplified schema. Most of them are obvious and easy to understand.
- !!Constraints on the attributes
- These constraints match the definition of the attributes given by the XML 1.0 recommendation, namely:
- *Attributes can't contain other attributes: "attribute" patterns cannot have another "attribute" pattern in their descendants.
- *Attributes can't contain elements: "attribute" patterns cannot have a "ref" pattern in their descendants.
- *Attributes can't be duplicated: an attribute may not be found in a "oneOrMore" pattern with a combination by "group" or "interleave". Furthermore, if "attributes" patterns are combined in a group or interleave pattern, their name classes must not overlap, i.e. they cannot be any name which belongs to both name classes.
- *Attributes which have an infinite name class ("anyName" or "nsName") must be enclosed in a "oneOrMore" pattern. In other words, we can't specify that we want to allow only one or a certain number of occurrences of these attributes. Furthermore, they can only have "text" as their model (in other words, "data" patterns are forbidden here).
- Let's give some examples of schemas which may look valid through a quick glance but are hit by these restrictions (please note that all these schemas are invalid).
- !Example: content model of attributes
- This schema is defining that any content model can be accepted in the "bar" attribute:
- anything =
- (element * { anything }
- | attribute * { text }
- | text)*
- start =
- element foo {
- attribute bar { anything },
- text
- }
- Unfortunately, it's translated into:
- start = __foo-elt-id2602800
- __-elt-id2602788 =
- element * {
- empty
- | ((__-elt-id2602788
- | attribute * { text })
- | text)+
- }
- __foo-elt-id2602800 =
- element foo {
- attribute bar {
- empty
- | ((__-elt-id2602788
- | attribute * { text })
- | text)+
- },
- text
- }
- And this is allowing a reference to a named pattern (which means an element in the simplified syntax) and an attribute. Both are forbidden.
- To fix this, we must make sure that the "anything" defined for the content of the attribute is compatible with the content of attributes as defined by the XML specification, for instance:
- anything =
- (text)
- start =
- element foo {
- attribute bar { anything },
- text
- }
- which will be simplified into:
- start = __foo-elt-id2602296
- __foo-elt-id2602296 =
- element foo {
- attribute bar { text },
- text
- }
- This schema is expressing what we wanted to express and it is valid.
- !Example: duplication of attributes
- Let's say we want to extend the definition of our "title" element to have the same attributes and content model than the XHTML 2.0 "span" element. If we look into the Relax NG module implementing the "span" element, we can see that its definition is:
- span = element span { span.attlist, Inline.model }
- and ma want to just include it in the definition of the "title" element which already includes an xml:lang" attribute:
- namespace x = "http://www.w3.org/2002/06/xhtml2"
-
- start = book
- include "xhtml-attribs-2.rnc" inherit = x
- include "xhtml-inltext-2.rnc" inherit = x
- include "xhtml-datatypes-2.rnc" inherit = x
- book =
- element book {
- attribute id { text },
- attribute available { text },
- element isbn { text },
- element title {
- attribute xml:lang { xsd:language },
- span.attlist,
- Inline.model
- }
- }
- Unfortunately, this is invalid because the "xml:lang" attribute is already included somewhere into the "
- span.attlist" pattern and gets pulled during the simplification which defines the "title" element as:
- __title-elt-id2641936 =
- element title {
- (attribute xml:lang { xsd:language },
- (((((((((empty
- | attribute id { xsd:ID }),
- (empty
- | attribute class { xsd:NMTOKENS })),
- (empty
- | attribute title { text })),
- (empty
- | attribute xml:lang { xsd:language })),
- (empty
- | attribute dir {
- (("ltr" | "rtl") | "lro")
- | "rlo"
- })),
- ((empty
- | attribute edit {
- (("inserted" | "deleted") | "changed")
- | "moved"
- }),
- (empty default namespace lib = "http://eric.van-der-vlist.com/ns/library"
- namespace local = ""
-
- start = book
- book =
- element book {
- attribute id { text },
- attribute available { text },
- foreign-attributes,
- element isbn { text },
- element title {
- attribute xml:lang { xsd:language },
- text
- }
- }
- foreign-attributes = attribute * - (local:* | lib:* ) { text }*
- | attribute datetime { xsd:dateTime }))),
- ((((((((empty
- | attribute href { xsd:anyURI }),
- (empty
- | attribute cite { xsd:anyURI })),
- (empty
- | attribute target { xsd:NMTOKEN })),
- (empty
- | attribute rel { xsd:NMTOKENS })),
- (empty
- | attribute rev { xsd:NMTOKENS })),
- (empty
- | attribute accesskey {
- xsd:string { length = "1" }
- })),
- (empty
- | attribute navindex {
- xsd:nonNegativeInteger {
- pattern = "[0-9]+"
- minInclusive = "0"
- maxInclusive = "32767"
- }
- })),
- (empty
- | attribute base { xsd:anyURI }))),
- ((empty
- | attribute src { xsd:anyURI }),
- (empty
- | attribute type { text }))),
- ((((empty
- | attribute usemap { xsd:anyURI }),
- (empty
- | attribute ismap { "ismap" })),
- (empty
- | attribute shape {
- (("rect" | "circle") | "poly")
- | "default"
- })),
- (empty
- | attribute coords { text })))),
- (empty
- | (empty
- | (text
- | (((((((((((((abbr-id2635861 | cite-id2635889)
- | code-id2635918)
- | dfn-id2635947)
- | em-id2635975)
- | kbd-id2636004)
- | l-id2636032)
- | quote-id2636061)
- | samp-id2636090)
- | span-id2636118)
- | strong-id2636147)
- | sub-id2636176)
- | sup-id2636204)
- | var-id2636233)))+)
- }
- To fix this, we just need to remove the "xml:lang" from our definition:
- namespace x = "http://www.w3.org/2002/06/xhtml2"
-
- start = book
- include "xhtml-attribs-2.rnc" inherit = x
- include "xhtml-inltext-2.rnc" inherit = x
- include "xhtml-datatypes-2.rnc" inherit = x
- book =
- element book {
- attribute id { text },
- attribute available { text },
- element isbn { text },
- element title {
- span.attlist,
- Inline.model
- }
- }
- !Example: name class overlap
- Let's say we have the following schema:
- default namespace lib = "http://eric.van-der-vlist.com/ns/library"
- namespace local = ""
-
- start = book
- book =
- element book {
- attribute id { text },
- attribute available { text },
- foreign-attributes,
- element isbn { text },
- element title {
- attribute xml:lang { xsd:language },
- text
- }
- }
- foreign-attributes = attribute * - (local:* | lib:* ) { text }*
- (book.rnc)
- Although we have accepted foreign attributes, we may want to be more precise on the definition of some Dublin Core elements and extend our schema as:
- namespace dc="http://purl.org/dc/elements/1.1/"
- include "book.rnc"
-
- book.content &= attribute dc:rights { text } ?
- Unfortunately, this is invalid because it gets simplified as:
- book-id2604347 =
- element book {
- ((((attribute id { text },
- attribute available { text }),
- (empty
- | attribute * - (lib:* | local:*) { text }+)),
- __isbn-elt-id2604556),
- __title-elt-id2604551)
- & attribute ns1:rights { text }
- }
-
- Where the attribute "dc:Rights" is included in the name class "* - (lib:* | local:*)". To fix this, we need to redefine the named pattern "foreign-attributes" to remove the name "dc:Riggs" or even all the namespace for Dublin Core elements:
- default namespace lib = "http://eric.van-der-vlist.com/ns/library"
- namespace dc="http://purl.org/dc/elements/1.1/"
- namespace local = ""
-
- include "book.rnc" {
- foreign-attributes = attribute * - (local:* | lib:* | dc:* ) { text }*
- }
-
- book.content &= attribute dc:rights { text } ?
- !!Constraints on lists
- Lists work on text nodes by splitting them into tokens which are handled as text nodes. It's therefore not possible to find elements, attributes in a list. Texts nodes and embedded lists would be confusing and are forbidden too:
- *List patterns cannot have as their descendants any "list", "ref" (remember that after simplification, the access to elements is done as references to named patterns), "attribute", "text". The "interleave" pattern is also forbidden as descendant of "list" patterns because it would complicate the implementations and has been considered not worth of it.
- !Example: list and interleave
- Let's say we'd like to define a price element as allowing either a numeric followed by a token, such as:
- <price>1 Euro</price>
- or a token followed by a numeric:
- <price>1 Euro</price>
- We might be tempted to write:
- element price {
- list { xsd:decimal & xsd:token }
- }
- But this would be invalid because "interleave" is forbidden in a "list". To workaround this limitation, we need to give all the possible combinations, which is easy on this example but can rapidly grow out of control:
- element price {
- list { (xsd:decimal, xsd:token) | (xsd:token, xsd:decimal) }
- }
- !!Constraints on "except" patterns
- Except patterns (i.e. "except" elements used in a "data" pattern) are about single data.
- *An "except" element with a "data" parent can only contain "data", "value" and "choice" elements.
- !!Constraints on "start" patterns.
- After simplification, a start pattern describes the list of possible root elements. You can thus find only combinations of choices between "ref" elements.
- !!Constraints on content models
- Relax NG defines three different content models for an element:
- *Empty when the element has only attributes.
- *Simple when the element has only attributes and has been described using "data", "value" or "list" patterns.
- *Complex in the other cases.
- Note that this is identical to the definition given by W3C XML Schema and similar but somewhat different from the definition of these terms in "plain" XML: an element expressed as "<foo>bar</foo>" is considered by Relax NG as complex content if its content has been described using a "text" pattern and as a simple content if its content has been described using other patterns. This means that it's not enough for an element that it contains only a text node to have a simple content but that it is also necessary that this element has been described with a "data" orientation. When it's not the case i.e. when the "text" pattern has been used, the element is considered as "document" oriented and a special case of mixed content where no elements are included.
- The restriction on the content model is expressed saying that empty content can be grouped with any other content models but that simple and complex content models can't be grouped together (through "group" or "interleave" patterns): they can only appear under the definition of the same element as alternatives. In other words, for each alternative, you need to choose if you are "data" or "text" oriented but can't mix both mindsets.
- We have already mentioned the practical consequence of this restriction on mixed content model in "Chapter 7: Constraining Text Values": it is not possible to use "data" patterns to specify constraints on the text nodes occurring in mixed elements.
- !!Limitations on "interleave".
- The last two limitations are on "interleave" and their goal is to facilitate the implementation of this feature which is lacking so much in other schema languages... These two limitations are defined to reduce the number of combinations that Relax NG processors need to explore to support "interleave":
- *Elements combined through interleave must have name classes without overlap: we have already seen a similar restriction with attributes which are always combined through interleave.
- *There must be at most one "text" pattern in each set of patterns combined by "interleave".
- These limitations don't impact the expressive power of Relax NG (i.e. the set of content models which can be written with Relax NG): even if we may hit them from time to time, schemas can always be rewritten to work around them; but they are a nuisance when combining existing patterns with mixed content models.
- They are needed for the different algorithms currently used to implement Relax NG and James Clark thinks that they could be removed in future versions of Relax NG: "Hopefully better algorithms will be developed that will allow this restriction to be removed in future versions."
- !Example: at most one text pattern in interleave
- We may have the following schema to describe our books:
- start = book
- book = element book { book.content }
- book.content =
- attribute id { text },
- attribute available { text },
- element isbn { text },
- title
- title = element title { title.attributes, title.content }
- title.attributes = attribute xml:lang { xsd:language }
- title.content = text
- (book.rnc)
- To add the XHTML "Inline.model" to "title.content" we could be tempted to write:
-
- include "book.rnc"
- include "xhtml-attribs-2.rnc"
- include "xhtml-inltext-2.rnc"
- include "xhtml-datatypes-2.rnc"
-
- title.content &= Inline.model
- Unfortunately, "Inline.model" already contains a "text" pattern and this gets simplified as:
- title-id2635741 =
- element title {
- attribute lang { xsd:language },
- (text
- & (empty
- | (empty
- | (text
- | (((((((((((((abbr-id2636549 | cite-id2636578)
- | code-id2636607)
- | dfn-id2636636)
- | em-id2636664)
- | kbd-id2636693)
- | l-id2636721)
- | quote-id2636750)
- | samp-id2636778)
- | span-id2636807)
- | strong-id2636836)
- | sub-id2636865)
- | sup-id2636893)
- | var-id2636922)))+))
- }
- Where we find "text" patterns within "interleave".
- To fix this, be need to replace our combination by a redefinition of "title.content":
- include "book.rnc" {
- title.content = Inline.model
- }
- include "xhtml-attribs-2.rnc"
- include "xhtml-inltext-2.rnc"
- include "xhtml-datatypes-2.rnc"
- include "book.rnc" {
- title.content = Inline.model
- }
- include "xhtml-attribs-2.rnc"
- include "xhtml-inltext-2.rnc"
- include "xhtml-datatypes-2.rnc"
- We see that we have not lost in expressive power (we are able to describe what we wanted to describe) but in modularity: changes done to "title.content" in "book.rnc" would now have to be manually added to our derived schema.
|