ch01.xml 12 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
  1. <?xml version="1.0" encoding="iso-8859-1"?>
  2. <?xml-stylesheet type="text/css" href="chapter.css"?><chapter>
  3. <title>Chapter 1: Relax NG In Perspective</title>
  4. <simplesect/>
  5. <sect1>
  6. <title> XML is about diversity</title>
  7. <para>I have heard people jest that XML was standing for &quot;eXcellent Marketing Language&quot; and I often felt that, unfortunately, this had become a very accurate definition. Nevertheless, the official meaning of the XML is &quot;eXtensible Markup Language&quot; and this one is still more accurate!</para>
  8. <para>XML is not extensible in the sense that the XML specifications themselves would be extensible and many experts think that both the XML recommendation itself and the pile of XML related specifications have already become legacy and are very hard if not impossible to update and extend.</para>
  9. <para>XML is extensible in the sense that it lets you define your own sets of elements and attributes which can express virtually any hierarchical structure. And it's not only accurate but real: the extensibility of XML is been used, some would even say overused, and we've lost the count of the different sets of XML elements and attributes (let's call them XML vocabularies) used by different people for different applications. And, of course, all of these vocabularies need to be validated which means that there is a need for validation tools easy to adapt to each of these vocabularies.</para>
  10. </sect1>
  11. <sect1>
  12. <title> XML is about the independence of documents over applications</title>
  13. <para>I have also heard many people elaborate on the tight relationship or parallel between XML and object orientation; saying that XML is the same paradigm for data than object orientation for programs and that XML is a perfect serialization format for object systems. That's not untrue, but we can also see XML as anti-object oriented, or maybe post object oriented, because it's reintroducing a clean separation between data and program which is the complete opposite of the basic object oriented principle which says that objects encapsulate both data and treatments.</para>
  14. <para>In the XML world, XML documents live their own lives independently of programs: they can be edited, read, displayed and transformed using generic tools independent of any application and it's vitally important that they can also be validated independently of any application.</para>
  15. <para>It's not only important that XML documents can be validated independently of any application because XML documents themselves have become independent of applications but it's also difficult because of the extensibility of XML. The diversity of the XML vocabulary is virtually infinite. It's one of the main assets of XML and that's something we certainly do not want to limit with the tools used to validate XML documents. It's also difficult because of the diversity of what we can call &quot;validation&quot;.</para>
  16. </sect1>
  17. <sect1>
  18. <title>There is more than one aspect in validation</title>
  19. <para>Validation can be about checking the structure of XML documents, it can be about checking the content of each text node and attribute independently of each other (that's often called datatyping), it can be about checking constraints between nodes, it can be about checking constraints between nodes and external information such as lookup tables or links, it can be about checking &quot;business rules&quot; and it can be anything else such as spell checking.</para>
  20. <para>Validation is key to improve the level of quality of our XML based information systems and that's something which appears to be most needed. I have recently followed two presentations about two independent projects in very different domains and both came out with an alarming ratio of one out of ten real world XML document containing errors. With such a high proportion, validation is not only useful but indispensable! Can you imagine a fund transfer system where 10% of the transactions would contain errors?</para>
  21. </sect1>
  22. <sect1>
  23. <title>Relax NG is the best tool to validate the structure of XML documents</title>
  24. <para>Relax NG won't solve this issue all by itself: it's not designed to solve the whole issue. Relax NG is designed to be the best tool available to solve two pieces of the problem and validate the structure of XML document by itself and provide a plug to datatype libraries which validate the content of text nodes and attributes. It's also designed to be used as a part of the ISO DSDL framework which has the vocation to deal with the issue of validation at large (see Appendix A: DSDL for more information about DSDL).</para>
  25. <para>This strong focus is what makes Relax NG so different from its main rival, W3C XML Schema. One of the reasons of the complexity of W3C XML Schema is that it includes many features which have been kept out of the perimeter of Relax NG. W3C XML Schema doesn't only care about validating the structure of XML documents, but also to validate the content of text nodes and attributes and the integrity between keys and references. Furthermore, W3C XML Schema doesn't only care about validation but also attempts to be a modeling language for XML document which can classify the elements and attributes of XML documents, identify their semantic and may be used to perform automatic binding between XML documents and objects. All these goals may be fair, but there is always a risk to loose focus when trying to solve too many different problems with a single technology.</para>
  26. <para>On the contrary, focus has always been kept on XML structure validation and no compromise has been made to any other feature during the development of the Relax NG specification and the result is that Relax NG can pretend to be the logical successor of XML DTDs and the best tool available to validate the structure of XML documents. Relax NG is powerful and its expressive power is such that virtually any XML vocabulary may be described with Relax NG which isn't the case of W3C XML Schema not even of DTDs. Maybe still more important for people having to write schemas, Relax NG is also very simple: because it doesn't try to model XML documents, validate too many things and brew coffee, the syntax is intuitive, it has been kept simple and isn't cluttered by limitations complex to learn and difficult to remember.</para>
  27. </sect1>
  28. <sect1>
  29. <title>Unexpected uses of Relax NG</title>
  30. <para>This focus doesn't mean that Relax NG is a niche language meant to stay limited to its original goal. Relax NG may well follow the path of XSLT (also developed by James Clark): XSLT which development has been focused on document transformation has become the Swiss army knife of XML developers and its unpredictable success shows that it's being used for much more than what had been originally forcasted.</para>
  31. <para>The same will likely happen with Relax NG and I can give a couple of examples.</para>
  32. <para>The other day, I had to write a converter for a flat non XML format into XML. The structure of the resulting document was described by a non Relax NG schema and after various attempts to find hacks to map the 400 different information items of this flat structure into as many elements and attributes, I have found that the easiest way was through a Relax NG Schema. I have transformed the schema of the XML vocabulary into a Relax NG schema as simple as possible. A trivial program (written in Python in that case but any other language could have been used) can just walk through this structure while parsing the flat document and dispatch the information items where they belong. This was made easy by the uncluttered simplicity of the syntax of Relax NG and it would have taken me much more time with any other schema language.</para>
  33. <para>The second example is taken from Relax NG itself. As we will discover in &quot;Chapter 4: Non XML Syntax&quot;, a non XML compact syntax is available for Relax NG and in its specification this syntax is described by an EBNF grammar. Knowing James Clark, I was sure he hadn't written it by hand but had generated it from XML and when I have written the reference guide for this syntax (&quot;Chapter 18: Non XML Syntax Reference&quot;), I have asked him to send me the source of this grammar as XML. I was expecting a format such as the DocBook EBNF module and guess what I received? A Relax NG schema of course! The syntax of Relax NG is flexible enough to describe the productions of an EBNF grammar and Chapter 18: Non XML Syntax Reference is generated from this schema. It's only a summary and the semantics and restrictions of Relax NG are not fully respected, but Relax NG is still a nice way to describe this EBNF.</para>
  34. </sect1>
  35. <sect1>
  36. <title>Relax NG as a pivot format</title>
  37. <para>These two examples are a little bit extreme, and more to the point Relax NG appears to be the perfect pivot format for XML schema related task. The first time I started to think of Relax NG as a pivot format was attending the presentation about the Sun multi-schema validator (MSV) by Kohsuke Kawaguchi at XML 2001. During his talk, Kohsuke Kawaguchi explained that the grammar based different schema languages supported by MSV (DTDs, Relax NG, Relax and W3C XML Schema) were translated into a common data model by the validator and that the validation algorithm relied on this data model. After his talk, I asked him what was this common data model and he answered that it was Relax NG. This is the proof that the expressive power of Relax NG is such that 99% of the constraints which can be described with other schema languages can be described with Relax NG.</para>
  38. <para>This advantage could be a major drawback: if the expressive power of Relax NG is so much important than the expressive power of other languages, that could mean that a schema written with Relax NG would be impossible to translate into other languages. This issue happens to be more theoretical than practical and even if when you know both Relax NG and W3C XML Schema you can imagine Relax NG that cannot be translated into W3C XML Schema, this doesn't happen often in real life schemas and when this happen, you can always ponder the need to express such a schema against your need to be able to publish a W3C XML Schema schema. And the reason why I can be fairly confident when I say that most Relax NG schemas can be translated into other schema languages is that I have seen it! James Clark has developed Trang, a magic tool that takes a Relax NG schema and converts it into W3C XML Schema or a DTD.</para>
  39. <para>Simpler to write by hand, Relax NG is also simpler to generate and that's something important too since a growing number of applications, especially but not only those having huge schemas with hundreds of elements and attributes tend to generate their schemas from logical models with a higher level of abstraction rather than create them from scratch by hand. Whether you are using UML as your design tool, a simple spreadsheet like the OASIS UBL project or sample documents like my examplotron, it is easier to derive a Relax NG schema from this model than any other schema language.</para>
  40. </sect1>
  41. <sect1>
  42. <title>Why should anyone use any other schema language?</title>
  43. <para>Having converters to and from other schema language, easier to write, easier to generate, easier to use by applications, why would anyone even consider using any other schema language as its main pivot schema language? As far as validation only is concerned, I can really not see any good reason.</para>
  44. <para>The only area where Relax NG is still a little bit behind is for data type assignment and data binding. Datatype assignment appears to be getting increasingly important for a whole set of applications; for instance, many new features of the XPath 2.0, XSLT 2.0 and XQuery 1.0 family of future W3C recommendations. Because data type assignment was out of the scope of Relax NG itself, Relax NG is very permissive about &quot;non deterministic&quot; schemas which could lead to non deterministic type assignment and this is something to keep in mind when writing Relax NG schemas which will be transformed into W3C XML Schema schemas. I will present the latest updates on this subject in &quot;Chapter 16: Determinism and Datatype Assignment&quot;.</para>
  45. </sect1>
  46. </chapter>