123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274 |
- <html>
- <body>
- <p>
- This is a Free Software DOM Level 3 implementation, supporting these features:
- <ul>
- <li>"XML"</li>
- <li>"Events"</li>
- <li>"MutationEvents"</li>
- <li>"HTMLEvents" (won't generate them though)</li>
- <li>"UIEvents" (also won't generate them)</li>
- <li>"USER-Events" (a conformant extension)</li>
- <li>"Traversal" (optional)</li>
- <li>"XPath"</li>
- <li>"LS" and "LS-Async"</li>
- </ul>
- It is intended to be a reasonable base both for
- experimentation and supporting additional DOM modules as clean layers.
- </p>
- <p>
- Note that while DOM does not specify its behavior in the
- face of concurrent access, this implementation does.
- Specifically:
- <ul>
- <li>If only one thread at a time accesses a Document,
- of if several threads cooperate for read-only access,
- then no concurrency conflicts will occur.</li>
- <li>If several threads mutate a given document
- (or send events using it) at the same time,
- there is currently no guarantee that
- they won't interfere with each other.</li>
- </ul>
- </p>
- <h3>Design Goals</h3>
- <p>
- A number of DOM implementations are available in Java, including
- commercial ones from Sun, IBM, Oracle, and DataChannel as well as
- noncommercial ones from Docuverse, OpenXML, and Silfide. Why have
- another? Some of the goals of this version:
- </p>
- <ul>
- <li>Advanced DOM support. This was the first generally available
- implementation of DOM Level 2 in Java, and one of the first Level 3
- and XPath implementations.</li>
- <li> Free Software. This one is distributed under the GPL (with
- "library exception") so it can be used with a different class of
- application.</li>
- <li>Second implementation syndrome. I can do it simpler this time
- around ... and heck, writing it only takes a bit over a day once you
- know your way around.</li>
- <li>Sanity check the then-current Last Call DOM draft. Best to find
- bugs early, when they're relatively fixable. Yes, bugs were found.</li>
- <li>Modularity. Most of the implementations mentioned above are part
- of huge packages; take all (including bugs, of which some have far
- too many), or take nothing. I prefer a menu approach, when possible.
- This code is standalone, not beholden to any particular parser or XSL
- or XPath code.</li>
- <li>OK, I'm a hacker, I like to write code.</li>
- </ul>
- <p>
- This also works with the GNU Compiler for Java (GCJ). GCJ promises
- to be quite the environment for programming Java, both directly and from
- C++ using the new CNI interfaces (which really use C++, unlike JNI). </p>
- <h3>Open Issues</h3>
- <p>At this writing:</p>
- <ul>
- <li>See below for some restrictions on the mutation event
- support ... some events aren't reported (and likely won't be).</li>
- <li>More testing and conformance work is needed.</li>
- <li>We need an XML Schema validator (actually we need validation in the DOM
- full stop).</li>
- </ul>
- <p>
- I ran a profiler a few times and remove some of the performance hotspots,
- but it's not tuned. Reporting mutation events, in particular, is
- rather costly -- it started at about a 40% penalty for appendNode calls,
- I've got it down around 12%, but it'll be hard to shrink it much further.
- The overall code size is relatively small, though you may want to be rid of
- many of the unused DOM interface classes (HTML, CSS, and so on).
- </p>
- <h2><a name="features">Features of this Package</a></h2>
- <p> Starting with DOM Level 2, you can really see that DOM is constructed
- as a bunch of optional modules around a core of either XML or HTML
- functionality. Different implementations will support different optional
- modules. This implementation provides a set of features that should be
- useful if you're not depending on the HTML functionality (lots of convenience
- functions that mostly don't buy much except API surface area) and user
- interface support. That is, browsers will want more -- but what they
- need should be cleanly layered over what's already here. </p>
- <h3> Core Feature Set: "XML" </h3>
- <p> This DOM implementation supports the "XML" feature set, which basically
- gets you four things over the bare core (which you're officially not supposed
- to implement except in conjunction with the "XML" or "HTML" feature). In
- order of decreasing utility, those four things are: </p> <ol>
- <li> ProcessingInstruction nodes. These are probably the most
- valuable thing. Handy little buggers, in part because all the APIs
- you need to use them are provided, and they're designed to let you
- escape XML document structure rules in controlled ways.</li>
- <li> CDATASection nodes. These are of of limited utility since CDATA
- is just text that prints funny. These are of use to some sorts of
- applications, though I encourage folk to not use them. </li>
- <li> DocumentType nodes, and associated Notation and Entity nodes.
- These appear to be useless. Briefly, these "Type" nodes expose no
- typing information. They're only really usable to expose some lexical
- structure that almost every application needs to ignore. (XML editors
- might like to see them, but they need true typing information much more.)
- I strongly encourage people not to use these. </li>
- <li> EntityReference nodes can show up. These are actively annoying,
- since they add an extra level of hierarchy, are the cause of most of
- the complexity in attribute values, and their contents are immutable.
- Avoid these.</li>
- </ol>
- <h3> Optional Feature Sets: "Events", and friends </h3>
- <p> Events may be one of the more interesting new features in Level 2.
- This package provides the core feature set and exposes mutation events.
- No gooey events though; if you want that, write a layered implementation! </p>
- <p> Three mutation events aren't currently generated:</p> <ul>
- <li> <em>DOMSubtreeModified</em> is poorly specified. Think of this
- as generating one such event around the time of finalization, which
- is a fully conformant implementation. This implementation is exactly
- as useful as that one. </li>
- <li> <em>DOMNodeRemovedFromDocument</em> and
- <em>DOMNodeInsertedIntoDocument</em> are supposed to get sent to
- every node in a subtree that gets removed or inserted (respectively).
- This can be <em>extremely costly</em>, and the removal and insertion
- processing is already significantly slower due to event reporting.
- It's much easier, and more efficient, to have a listener higher in the
- tree watch removal and insertion events through the bubbling or capture
- mechanisms, than it is to watch for these two events.</li>
- </ul>
- <p> In addition, certain kinds of attribute modification aren't reported.
- A fix is known, but it couldn't report the previous value of the attribute.
- More work could fix all of this (as well as reduce the generally high cost
- of childful attributes), but that's not been done yet. </p>
- <p> Also, note that it is a <em>Bad Thing™</em> to have the listener
- for a mutation event change the ancestry for the target of that event.
- Or to prevent mutation events from bubbling to where they're needed.
- Just don't do those, OK? </p>
- <p> As an experimental feature (named "USER-Events"), you can provide
- your own "user" events. Just name them anything starting with "USER-"
- and you're set. Dispatch them through, bubbling, capturing, or what
- ever takes your fancy. One important thing you can't currently do is
- pass any data (like an object) with those events. Maybe later there
- will be a "UserEvent" interface letting you get some substantial use
- out of this mechanism even if you're not "inside" of a DOM package.</p>
- <p> You can create and send HTML events. Ditto UIEvents. Since DOM
- doesn't require a UI, it's the UI's job to send them; perhaps that's
- part of your application. </p>
- <p><em>This package may be built without the ability to report mutation
- events, gaining a significant speedup in DOM construction time. However,
- if that is done then certain other features -- notably node iterators
- and getElementsByTagname -- will not be available.</em>
- <h3> Optional Feature: "Traversal" </h3>
- <p> Each DOM node has all you need to walk to everything connected
- to that node. Lightweight, efficient utilities are easily layered on
- top of just the core APIs. </p>
- <p> Traversal APIs are an optional part of DOM Level 2, providing
- a not-so-lightweight way to walk over DOM trees, if your application
- didn't already have such utilities for use with data represented via
- DOM. Implementing this helped debug the (optional) event and mutation
- event subsystems, so it's provided here. </p>
- <p> At this writing, the "TreeWalker" interface isn't implemented. </p>
- <h2><a name='avoid'>DOM Functionality to Avoid</a></h2>
- <p> For what appear to be a combination of historical and "committee
- logic" reasons, DOM has a number of <em>features which I strongly advise
- you to avoid using</em> in your library and application code. These
- include the following types of DOM nodes; see the documentation for the
- implementation class for more information: <ul>
- <li> CDATASection
- (<a href='DomCDATA.html'>DomCDATA</a> class)
- ... use normal Text nodes instead, so you don't have to make
- every algorithm recognize multiple types of character data
- <li> DocumentType
- (<a href='DomDoctype.html'>DomDocType</a> class)
- ... if this held actual typing information, it might be useful
- <li> Entity
- (<a href='DomEntity.html'>DomEntity</a> class)
- ... neither parsed nor unparsed entities work well in DOM; it
- won't even tell you which attributes identify unparsed entities
- <li> EntityReference
- (<a href='DomEntityReference.html'>DomEntityReference</a> class)
- ... permitted implementation variances are extreme, all children
- are readonly, and these can interact poorly with namespaces
- <li> Notation
- (<a href='DomNotation.html'>DomNotation</a> class)
- ... only really usable with unparsed entities (which aren't well
- supported; see above) or perhaps with PIs after the DTD, not with
- NOTATION attributes
- </ul>
- <p> If you really need to use unparsed entities or notations, use SAX;
- it offers better support for all DTD-related functionality.
- It also exposes actual
- document typing information (such as element content models).</p>
- <p> Also, when accessing attribute values, use methods that provide their
- values as single strings, rather than those which expose value substructure
- (Text and EntityReference nodes). (See the <a href='DomAttr.html'>DomAttr</a>
- documentation for more information.) </p>
- <p> Note that many of these features were provided as partial support for
- editor functionality (including the incomplete DTD access). Full editor
- functionality requires access to potentially malformed lexical structure,
- at the level of unparsed tokens and below. Access at such levels is so
- complex that using it in non-editor applications sacrifices all the
- benefits of XML; editor aplications need extremely specialized APIs. </p>
- <p> (This isn't a slam against DTDs, note; only against the broken support
- for them in DOM. Even despite inclusion of some dubious SGML legacy features
- such as notations and unparsed entities,
- and the ongoing proliferation of alternative schema and validation tools,
- DTDs are still the most widely adopted tool
- to constrain XML document structure.
- Alternative schemes generally focus on data transfer style
- applications; open document architectures comparable to
- DocBook 4.0 don't yet exist in the schema world.
- Feel free to use DTDs; just don't expect DOM to help you.) </p>
- </body>
- </html>
|