sxml.texi 40 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145
  1. @c -*-texinfo-*-
  2. @c This is part of the GNU Guile Reference Manual.
  3. @c Copyright (C) 2013, 2017, 2021 Free Software Foundation, Inc.
  4. @c See the file guile.texi for copying conditions.
  5. @c SXPath documentation based on SXPath.scm by Oleg Kiselyov,
  6. @c which is in the public domain according to <http://okmij.org/ftp/>
  7. @c and <http://ssax.sourceforge.net/>.
  8. @node SXML
  9. @section SXML
  10. SXML is a native representation of XML in terms of standard Scheme data
  11. types: lists, symbols, and strings. For example, the simple XML
  12. fragment:
  13. @example
  14. <parrot type="African Grey"><name>Alfie</name></parrot>
  15. @end example
  16. may be represented with the following SXML:
  17. @example
  18. (parrot (@@ (type "African Grey")) (name "Alfie"))
  19. @end example
  20. SXML is very general, and is capable of representing all of XML.
  21. Formally, this means that SXML is a conforming implementation of the
  22. @uref{http://www.w3.org/TR/xml-infoset/,XML Information Set} standard.
  23. Guile includes several facilities for working with XML and SXML:
  24. parsers, serializers, and transformers.
  25. @menu
  26. * SXML Overview:: XML, as it was meant to be
  27. * Reading and Writing XML:: Convenient XML parsing and serializing
  28. * SSAX:: Custom functional-style XML parsers
  29. * Transforming SXML:: Munging SXML with @code{pre-post-order}
  30. * SXML Tree Fold:: Fold-based SXML transformations
  31. * SXPath:: XPath for SXML
  32. * sxml ssax input-parse:: The SSAX tokenizer, optimized for Guile
  33. * sxml apply-templates:: A more XSLT-like approach to SXML transformations
  34. @end menu
  35. @node SXML Overview
  36. @subsection SXML Overview
  37. (This section needs to be written; volunteers welcome.)
  38. @node Reading and Writing XML
  39. @subsection Reading and Writing XML
  40. The @code{(sxml simple)} module presents a basic interface for parsing
  41. XML from a port into the Scheme SXML format, and for serializing it back
  42. to text.
  43. @example
  44. (use-modules (sxml simple))
  45. @end example
  46. @deffn {Scheme Procedure} xml->sxml [string-or-port] [#:namespaces='()] @
  47. [#:declare-namespaces?=#t] [#:trim-whitespace?=#f] @
  48. [#:entities='()] [#:default-entity-handler=#f] @
  49. [#:doctype-handler=#f]
  50. Use SSAX to parse an XML document into SXML. Takes one optional
  51. argument, @var{string-or-port}, which defaults to the current input
  52. port. Returns the resulting SXML document. If @var{string-or-port} is
  53. a port, it will be left pointing at the next available character in the
  54. port.
  55. @end deffn
  56. As is normal in SXML, XML elements parse as tagged lists. Attributes,
  57. if any, are placed after the tag, within an @code{@@} element. The root
  58. of the resulting XML will be contained in a special tag, @code{*TOP*}.
  59. This tag will contain the root element of the XML, but also any prior
  60. processing instructions.
  61. @example
  62. (xml->sxml "<foo/>")
  63. @result{} (*TOP* (foo))
  64. (xml->sxml "<foo>text</foo>")
  65. @result{} (*TOP* (foo "text"))
  66. (xml->sxml "<foo kind=\"bar\">text</foo>")
  67. @result{} (*TOP* (foo (@@ (kind "bar")) "text"))
  68. (xml->sxml "<?xml version=\"1.0\"?><foo/>")
  69. @result{} (*TOP* (*PI* xml "version=\"1.0\"") (foo))
  70. @end example
  71. All namespaces in the XML document must be declared, via @code{xmlns}
  72. attributes. SXML elements built from non-default namespaces will have
  73. their tags prefixed with their URI. Users can specify custom prefixes
  74. for certain namespaces with the @code{#:namespaces} keyword argument to
  75. @code{xml->sxml}.
  76. @example
  77. (xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>")
  78. @result{} (*TOP* (http://example.org/ns1:foo "text"))
  79. (xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>"
  80. #:namespaces '((ns1 . "http://example.org/ns1")))
  81. @result{} (*TOP* (ns1:foo "text"))
  82. (xml->sxml "<foo xmlns:bar=\"http://example.org/ns2\"><bar:baz/></foo>"
  83. #:namespaces '((ns2 . "http://example.org/ns2")))
  84. @result{} (*TOP* (foo (ns2:baz)))
  85. @end example
  86. By default, namespaces passed to @code{xml->sxml} are treated as if they
  87. were declared on the root element. Passing a false
  88. @code{#:declare-namespaces?} argument will disable this behavior,
  89. requiring in-document declarations of namespaces before use..
  90. @example
  91. (xml->sxml "<foo><ns2:baz/></foo>"
  92. #:namespaces '((ns2 . "http://example.org/ns2")))
  93. @result{} (*TOP* (foo (ns2:baz)))
  94. (xml->sxml "<foo><ns2:baz/></foo>"
  95. #:namespaces '((ns2 . "http://example.org/ns2"))
  96. #:declare-namespaces? #f)
  97. @result{} error: undeclared namespace: `bar'
  98. @end example
  99. By default, all whitespace in XML is significant. Passing the
  100. @code{#:trim-whitespace?} keyword argument to @code{xml->sxml} will trim
  101. whitespace in front, behind and between elements, treating it as
  102. ``unsignificant''. Whitespace in text fragments is left alone.
  103. @example
  104. (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>")
  105. @result{} (*TOP* (foo "\n" (bar " Alfie the parrot! ") "\n"))
  106. (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>"
  107. #:trim-whitespace? #t)
  108. @result{} (*TOP* (foo (bar " Alfie the parrot! ")))
  109. @end example
  110. Parsed entities may be declared with the @code{#:entities} keyword
  111. argument, or handled with the @code{#:default-entity-handler}. By
  112. default, only the standard @code{&lt;}, @code{&gt;}, @code{&amp;},
  113. @code{&apos;} and @code{&quot;} entities are defined, as well as the
  114. @code{&#@var{N};} and @code{&#x@var{N};} (decimal and hexadecimal)
  115. numeric character entities.
  116. @example
  117. (xml->sxml "<foo>&amp;</foo>")
  118. @result{} (*TOP* (foo "&"))
  119. (xml->sxml "<foo>&nbsp;</foo>")
  120. @result{} error: undefined entity: nbsp
  121. (xml->sxml "<foo>&#xA0;</foo>")
  122. @result{} (*TOP* (foo "\xa0"))
  123. (xml->sxml "<foo>&nbsp;</foo>"
  124. #:entities '((nbsp . "\xa0")))
  125. @result{} (*TOP* (foo "\xa0"))
  126. (xml->sxml "<foo>&nbsp; &foo;</foo>"
  127. #:default-entity-handler
  128. (lambda (port name)
  129. (case name
  130. ((nbsp) "\xa0")
  131. (else
  132. (format (current-warning-port)
  133. "~a:~a:~a: undefined entitity: ~a\n"
  134. (or (port-filename port) "<unknown file>")
  135. (port-line port) (port-column port)
  136. name)
  137. (symbol->string name)))))
  138. @print{} <unknown file>:0:17: undefined entitity: foo
  139. @result{} (*TOP* (foo "\xa0 foo"))
  140. @end example
  141. By default, @code{xml->sxml} skips over the @code{<!DOCTYPE>}
  142. declaration, if any. This behavior can be overridden with the
  143. @code{#:doctype-handler} argument, which should be a procedure of three
  144. arguments: the @dfn{docname} (a symbol), @dfn{systemid} (a string), and
  145. the internal doctype subset (as a string or @code{#f} if not present).
  146. The handler should return keyword arguments as multiple values, as if it
  147. were calling its continuation with keyword arguments. The continuation
  148. accepts the @code{#:entities} and @code{#:namespaces} keyword arguments,
  149. in the same format that @code{xml->sxml} itself takes. These entities
  150. and namespaces will be prepended to those given to the @code{xml->sxml}
  151. invocation.
  152. @example
  153. (define (handle-foo docname systemid internal-subset)
  154. (case docname
  155. ((foo)
  156. (values #:entities '((greets . "<i>Hello, world!</i>"))))
  157. (else
  158. (values))))
  159. (xml->sxml "<!DOCTYPE foo><p>&greets;</p>"
  160. #:doctype-handler handle-foo)
  161. @result{} (*TOP* (p (i "Hello, world!")))
  162. @end example
  163. If the document has no doctype declaration, the @var{doctype-handler} is
  164. invoked with @code{#f} for the three arguments.
  165. In the future, the continuation may accept other keyword arguments, for
  166. example to validate the parsed SXML against the doctype.
  167. @deffn {Scheme Procedure} sxml->xml tree [port]
  168. Serialize the SXML tree @var{tree} as XML. The output will be written to
  169. the current output port, unless the optional argument @var{port} is
  170. present.
  171. @end deffn
  172. @deffn {Scheme Procedure} sxml->string sxml
  173. Detag an sxml tree @var{sxml} into a string. Does not perform any
  174. formatting.
  175. @end deffn
  176. @node SSAX
  177. @subsection SSAX: A Functional XML Parsing Toolkit
  178. Guile's XML parser is based on Oleg Kiselyov's powerful XML parsing
  179. toolkit, SSAX.
  180. @subsubsection History
  181. Back in the 1990s, when the world was young again and XML was the
  182. solution to all of its problems, there were basically two kinds of XML
  183. parsers out there: DOM parsers and SAX parsers.
  184. A DOM parser reads through an entire XML document, building up a tree of
  185. ``DOM objects'' representing the document structure. They are very easy
  186. to use, but sometimes you don't actually want all of the information in
  187. a document; building an object tree is not necessary if all you want to
  188. do is to count word frequencies in a document, for example.
  189. SAX parsers were created to give the programmer more control on the
  190. parsing process. A programmer gives the SAX parser a number of
  191. ``callbacks'': functions that will be called on various features of the
  192. XML stream as they are encountered. SAX parsers are more efficient, but
  193. much harder to use, as users typically have to manually maintain a
  194. stack of open elements.
  195. Kiselyov realized that the SAX programming model could be made much
  196. simpler if the callbacks were formulated not as a linear fold across the
  197. features of the XML stream, but as a @emph{tree fold} over the structure
  198. implicit in the XML. In this way, the user has a very convenient,
  199. functional-style interface that can still generate optimal parsers.
  200. The @code{xml->sxml} interface from the @code{(sxml simple)} module is a
  201. DOM-style parser built using SSAX, though it returns SXML instead of DOM
  202. objects.
  203. @subsubsection Implementation
  204. @code{(sxml ssax)} is a package of low-to-high level lexing and parsing
  205. procedures that can be combined to yield a SAX, a DOM, a validating
  206. parser, or a parser intended for a particular document type. The
  207. procedures in the package can be used separately to tokenize or parse
  208. various pieces of XML documents. The package supports XML Namespaces,
  209. internal and external parsed entities, user-controlled handling of
  210. whitespace, and validation. This module therefore is intended to be a
  211. framework, a set of ``Lego blocks'' you can use to build a parser
  212. following any discipline and performing validation to any degree. As an
  213. example of the parser construction, the source file includes a
  214. semi-validating SXML parser.
  215. SSAX has a ``sequential'' feel of SAX yet a ``functional style'' of DOM.
  216. Like a SAX parser, the framework scans the document only once and
  217. permits incremental processing. An application that handles document
  218. elements in order can run as efficiently as possible. @emph{Unlike} a
  219. SAX parser, the framework does not require an application register
  220. stateful callbacks and surrender control to the parser. Rather, it is
  221. the application that can drive the framework -- calling its functions to
  222. get the current lexical or syntax element. These functions do not
  223. maintain or mutate any state save the input port. Therefore, the
  224. framework permits parsing of XML in a pure functional style, with the
  225. input port being a monad (or a linear, read-once parameter).
  226. Besides the @var{port}, there is another monad -- @var{seed}. Most of
  227. the middle- and high-level parsers are single-threaded through the
  228. @var{seed}. The functions of this framework do not process or affect
  229. the @var{seed} in any way: they simply pass it around as an instance of
  230. an opaque datatype. User functions, on the other hand, can use the seed
  231. to maintain user's state, to accumulate parsing results, etc. A user
  232. can freely mix their own functions with those of the framework. On the
  233. other hand, the user may wish to instantiate a high-level parser:
  234. @code{SSAX:make-elem-parser} or @code{SSAX:make-parser}. In the latter
  235. case, the user must provide functions of specific signatures, which are
  236. called at predictable moments during the parsing: to handle character
  237. data, element data, or processing instructions (PI). The functions are
  238. always given the @var{seed}, among other parameters, and must return the
  239. new @var{seed}.
  240. From a functional point of view, XML parsing is a combined
  241. pre-post-order traversal of a ``tree'' that is the XML document itself.
  242. This down-and-up traversal tells the user about an element when its
  243. start tag is encountered. The user is notified about the element once
  244. more, after all element's children have been handled. The process of
  245. XML parsing therefore is a fold over the raw XML document. Unlike a
  246. fold over trees defined in [1], the parser is necessarily
  247. single-threaded -- obviously as elements in a text XML document are laid
  248. down sequentially. The parser therefore is a tree fold that has been
  249. transformed to accept an accumulating parameter [1,2].
  250. Formally, the denotational semantics of the parser can be expressed as
  251. @smallexample
  252. parser:: (Start-tag -> Seed -> Seed) ->
  253. (Start-tag -> Seed -> Seed -> Seed) ->
  254. (Char-Data -> Seed -> Seed) ->
  255. XML-text-fragment -> Seed -> Seed
  256. parser fdown fup fchar "<elem attrs> content </elem>" seed
  257. = fup "<elem attrs>" seed
  258. (parser fdown fup fchar "content" (fdown "<elem attrs>" seed))
  259. parser fdown fup fchar "char-data content" seed
  260. = parser fdown fup fchar "content" (fchar "char-data" seed)
  261. parser fdown fup fchar "elem-content content" seed
  262. = parser fdown fup fchar "content" (
  263. parser fdown fup fchar "elem-content" seed)
  264. @end smallexample
  265. Compare the last two equations with the left fold
  266. @smallexample
  267. fold-left kons elem:list seed = fold-left kons list (kons elem seed)
  268. @end smallexample
  269. The real parser created by @code{SSAX:make-parser} is slightly more
  270. complicated, to account for processing instructions, entity references,
  271. namespaces, processing of document type declaration, etc.
  272. The XML standard document referred to in this module is
  273. @uref{http://www.w3.org/TR/1998/REC-xml-19980210.html}
  274. The present file also defines a procedure that parses the text of an XML
  275. document or of a separate element into SXML, an S-expression-based model
  276. of an XML Information Set. SXML is also an Abstract Syntax Tree of an
  277. XML document. SXML is similar but not identical to DOM; SXML is
  278. particularly suitable for Scheme-based XML/HTML authoring, SXPath
  279. queries, and tree transformations. See SXML.html for more details.
  280. SXML is a term implementation of evaluation of the XML document [3].
  281. The other implementation is context-passing.
  282. The present frameworks fully supports the XML Namespaces Recommendation:
  283. @uref{http://www.w3.org/TR/REC-xml-names/}.
  284. Other links:
  285. @table @asis
  286. @item [1]
  287. Jeremy Gibbons, Geraint Jones, "The Under-appreciated Unfold," Proc.
  288. ICFP'98, 1998, pp. 273-279.
  289. @item [2]
  290. Richard S. Bird, The promotion and accumulation strategies in
  291. transformational programming, ACM Trans. Progr. Lang. Systems,
  292. 6(4):487-504, October 1984.
  293. @item [3]
  294. Ralf Hinze, "Deriving Backtracking Monad Transformers," Functional
  295. Pearl. Proc ICFP'00, pp. 186-197.
  296. @end table
  297. @subsubsection Usage
  298. @deffn {Scheme Procedure} current-ssax-error-port
  299. @end deffn
  300. @deffn {Scheme Procedure} with-ssax-error-to-port port thunk
  301. @end deffn
  302. @deffn {Scheme Procedure} xml-token? _
  303. @verbatim
  304. -- Scheme Procedure: pair? x
  305. Return `#t' if X is a pair; otherwise return `#f'.
  306. @end verbatim
  307. @end deffn
  308. @deffn {Scheme Syntax} xml-token-kind token
  309. @end deffn
  310. @deffn {Scheme Syntax} xml-token-head token
  311. @end deffn
  312. @deffn {Scheme Procedure} make-empty-attlist
  313. @end deffn
  314. @deffn {Scheme Procedure} attlist-add attlist name-value
  315. @end deffn
  316. @deffn {Scheme Procedure} attlist-null? x
  317. Return @code{#t} if @var{x} is the empty list, else @code{#f}.
  318. @end deffn
  319. @deffn {Scheme Procedure} attlist-remove-top attlist
  320. @end deffn
  321. @deffn {Scheme Procedure} attlist->alist attlist
  322. @end deffn
  323. @deffn {Scheme Procedure} attlist-fold kons knil lis1
  324. @end deffn
  325. @deffn {Scheme Procedure} define-parsed-entity! entity str
  326. Define a new parsed entity. @var{entity} should be a symbol.
  327. Instances of &@var{entity}; in XML text will be replaced with the string
  328. @var{str}, which will then be parsed.
  329. @end deffn
  330. @deffn {Scheme Procedure} reset-parsed-entity-definitions!
  331. Restore the set of parsed entity definitions to its initial state.
  332. @end deffn
  333. @deffn {Scheme Procedure} ssax:uri-string->symbol uri-str
  334. @end deffn
  335. @deffn {Scheme Procedure} ssax:skip-internal-dtd port
  336. @end deffn
  337. @deffn {Scheme Procedure} ssax:read-pi-body-as-string port
  338. @end deffn
  339. @deffn {Scheme Procedure} ssax:reverse-collect-str-drop-ws fragments
  340. @end deffn
  341. @deffn {Scheme Procedure} ssax:read-markup-token port
  342. @end deffn
  343. @deffn {Scheme Procedure} ssax:read-cdata-body port str-handler seed
  344. @end deffn
  345. @deffn {Scheme Procedure} ssax:read-char-ref port
  346. @end deffn
  347. @deffn {Scheme Procedure} ssax:read-attributes port entities
  348. @end deffn
  349. @deffn {Scheme Procedure} ssax:complete-start-tag tag-head port elems entities namespaces
  350. @end deffn
  351. @deffn {Scheme Procedure} ssax:read-external-id port
  352. @end deffn
  353. @deffn {Scheme Procedure} ssax:read-char-data port expect-eof? str-handler seed
  354. @end deffn
  355. @deffn {Scheme Procedure} ssax:xml->sxml port namespace-prefix-assig
  356. @end deffn
  357. @deffn {Scheme Syntax} ssax:make-parser . kw-val-pairs
  358. @end deffn
  359. @deffn {Scheme Syntax} ssax:make-pi-parser orig-handlers
  360. @end deffn
  361. @deffn {Scheme Syntax} ssax:make-elem-parser my-new-level-seed my-finish-element my-char-data-handler my-pi-handlers
  362. @end deffn
  363. @node Transforming SXML
  364. @subsection Transforming SXML
  365. @subsubsection Overview
  366. @heading SXML expression tree transformers
  367. @subheading Pre-Post-order traversal of a tree and creation of a new tree
  368. @smallexample
  369. pre-post-order:: <tree> x <bindings> -> <new-tree>
  370. @end smallexample
  371. where
  372. @smallexample
  373. <bindings> ::= (<binding> ...)
  374. <binding> ::= (<trigger-symbol> *preorder* . <handler>) |
  375. (<trigger-symbol> *macro* . <handler>) |
  376. (<trigger-symbol> <new-bindings> . <handler>) |
  377. (<trigger-symbol> . <handler>)
  378. <trigger-symbol> ::= XMLname | *text* | *default*
  379. <handler> :: <trigger-symbol> x [<tree>] -> <new-tree>
  380. @end smallexample
  381. The @code{pre-post-order} function, in the @code{(sxml transform)}
  382. module, visits the nodes and nodelists pre-post-order (depth-first).
  383. For each @code{<Node>} of the form @code{(@var{name} <Node> ...)}, it
  384. looks up an association with the given @var{name} among its
  385. @var{<bindings>}. If failed, @code{pre-post-order} tries to locate a
  386. @code{*default*} binding. It's an error if the latter attempt fails as
  387. well. Having found a binding, the @code{pre-post-order} function first
  388. checks to see if the binding is of the form
  389. @smallexample
  390. (<trigger-symbol> *preorder* . <handler>)
  391. @end smallexample
  392. If it is, the handler is 'applied' to the current node. Otherwise, the
  393. pre-post-order function first calls itself recursively for each child of
  394. the current node, with @var{<new-bindings>} prepended to the
  395. @var{<bindings>} in effect. The result of these calls is passed to the
  396. @var{<handler>} (along with the head of the current @var{<Node>}). To be
  397. more precise, the handler is _applied_ to the head of the current node
  398. and its processed children. The result of the handler, which should also
  399. be a @code{<tree>}, replaces the current @var{<Node>}. If the current
  400. @var{<Node>} is a text string or other atom, a special binding with a
  401. symbol @code{*text*} is looked up.
  402. A binding can also be of a form
  403. @smallexample
  404. (<trigger-symbol> *macro* . <handler>)
  405. @end smallexample
  406. This is equivalent to @code{*preorder*} described above. However, the
  407. result is re-processed again, with the current stylesheet.
  408. @subsubsection Usage
  409. @deffn {Scheme Procedure} SRV:send-reply . fragments
  410. Output the @var{fragments} to the current output port.
  411. The fragments are a list of strings, characters, numbers, thunks,
  412. @code{#f}, @code{#t} -- and other fragments. The function traverses the
  413. tree depth-first, writes out strings and characters, executes thunks,
  414. and ignores @code{#f} and @code{'()}. The function returns @code{#t} if
  415. anything was written at all; otherwise the result is @code{#f} If
  416. @code{#t} occurs among the fragments, it is not written out but causes
  417. the result of @code{SRV:send-reply} to be @code{#t}.
  418. @end deffn
  419. @deffn {Scheme Procedure} foldts fdown fup fhere seed tree
  420. @end deffn
  421. @deffn {Scheme Procedure} post-order tree bindings
  422. @end deffn
  423. @deffn {Scheme Procedure} pre-post-order tree bindings
  424. @end deffn
  425. @deffn {Scheme Procedure} replace-range beg-pred end-pred forest
  426. @end deffn
  427. @node SXML Tree Fold
  428. @subsection SXML Tree Fold
  429. @subsubsection Overview
  430. @code{(sxml fold)} defines a number of variants of the @dfn{fold}
  431. algorithm for use in transforming SXML trees. Additionally it defines
  432. the layout operator, @code{fold-layout}, which might be described as a
  433. context-passing variant of SSAX's @code{pre-post-order}.
  434. @subsubsection Usage
  435. @deffn {Scheme Procedure} foldt fup fhere tree
  436. The standard multithreaded tree fold.
  437. @var{fup} is of type [a] -> a. @var{fhere} is of type object -> a.
  438. @end deffn
  439. @deffn {Scheme Procedure} foldts fdown fup fhere seed tree
  440. The single-threaded tree fold originally defined in SSAX. @xref{SSAX},
  441. for more information.
  442. @end deffn
  443. @deffn {Scheme Procedure} foldts* fdown fup fhere seed tree
  444. A variant of @code{foldts} that allows pre-order tree
  445. rewrites. Originally defined in Andy Wingo's 2007 paper,
  446. @emph{Applications of fold to XML transformation}.
  447. @end deffn
  448. @deffn {Scheme Procedure} fold-values proc list . seeds
  449. A variant of @code{fold} that allows multi-valued seeds. Note that the
  450. order of the arguments differs from that of @code{fold}. @xref{SRFI-1
  451. Fold and Map}.
  452. @end deffn
  453. @deffn {Scheme Procedure} foldts*-values fdown fup fhere tree . seeds
  454. A variant of @code{foldts*} that allows multi-valued
  455. seeds. Originally defined in Andy Wingo's 2007 paper, @emph{Applications
  456. of fold to XML transformation}.
  457. @end deffn
  458. @deffn {Scheme Procedure} fold-layout tree bindings params layout stylesheet
  459. A traversal combinator in the spirit of @code{pre-post-order}.
  460. @xref{Transforming SXML}.
  461. @code{fold-layout} was originally presented in Andy Wingo's 2007 paper,
  462. @emph{Applications of fold to XML transformation}.
  463. @example
  464. bindings := (<binding>...)
  465. binding := (<tag> <handler-pair>...)
  466. | (*default* . <post-handler>)
  467. | (*text* . <text-handler>)
  468. tag := <symbol>
  469. handler-pair := (pre-layout . <pre-layout-handler>)
  470. | (post . <post-handler>)
  471. | (bindings . <bindings>)
  472. | (pre . <pre-handler>)
  473. | (macro . <macro-handler>)
  474. @end example
  475. @table @var
  476. @item pre-layout-handler
  477. A function of three arguments:
  478. @table @var
  479. @item kids
  480. the kids of the current node, before traversal
  481. @item params
  482. the params of the current node
  483. @item layout
  484. the layout coming into this node
  485. @end table
  486. @var{pre-layout-handler} is expected to use this information to return a
  487. layout to pass to the kids. The default implementation returns the
  488. layout given in the arguments.
  489. @item post-handler
  490. A function of five arguments:
  491. @table @var
  492. @item tag
  493. the current tag being processed
  494. @item params
  495. the params of the current node
  496. @item layout
  497. the layout coming into the current node, before any kids were processed
  498. @item klayout
  499. the layout after processing all of the children
  500. @item kids
  501. the already-processed child nodes
  502. @end table
  503. @var{post-handler} should return two values, the layout to pass to the
  504. next node and the final tree.
  505. @item text-handler
  506. @var{text-handler} is a function of three arguments:
  507. @table @var
  508. @item text
  509. the string
  510. @item params
  511. the current params
  512. @item layout
  513. the current layout
  514. @end table
  515. @var{text-handler} should return two values, the layout to pass to the
  516. next node and the value to which the string should transform.
  517. @end table
  518. @end deffn
  519. @node SXPath
  520. @subsection SXPath
  521. @subsubsection Overview
  522. @heading SXPath: SXML Query Language
  523. SXPath is a query language for SXML, an instance of XML Information set
  524. (Infoset) in the form of s-expressions. See @code{(sxml ssax)} for the
  525. definition of SXML and more details. SXPath is also a translation into
  526. Scheme of an XML Path Language, @uref{http://www.w3.org/TR/xpath,XPath}.
  527. XPath and SXPath describe means of selecting a set of Infoset's items or
  528. their properties.
  529. To facilitate queries, XPath maps the XML Infoset into an explicit tree,
  530. and introduces important notions of a location path and a current,
  531. context node. A location path denotes a selection of a set of nodes
  532. relative to a context node. Any XPath tree has a distinguished, root
  533. node -- which serves as the context node for absolute location paths.
  534. Location path is recursively defined as a location step joined with a
  535. location path. A location step is a simple query of the database
  536. relative to a context node. A step may include expressions that further
  537. filter the selected set. Each node in the resulting set is used as a
  538. context node for the adjoining location path. The result of the step is
  539. a union of the sets returned by the latter location paths.
  540. The SXML representation of the XML Infoset (see SSAX.scm) is rather
  541. suitable for querying as it is. Bowing to the XPath specification, we
  542. will refer to SXML information items as 'Nodes':
  543. @example
  544. <Node> ::= <Element> | <attributes-coll> | <attrib>
  545. | "text string" | <PI>
  546. @end example
  547. This production can also be described as
  548. @example
  549. <Node> ::= (name . <Nodeset>) | "text string"
  550. @end example
  551. An (ordered) set of nodes is just a list of the constituent nodes:
  552. @example
  553. <Nodeset> ::= (<Node> ...)
  554. @end example
  555. Nodesets, and Nodes other than text strings are both lists. A <Nodeset>
  556. however is either an empty list, or a list whose head is not a symbol. A
  557. symbol at the head of a node is either an XML name (in which case it's a
  558. tag of an XML element), or an administrative name such as '@@'. This
  559. uniform list representation makes processing rather simple and elegant,
  560. while avoiding confusion. The multi-branch tree structure formed by the
  561. mutually-recursive datatypes <Node> and <Nodeset> lends itself well to
  562. processing by functional languages.
  563. A location path is in fact a composite query over an XPath tree or its
  564. branch. A singe step is a combination of a projection, selection or a
  565. transitive closure. Multiple steps are combined via join and union
  566. operations. This insight allows us to @emph{elegantly} implement XPath
  567. as a sequence of projection and filtering primitives -- converters --
  568. joined by @dfn{combinators}. Each converter takes a node and returns a
  569. nodeset which is the result of the corresponding query relative to that
  570. node. A converter can also be called on a set of nodes. In that case it
  571. returns a union of the corresponding queries over each node in the set.
  572. The union is easily implemented as a list append operation as all nodes
  573. in a SXML tree are considered distinct, by XPath conventions. We also
  574. preserve the order of the members in the union. Query combinators are
  575. high-order functions: they take converter(s) (which is a Node|Nodeset ->
  576. Nodeset function) and compose or otherwise combine them. We will be
  577. concerned with only relative location paths [XPath]: an absolute
  578. location path is a relative path applied to the root node.
  579. Similarly to XPath, SXPath defines full and abbreviated notations for
  580. location paths. In both cases, the abbreviated notation can be
  581. mechanically expanded into the full form by simple rewriting rules. In
  582. the case of SXPath the corresponding rules are given in the
  583. documentation of the @code{sxpath} procedure.
  584. @xref{sxpath-procedure-docs,,SXPath procedure documentation}.
  585. The regression test suite at the end of the file @file{SXPATH-old.scm}
  586. shows a representative sample of SXPaths in both notations, juxtaposed
  587. with the corresponding XPath expressions. Most of the samples are
  588. borrowed literally from the XPath specification.
  589. Much of the following material is taken from the SXPath sources by Oleg
  590. Kiselyov et al.
  591. @subsubsection Basic Converters and Applicators
  592. A converter is a function mapping a nodeset (or a single node) to another
  593. nodeset. Its type can be represented like this:
  594. @example
  595. type Converter = Node|Nodeset -> Nodeset
  596. @end example
  597. A converter can also play the role of a predicate: in that case, if a
  598. converter, applied to a node or a nodeset, yields a non-empty nodeset,
  599. the converter-predicate is deemed satisfied. Likewise, an empty nodeset
  600. is equivalent to @code{#f} in denoting failure.
  601. @deffn {Scheme Procedure} nodeset? x
  602. Return @code{#t} if @var{x} is a nodeset.
  603. @end deffn
  604. @deffn {Scheme Procedure} node-typeof? crit
  605. This function implements a 'Node test' as defined in Sec. 2.3 of the
  606. XPath document. A node test is one of the components of a location
  607. step. It is also a converter-predicate in SXPath.
  608. The function @code{node-typeof?} takes a type criterion and returns a
  609. function, which, when applied to a node, will tell if the node satisfies
  610. the test.
  611. The criterion @var{crit} is a symbol, one of the following:
  612. @table @code
  613. @item id
  614. tests if the node has the right name (id)
  615. @item @@
  616. tests if the node is an <attributes-coll>
  617. @item *
  618. tests if the node is an <Element>
  619. @item *text*
  620. tests if the node is a text node
  621. @item *PI*
  622. tests if the node is a PI (processing instruction) node
  623. @item *any*
  624. @code{#t} for any type of node
  625. @end table
  626. @end deffn
  627. @deffn {Scheme Procedure} node-eq? other
  628. A curried equivalence converter predicate that takes a node @var{other}
  629. and returns a function that takes another node. The two nodes are
  630. compared using @code{eq?}.
  631. @end deffn
  632. @deffn {Scheme Procedure} node-equal? other
  633. A curried equivalence converter predicate that takes a node @var{other}
  634. and returns a function that takes another node. The two nodes are
  635. compared using @code{equal?}.
  636. @end deffn
  637. @deffn {Scheme Procedure} node-pos n
  638. Select the @var{n}'th element of a nodeset and return as a singular
  639. nodeset. If the @var{n}'th element does not exist, return an empty
  640. nodeset. If @var{n} is a negative number the node is picked from the
  641. tail of the list.
  642. @example
  643. ((node-pos 1) nodeset) ; return the the head of the nodeset (if exists)
  644. ((node-pos 2) nodeset) ; return the node after that (if exists)
  645. ((node-pos -1) nodeset) ; selects the last node of a non-empty nodeset
  646. ((node-pos -2) nodeset) ; selects the last but one node, if exists.
  647. @end example
  648. @end deffn
  649. @deffn {Scheme Procedure} filter pred?
  650. A filter applicator, which introduces a filtering context. The argument
  651. converter @var{pred?} is considered a predicate, with either @code{#f}
  652. or @code{nil} meaning failure.
  653. @end deffn
  654. @deffn {Scheme Procedure} take-until pred?
  655. @example
  656. take-until:: Converter -> Converter, or
  657. take-until:: Pred -> Node|Nodeset -> Nodeset
  658. @end example
  659. Given a converter-predicate @var{pred?} and a nodeset, apply the
  660. predicate to each element of the nodeset, until the predicate yields
  661. anything but @code{#f} or @code{nil}. Return the elements of the input
  662. nodeset that have been processed until that moment (that is, which fail
  663. the predicate).
  664. @code{take-until} is a variation of the @code{filter} above:
  665. @code{take-until} passes elements of an ordered input set up to (but not
  666. including) the first element that satisfies the predicate. The nodeset
  667. returned by @code{((take-until (not pred)) nset)} is a subset -- to be
  668. more precise, a prefix -- of the nodeset returned by @code{((filter
  669. pred) nset)}.
  670. @end deffn
  671. @deffn {Scheme Procedure} take-after pred?
  672. @example
  673. take-after:: Converter -> Converter, or
  674. take-after:: Pred -> Node|Nodeset -> Nodeset
  675. @end example
  676. Given a converter-predicate @var{pred?} and a nodeset, apply the
  677. predicate to each element of the nodeset, until the predicate yields
  678. anything but @code{#f} or @code{nil}. Return the elements of the input
  679. nodeset that have not been processed: that is, return the elements of
  680. the input nodeset that follow the first element that satisfied the
  681. predicate.
  682. @code{take-after} along with @code{take-until} partition an input
  683. nodeset into three parts: the first element that satisfies a predicate,
  684. all preceding elements and all following elements.
  685. @end deffn
  686. @deffn {Scheme Procedure} map-union proc lst
  687. Apply @var{proc} to each element of @var{lst} and return the list of results.
  688. If @var{proc} returns a nodeset, splice it into the result
  689. From another point of view, @code{map-union} is a function
  690. @code{Converter->Converter}, which places an argument-converter in a joining
  691. context.
  692. @end deffn
  693. @deffn {Scheme Procedure} node-reverse node-or-nodeset
  694. @example
  695. node-reverse :: Converter, or
  696. node-reverse:: Node|Nodeset -> Nodeset
  697. @end example
  698. Reverses the order of nodes in the nodeset. This basic converter is
  699. needed to implement a reverse document order (see the XPath
  700. Recommendation).
  701. @end deffn
  702. @deffn {Scheme Procedure} node-trace title
  703. @example
  704. node-trace:: String -> Converter
  705. @end example
  706. @code{(node-trace title)} is an identity converter. In addition it
  707. prints out the node or nodeset it is applied to, prefixed with the
  708. @var{title}. This converter is very useful for debugging.
  709. @end deffn
  710. @subsubsection Converter Combinators
  711. Combinators are higher-order functions that transmogrify a converter or
  712. glue a sequence of converters into a single, non-trivial converter. The
  713. goal is to arrive at converters that correspond to XPath location paths.
  714. From a different point of view, a combinator is a fixed, named
  715. @dfn{pattern} of applying converters. Given below is a complete set of
  716. such patterns that together implement XPath location path specification.
  717. As it turns out, all these combinators can be built from a small number
  718. of basic blocks: regular functional composition, @code{map-union} and
  719. @code{filter} applicators, and the nodeset union.
  720. @deffn {Scheme Procedure} select-kids test-pred?
  721. @code{select-kids} takes a converter (or a predicate) as an argument and
  722. returns another converter. The resulting converter applied to a nodeset
  723. returns an ordered subset of its children that satisfy the predicate
  724. @var{test-pred?}.
  725. @end deffn
  726. @deffn {Scheme Procedure} node-self pred?
  727. Similar to @code{select-kids} except that the predicate @var{pred?} is
  728. applied to the node itself rather than to its children. The resulting
  729. nodeset will contain either one component, or will be empty if the node
  730. failed the predicate.
  731. @end deffn
  732. @deffn {Scheme Procedure} node-join . selectors
  733. @example
  734. node-join:: [LocPath] -> Node|Nodeset -> Nodeset, or
  735. node-join:: [Converter] -> Converter
  736. @end example
  737. Join the sequence of location steps or paths as described above.
  738. @end deffn
  739. @deffn {Scheme Procedure} node-reduce . converters
  740. @example
  741. node-reduce:: [LocPath] -> Node|Nodeset -> Nodeset, or
  742. node-reduce:: [Converter] -> Converter
  743. @end example
  744. A regular functional composition of converters. From a different point
  745. of view, @code{((apply node-reduce converters) nodeset)} is equivalent
  746. to @code{(foldl apply nodeset converters)}, i.e., folding, or reducing,
  747. a list of converters with the nodeset as a seed.
  748. @end deffn
  749. @deffn {Scheme Procedure} node-or . converters
  750. @example
  751. node-or:: [Converter] -> Converter
  752. @end example
  753. This combinator applies all converters to a given node and produces the
  754. union of their results. This combinator corresponds to a union
  755. (@code{|} operation) for XPath location paths.
  756. @end deffn
  757. @deffn {Scheme Procedure} node-closure test-pred?
  758. @example
  759. node-closure:: Converter -> Converter
  760. @end example
  761. Select all @emph{descendants} of a node that satisfy a
  762. converter-predicate @var{test-pred?}. This combinator is similar to
  763. @code{select-kids} but applies to grand... children as well. This
  764. combinator implements the @code{descendant::} XPath axis. Conceptually,
  765. this combinator can be expressed as
  766. @example
  767. (define (node-closure f)
  768. (node-or
  769. (select-kids f)
  770. (node-reduce (select-kids (node-typeof? '*)) (node-closure f))))
  771. @end example
  772. This definition, as written, looks somewhat like a fixpoint, and it will
  773. run forever. It is obvious however that sooner or later
  774. @code{(select-kids (node-typeof? '*))} will return an empty nodeset. At
  775. this point further iterations will no longer affect the result and can
  776. be stopped.
  777. @end deffn
  778. @deffn {Scheme Procedure} node-parent rootnode
  779. @example
  780. node-parent:: RootNode -> Converter
  781. @end example
  782. @code{(node-parent rootnode)} yields a converter that returns a parent
  783. of a node it is applied to. If applied to a nodeset, it returns the
  784. list of parents of nodes in the nodeset. The @var{rootnode} does not
  785. have to be the root node of the whole SXML tree -- it may be a root node
  786. of a branch of interest.
  787. Given the notation of Philip Wadler's paper on semantics of XSLT,
  788. @verbatim
  789. parent(x) = { y | y=subnode*(root), x=subnode(y) }
  790. @end verbatim
  791. Therefore, @code{node-parent} is not the fundamental converter: it can
  792. be expressed through the existing ones. Yet @code{node-parent} is a
  793. rather convenient converter. It corresponds to a @code{parent::} axis
  794. of SXPath. Note that the @code{parent::} axis can be used with an
  795. attribute node as well.
  796. @end deffn
  797. @anchor{sxpath-procedure-docs}
  798. @deffn {Scheme Procedure} sxpath path
  799. Evaluate an abbreviated SXPath.
  800. @example
  801. sxpath:: AbbrPath -> Converter, or
  802. sxpath:: AbbrPath -> Node|Nodeset -> Nodeset
  803. @end example
  804. @var{path} is a list. It is translated to the full SXPath according to
  805. the following rewriting rules:
  806. @example
  807. (sxpath '())
  808. @result{} (node-join)
  809. (sxpath '(path-component ...))
  810. @result{} (node-join (sxpath1 path-component) (sxpath '(...)))
  811. (sxpath1 '//)
  812. @result{} (node-or
  813. (node-self (node-typeof? '*any*))
  814. (node-closure (node-typeof? '*any*)))
  815. (sxpath1 '(equal? x))
  816. @result{} (select-kids (node-equal? x))
  817. (sxpath1 '(eq? x))
  818. @result{} (select-kids (node-eq? x))
  819. (sxpath1 ?symbol)
  820. @result{} (select-kids (node-typeof? ?symbol)
  821. (sxpath1 procedure)
  822. @result{} procedure
  823. (sxpath1 '(?symbol ...))
  824. @result{} (sxpath1 '((?symbol) ...))
  825. (sxpath1 '(path reducer ...))
  826. @result{} (node-reduce (sxpath path) (sxpathr reducer) ...)
  827. (sxpathr number)
  828. @result{} (node-pos number)
  829. (sxpathr path-filter)
  830. @result{} (filter (sxpath path-filter))
  831. @end example
  832. @end deffn
  833. @node sxml ssax input-parse
  834. @subsection (sxml ssax input-parse)
  835. @subsubsection Overview
  836. A simple lexer.
  837. The procedures in this module surprisingly often suffice to parse an
  838. input stream. They either skip, or build and return tokens, according to
  839. inclusion or delimiting semantics. The list of characters to expect,
  840. include, or to break at may vary from one invocation of a function to
  841. another. This allows the functions to easily parse even
  842. context-sensitive languages.
  843. EOF is generally frowned on, and thrown up upon if encountered.
  844. Exceptions are mentioned specifically. The list of expected characters
  845. (characters to skip until, or break-characters) may include an EOF
  846. "character", which is to be coded as the symbol, @code{*eof*}.
  847. The input stream to parse is specified as a @dfn{port}, which is usually
  848. the last (and optional) argument. It defaults to the current input port
  849. if omitted.
  850. If the parser encounters an error, it will throw an exception to the key
  851. @code{parser-error}. The arguments will be of the form @code{(@var{port}
  852. @var{message} @var{specialising-msg}*)}.
  853. The first argument is a port, which typically points to the offending
  854. character or its neighborhood. You can then use @code{port-column} and
  855. @code{port-line} to query the current position. @var{message} is the
  856. description of the error. Other arguments supply more details about the
  857. problem.
  858. @subsubsection Usage
  859. @deffn {Scheme Procedure} peek-next-char [port]
  860. @end deffn
  861. @deffn {Scheme Procedure} assert-curr-char expected-chars comment [port]
  862. @end deffn
  863. @deffn {Scheme Procedure} skip-until arg [port]
  864. @end deffn
  865. @deffn {Scheme Procedure} skip-while skip-chars [port]
  866. @end deffn
  867. @deffn {Scheme Procedure} next-token prefix-skipped-chars break-chars [comment] [port]
  868. @end deffn
  869. @deffn {Scheme Procedure} next-token-of incl-list/pred [port]
  870. @end deffn
  871. @deffn {Scheme Procedure} read-text-line [port]
  872. @end deffn
  873. @deffn {Scheme Procedure} read-string n [port]
  874. @end deffn
  875. @deffn {Scheme Procedure} find-string-from-port? _ _ . _
  876. Looks for @var{str} in @var{<input-port>}, optionally within the first
  877. @var{max-no-char} characters.
  878. @end deffn
  879. @node sxml apply-templates
  880. @subsection (sxml apply-templates)
  881. @subsubsection Overview
  882. Pre-order traversal of a tree and creation of a new tree:
  883. @smallexample
  884. apply-templates:: tree x <templates> -> <new-tree>
  885. @end smallexample
  886. where
  887. @smallexample
  888. <templates> ::= (<template> ...)
  889. <template> ::= (<node-test> <node-test> ... <node-test> . <handler>)
  890. <node-test> ::= an argument to node-typeof? above
  891. <handler> ::= <tree> -> <new-tree>
  892. @end smallexample
  893. This procedure does a @emph{normal}, pre-order traversal of an SXML
  894. tree. It walks the tree, checking at each node against the list of
  895. matching templates.
  896. If the match is found (which must be unique, i.e., unambiguous), the
  897. corresponding handler is invoked and given the current node as an
  898. argument. The result from the handler, which must be a @code{<tree>},
  899. takes place of the current node in the resulting tree. The name of the
  900. function is not accidental: it resembles rather closely an
  901. @code{apply-templates} function of XSLT.
  902. @subsubsection Usage
  903. @deffn {Scheme Procedure} apply-templates tree templates
  904. @end deffn