GML-Definition.tex 19 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582
  1. % The GML Format
  2. %
  3. \chapter{The GML Format}
  4. This section describes the syntax of a GML file, and how scanner and
  5. parser for the object tree are constructed. Graphs are of no
  6. relevance here; the next chapter will show how to extract a graph
  7. from a GML file.
  8. %
  9. % Syntax
  10. %
  11. \section{Syntax}
  12. Figure \ref{f:GML:Grammar} shows the syntax of GML in BNF
  13. notation. In this format, $x^{+}$ denotes a sequence of one or
  14. more $x$ items, and $x^{*}$ denotes a sequence of zero ore more $x$
  15. items. Characters in `quotes' denote terminal characters, and words in
  16. $<$\emph{angle brackets}$>$ denote nonterminals.
  17. \begin{figure}[htbp]
  18. \begin{bnf}{$<$WhiteSpace$>$}
  19. \Declare{GML}{
  20. \NonTerm{List}
  21. }
  22. \Declare{List}{
  23. \Empty{}
  24. \Alternative{
  25. \NonTerm{KeyValue}
  26. \ZeroOrMore{(\OneOrMore{\NonTerm{WhiteSpace}}\NonTerm{KeyValue})}
  27. }
  28. }
  29. \Declare{KeyValue}{
  30. \NonTerm{Key}
  31. \OneOrMore{\NonTerm{WhiteSpace}}
  32. \NonTerm{Value}
  33. }
  34. \Declare{Value}{
  35. \NonTerm{Integer}
  36. \Alternative{\NonTerm{Real}}
  37. \Alternative{\NonTerm{String}}
  38. \Alternative{\Term{$\left[\right.$} \NonTerm{List} \Term{$\left.\right]$}}
  39. }
  40. \Declare{Key}{
  41. \Range{\FromTo{a}{z}\FromTo{A}{Z}}\ZeroOrMore{\Range{\FromTo{a}{z}\FromTo{A}{Z}\FromTo{0}{9}}}
  42. }
  43. \Declare{Integer}{
  44. \NonTerm{Sign} \OneOrMore{\NonTerm{Digit}}
  45. }
  46. \Declare{Real}{
  47. \NonTerm{Sign}
  48. \OneOrMore{\NonTerm{Digit}}
  49. \Term{.}
  50. \OneOrMore{\NonTerm{Digit}}
  51. \NonTerm{Mantissa}
  52. }
  53. \Declare{String}{
  54. \Term{"}
  55. \NonTerm{Instring}
  56. \Term{"}
  57. }
  58. \Declare{Sign}{
  59. \Empty{}
  60. \Alternative{\Term{+}}
  61. \Alternative{\Term{-}}
  62. }
  63. \Declare{Digit}{
  64. \Range{\FromTo{0}{9}}
  65. }
  66. \Declare{Mantissa}{
  67. \Empty{}
  68. \Alternative{\Term{E} \NonTerm{Sign} \OneOrMore{Digit}}
  69. \Alternative{\Term{e} \NonTerm{Sign} \OneOrMore{Digit}}
  70. }
  71. \Declare{Instring}{
  72. ${\mathrm ASCII} - \{ \Term{\&}, \Term{"} \}$
  73. \Alternative{\Term{\&} \ZeroOrMore{\NonTerm{character}} \Term{;}}
  74. }
  75. \Declare{Whitespace}{
  76. \NonTerm{space}
  77. \Alternative{\NonTerm{tabulator}}
  78. \Alternative{\NonTerm{newline}}
  79. }
  80. \end{bnf}
  81. \caption{The GML Grammar in BNF Format.}
  82. \label{f:GML:Grammar}
  83. \end{figure}
  84. \section{Further specifications}
  85. \subsection{ISO 8859 Character Set}
  86. In Figure \ref{f:GML:Grammar}, \emph{Instring} excludes the
  87. characters \verb|"| and \verb|&| characters from a string. This
  88. is necessary because a \verb|"| inside a string would terminate
  89. the string prematurely. The \verb|&| character is used by the
  90. ISO 8859 character set to introduce a special character. This
  91. special character starts with an ampersand, is followed by a name
  92. and is terminated by a semicolon. For example, \verb|&quot;|
  93. inserts \verb|"|, \verb|&amp;| inserts \verb|&|, and
  94. \verb|&auml;| inserts a German `\"{a}'. For a complete list of
  95. characters, see tables \ref{t:ISO8859-1:basic},
  96. \ref{t:ISO8859-1:special}, \ref{t:ISO8859-1:capital} and
  97. \ref{t:ISO8859-1:lowercase} in section \ref{s:ISO8859}.
  98. We do not allow ISO 8859 characters outside strings, especially
  99. not in keywords. Thus, a sloppy parser might not know the ISO
  100. 8859 character set, but just ASCII, and can safely read and write
  101. GML as 7-bit ASCII files. Many applications do in fact not need
  102. to look into the labels, or use only simple labels.
  103. \subsection{Line Length}
  104. The maximum line length in the file format must not exceed
  105. \textbf{254} characters. This ensures that even systems with a
  106. more restrictive line length can cope with a GML file.
  107. \subsection{Key Syntax}
  108. We use a very restricted format for keys, which does not allow
  109. characters such as '\texttt{\_}', '\texttt{\$}' or '\texttt{:}'.
  110. This is because they might not be legal characters
  111. for variables in some interpreted languages. With this
  112. restriction, keys may be used as identifiers in interpreted
  113. languages. It also simplifies the syntax a lot.
  114. \subsection{Key Size}
  115. Because of the maximum line length, keys must less than
  116. \textbf{254} characters. However, it is quite convenient to have
  117. a key and a data item on one line, so it is a good idea to have a
  118. key size of less then than 126.
  119. \subsection{Line breaks}
  120. Line breaks may occur anywhere in the file format where white
  121. space is allowed. Line breaks inside strings are line breaks in
  122. the string\footnote{This limits strings to a maximum \emph{line
  123. width} of 253 characters, which seems reasonable.}.
  124. \subsection{\# Comments}
  125. Any line which starting with \verb|#| (whitespace \emph{before}
  126. the \verb|#| is allowed) is ignored by the parser. This is a
  127. standard treatment in most UNIX programs. For example, using the
  128. following as a first line
  129. \begin{alltt}
  130. #!/usr/local/bin/gmlview
  131. \end{alltt}
  132. \noindent specifies that the program \texttt{gmlview} interprets the file.
  133. It is also common to include foreign data (e.g.\ a Postscript
  134. representation of the graph) through that mechanism. Many drawing
  135. programs use the reverse mechanism to insert their data into a
  136. Postscript file.
  137. GML includes also a \texttt{comment} key which adds comments to a
  138. file. However, a parser will read and store these comments, so
  139. they should be reserved for \emph{small} comments. Any comments
  140. inserted with \texttt{\#} should be ignored by the parser.
  141. \subsection{Order of duplicate keys}
  142. \label{s:GML:OrderOfDupliactekeys}
  143. It is perfectly legal to have duplicate keys within the same
  144. list. For example, an array might be represented as follows:
  145. \begin{quote}
  146. \begin{alltt}
  147. array [
  148. element [ {\textnormal{\ldots{}}} ]
  149. element [ {\textnormal{\ldots{}}} ]
  150. element [ {\textnormal{\ldots{}}} ]
  151. ]
  152. \end{alltt}
  153. \end{quote}
  154. \noindent To avoid problems, we require that the order of \emph{duplicate}
  155. keys is \emph{preserved} by the parser. The order of not
  156. duplicate keys does need not be preserved. This is because
  157. programs might not be able to record the exact order of the
  158. attributes in the file. If would also make it more difficult to
  159. add more attribues as the file format grows\footnote{Of course,
  160. one could require that the new attributes are just appended to
  161. the old list. However, if a program is written in a modular
  162. fashion, the attributes will be written by procedures
  163. \mbox{\texttt{p1}, \texttt{p2}, \ldots, \texttt{pn}} in that
  164. order. If \texttt{p1} is extended, the new attributes would
  165. have to be written \emph{after} \texttt{pn}, which would break
  166. the modularization.}.
  167. \subsection{Unknown Keys}
  168. Any parser which encounters an unknown key should preserve it and
  169. its value, and write them back when the graph is saved into a
  170. file. Exceptions to this policy are changes in the structure,
  171. e.g.\ deletion of the parent of the unknown objects, and
  172. consistency problems (see \ref{c:GML:Consistency}).
  173. \subsection{Default values}
  174. One important requirement for GML is that an application which
  175. writes a file may omit all ``not interesting'' keys. However, the
  176. following default values should be be assumed for missing
  177. key-value pairs:
  178. \begin{itemize}
  179. \item \textbf{0} for \emph{Integer} values.
  180. \item \textbf{0.0} for \emph{Real} values.
  181. \item \textbf{\texttt{""}} for \emph{String} values.
  182. \item $\left[\right]$ for \emph{List} values.
  183. \end{itemize}
  184. \noindent This makes sure that files with missing keys are treated
  185. equally by different programs. For example, one could define that
  186. a missing object width is substituted by some default value. But,
  187. what should be used ? Common values are 1, 2, 16, 32, 42 and 64,
  188. or the \emph{current default setting of the program}. However,
  189. especially the last variant is highly dependend on the current
  190. state of the program, and might lead to overlapping objects and
  191. therefore hard to read drawings.
  192. Nevertheless, a program may (and should) implement an option
  193. ``substitute defaults for missing values''. Such a clean up
  194. operation\footnote{Graphlet provides such operations in the
  195. ``Tool'' menu under the keyword ``Clean up''.} which is
  196. available by request is less confusing than the substitution of
  197. system state dependend values.
  198. %
  199. % Graphlet's Parser Implementation
  200. %
  201. \section{Graphlet's Parser Implementation}
  202. It remains to show how to construct an Object tree from a GML file.
  203. This is done while parsing the second and third rule in the syntax
  204. definition.
  205. \begin{tabbing}
  206. \emph{List}\texttt{\ ::=\ }\=\texttt{\ \ \ \ }\=\texttt{\ \ \ \ }\=\kill
  207. \emph{List}\texttt{\ ::=\ }
  208. \emph{KeyValue} ( \emph{WhiteSpace}$^+$ \emph{KeyValue})$^{*}$ \\
  209. \> \> \textbf{var} \emph{l}: List(Object); \\
  210. \> \> \emph{l} := emptyList; \\
  211. \> \> \textbf{foreach} \emph{KeyValue} in the list \textbf{do} \\
  212. \> \> \> \emph{o} := new Object(\emph{KeyValue.Key}); \\
  213. \> \> \> \emph{o}.value := \emph{KeyValue.Value}; \\
  214. \> \> \> \textbf{append} \emph{o} to \emph{l}; \\
  215. \> \> \textbf{done} \\
  216. \> \> \textbf{return} l; \\
  217. \\
  218. \emph{Value}\texttt{\ ::=\ }\emph{Integer} $\mid$ \\
  219. \> \> \textbf{return} \emph{Integer}; \\
  220. \>\emph{Real} $\mid$ \\
  221. \> \> \textbf{return} \emph{Real}; \\
  222. \>\emph{String} $\mid$ \\
  223. \> \> \textbf{return} \emph{String}; \\
  224. \>\verb|[| \emph{List} \verb|]| \\
  225. \> \> \textbf{return} \emph{List};
  226. \end{tabbing}
  227. \section{How to represent common data structures}
  228. As earlier said, GML is by no means restricted to graphs. In
  229. fact, all common data types can be represented in GML. This
  230. section is a cookbook for designing data structure
  231. representations in GML.
  232. \subsection{Boolean}
  233. Boolean values can be represented by Integers. \texttt{false} is
  234. represented by 0, \texttt{true} is represented by any other
  235. number.
  236. \begin{notes}
  237. \item We decided not to implement a separate datatype for
  238. boolean values because this would only complicate the parser.
  239. \end{notes}
  240. \subsection{Large numbers}
  241. \label{s:GML:LargeNumbers}
  242. Large integer or floating point data types can be represented as
  243. Strings. The string is the standard ASCII representation of the
  244. value.
  245. \begin{notes}
  246. \item A more compact representation for large integer values
  247. could be obtained by coding the bits of the number in the same
  248. fashion as the \texttt{uuencode} program which is available on
  249. UNIX systems. However, this would assume that the layout of the
  250. bits in memory is fixed (which is \emph{not} the case), and
  251. would also complicate the parser. It is also not clear how many
  252. applications would need this.
  253. % except TL
  254. \item If space is an issue, An application may encode very
  255. large integer numbers as a list if integers in a $2^{15}$- or
  256. $2^{31}$-adic number system.
  257. \end{notes}
  258. \subsection{Bitset}
  259. \label{s:GML:Bitset}
  260. A bitset should be represented as a string of 0 and 1 characters.
  261. \begin{notes}
  262. \item The same as for large numbers (section
  263. \ref{s:GML:LargeNumbers}) applies here, too.
  264. \end{notes}
  265. \subsection{Records}
  266. A record data type
  267. \begin{quote}
  268. \begin{alltt}
  269. name: record
  270. a: type1
  271. b: type2
  272. \ldots{}
  273. end
  274. \end{alltt}
  275. \end{quote}
  276. \noindent can be represented in GML as follows:
  277. \begin{quote}
  278. \begin{alltt}
  279. name [
  280. a \ttcomment{value\_of\_a}
  281. b \ttcomment{value\_of\_b}
  282. \ttcomment{\ldots{}}
  283. ]
  284. \end{alltt}
  285. \end{quote}
  286. \noindent In place of \emph{value\_of\_field}, insert the value of
  287. the corresponding field of the record. This can either be an
  288. integer, a real or a string value or a list.
  289. \subsection{Lists, Sets, Arrays}
  290. A list of data type
  291. \begin{quote}
  292. \begin{alltt}
  293. name: List(SomeType)
  294. \end{alltt}
  295. \end{quote}
  296. \noindent can be represented in GML as follows:
  297. \begin{quote}
  298. \begin{alltt}
  299. name [
  300. obj \ttcomment{value\_of\_first\_element}
  301. obj \ttcomment{value\_of\_second\_element}
  302. \ldots{}
  303. ]
  304. \end{alltt}
  305. \end{quote}
  306. \noindent where \texttt{obj} should be replaced by a suitable name for the
  307. elements of the list. Section \ref{s:GML:OrderOfDupliactekeys}
  308. specifies that a parser must preserve the order of the
  309. \texttt{obj} attributes when it reads the program.
  310. Sets and arrays are represented with exactly the same schema. In
  311. the case of arrays, it might be neccessary to use a list for
  312. \emph{value\_of\_field} and add an \texttt{id} key to the value.
  313. This \texttt{id} represents the index of the array element.
  314. Associative arrays might use a \texttt{name} key or some more
  315. complex structure.
  316. \subsection{Name clashes}
  317. The freedom of adding new keys bears the problem of name clashes.
  318. To avoid this problem as much as possible, any application should
  319. put its private information in a object of type list who's
  320. private key represents the name of the application. For example,
  321. the Graphlet system will insert all private information into a an
  322. object with key named \texttt{Graphlet}\footnote{Of course, this
  323. is still not 100\% save, but as close as we can get with a
  324. reasonable effort. HTML uses the same approach and works still
  325. fine after all these years.}.
  326. %
  327. % Extracting Graphs from GML files
  328. %
  329. \chapter{Extracting Graphs from GML files}
  330. Up to this point, GML was not related to graphs in any way. In
  331. fact, GML is designed so that it can map any data structure onto
  332. an ASCII file. To extract a graph from a GML file, we parse the
  333. file and extract the list of Object structures. Then, we run
  334. through the objects and extract the graph structure from them:
  335. \begin{program}
  336. \textbf{var} \emph{objects}: List(Object); \\
  337. \\
  338. \emph{objects} := parse (file); \\
  339. \textbf{foreach} $o$ \textbf{in} \emph{objects} \textbf{where} $o$.key = ``Graph'' \textbf{do} \\
  340. \> \emph{g} := \textbf{new} graph; \\
  341. \> \textbf{foreach} $\bar{o}$ \textbf{in} $o$.value \textbf{where} $\bar{o}$.key = ``Node'' \textbf{do} \\
  342. \> \> $n$ := \textbf{new} node ($g$); \\
  343. \> \> $n$.attributes := $\bar{o}$.value; \\
  344. \> \> \textbf{remove} $\bar{o}$ \textbf{from} $o$.value; \\
  345. \> \textbf{done} \\
  346. \> \textbf{foreach} $\bar{o}$ \textbf{in} $o$.value \textbf{where} $\bar{o}$.key = ``Edge'' \textbf{do} \\
  347. \> \> $e$ := \textbf{new} edge ($\bar{o}$.source.value, $\bar{o}$.target.value); \\
  348. \> \> $e$.attributes := $\bar{o}$.value; \\
  349. \> \> \textbf{remove} $\bar{o}$ \textbf{from} $o$; \\
  350. \> \textbf{done} \\
  351. \> $g$.attributes := $o$.value; \\
  352. \> \textbf{remove} $o$ \textbf{from} \emph{objects}; \\
  353. \textbf{done} \\
  354. \end{program}
  355. \noindent To write a graph to a GML file, use the following schema:
  356. \begin{program}
  357. \textbf{procedure} print ($g$: Graph); \\
  358. \textbf{begin} \\
  359. \> \textbf{print} ``\verb|graph [|''; \\
  360. \> \textbf{print} ($g$.attributes); \\
  361. \> \textbf{foreach} $n$ \textbf{in} g.nodes \textbf{do} \\
  362. \> \> \textbf{print} ``\verb|node [|''; \\
  363. \> \> \textbf{print} ($n$.attributes); \\
  364. \> \> \textbf{print} \verb|]|''; \\
  365. \> \textbf{done} \\
  366. \> \textbf{foreach} $e$ \textbf{in} $g$.edges \textbf{do} \\
  367. \> \> \textbf{print} ``\verb|edge [|''; \\
  368. \> \> \textbf{print} $e$.attributes; \\
  369. \> \> \textbf{print} ``\verb|]|''; \\
  370. \> \textbf{done} \\
  371. \> \textbf{print} ``\verb|]|''; \\
  372. \textbf{end} \\
  373. \\
  374. \textbf{procedure} print (\emph{objects}: List(Object)) \\
  375. \textbf{begin} \\
  376. \> \textbf{foreach} $o$ \textbf{in} \emph{objects} \textbf{do} \\
  377. \> \> \textbf{print} $o$.key; \\
  378. \> \> \textbf{case} type($o$) \textbf{of} \\
  379. \> \> \> Integer : \\
  380. \> \> \> \> \textbf{print} $o$.value; \\
  381. \> \> \> Real : \\
  382. \> \> \> \> \textbf{print} $o$.value; \\
  383. \> \> \> String : \\
  384. \> \> \> \> \textbf{print} $o$.value; \ttcomment{must ensure line
  385. length $<$ 255} \\
  386. \> \> \> List : \\
  387. \> \> \> \> \textbf{print} ``\verb|[|'' \\
  388. \> \> \> \> \textbf{print} $o$.value; \\
  389. \> \> \> \> \textbf{print} ``\verb|]|''; \\
  390. \> \> \textbf{end} \\
  391. \> \textbf{done} \\
  392. \textbf{end} \\
  393. \end{program}
  394. \noindent In the above program, we assume that a statement \textbf{print
  395. $s$} writes $s$ with the following properties:
  396. \begin{itemize}
  397. \item $s$ is surrounded by quotes.
  398. \item The output is adjusted to a maximum line length of 254 characters,
  399. inclusive quotes.
  400. \item All characters are properly translated into the ISO 8859
  401. character set. Especially, the \verb|&| and \verb|"| characters
  402. are replaced by \verb|&amp;| and \verb|&quot;|.
  403. \end{itemize}
  404. It is a good idea to save all objects in the file and print them
  405. out again, unless they are unsafe and changes to the graph have
  406. been made (see also chapter \ref{c:GML:Consistency}). This will
  407. make sure that valuable information supplied by other programs
  408. wont get lost.
  409. %
  410. % Consistency
  411. %
  412. \chapter{Consistency: Safe and Unsafe Objects}
  413. \label{c:GML:Consistency}
  414. With GML, we can specify objects that depend on other objects.
  415. That is, changing one object forces other objects to be updated.
  416. There are many examples for such behaviour:
  417. \begin{description}
  418. \item[Edge coordinates] Edge coordinates must be updated when
  419. the endnodes of the edge move. Technically, if a node is
  420. removed, all adjacent edges must be removed.
  421. \item[Label coordinates] The same as above applies to label
  422. coordinates.
  423. \item[Graph Theory] Information on graph theoretical
  424. properties, such as planarity, is often needed by applications
  425. and should therefore be added to a graph. This may save costly
  426. recomputation (if an appropriate algorithm is available at
  427. all). This information must be updated whenever the structure
  428. of the graph changes.
  429. \item[Maximum and Minimum Coordinates] Some applications need
  430. information on the maximum or minimum x and y coordinates.
  431. They must be updated whenever a node or edge changes
  432. \emph{position} or \emph{size}.
  433. \item[Size of the graph] Information on the size of the graph,
  434. that is the number of nodes and/or edges is also an often
  435. found information in graph file formats. However, this
  436. information must be updated whenever the structure of the graph
  437. changes.
  438. \end{description}
  439. In some cases, it is quite obvious how to handle updates. On the
  440. other hand, in the case of Graph theoretic properties, it might
  441. be difficult and time costly to update, if at all feasible.
  442. There are many solutions to this problem. Besides omitting all
  443. potentially dangerous keys, a simple solution would be to define
  444. consistency conditions for a few known objects and forbid any
  445. other consistency violating objects. This would especially mean
  446. that graph theoretical properties, which might be the result of
  447. several hours computing, are ruled out.
  448. The most complex solution would be to give a precise
  449. specification for the dependencies of each object, and force
  450. updates. However, this would lead to a very complex file formant,
  451. and would be difficult to implement.
  452. Therefore, we settle for the following solution: It is legal to
  453. add objects which become inconsistent after changes, but a a hint
  454. must be given that the key bears a potential consistency problem.
  455. \begin{definition}[Safe and Unsafe Objects]
  456. We call any object which must be updated or removed upon a
  457. change unsafe. We discriminate safe and unsafe objects by
  458. their key:
  459. \begin{itemize}
  460. \item An object who's key starts with a \textbf{lower case} letter
  461. starts is \emph{safe}.
  462. \item An object who's key starts with a \textbf{capital letter} starts
  463. is \emph{unsafe}.
  464. \end{itemize}
  465. Any program that changes (a) the topoligical structure of the
  466. graph or (b) an attribute must delete all objects it cannot
  467. update properly.
  468. \end{definition}
  469. %%% Local Variables:
  470. %%% mode: latex
  471. %%% TeX-master: "GML"
  472. %%% End: