- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Fri, 24 Jun 2022 20:27:22 -0600
- To: M Joel Dubinko <micah@dubinko.info>
- Cc: public-ixml@w3.org
M Joel Dubinko writes: > Idle thought, > > Is it possible to represent the XML grammar (of XML itself) in ixml? > Not that this seems immediately useful, other than perhaps as a torture test... Yes, I think so, though the closest anyone has come was a very simple toy grammar to illustrate the principle and exhibit the complications that arise in the process. The toy grammar, which I have now retrieved and sanity checked with a functioning ixml processor, looks like this: { A grammar for a small subset of XML, as an illustration. } document: ws?, element, ws?. element: start-tag, content, end-tag; sole-tag. -start-tag: -"<", @gi, (ws, attribute)*, ws?, -">". -end-tag: -"</", @gi2, (ws, attribute)*, ws?, -">". -sole-tag: -"<", @gi, (ws, attribute)*, ws?, -"/>". attribute: @name, ws?, -"=", ws?, @value. @value: dqstring; sqstring. -dqstring: dq, ~['"']*, dq. -sqstring: sq, ~["'"]*, sq. -dq: -['"']. -sq: -["'"]. -content: (PCDATA-char; processing-instruction; comment; element)*. -PCDATA-char: (~["<>&"]; "&"; "<"; ">"). processing-instruction: -"<?", @name, ws, @pi-data, -"?>". comment: -"<--", commentdata, -"-->". name: [L; "_"], [L; Nd; "_-."]*. { a slight simplification } -pi-data: ~["?"]*. { another slight simplification } -commentdata: ~["-"]*. { and another } gi: name. gi2: name. -ws: -[#20; #A; #C; #9]+. As you can see, it's somewhat simpler than the grammar in the XML spec, but I don't know of any principled reason that the spec grammar could not be translated in its entirety into ixml. (The XML spec does use some constructs like subtraction which would complicate the effort, but I don't think it's impossible to translate those into ixml, just error-prone and tedious.) Among the input sequences which should be accepted by this grammar is the following XML representation of a haiku. <haiku author="Basho" date="1686"> <line>When the old pond</line> <line>gets a new frog</line> <line>it's a new pond.</line> </haiku> When we ask a processor to parse that input against that grammar, the results remind us of the fact that XML has a number of additional constraints that are attached to the grammar but not part of it. (For those who care about such things, it may be noted that the well-formedness and validity constraints, and in particular the idea of attaching them to the relevant productions in the grammars, came from the notation of attribute grammars.) Also, ixml creates elements with the names of the nonterminals in the grammar. So the output does not look like the input XML (although it is easy to see that it has a similar structure and it would be an easy XSLT exercise to translate it into conventional XML). It looks like an XML document describing the structure of another an XML structure -- which, I suppose, is what it is. <document> <element gi="haiku" gi2="haiku"> <attribute name="author" value="Basho"/> <attribute name="date" value="1686"/> <element gi="line" gi2="line">When the old pond</element> <element gi="line" gi2="line">gets a new frog</element> <element gi="line" gi2="line">it's a new pond.</element> </element> </document> There is extra whitespace in the output because the grammar makes no effort to suppress inter-element whitespace. Again, easy to deal with in an XSLT post-processing step. I hope this helps, and thank you for the question. Michael -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Saturday, 25 June 2022 02:27:41 UTC