W3C home > Mailing lists > Public > public-ixml@w3.org > April 2021

Re: Adding implicit string values

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Tue, 13 Apr 2021 11:56:59 -0600
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, Steven Pemberton <Steven.Pemberton@cwi.nl>, public-ixml@w3.org
Message-Id: <24F2341C-497D-4F59-A2E7-9B760142D84D@blackmesatech.com>
To: Tom Hillman <tom@expertml.com>
Interesting idea.  

Once more I am reminded of Prolog definite clause grammars, which have a facility sometimes referred to as ‘pushback tokens’.  Normally, DCG rules are written

  n —> a, b, c.

or 

  n —> “terminal”.

But it’s also possible to write

  n, “injected-terminal” —> a, b, c.

which has the operational meaning “parse the a, b, and c, and if that succeeds, then push “injected-terminal” onto the front of the remaining input sequence. It can be used to handle lookahead:  the rule 

    n, “x” —> “abc”, “x”. 

effectively recognizes the input “abc” as an n, but only when followed by “x”. 

I believe that some sufficiently strong-minded people have managed to formulate a declarative meaning for pushback tokens, but I have not actually internalized it.  (Pause.  OK, now I’ve checked, and I find that they did not actually get any further than renaming the construct “semi-context notation”; the only description they give is completely operational.)

However, there is a catch in the example:  we don’t want attributes named “article-citation-level” and “book-citation-level” and so on; they all need to be named “level”.  I haven’t seen a way to extend ixml to do that without breaking the simple rule that element and attribute names are given by the nonterminals.  

I like the in-camera, in-darkroom analogy, in part because it allows for situations in which there are tradeoffs.  At this point, Steven may be fearing that everyone is wanting to take a beautiful, minimal design and add bells and whistles to it that will ruin its simplicity, so I will say explicitly that a lot of the beauty of ixml is in the extremely high power-to-mechanism ratio, and we should be very wary of adding too much visible mechanism.  

It’s been a long time since I looked at the ISO standard for EBNF, but what I remember as my dominant impression when I looked at it was that they had taken a very simple, reasonably attractive design that said nothing at all about tokenization, and the ISO WG had added tokenization with so much mechanism (and so little taste) that it spoiled the entire design.  I have a vague recollection that looking at it again later I thought maybe it wasn’t all THAT bad.  But the general principle is clear:  when you add a lot of things to a beautiful minimal design, you can pretty much guarantee that when you’re done it won’t be minimal any more.

Michael

> On 13,Apr2021, at 4:35 AM, Tom Hillman <tom@expertml.com> wrote:
> 
> Michael's TEI reference example makes me think that there is another missing feature that we may want to consider.
> 
> If there is some grammatical furniture in an input parse, we can choose to discard it using the `-` mark, and suppress the corresponding rule in the grammar from being serialised as an XML node.
> 
> Thus, something explicit in the non-XML format can become implicit in the XML format.
> 
> Michael's use case is an example of where we might want to do the opposite: take something implicit in the non-XML format and make it explicit in the XML format.
> 
> We can do that, as Michael discusses, by adding a rule name that we might not want in our final output and processing further, but I think it would be better if we can do as much as possible "in camera" rather than "in the dark room".
> 
> Perhaps we could consider a `+` mark to complement the `-` mark;  the effect would be to create some XML node that need not be present in the parsed input:
> 
>                citation: article-citation; journal-citation; book-citation.
>       -article-citation: +article-citation-level, author, title, journal, volume, locator, date.
>       -journal-citation: +journal-citation-level, journal.
>          -book-citation: +book-citation-level, author, title, location, publisher, date.
> @article-citation-level: "a".
> @journal-citation-level: "j".
>    @book-citation-level: "m".
> 
> Tom
> 
> _________________
> Tomos Hillman
> eXpertML Ltd
> +44 7793 242058
> On 13 Apr 2021, 02:54 +0100, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>, wrote:
>> 
>> So we start with
>> 
>> <article>Alka-Seltzer, L. Untersuchungen über die tomatostaltische Reflexe beim Walküre. *Bayreuth Monatschr. f. exp. Biol.* 184, 34-43, 1815.</article>
>> ...
>> <book>Hun, O. & Deu, I. *Tonic, diatonic, & catatonic stage-distress syndromes.* Basel, Karger, 1960.</book>
>> 
>> and use simple grammars to parse these into a richer tagging. (Trying to show how these would be tagged, I realize I can’t get these into TEI on a single pass, because the TEI output wants <title level=“j”> for article titles, <title level=“j”> for journal titles, and <title level=“m”> for book titles, and we don’t have any literal “a”, “j”, or “m” in the data to use to populate the ‘level’ attribute.

********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
cmsmcq@blackmesatech.com
http://www.blackmesatech.com
********************************************
Received on Tuesday, 13 April 2021 17:57:22 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 13 April 2021 17:57:23 UTC