experiment for namespace declaration and text injection from C. M. Sperberg-McQueen on 2021-06-08 (public-ixml@w3.org from June 2021)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Mon, 7 Jun 2021 21:49:34 -0600
To: public-ixml@w3.org
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Message-Id: <E025EF56-E0B0-4068-83D3-5AF613616582@blackmesatech.com>

This is a followup to

https://lists.w3.org/Archives/Public/public-ixml/2021May/0002

in preparation for our June call.

If we want to think about using namespace and injecting text, I think we could do worse than experiment with pragmas.

If as a group we agree to make namespaces and text injection part of the language, then we won't use pragmas or comments to do it, but the experiments will give us a sense of what needs to be said and what kinds of syntax we want.

And if as a group we don't change the spec, then any implementor can use pragmas (or meaningful comments, which amount to non-standardized pragmas) to do it.

Imagine that I am writing a grammar for a non-XML representation of (some class of) XSLT stylesheets. I would like the line

out: xml

to turn into

<xsl:output method="xml" indent="yes"/>

Task 1: bind a namespace to a prefix.

What we need is a prolog that occurs before the grammar, with namespace declarations (and potentially other things if we find the need). It might be as simple as:

{!xmlns xsl "http://www.w3.org/1999/XSL/Transform" !}

Task 2: specify that the output element should use it.

The simplest way to do this is to allow colons in nonterminals and specify that if a nonterminal has the form prefix:localname, then it's to be interpreted as a QName and the XML element or attribute emitted should be in the appropriate namespace.

xsl:output: 'out:', s, method, indent.
@method: 'xml'; 'html'; 'text'.

Task 3: inject the attribute-value pair indent="yes".

Injecting the attribute is easy; injecting the 'yes' requires something new.

Approach 3.1: Pragmas. With a pragma, it might be:

@indent: {! {http://example.org/eixml}inject "yes" !}.

or (assuming the appropriate binding)

@indent: {!eixml:inject "yes" !}.

Read declaratively as a normal grammar rule, this says that the lhs @indent rewrites to the empty string. Read operationally, it adds that when that occurs in the parse, an implementation supporting the eixml 'inject' keyword will inject that string into the data.

Approach 3.2: new marker + (inject text)

Text injection is the mirror image of text suppression using the mark -, so we might also imagine just adding + as a mark on terminals. In that case, we could write

@indent: +'yes'.

I am pretty sure this introduces no ambiguity but have not looked.

Approach 3.3: new marker # (serialize as text node)

Alternatively, we can add a new marker for non-terminals. Right now, non-terminals can be serialized as elements (or element names), attribute( name)s, or not at all. If we add a mark meaning "serialize as character data or attribute value" -- the equivalent of the xsl:text instruction -- then we could inject the attribute-value pair we want by writing

@indent: yes.
#yes: .

If the right-hand side is nonempty, it gets serialized after the #-marked left hand side. This could be used for simple transductions:

xsl:output: method, indent.
@method: 'xml'; 'text'; default-method.
-default-method: #xml.
#xml: .

@indent: indyes; indno.
-indyes: 'indent', #yes; default-indent.
-default-indent: #yes.
-indno: -'noindent', #no.
#yes: .
#no: .

The one thing that worries me a bit is the realization that after writing XForms 1.0 to require literals in all sorts of places, the XForms WG spent much of 1.1 revising the spec to allow indirect specification of values — instead of literals, references to values in instance documents are allowed. I worry that it may be over-optimistic to assume that the text to be injected will always be known to the grammar writer and need not be read out of the input. (But if it’s read out of the input, it need not be injected. The only use for saying ‘inject here a string that came from the input document’ would be to re-order things or make something that occurred once in the input occur multiple times in the output. I think our answer to that use case should be: XSLT is your friend. Use it.)

I wonder what other people think.

Michael

********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
cmsmcq@blackmesatech.com
http://www.blackmesatech.com
********************************************

Received on Tuesday, 8 June 2021 03:50:16 UTC