Re: Refactor of ixml grammar from Dave Pawson on 2021-11-04 (public-ixml@w3.org from November 2021)

From: Dave Pawson <dave.pawson@gmail.com>
Date: Thu, 4 Nov 2021 13:43:50 +0000
To: Tom Hillman <tom@expertml.com>
Cc: Steven Pemberton <steven.pemberton@cwi.nl>, "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, ixml <public-ixml@w3.org>
Message-ID: <CAEncD4e6Kp2L=A-3-uakVMnqyaWsznwGmZVZvcsn82V8sjjU8w@mail.gmail.com>

On Thu, 4 Nov 2021 at 13:28, Tom Hillman <tom@expertml.com> wrote:
>
> I would import it as unparsed-text() and treat it as a string.  It is the grammar's job to define whether or not end of lines are meaningful; to the parser, they are just characters, as Steven says.

 the comment was, the eoln character(s) are being stripped - so how to
isolate the 'string' into lines for further
processing?

regards



>
> Tom
>
> _________________
> Tomos Hillman
> eXpertML Ltd
> +44 7793 242058
> On 4 Nov 2021, 13:25 +0000, Dave Pawson <dave.pawson@gmail.com>, wrote:
>
> On Thu, 4 Nov 2021 at 13:16, Tomos Hillman <yamahito@gmail.com> wrote:
>
>
> JayParser is an XSLT implementation of iXML, Dave:
>
> https://github.com/eXpertML/JayParser
>
> There's also a talk online from last year's Declarative Amsterdam, if you want to see how it works: TLDR is to import the JayParser.xsl and call e:parse-with-grammar($string-to-parse, $grammar-as-xml).
>
>
> So how might you tokenize a plain text file, isolating each text line Tomos?
> (Assumes xslt impl comprehends ixml).
>
> regards
>
>
>
>
> Unfortunately, it doesn't scale so well, yet; a(nother) rewrite has been on my todo list for a while!
>
> Thanks,
> Tom
> On 4 Nov 2021, 12:27 +0000, Dave Pawson <dave.pawson@gmail.com>, wrote:
>
> Sorry Steven, I meant an XSLT tokenize. I 'presume' that XSLT may eventually
> take in iXML?
>
> regards
>
> On Thu, 4 Nov 2021 at 10:39, Steven Pemberton <steven.pemberton@cwi.nl> wrote:
>
>
> I'm not sure if I exactly understand your comment, but ixml doesn't tokenise at all. It only recognises characters.
> Steven
>
> On Thursday 04 November 2021 09:45:32 (+01:00), Dave Pawson wrote:
>
> A common XSLT processing sequence, plain text to XML (e.g. CSV) is to
> tokenize by eol first, then
> within line?
>
> I'd hope you might support this form of processing.
>
> regards
>
> On Thu, 4 Nov 2021 at 08:31, Steven Pemberton <steven.pemberton@cwi.nl> wrote:
>
>
> Good points.
>
> ABC completely denies the existence of end-of-line characters. It delivers input as an array of lines, where the line terminators have been elided. This is because different operating systems use different line end conventions, and the language hides these differences. So there is no way to get a LF delivered.
>
> Steven
>
> On Thursday 04 November 2021 01:32:13 (+01:00), C. M. Sperberg-McQueen wrote:
>
> I like most of these changes.
>
> But having
>
> ixml: s, rule+.
> rule: (mark, s)?, name, s, -[“:=“], s, -alts, -“.”, s.
>
> instead of
>
> ixml: s, rule+s.
> rule: (mark, s)?, name, s, -[“:=“], s, -alts, -“.”.
>
> has the unfortunate effect that a grammar like
>
> { Section 1: …}
> a: … .
> b: … .
>
>
> { Section 2: …}
> z: … .
> y: … .
>
> produces XML in which the comment ‘ Section 2: … ‘ turns up not
> between the last rule of section 1 and the first rule of section 2, but
> within the last rule of section 1.
>
> Also, I’m curious what the bug involving lf was.
>
> Michael
>
> On 3,Nov2021, at 5:02 PM, Steven Pemberton <steven.pemberton@cwi.nl> wrote:
>
> In an idle moment, I refactored the grammar. Comments gladly received.
> Changes: * I hid all nonessential terminals. I know above all Tom was asking for this.
> * I moved the spaces from the rule for ixml into the rule for rule. Tidier and more consistent.
> * I renamed S to s.
> * I simplified 'namestart', since I realised class L covered all the cases.
>
> I think that's all.
>
> See attachment.
>
> Steven<ixml-new.ixml>
>
>
>
>
>
>
>
>
>
>
> --
> Dave Pawson
> XSLT XSL-FO FAQ.
> Docbook FAQ.
>
>
>
> --
> Dave Pawson
> XSLT XSL-FO FAQ.
> Docbook FAQ.



-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.

Received on Thursday, 4 November 2021 13:44:14 UTC