- From: Chris Lahey <clahey@clahey.net>
- Date: Tue, 4 Dec 2012 23:50:19 -0500
- To: James Clark <jjc@jclark.com>
- Cc: David Carlisle <davidc@nag.co.uk>, "public-microxml (public-microxml@w3.org)" <public-microxml@w3.org>
I've run into a couple issues with the spec. Is this a good forum for the discussion? Specifically, when in Main Tokenization Mode, the first listed possible parse is DATA_CHAR with a default handler, but all possible strings match this rule, so if you apply the rules in order, the whole document will just be parsed as a list of DataChars (this is what my code is doing right now, but I already changed the order, so that's just debugging that has to happen on my end.) I think the spec should specify the order in which matches take precedence? You also don't specify what happens when you get to end of stream when in Main mode (or a bunch of other modes, actually). My guess is you stop outputting things, but I think that should be specified. Also, the default handling rule for NUMERIC_CHAR_REF requires the original character data if the integer is over 10FFFF, but the associated data for a NUMERIC_CHAR_REF is the integer. I got my Tokenization code to compile, which is a pretty good step. There's still a fair amount of work to do, but I'm pretty happy with the spec so far. Thanks much, Chris On Mon, Nov 26, 2012 at 9:53 AM, James Clark <jjc@jclark.com> wrote: > Yes, you're right, thanks. Fixed now. > > James > > > On Mon, Nov 26, 2012 at 9:48 PM, David Carlisle <davidc@nag.co.uk> wrote: >> >> On 26/11/2012 13:47, James Clark wrote: >>> >>> The write-up is here: >> >> >> newlines are normalized by replacing any #xA character or #xD/#xA >> character sequence, by a #xA character. >> >> I think that first xA should be xD >> >> David >> >> >
Received on Wednesday, 5 December 2012 04:51:16 UTC