Re: ARC updated to latest syntax doc, WD feedback from Mark Birbeck on 2008-02-25 (public-rdf-in-xhtml-tf@w3.org from February 2008)

From: Mark Birbeck <mark.birbeck@x-port.net>
Date: Mon, 25 Feb 2008 21:40:13 +0000
To: bnowack@semsol.com
Cc: public-rdf-in-xhtml-tf@w3.org
Message-ID: <a707f8300802251340x110b3be5p9f9baa521e39faf2@mail.gmail.com>
Hi Benjamin,

Great review...thanks.

>  the latest ARC2 revision (2008-02-25)[1] has an updated RDFa extractor
>  which passes all approved test cases (I think). I've also updated the
>  extractor[2] linked from CrazyIvan. For some reason, I get a FAIL on
>  some tests there, although I'm pretty sure the extractor is compliant.
>
>  I live-logged[3] my way through the spec, which might be helpful as
>  feedback on the WD (warning: it's a little tongue-in-cheek and/or
>  impatient here and there). I also only looked at the processing
>  instructions, w/o really reading anything else of the spec.

Thanks. The idea of the section _prior_ to the processing section is
to explain some of the concepts, such as chaining. The idea of the
section _after_ the processing section is to go into more detail on
the steps described--a kind of 'annotation'.

So if there are gaps after reading all three sections, then we do need
to correct them.


> Here is a
>  short (chronological) summary for (slightly) better readability:
>
>  11:19:14   * there are almost twice as many steps now, compared to the
>   previous spec. I would've expected a simplified final parsing process.

In the main this is because I broke quite a few steps into more
detailed versions of themselves after a number of comments. Also,
there were a couple of places where the steps were just plain wrong,
and would not have been sufficient to provide the functionality we
needed, so extra steps had to be added.


>  * step 9 is missing

There must be a mark-up error, since all lists use <li>. But
thanks...well spotted!


>  * CURIE is not a valid abbreviation for "compact URI", should be cURI,
>   or CURI, no? or do I need Marie Curie capabilities to spot the E? ;)

You are not the first to say that. Bridges and streets have been named
after people who have made far less contribution to humanity than the
CURIE family did. I'm not in a position to name a bridge after them,
but I can at least name a rather modest software technique (and
forthcoming specification) for them.


>  * steps 1-6 seem fine to me, easy to follow

Ok.


>  * step 7 seems to require a [new subject] in order to create a triple.
>   This is not explicitly mentioned in the intro sentence. (this is
>   different from step 6, which explicitly says "none of this ... if
>   there is no [new subject]")(nitpick)

The [new subject] is actually set in step 5. When in step 6 we say "if
there is no [new subject]", we mean "no [new subject] set in the
previous (fifth) step. The [new subject] property is listed in section
5.3.

Would you mind looking again at this and saying what aspects of this
could be tightened up to make it clearer? (Or were you reading too
quickly? ;))


>  * step 8: fine to me

Ok.


>  * step 10: understood, I guess it's identical to the previous spec version

Pretty much. The wording on child text nodes should be clearer in this
version, though.


>  * hmm, "once the triple has been created". There can be multiple, so maybe
>   s/the/a/, i.e. *any* XMLLiteral object stops recursion (as I understand
>   it)

Yes...good point. Thanks.


>  * step 11: "using the rules described here". What exactly does *here* refer
>   to? step 11, or the whole process sections

It's recursive, so it's the entire processing sequence. If I added
something like "beginning again at step 1", or something like that,
would that make it better?


>  * hmm, the distinction between the passed-in context, the current context, and
>  local variables is a bit confusing, I just used the passed-in context and
>  overwrote it's values while I was making my way through the process. That
>  worked fine in the prev. spec, but is apparently not correct now

No...that's right, and it took a lot of work to fix. :(

The problem arose when one of our very talented reviewers asked when
the 'incomplete triple' list should be reset. This led to the
discovery that if you just overwrite the contexts (including resetting
the list of incomplete triples), then there are many situations where
you start to get incorrect results on siblings.

The major change to solve this was to complete the triples _after_
doing the recursing. This had a couple of other benefits, and
generally seems to be more robust.


>  * hmm, I fail to grok the text in the blue box in step 11

It's an 'RDF thing'. :) If you have a statement where the object is a
bnode, and that bnode is not used in any other statements, then it's
pretty meaningless. It's a real edge case, and we were originally
going to just ignore it, but as a by-product of the fix that I just
described (postponing the completion of hanging triples until after
the recursion) it turned out you could actually fix this minor
annoyance.

So now, if an author does this:

  <div about="a">
    <div rel="b" />
  </div>

no triples will be generated. (Before we generated a triple with an
object of a bnode.)

Does this make any more sense? Do you have any suggestions on how the
wording could be improved? Or is it simply that it is the wrong place
to start trying to explain things. (I.e., we should move it out of the
processing steps.)


>  * "the final step (step 12, below) involves returning a flag. If the flag is
>  true, then incomplete triples are completed in the next step (step 11)"
>
>  * I *am* in step 11, s/next/this/ in he last sentence?
>
>  * ok, I'll try to read it as "text correct, numbers wrong", then 12 is the
>  final step, and step 11 is actually step 10 and can point at step 11 as "next
>  step"

Yes...you are right that the numbering has gone up the spout.


>  * "after having recursed into the processing of descendants" (could this be
>  said in simpler words?)

"But no simpler"? :)

I guess...any suggestions?


>  * ok, I've done step 13 (12) now, that was easy

:)


>  * now back to 12 (11)
>
>  * the "not the local list of incomplete triples" hint is helpful, I would've
>  been confused again w/o it

Good.


>   * in step 12, everything from the 3rd block sentence ("Note that if [new
>  subject] is a bnode, then ... during this step") should perhaps be moved to
>  a separate block. The first 2 sentences tell what I should do, I had the
>  impression I had to add a bnode check for new subject here after reading on.

By "the first 2 sentences" do you mean before the blue block? If so,
are you saying that effectively that's clear enough, and to try to add
the point again in the blue block is confusing?


>   * "if direction is not 'forward'": are there any other possible values than
>  "reverse"?

No. It's essentially an 'else' statement, but using a test. (It just
shows that I'm unable to escape years of programming in assembler...if
ever we had two conditions A and B, we'd always test for A and not A,
so that you had a fallback.)

I guess we could change this if people think it's confusing.


>  17:45:37 * pass/fail 63/0

Hurray!


>  I had to tweak the processing rules to get there, though. Something seems to
>  be incorrect (or confused me) in the context of processing incomplete triples:
>
>  * step 12: "If the [skip element] flag is 'false' ...": this condition should
>  be removed. Otherwise, a number of test cases don't get a PASS.

Which ones?

Not having this rule was a major flaw before, as pointed out by Ivan,
Ed and Diego. If we don't have this rule then all of the processing
rules were falling apart in situations like this:

  <div about="a">
    <div>
      <div rel="b">
        <div>
          <div resource="c" />
        </div>
      </div>
    </div>
  </div>

(The previous rules were often completing all the triples on the empty
elements.)


>  HTH,

Very much so...thanks.

Regards,

Mark

-- 
  Mark Birbeck

  mark.birbeck@x-port.net | +44 (0) 20 7689 9232
  http://www.x-port.net | http://internet-apps.blogspot.com

  x-port.net Ltd. is registered in England and Wales, number 03730711
  The registered office is at:

    2nd Floor
    Titchfield House
    69-85 Tabernacle Street
    London
    EC2A 4RR
Received on Monday, 25 February 2008 21:40:25 UTC