Re: Comments on PR-rdfa-syntax-20080904

Hi Danny,

On Wed, Sep 10, 2008 at 2:08 PM, Danny Ayers <da@talisplatform.com> wrote:
> While I work for a member company (Talis), the opinions below are my own,
> though I suspect the summary at least will coincide with that of the company
> rep (cc'd).
>
> Hope you can make sense of my plain-text markup, sections: Summary, General
> Points, Substantive Points, Editorial, Nitpicking

I can make sense of them, no problem.

I'm firing off a quick 'unofficial' reply, if you don't mind, to
elicit a little more information, and perhaps give some background
that might answer some of your issues.

But obviously there will be a more 'formal' reply from the group at
some point, as required by the process.


> *** Summary ***
>
> While I believe the document likely contains all the information necessary
> to use RDFa I can't tell for sure. The way it's currently organized leads me
> to suggest it needs one or two more revisions before proceeding any further.

Obviously you are entitled to say this, but this is the first time
I've heard any such comments. The document has been reviewed by a lot
of people over the last year or so, and in general the comments have
been positive.

So do you think you could be more specific than you have been?


> IMHO it could use compressing, making more formal, and significant chunks
> moving to other docs - some of the informative bits to the primer...

The primer is aimed mainly at authors, whilst this document is aimed
primarily at implementers. The general feeling is that the RDFa spec
should be self-contained, and provide everything that an implementer
needs to produce an XHTML+RDFa processor.

(Also, removing all of the informative material would probably make
the document more difficult, not less, since it would be very terse.)


> ...the CURIE def to another spec.

You are exactly right, and there is a separate CURIE spec. However,
the timing of the specifications means that we can't refer to it
normatively from the current RDFa spec.


> Pragmatically, as it stands I suspect most
> publishers/consumers & parser authors will simply get confused.

I don't want to sound overly sensitive here, since of course any spec
can be improved upon. :)

But the number of implementations that *already* exist [1], and the
positive comments we have received from reviewers and implementers do
make me think that the spec is not that confusing.

However, if you have very specific proposals, they would be most welcome.


> *** General Points ***
>
> a large proportion of this doc is informative (and a lot of small sections
> gives the impression of it being piecemeal, rather than a coherent spec)
>
> I think it would allow more flexibility in future spec creation if CURIES
> were defined in an independent spec

As above...you are exactly right. :)

There is such a spec [2], and if it was further through the W3C
process, we would be able to refer to it normatively.


> while Relax NG might not be as widely adopted as DTDs, for the purposes of a
> specification like this, such a description would be a lot more helpful than
> the DTDs

This module has been designed specifically to work with XHTML
Modularisation, which currently uses DTDs. You are exactly right about
Relax NG though, and Shane has been working on a new version of M12N
that uses it. But that is a little way off, and for now DTDs are the
only formally supported mechanism (i.e., in the sense of the
specifications).


> the distinction rendered data vs. structured data doesn't seem clear

We aren't really concerned with the rendered data. What are you
thinking in particular that we should clarify?


> how does a parser distinguish between intentional RDFa and HTML tag soup?

It doesn't. In general the use of existing XHTML attributes like @rel
and @href will generate triples that one could reasonably infer from a
standard HTML document. And since @about, @datatype and other such
attributes don't exist already, then we don't think we're going to get
any 'false positives'.

Can I just ask for clarification, whether you are asking this
question, merely because you think the spec should be clearer? Or
whether you are asking because you have a strong opinion on this? If
the latter, then it would be worth raising your doubts in a separate
thread, since there has been a great deal of discussion about this
issue in the past.


> *** Substantive Points ***
>
> * How to Read this Document *
> "...authors don't need to understand RDF to use it"
>
> While I appreciate the intent, I believe this statement to be wrong -
> accurate communication (of data) requires both the producer and consumer to
> understand the language. Suggest rewording to something along the lines of:
> "...authors don't need complete understanding of RDF to use it"

I don't think it is true that an author needs to understand RDFa.
There are many scenarios where they might need to do little more than
cut-and-paste. The full sentence that you have quoted from, says this:

  Although RDFa is designed to be easy to author—and authors don't need to
  understand RDF to use it—anyone writing applications that consume RDFa
  will need to understand RDF.


> * 3.1. Statements *
>
> "A statement is a basic unit of information that has been constructed in a
> specific format to make it easier to process"
>
> ew.

That could perhaps, be improved upon... :)


> * 3.7. Graphs
>
> appears unfinished...

How so? A graph is a collection of triples.


> * 4.1. Document Conformance *
>
> This seems a bit messy...

You seem to use a lot of this kind of wording in your comments.


> ... both substantively & editorially. For starters, where
> is @version in the http://www.w3.org/1999/xhtml namespace?

I don't know what you mean by that. In section A.3 [3] you'll find
references to it in the DTD. Is that what you mean, or do you mean
something else?


> * 4.3. RDFa Processor Conformance *
>
> "A conforming RDFa Processor MAY make available additional triples that have
> been generated using rules not described here, but these triples MUST NOT be
> made available in the [default graph]. (Whether these additional triples are
> made available in one or more additional [RDF graph]s is
> implementation-specific, and therefore not defined here.)"
>
> This seems over-constrained. If I have a doc which contains RDFa plus GRDDL
> plus [something not yet defined] RDF-in-HTML data, I would expect them to at
> least be able to be interpreted as a single graph. i.e. the graph scope
> should be the document, not the RDFa processor's interpretation. (That's
> assuming "default graph" is meant to mean what I think - it's not defined
> here as far as I can see). I don't see how Appendix C. Deployment Advice
> fits in here either.

The problem with the approach you suggest is that it is then difficult
to establish what is and is not a conforming processor. If I were to
add a rule to my processor that added triples based on the value of
@class or @alt, would that be conforming or not? If I had a processor
that ignored certain triples, would that be conforming?

So the approach we took was to ensure that there would be *at least*
one graph that represented all triples in the XHTML document that
could be obtained with the core rules. This graph is required
*exactly*, i.e., it should not contain more or less triples than the
rules state should be obtained.

Now, if some processor wants to then interpret Microformats like
@class="hcard", or to run a GRDDL transform, or whatever, they can do,
but those triples should not be part of the default graph. This allows
people to experiment with new features in RDFa without users having to
worry that the processor they are using is 'non-conformant'.


> * 5.2. Evaluation Context *
> This section seems loosely defined for normative material. I don't think
> it'd take much effort to tie it to the XML DOM, in a similar fashion to 5.5.
> Sequence (SAX).

Well, that's two separate statements. I assume you are not saying
"it's loosely defined because it doesn't use DOM terminology"?

I believe you are saying that (a) it's loosely defined, and (b) you
think it would be better defined in terms of the DOM.

I have to disagree that it's loosely defined, but would welcome any
precise comments you have on where it could be improved. I think a
general statement that it could be improved by adopting DOM
terminology is not really good enough, since even after making the
change to please you, someone else could come along and say that they
think it is loosely defined because it doesn't use the post-schema
validation infoset terminoloyg...and so on.

So if you think it's loosely defined, I think you need to say why, in
terms of the spec as it is now.


> Use of [] on the CURIE attributes seems inconsistent.

Again, forgive me if I ask you for something more precise than "seems".

Where exactly is it inconsistent? There are places where we need to
'escape' CURIEs so that they don't get mixed up with URLs, but are you
saying that even after taking that into account there are
inconsistencies?

Or do you mean that this point has not been made sufficiently clearly?


> // here I got lost
>
> * 9.3. @rel/@rev attribute values *
>
> unnecessary repetion of HTML defs - a single example would do (i.e.
> rel="cite")

I think the def's example is a reasonable one, since it's something
that people do a lot.


> *** Editorial ***
>
> various places:
> mark-up => markup

Interesting. :) I always use the former, but I notice that the W3C
'house style' is to use the latter. Speaking for myself, I agree that
this should be changed.


> * 2.1. The RDFa Attributes *
>
> the "X in RDF terminology" bits should perhaps be linked to corresponding
> places in the RDF Primer

That's an interesting point. I guess it would be useful for some
people, although anyone who is not familiar with one term is probably
not familiar with all of them -- and so should read the whole primer.
:)

But it's a good point, and speaking for myself I'd support your suggestion.


> * 3. RDF Terminology *
>
> Sytax => Syntax

Thanks.


> There are a few unfulfilled refs like <em>[triples]</em>

Thanks again.


> In the informative sections, it may be more reader-friendly to use
> 'property' rather than 'predicate'.

I know that this distinction was made quite consciously in early
drafts, because when you are talking about both XHTML DOMs _and_ RDF
in the same document, it is useful to have a way of distinguishing
'property' from 'property'...

...if you see what I mean. :)

It will be interesting to see what others think.


> consider merging 3.1 Statements with 3.2 Triples

I think the current structure distinguishes between the quasi-human
readable 'statement', and the machine-readable 'triple', which is
useful for those new to RDF.


> *** Nitpicking ***
>
> * Abstract *
>
> "The modern Web is made up of an enormous number of documents that have been
> created using HTML."
> =>
> "The current Web is primarily made up of an enormous number of documents
> that have been created using HTML."

I like "primarily", although I don't mind either way on "modern" and
"current". :)


> ----
>
> "RDFa is a specification for attributes to express structured data in any
> markup language."
> Is RDFa specified for any language other than XHTML? Where?

Not by us, but Yahoo! have incorporated RDFa into DataRSS [4] which
can in turn be carried in Atom and RSS.

Also, most RDFa parsers will process RDFa if it occurs in HTML, so an
obvious next step will be to define HTML+RDFa.

The general point is that as we refined the spec, we always ensured
that nothing we did would jeopardise the consistent use of the same
rules in other markup languages.


> ----
>
> "rendered data can be copied and pasted along with its relevant structure"
> Yuck, something more like this seems more appropriate:
> "rendered data can be manipulated and reused along with its relevant
> structure"

Sure...it's as long as it's short...

:)


> ----
>
> * Motivation *
> "RDF/XML [RDF-SYNTAX] provides sufficient flexibility to represent all of
> the abstract concepts in RDF [RDF-CONCEPTS]."
> - with certain limitations, e.g. properties which can't be expressed as
> qnames can't be serialized as RDF/XML, e.g. http://example.com/1234

Good point.

How about "most of"? :)


> ----
>
> 'hard-wired' - not sure everyone will understand, anyone got a synonym?

Really? Speaking only for myself, I don't mind if it's changed, but I
don't think it's an unusual expression.


Thanks for your comments, and once again, just to point out that this
isn't a formal reply, but more an attempt to get some clarification on
some of your issues.

Regards,

Mark

[1] <http://www.w3.org/2006/07/SWD/RDFa/implementation-report/>
[2] <http://www.w3.org/TR/curie/>
[3] <http://www.w3.org/TR/2008/PR-rdfa-syntax-20080904/#a_DTD_driver>
[4] <http://developer.yahoo.com/searchmonkey/smguide/datarss_primer.html>

-- 
Mark Birbeck, webBackplane

mark.birbeck@webBackplane.com

http://webBackplane.com/mark-birbeck

webBackplane is a trading name of Backplane Ltd. (company number
05972288, registered office: 2nd Floor, 69/85 Tabernacle Street,
London, EC2A 4RR)

Received on Wednesday, 10 September 2008 17:33:35 UTC