Re: Eliminate @profile? from Shane McCarron on 2011-07-14 (public-rdfa-wg@w3.org from July 2011)

From: Shane McCarron <shane@aptest.com>
Date: Thu, 14 Jul 2011 14:08:03 -0500
To: public-rdfa-wg@w3.org
Message-ID: <4E1F3E93.3060202@aptest.com>
Forgive me.  I missed the meeting and I am still reeling from this.  My 
comments are inline and are not as well thought out as they should be.   
But I wanted to get this out whilst it is still fresh in everyone's mind.

Oh... and -10 to removing @profile

On 7/14/2011 12:44 PM, Gregg Kellogg wrote:
> On today's  RDF Web Apps call [1], there was some discussion of 
> @profile. ISSUE-96 [2] relates to document ready. I encourage people 
> with an opinion on the use of @profile in RDFa to voice their opinions.
>
> Basically, until all @profile documents are loaded and processed, an 
> application cannot reliably access the RDFa API because the URI 
> resolution of types and properties cannot be reliably resolved until 
> all term and prefix definitions have complete. Also, the failure to 
> load a profile can mean that an entire sub-tree of the document must 
> be ignored.

The RDFa processing model requires that the document be evaluated from 
beginning to end, and that @profile attributes are processed as they are 
encountered.  RDFa doesn't allow for ad hoc evaluation of *parts* of a 
document / DOM tree.  Assuming I am correct in this, my response to the 
above has to be "so?"  I can envision an RDFa API implementor wanting to 
break the processing model by evaluating the document piecemeal as 
requests for triples are made, but I don't think that is feasible 
REGARDLESS of whether or not @profile is supported.  For example, a 
given triple in the middle of a document might reference a bnode that is 
from some other part of the document.   You wouldn't know if that 
reference made sense unless you had processed the whole document, 
right?  (Note - that might be a red herring but it seems important to me 
right now).

I think it is a fundamental requirement of the processing model that the 
entire document is processed and its triples identified before the RDFa 
API can perform any operations on the document.  If this is the case... 
then surely the issue of whether a profile is accessible is resolved 
long before the RDFa API has to do anything?  All it would need do is 
query the triple store returned from the RDFa Processor and return the 
triples that match the query.  (I know the RDFa API has other 
capabilities, but they all come down to manipulating triples or finding 
data related to certain components of triples).  If there are no triples 
because a profile was unavailable... okay.


>
> Loading a profile is problematic due to HTTP same-origin restrictions 
> [3]. This can be alleviated by insuring that profile documents are all 
> CORS enabled, but it remains a complicating factor. Not all authors 
> will be in a position to set CORS policy for served profiles.

Do you honestly believe that there will be that many important 
profiles?  I agree that Bob's Auto Shop might find setting up CORS and a 
profile challenging, but they aren't going to do it anyway.  There will 
be a handful of important profiles, and those will work right.  Almost 
by definition... since if they do not work right you won't get any 
triples.  Organizations like Dublin Core and Creative Commons will 
ensure their profiles work right, their CORS is set up right, etc.  How 
could they not?  It's in their own interest, it's not actually hard if 
you can read and use an editor, and it is REQUIRED for their content to 
be used on the semantic web via RDFa.  If you are building a profile, 
you are building it for use in RDFa via @profile.  q.e.d.

>
> A profile may fail to load because of network connectivity issues, 
> meaning that the same document may be interpreted differently 
> depending on environmental conditions.

Yes, but we have always said this is such a vanishingly small issue that 
it is not worth worrying about, other than to define what a conforming 
processor should do when it occurs.

>
> Multiple profiles may define the same term differently, which could 
> lead to confusion (on the part of the user, behavior is well-specified 
> within the processing rules).
>

Nothing we can possibly do here will change this.  Hell, people 
interpret the meaning of the existing @rel values differently in HTML4.  
And there was only ONE Profile for for that!  Even if there were no 
@profile, @vocab does the same time in an even more flagrant manner 
(since a processor doesn't dereference @vocab to infer meaning - it just 
trusts that there are terms).

> Note that the default profile does not present the same problems, 
> since it is assumed that RDFa processors will internally cache the 
> default profile. Concerns were raised about the relatively closed 
> nature of relying on the default profile for prefix definitions, as 
> frequent changes to the profile place a burden on processor 
> developers, and even with a simple registration form, it places a 
> barrier to entry and is generally not in the open nature of RDF.

I guess I agree that there might be a barrier.  However, we never 
envisioned that the default profile definition would change frequently.  
Surely the collection of interesting terms will not expand quickly.  And 
it is even less likely that new, interesting vocabularies that are 
expected to be included in a default profile will arise daily!  The 
vision was that, once a year or so, there might be a reason to revise 
the profile.  At least, that was my vision.  Moreover, since a profile 
has an explicit URI, I can reference it via @profile and KNOW that I am 
getting the collection I wanted at the time I wrote my document (or set 
up my web site / CMS, or whatever).

>
> Personally, I really see the advantage of a profile mechanism. In 
> addition to declaring prefixes, it allows an author to establish a 
> number of terms for use within their document. CURIEs have been 
> criticized (rightly or wrongly) as being to complex and error prone, 
> but terms are consistent with similar usage in Microdata and 
> Microformats, but it's not feasible to include a large number of terms 
> in a default profile, where term definitions may overlap between 
> different vocabularies.

Right.  But if I were an *author* who cared about a collection of terms, 
I would use @vocab.  @profile wasn't really targeted at authors.  It was 
targeted at taxonomy creators (e.g. microformats, that news thingy, 
dublin core, facebook, even schema.org) to make it easier for authors to 
rely upon the set of terms they need to express their content.

>
> However, the use of profiles is a substantially complicating factor in 
> RDFa. Removing it still makes RDFa authoring much simpler than other 
> RDF variations, as for most developers, almost all of the prefix 
> definitions they would need to use can be included in the default 
> profile. Also, the use of @vocab provides much of the benefits of a 
> custom profile when a single vocabulary is being used (e.g., 
> http://www.w3.org/2006/03/hcard or http://schema.org/). Also, custom 
> prefix definitions may still be introduced into a document using the 
> @prefix definition.

However, it means that organizations outside of our own have no way to 
define the collections of prefixes and terms that are relevant to their 
content developers (schema.org, facebook, the news people).  The major 
reason to support @profile was to permit those organizations and others 
to override the defaults.  Without such a mechanism, there is no way, 
for example, for me to ensure that MY authors are restricted to using 
terms I want and prefixes that map to what I mean.  Instead, they are 
going to map to whatever the implementation that happens to be parsing 
the content means.  Frankly, that's worse than useless to me.

In the absence of @profile, the only way I as a content author or 
publisher with captive authors can be confident that the semantics are 
EXACTLY what I want is to.... what?  Define all my prefixes explicitly 
on the document element and require that my authors only use scoped 
terms? I could probably define my own vocabulary via @vocab...  But as 
currently defined @vocab doesn't clear the terms from the default 
profile out of my context.  So I have no good way to control what is 
available in my context.  If my vocabulary defines "nert" my users will 
use that.  If one accidently types @rel="next" it will turn into a 
triple because I have no way to turn that off.  And if two years down 
the road the default profile learns some new term (e.g., 'security') and 
one of my authors mistakenly use that term in a document, I suddenly 
have a new triple that has some weird meaning that author and I never 
intended.  And I have no way to prevent it!  At least with @profile we 
have a way to explicitly reference the default profile and know what is 
getting imported into the context space (although there is no way to 
perform a 'reset' as there is with @vocab - I still think that's a 
mistake).


>
> This would also have the benefit that the RDFa API would not have 
> profile load latency issues
>
> * Potential same-origin problems in loading profile,
> * Profile loading relies on network connectivity,
> * Processing complication introduced due to profile load failure,
> * Latency introduced by having to load profile before proceeding with 
> application processing,
> * Need to add notification mechanisms to know when profile processing 
> has completed,
> * Potential difference in CURIE/term meaning based on multiple 
> overlapping profile definitions,
> * No clear community requirement for profiles other than the default. 
> (Sophisticated authors have other syntactic mechanisms to draw on).

I feel very strongly that letting the RDFa API drive the definition of 
RDFa is a mistake.  The API is a nice feature, and it works regardless 
of the issues above.  Latency, notifications, mutation, etc. are all 
issues in ALL client-side APIs.  Rare edge cases like the network 
working fine to load the initial page, but no longer working .5 seconds 
later when I want to load a profile, are no reason to throw out the 
feature.  It is trivial to implement the RDFa processor such that it 
calls back to the RDFa API when it is done processing... if that's how 
you want to implement it.  Until that callback has occurred, the RDFa 
API isn't ready.  I don't know what the RDFa API says about this today, 
but no matter what happens with @profile there will always be a gap 
between DOMReady and when the RDFa API can do something with triples.... 
so I am sure you handle this already (or will do).

And I disagree that there are no community requirements for this.  I am 
sure that Ivan has an opinion about this.  Others will chime in.  I am 
in the community.  And I have a requirement for it.  Mark Birbeck is in 
the community.  He has a requirement for it.

Here's my requirement.  I need to have my triples be deterministic 
(modulo the network failing).  And I don't want to create an infinite 
number of cargo cult programmers in order to achieve this.  It is 
insufficient to say authors can declare all the prefixes the want at the 
top of their documents.  That's how we got in this mess in the first 
place.  I want to be able to tell my authors "use this profile and you 
will be able to embed all the semantics that we need in our environment" 
and that's it.  Moreover, since it IS a profile, and the profile is in 
RDFa, my authors can LOOK AT IT!  It is an HTML document.  It defines 
terms and prefixes.  It is self-documenting.  My authors can know 
IMMEDIATELY what they can use in documents they write for me.

If the API is hard to implement or might be laggy, change the API.  As 
written today the processing model is straightforward to implement.  
Many of us have done it.   You don't actually need any new event 
notifications - the existing model is fine.  The RDFa API needs to know 
when the document has been parsed.  Once that is done, everything about 
the document is readily available.  Could that take a little time?  
Sure.  But it isn't "overhead" on every HTML document.  It is overhead 
on HTML documents where a script author has made an RDFa API request to 
retrieve semantic data.  And there WILL be overhead to retrieve that 
data.  Will there be MORE overhead on a document that references a 
profile that has not yet been cached by the user agent?  Yep!  Once.  
Then it will be cached.  Your implementation doesn't cache profiles it 
has retrieved?  Why not?  Fix it.

>
> At the time profiles were introduced, there was no mechanism for 
> importing standard prefix definitions in bulk. For almost all cases, a 
> built-in default profile definition addresses this issue. Going 
> further to allow for arbitrary profile introduction may be going to 
> far, at least for this particular version of the standard.

See above.  I strongly disagree.  The time to do this is now.  There 
won't be a next time.  We are not requiring that everyone use it.  And 
there won't be a million little profiles out there.  But there will be 
*some*.  And we can't envision what those will be or which will 
succeed.  In the absence of this mechanism I will be forced to declare 
all of the prefixes I care about every time.  I cannot rely upon the 
default profile because IT CAN CHANGE UNDERNEATH ME and I have no 
announcement mechanism! (I know, there goes Shane, bitching about 
announcement mechanisms again.)

If the working group decides to go this way, there's nothing I can do 
about it.  But it is short sighted.  If I wanted to be short sighted, I 
would have worked on HTML5.

-- 
Shane McCarron
Managing Director, Applied Testing and Technology, Inc.
+1 763 786 8160 x120
Received on Thursday, 14 July 2011 19:08:42 UTC