Re: Eliminate @profile? from Manu Sporny on 2011-07-19 (public-rdfa-wg@w3.org from July 2011)

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Mon, 18 Jul 2011 22:43:57 -0400
To: public-rdfa-wg@w3.org
Message-ID: <4E24EF6D.1020801@digitalbazaar.com>
On 07/14/2011 03:08 PM, Shane McCarron wrote:
> But I wanted to get this out whilst it is still fresh in everyone's 
> mind.

Wow, Shane... when does the paperback of this epic e-mail come out? :P

> Oh... and -10 to removing @profile

I'm currently a +1 to remove @profile due to the headaches it will cause
in browser implementations and because of the headaches it will cause me
in the SAX-based implementation. I say this fully realizing that I was
one of the main proponents, along with Shane, of the @profile feature in
RDFa.

> I can envision an RDFa API 
> implementor wanting to break the processing model by evaluating the 
> document piecemeal as requests for triples are made, but I don't 
> think that is feasible REGARDLESS of whether or not @profile is 
> supported. 

This didn't come up in the conversation, and I don't think it was
suggested that the document is evaluated in a piecemeal fashion. Rather,
the point was raised that if a @profile could not be dereferenced that
we would have to do one of two things:

1. Tell people not to process the subtree.
2. Tell people to not process the entire document.

> For example, a given triple in the middle of a document 
> might reference a bnode that is from some other part of the
> document. You wouldn't know if that reference made sense unless you
> had processed the whole document, right?  (Note - that might be a
> red herring but it seems important to me right now).

True, and this would mean that if just one @profile was not available in
the document that the entire document shouldn't be processed. This makes
the case stronger for removing @profile, IMHO.

> I think it is a fundamental requirement of the processing model that 
> the entire document is processed and its triples identified before 
> the RDFa API can perform any operations on the document.  

I think everyone agrees on this point.

> If this is 
> the case... then surely the issue of whether a profile is accessible 
> is resolved long before the RDFa API has to do anything?  

The question wasn't that - it was: how does the API notify the developer
that the page data is ready to be processed? So there is a onload event
- did we need another type of event? "ondata" for instance? Or is the
RDFa API lazy? You can call it at any time?

> All it 
> would need do is query the triple store returned from the RDFa 
> Processor and return the triples that match the query.  (I know the 
> RDFa API has other capabilities, but they all come down to 
> manipulating triples or finding data related to certain components
> of triples).  If there are no triples because a profile was 
> unavailable... okay.

I don't think that's okay anymore. It's the trade-off - which use cases
are we attempting to address with @profile? The main one in my mind is
Microformats. In which case, this would work:

<div vocab="http://microformats.org/vocab#" typeof="hcard">
   <span property="fn">Shane McCarron</span>
</div>

However, one of the main questions we were contemplating was what
happens /before/ the step when a developer starts using the API. How
does the developer know when their data is ready?

>> Loading a profile is problematic due to HTTP same-origin 
>> restrictions [3]. This can be alleviated by insuring that profile 
>> documents are all CORS enabled, but it remains a complicating 
>> factor. Not all authors will be in a position to set CORS policy 
>> for served profiles.
> 
> Do you honestly believe that there will be that many important 
> profiles?  I agree that Bob's Auto Shop might find setting up CORS 
> and a profile challenging, but they aren't going to do it anyway.
> There will be a handful of important profiles, and those will work 
> right.  Almost by definition... since if they do not work right you 
> won't get any triples.  Organizations like Dublin Core and Creative 
> Commons will ensure their profiles work right, their CORS is set up 
> right, etc.  How could they not?  It's in their own interest, it's 
> not actually hard if you can read and use an editor, and it is 
> REQUIRED for their content to be used on the semantic web via RDFa. 
> If you are building a profile, you are building it for use in RDFa 
> via @profile.  q.e.d.

Yes, but the simple fact remains that in order to /solidly/ support
@profile, CORS or not, either hacks are required of JavaScript code, or
it needs browser support. I think this was one of the driving factors
during the discussion. If there is no browser support for CORS, or a
profile is not CORS-enabled, the RDFa API is not easily implementable in
JavaScript. If we remove @profile, that no longer holds true. It's a
pretty big reduction in complexity and our @profile use cases are still
supported.

> Yes, but we have always said this is such a vanishingly small issue 
> that it is not worth worrying about, other than to define what a 
> conforming processor should do when it occurs.

The bigger issue is that the processor has to go out to the network at all.

>> Note that the default profile does not present the same problems, 
>> since it is assumed that RDFa processors will internally cache the
>>  default profile. Concerns were raised about the relatively closed
>>  nature of relying on the default profile for prefix definitions, 
>> as frequent changes to the profile place a burden on processor 
>> developers, and even with a simple registration form, it places a 
>> barrier to entry and is generally not in the open nature of RDF.
> 
> I guess I agree that there might be a barrier.  However, we never 
> envisioned that the default profile definition would change 
> frequently. Surely the collection of interesting terms will not 
> expand quickly.  And it is even less likely that new, interesting 
> vocabularies that are expected to be included in a default profile 
> will arise daily!  The vision was that, once a year or so, there 
> might be a reason to revise the profile.  At least, that was my 
> vision.  Moreover, since a profile has an explicit URI, I can 
> reference it via @profile and KNOW that I am getting the collection
> I wanted at the time I wrote my document (or set up my web site /
> CMS, or whatever).

Yes, but I would imagine that the default profile would continue to live
on where it does and be updated using the mechanism we have outlined. We
would just tell people to include/cache the data in their processors.
Perhaps we'd make the document available in a very simple to use set of
key-value pairs instead of requiring an RDFa vocabulary to describe it?

> However, it means that organizations outside of our own have no way 
> to define the collections of prefixes and terms that are relevant to 
> their content developers (schema.org, facebook, the news people). The
> major reason to support @profile was to permit those organizations
> and others to override the defaults.  Without such a mechanism, there
> is no way, for example, for me to ensure that MY authors are
> restricted to using terms I want and prefixes that map to what I
> mean.

That's true, this is bad. However, we'll see if these organizations take
issue with the removal of @profile. I know the ePub 3.0 folks were using
@profile heavily - let's see if they think this is an awful move.

I agree that having absolutely no way to declare terms is bad - perhaps
we should re-visit Mark's @token proposal? Or re-consider the merging of
@prefixes and @tokens. We've done this successfully in JSON-LD. I
remember there being one technical reason we didn't do it in RDFa...
Nathan raised it, but it slips my mind at the moment.

> In the absence of @profile, the only way I as a content author or 
> publisher with captive authors can be confident that the semantics 
> are EXACTLY what I want is to.... what?  Define all my prefixes 
> explicitly on the document element and require that my authors only 
> use scoped terms? I could probably define my own vocabulary via 
> @vocab...  But as currently defined @vocab doesn't clear the terms 
> from the default profile out of my context.

We could make @vocab override the default profile. That approach has
it's own set of downsides, but we may be able to live with those?

> I feel very strongly that letting the RDFa API drive the definition 
> of RDFa is a mistake.

I agree. I also note that the RDFa API may have just highlighted
something that was an issue and we've since (possibly) figured out a way
to support the @profile use cases without the feature.

> And I disagree that there are no community requirements for this.  I 
> am sure that Ivan has an opinion about this.  Others will chime in. I
> am in the community.  And I have a requirement for it.  Mark Birbeck
> is in the community.  He has a requirement for it.

I don't know if Ivan and Mark had a requirement for this. I'm sure
they'll speak up... but I thought the major requirement was the
Microformats community and vocabulary/term mixing.

> Here's my requirement.  I need to have my triples be deterministic 
> (modulo the network failing).  And I don't want to create an infinite
> number of cargo cult programmers in order to achieve this. It is
> insufficient to say authors can declare all the prefixes the want at
> the top of their documents.  That's how we got in this mess in the
> first place.

True, but it may be sufficient to say: "Declare the @vocab if you want
to use terms". That seems simpler to me. I agree that it's not as nice
as @profile. However, it may be that a modified @vocab attribute without
@profile is the right balance.

> I want to be able to tell my authors "use this profile
> and you will be able to embed all the semantics that we need in our
> environment" and that's it.  

Why couldn't you say: "Use this vocabulary and you will be able to embed
all the semantics that we need in our environment"?

> Moreover, since it IS a profile, and the
> profile is in RDFa, my authors can LOOK AT IT!  It is an HTML 
> document.  It defines terms and prefixes.  It is self-documenting. My
> authors can know IMMEDIATELY what they can use in documents they 
> write for me.

True, one of the many benefits of having a @profile in the first place.
You could kinda do the same for @vocab, but you wouldn't have control
over it on an application-by-application basis.

> If the API is hard to implement or might be laggy, change the API. As
> written today the processing model is straightforward to implement.
> Many of us have done it.   You don't actually need any new event
> notifications - the existing model is fine.  The RDFa API needs to
> know when the document has been parsed.  Once that is done, 
> everything about the document is readily available.  Could that take 
> a little time? Sure.  But it isn't "overhead" on every HTML
> document. It is overhead on HTML documents where a script author has
> made an RDFa API request to retrieve semantic data.  And there WILL
> be overhead to retrieve that data.  Will there be MORE overhead on a 
> document that references a profile that has not yet been cached by 
> the user agent?  Yep!  Once. Then it will be cached.  Your 
> implementation doesn't cache profiles it has retrieved?  Why not? Fix
> it.

All compelling points. There are some nits I have with "you don't
actually need any new event notifications", but the rest of the points
hold in spite of that.

>> At the time profiles were introduced, there was no mechanism for 
>> importing standard prefix definitions in bulk. For almost all 
>> cases, a built-in default profile definition addresses this issue. 
>> Going further to allow for arbitrary profile introduction may be 
>> going to far, at least for this particular version of the 
>> standard.
> 
> See above.  I strongly disagree.  The time to do this is now.  There
> won't be a next time.  We are not requiring that everyone use it. 

True, but the browser vendors don't want to implement any feature that
requires them to do work that they feel is unnecessary to support the
use cases. The @profile feature is something that they /really/ don't
like for two reasons:

1. It requires event/callback complexity at the API level that
   Microdata doesn't have.
2. It requires extra document fetches when the @vocab solution doesn't
   while addressing almost all of the use cases we had for @profile.

> And there won't be a million little profiles out there.  But there 
> will be *some*.  And we can't envision what those will be or which 
> will succeed.  In the absence of this mechanism I will be forced to 
> declare all of the prefixes I care about every time.  I cannot rely 
> upon the default profile because IT CAN CHANGE UNDERNEATH ME and I 
> have no announcement mechanism! (I know, there goes Shane, bitching 
> about announcement mechanisms again.)

Another very good point.

So, of the many points that Shane made, I believe that the following
would have to be resolved in order to remove @profile support:

1. Why is waiting for all @profile documents to load and then
   proceeding a bad thing? What makes it technically challenging to
   implement in a browser?
2. Is there an announcement mechanism for RDFa Core 1.1? We removed
   @version and pseudo-replaced it with @profile. Do we need to
   re-introduce @version? If we don't do this, an RDFa 2.0 processor
   may accidentally corrupt the intent of an RDFa 1.1 document.

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: PaySwarm Developer Tools and Demo Released
http://digitalbazaar.com/2011/05/05/payswarm-sandbox/
Received on Tuesday, 19 July 2011 02:44:37 UTC