Re: FragIds in semantic web (ACTION-543) from Jonathan Rees on 2011-04-22 (www-tag@w3.org from April 2011)

From: Jonathan Rees <jar@creativecommons.org>
Date: Fri, 22 Apr 2011 08:57:54 -0400
To: Jeni Tennison <jeni.tennison@googlemail.com>
Cc: Larry Masinter <masinter@adobe.com>, "www-tag@w3.org List" <www-tag@w3.org>
Message-ID: <BANLkTi=RvUVpFE1_X13ZEyUv54icqmOw3g@mail.gmail.com>
On Wed, Apr 6, 2011 at 6:18 PM, Jeni Tennison
<jeni.tennison@googlemail.com> wrote:
> Hi Larry,
>
> I have an action (ACTION-543: Propose addition to MIME/Web draft to discuss sem-web use of fragids not grounded in media type) to propose some wording to slot into your "MIME and the Web" draft which I'm taking to be the version at:
>
>  http://tools.ietf.org/id/draft-masinter-mime-web-info-02.html
>
> You already have a Section 4.6 (Fragment identifiers) which touches on the issue, so I suggest extending that to read something like:
>
> ---
>  The Web added the notion of being able to address part of an entity
>  and not the whole content by adding a 'fragment identifier' to the
>  URL that addressed the data. Of course, this originally made sense
>  for the original Web with just HTML, but how would it apply to other
>  content types? The URL spec glibly noted that "the definition of the
>  fragment identifier meaning depends on the Internet Media Type", but
>  unfortunately, few of the Internet Media Type definitions included
>  this information, and practices diverged greatly.
>
>  Content negotiation becomes extremely difficult when the interpretation
>  of fragment identifiers depends on the MIME type as there is no
>  guarantee that the syntax of a fragment identifier that is legal for
>  one MIME type is also legal (or interpreted in an equivalent way) for
>  another MIME type. For example, the common `#identifier` syntax for
>  HTML is not consistent with the XPointer-based syntax defined for XML.

I don't understand this. Can you explain or give an example? It seems
to me that XPointer and HTML were designed to be compatible - in fact
they overlap in the application/xhtml+xml media type.

Are you referring to RFC 3023, or to 3023bis?

The problem of inconsistency between simultaneous representations has
been raised before and should be raised here - it's not syntactic,
it's semantic. If I have French and Spanish HTML files, both with #foo
fragids, the fragid "identifies" different elements in the two
documents - since the documents are different. Yet there is no problem
in practice as long as the elements serve the same function in
interaction (they "say the same thing" in the two different
languages).

I think the AWWW story about consistency between fragment identifier
meaning among representations is probably worth repeating here, or
referring to, not because it is a complete solution to the problem but
because it's the closest thing we have so far to an interpretation of
3986 that makes sense.

>  This is exacerbated in common semantic web practice, which not only
>  makes heavy use of content negotiation but in which URLs with fragment
>  identifiers are used to identify real-world Things.

Ouch! Since when is the Web not part of the real world? Just read the
newspapers...

I think you should just say "things that are not document fragments".

I am not convinced that conneg gets heavy use in semantic web contexts
- I once spent an hour or so trying to find a single example of RDF
conneg, and the only one I found was FOAF. Just for my own
edification, could you elaborate on this?

>  In these cases,
>  the URI as a whole is used to identify the real-world Thing, and the
>  fragment identifier does not address a part of any entity, so
>  interpreting the fragment identifier based on the MIME type of whatever
>  entity happens to be returned does not make sense.

If you look at the registration for application/rdf+xml you'll see
that it treats this situation explicitly. That is, it provides a
"follow your nose" story that goes from the URI to its RDF-aware
non-fragment referent. It is also compatible with 3986 as long as you
admit that "fragment" has been generalized and is now a misnomer. I
don't think there is any problem with the specs for this particular
media type, or for the newly submitted N3 and Turtle media types. I
haven't checked the other RDF serializations such as
application/owl+xml and text/owl-manchester. (Actually I'm having a
hard time finding these registrations at all...)

As far as I know the use of fragids in RDF is limited to situations
where there is a representation that has a media type that allows this
use.  The exception is the new RDFa specs, which we have discussed.
They include examples that use fragids in a way not documented by the
media type registrations. This is a new practice and it could be
repaired in several ways:
  - by amending the registrations (text/html and application/xml = 3023)
  - by amending 3986
  - by making AWWW more aggressive in specifying fragid semantics
  - by discouraging the new practice

You have couched the RDF problem as generic across all media types but
as far as I know it is limited (so far) to the approximately 10 RDF
serializations. As you say the most parsimonious solution might be
fully generic, rather than attacking the registrations one by one, but
that's something we need to figure out.

Here's something interesting that I've just thought of - you might
think that a fragid can at least be resolved locally, i.e. if the
reference occurs in representation A then its definition can be
expected to be found in representation A. But it seems likely that in
RDF-conneg situations you might have a reference in A that would have
to be resolved in representation B, e.g. the RDF version linking to
the HTML version or vice versa. Does this happen much in practice? We
see occurrences of FOAF RDF fragids in the FOAF HTML file, for
example, and maybe vice versa.

For me the biggest problem with conneg+fragid is that it destroys the
follow your nose story. Given a particular fragid there is no way to
know ahead of time which representation it's defined in and therefore
what to try to conneg for. If you get it wrong, there's no way to
iterate through all the representations looking for the fragid
definition, except in the unlikely event the server does TCN.

I think the RDFa WG's justification for encouraging off-label fragids
without any commitment to modifying the specs is that fragid semantics
are *already* broken in so many ways that it's not worth trying to
play by the rules, and breaking them in one more way won't make any
difference to anyone. I'm not happy with that but can't argue with it.

Best
Jonathan
Received on Friday, 22 April 2011 12:58:21 UTC