Re: HTML media type vs. # URIs that do not identify document elements from Jonathan Rees on 2010-02-09 (www-tag@w3.org from February 2010)

From: Jonathan Rees <jar@creativecommons.org>
Date: Tue, 9 Feb 2010 08:58:20 -0500
To: noah_mendelsohn@us.ibm.com
Cc: Dan Connolly <connolly@w3.org>, Ben Adida <ben@adida.net>, www-tag@w3.org
Message-ID: <760bcb2a1002090558ofa3c390m9d705dae5435814f@mail.gmail.com>
There are two different cases, and we probably ought to consider them
separately.

1. In my original message the same # URI is used to refer to two
different things: An element (as the text/html media type reg would
have it) and something else, say a person (as RDF would have it).

2. In the situation you describe the # URI is only used to refer to
the non-element; there is no element with the given name.

Your link checker would detect a problem in case 2 but not in case 1.
I'm not sure what business a link checker has in looking at URIs that
are only used in non-hypertext contexts (e.g. rel= but never href=),
but I guess that would be its prerogative.

In case 1 the risk would only be to an agent that knew about the RFC,
and from the existence of an HTML representation, inferred that the
URI referred to an element; and then later encountered RDF that put
the URI's referent in a class that's disjoint with element (e.g.
foaf:Agent), thus deriving a contradiction.

FOAF (http://xmlns.com/foaf/0.1/) follows pattern 2 (which is
unfortunate since on following one of its # URIs you're taken to the
top of the file, not to the specification of that # URI). I have not
found an instance of pattern 1 yet, but would be surprised if there
weren't one.

(By the way I just read the fine print in the 'best practices' note
http://www.w3.org/TR/swbp-vocab-pub/ and it advises using 303 for
namespace documents in the situation where HTML/RDF conneg is being
done. E.g. Void (http://rdfs.org/ns/void) follows this pattern. The
document at the end of the 303 is itself at risk, but the chance that
in this case there are non-element # URIs based on the 303 target URI
seems vanishingly small.)

I will take the answer to my original question as "no one who reads
www-tag has thought about updating the media type registration".
Perhaps it's not so important, as in five or ten minutes of searching
I found fewer instances of HTML/RDF conneg than I expected to, and
future instances may adhere to the SWBP note.

Jonathan

On Mon, Feb 8, 2010 at 7:07 PM,  <noah_mendelsohn@us.ibm.com> wrote:
> Dan Connolly writes:
>
>> [Noah Mendelsohn writes:]
>> > As to amending the media type specification: in principle I might be
>> > concerned, precisely because people could have invested in code that
>> > interpreted the failure to resolve as an error (at least in the same
>> > spirit that 404 is an error).
>>
>> What failure to resolve? Could you elaborate the case you have
>> in mind?
>
> I'll be glad to try.  I can at least propose something hypothetical that
> might be suggestive of the concern.  Imagine that someone had built an
> agent that helps debug broken links, perhaps for use in conjunction with a
> content management system.  Per the applicable specs,  if
> http://example.com/xyz.html#frag returns text/html, then the URI is a
> reference (or attempted reference) to an element in the file, and the tool
> should include it in a list of broken links if it doesn't resolve.  If
> instead the resource owner uses this as a reference to some other
> secondary resource [2], it will get incorrectly counted as a broken link.
>
> I think this is also in the spirit of the concerns that Eric Bowman
> raised.   I don't know if you'll find this particular example convincing,
> but it's the sort of thing I have in mind.
>
> Noah
>
> [1] http://lists.w3.org/Archives/Public/www-tag/2010Feb/0058.html
> [2] http://www.w3.org/TR/webarch/#def-secondary-resource
> [3]
>
> --------------------------------------
> Noah Mendelsohn
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------
>
>
>
>
>
>
>
>
> Dan Connolly <connolly@w3.org>
> 02/05/2010 06:30 PM
>
>        To:     noah_mendelsohn@us.ibm.com
>        cc:     Ben Adida <ben@adida.net>, Jonathan Rees
> <jar@creativecommons.org>, www-tag@w3.org
>        Subject:        Re: HTML media type vs. # URIs that do not
> identify document elements
>
>
> On Fri, 2010-02-05 at 16:54 -0500, noah_mendelsohn@us.ibm.com wrote:
>> Dan Connolly wrote:
>>
>> > You say that like it's a bad thing.
>> >
>> > i.e. what's "wrong" about that?
>>
>> > [..]
>>
>> > Why would browsers do anything different
>> > from what they do now?
>>
>> Perhaps I wasn't clear:  I have no problem at all with what the browsers
>
>> are doing.
>>
>> I believe Jonathan pointed out a use case in which the semantic Web
>> community was serving text/html documents, with fragids used for
> purposes
>> that were not in conformance with the applicable media type
> specification.
>>  You acknowledge that's the issue, where you say:
>>
>> > I wrote about this in a 2006 workshop paper...
>> >
>> > [[
>> > In order for this to work with documents published both in RDF/XML and
>> > XHTML, the XHTML media type specifications may need to be ammended so
>> > that authors can opt out of the section-of-the-document meaning of
>> > fragment identifiers that they publish. For example, the profile
>> > attribute from section 7.4.4.3 Meta data profiles of the HTML 4
>> > specification[HTML4] seems like a reasonable opt-out signal.
>> > ]]
>> >  -- section Fragments as sections vs. people
>> >   http://www.w3.org/2006/04/irw65/urisym#docdata
>>
>> Right, but there's at least some damage in the meantime, with content
> out
>> on the Web that's in violation of current applicable specifications. I'm
>
>> not claiming the Web will crumble tomorrow over this, but I don't think
>> it's a good thing.  I used the browser example merely to point out the
>> kind of damage that might, at least in principle, be observed.
>
> But what you call "damage" looks like stuff working as expected
> and as designed, to me. I don't see the point at all.
>
>
>> As to amending the media type specification: in principle I might be
>> concerned, precisely because people could have invested in code that
>> interpreted the failure to resolve as an error (at least in the same
>> spirit that 404 is an error).
>
> What failure to resolve? Could you elaborate the case you have
> in mind? In the case that Jonathan brought up, there's no
> failure to resolve.
>
>
>>   In practice, it's hard for me to imagine
>> that there would be significant trouble for anyone, and something like a
>
>> profile attribute seems like a reasonable way to signal the opt-out.
>>
>> Noah
>
>
>
> --
> Dan Connolly, W3C http://www.w3.org/People/Connolly/
> gpg D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
>
>
>
>
Received on Tuesday, 9 February 2010 14:06:50 UTC