Re: HTTPS and the Semantic Web from Henry Story on 2016-05-22 (semantic-web@w3.org from May 2016)

From: Henry Story <henry.story@bblfish.net>
Date: Sun, 22 May 2016 09:55:27 +0200
To: Wouter Beek <w.g.j.beek@vu.nl>
Cc: nathan <nathan@webr3.org>, Simon Spero <sesuncedu@gmail.com>, Halpin Harry <hhalpin@ibiblio.org>, Carvalho Melvin <melvincarvalho@gmail.com>, Patrick Hayes <phayes@ihmc.us>, Archer Phil <phila@w3.org>, Semantic Web IG <semantic-web@w3.org>
Message-Id: <FA4BD953-1B68-4571-B7DE-42A436850D5C@bblfish.net>

> On 22 May 2016, at 09:14, Wouter Beek <w.g.j.beek@vu.nl> wrote:
> 
> Hi Henry,
> 
> Thanks for the pointer to POWDER; I was not aware of it yet.

It's underutilised, and just waiting the for the moment to emerge I think.

> 
> On Sun, May 22, 2016 at 8:23 AM, Henry Story <henry.story@bblfish.net <mailto:henry.story@bblfish.net>> wrote:
>> I want to point out that a similar issue has already been around for as long as the SW exists: IRIs that differ only in terms of escaping are different SW names even though they denote the same Web location.  In practice I do not always see a data publisher make explicit (`owl:sameAs') assertions between [3] and [4] (although some do, I've seen them in LOD Laundromat).
> 
> I think that equivalence is covered by the URI and IRI spec. URIs have to be compared for equivalence after denormalisation, including relative URI
> resolution. ie. <https://www.w3.org/2001/sw/wiki/POWDER <https://www.w3.org/2001/sw/wiki/POWDER>> is the same as <https://www.w3.org/2002/../2001/sw/wiki/POWDER <https://www.w3.org/2001/sw/wiki/POWDER>>.
> 
> Do you have a reference for the use of denormalization in IRI equivalence checking in RDF?  IIUC  the current RDF 1.1 specification <https://www.w3.org/TR/rdf11-concepts/#section-IRIs> takes a different stance:
> 
> IRI equality: Two IRIs are equal if and only if they are equivalent under Simple String Comparison according to section 5.1 <http://tools.ietf.org/html/rfc3987#section-5.1> of [RFC3987 <https://www.w3.org/TR/rdf11-concepts/#bib-RFC3987>]. Further normalization MUST NOT be performed when comparing IRIs for equality.

As I understand the IRI spec already talking of deneormalised URIs as far as percentage encoding goes, since it is working at the level of Unicode Strings. So that
has already been dealt with by the time they reach section 5.1 . ( of course if after percentage decoding of URIs into IRIs one still has percentage symbols in a IRI then that no longer counts as percentage encoding). In section 5.2 it says

   Any kind of IRI comparison REQUIRES that all escapings or encodings
   in the protocol or format that carries an IRI are resolved.  This is
   usually done when the protocol or format is parsed.

Otherwise RFC3987 speaks about a number of equivalence types, and categorises them by the
effort required to process them.
   Protocols
   or implementations that compare IRIs for different purposes will
   often be subject to differing design trade-offs in regards to how
   much effort should be spent in reducing aliased identifiers.  This
   section describes various methods that may be used to compare IRIs,
   the trade-offs between them, and the types of applications that might
   use them.
Clearly some normalisation is performed when comparing IRIs for equality such as when parsing 
documents with relative URIs. Indeed it says in that RFC

   In testing for equivalence, applications should not directly compare
   relative references; the references should be converted to their
   respective target IRIs before comparison. 
It would be silly to not denormalise <../foo#me> to the right URI, and to keep the  `..` in the final URI.

I am not sure why why the RDF spec thinks it should override the IRI spec, nor why it thinks it has the 
authority to do so. I have notied a few other flaws in that spec including some bizaare attempt at bnode
naming using .well_known which makes no sense. My guess is that some sections of RDF1.1 just did
not get the scrutiny of rewiew required.

> 
> The relation of my remark to the HTTPS discussion is that I can find empirical evidence in LOD Laundromat that some people are already adding `owl:sameAs' links between what they consider to be syntactic variations of the same identifiers.
> 
> You are right that HTTP/HTTPS is not a syntactic rewrite of the same identifier according to the IRI spec, but my point is that percent-encoded/unencoded is not a syntactic rewrite of the same identifier according to the RDF spec either.

I think it is. As shown in section 5.2

   Any kind of IRI comparison REQUIRES that all escapings or encodings
   in the protocol or format that carries an IRI are resolved.  This is
   usually done when the protocol or format is parsed.  Examples of such escapings 
or encodings are entities and numeric character references
   in [HTML4] and [XML1].  As an example,
   "http://example.org/ros&eacute;" (in HTML),
   "http://example.org/ros&#233"; (in HTML or XML), and
   "http://example.org/ros&#xE9"; (in HTML or XML) are all resolved into
   what is denoted in this document (see section 1.4) as
   "http://example.org/ros&#xE9"; (the "&#xE9;" here standing for the
   actual e-acute character, to compensate for the fact that this
   document cannot contain non-ASCII characters).

> 
> ---
> Cheers!,
> Wouter.
>

Received on Sunday, 22 May 2016 07:56:00 UTC