- From: Henry Story <henry.story@bblfish.net>
- Date: Sun, 22 May 2016 09:55:27 +0200
- To: Wouter Beek <w.g.j.beek@vu.nl>
- Cc: nathan <nathan@webr3.org>, Simon Spero <sesuncedu@gmail.com>, Halpin Harry <hhalpin@ibiblio.org>, Carvalho Melvin <melvincarvalho@gmail.com>, Patrick Hayes <phayes@ihmc.us>, Archer Phil <phila@w3.org>, Semantic Web IG <semantic-web@w3.org>
- Message-Id: <FA4BD953-1B68-4571-B7DE-42A436850D5C@bblfish.net>
> On 22 May 2016, at 09:14, Wouter Beek <w.g.j.beek@vu.nl> wrote: > > Hi Henry, > > Thanks for the pointer to POWDER; I was not aware of it yet. It's underutilised, and just waiting the for the moment to emerge I think. > > On Sun, May 22, 2016 at 8:23 AM, Henry Story <henry.story@bblfish.net <mailto:henry.story@bblfish.net>> wrote: >> I want to point out that a similar issue has already been around for as long as the SW exists: IRIs that differ only in terms of escaping are different SW names even though they denote the same Web location. In practice I do not always see a data publisher make explicit (`owl:sameAs') assertions between [3] and [4] (although some do, I've seen them in LOD Laundromat). > > I think that equivalence is covered by the URI and IRI spec. URIs have to be compared for equivalence after denormalisation, including relative URI > resolution. ie. <https://www.w3.org/2001/sw/wiki/POWDER <https://www.w3.org/2001/sw/wiki/POWDER>> is the same as <https://www.w3.org/2002/../2001/sw/wiki/POWDER <https://www.w3.org/2001/sw/wiki/POWDER>>. > > Do you have a reference for the use of denormalization in IRI equivalence checking in RDF? IIUC the current RDF 1.1 specification <https://www.w3.org/TR/rdf11-concepts/#section-IRIs> takes a different stance: > > IRI equality: Two IRIs are equal if and only if they are equivalent under Simple String Comparison according to section 5.1 <http://tools.ietf.org/html/rfc3987#section-5.1> of [RFC3987 <https://www.w3.org/TR/rdf11-concepts/#bib-RFC3987>]. Further normalization MUST NOT be performed when comparing IRIs for equality. As I understand the IRI spec already talking of deneormalised URIs as far as percentage encoding goes, since it is working at the level of Unicode Strings. So that has already been dealt with by the time they reach section 5.1 . ( of course if after percentage decoding of URIs into IRIs one still has percentage symbols in a IRI then that no longer counts as percentage encoding). In section 5.2 it says Any kind of IRI comparison REQUIRES that all escapings or encodings in the protocol or format that carries an IRI are resolved. This is usually done when the protocol or format is parsed. Otherwise RFC3987 speaks about a number of equivalence types, and categorises them by the effort required to process them. Protocols or implementations that compare IRIs for different purposes will often be subject to differing design trade-offs in regards to how much effort should be spent in reducing aliased identifiers. This section describes various methods that may be used to compare IRIs, the trade-offs between them, and the types of applications that might use them. Clearly some normalisation is performed when comparing IRIs for equality such as when parsing documents with relative URIs. Indeed it says in that RFC In testing for equivalence, applications should not directly compare relative references; the references should be converted to their respective target IRIs before comparison. It would be silly to not denormalise <../foo#me> to the right URI, and to keep the `..` in the final URI. I am not sure why why the RDF spec thinks it should override the IRI spec, nor why it thinks it has the authority to do so. I have notied a few other flaws in that spec including some bizaare attempt at bnode naming using .well_known which makes no sense. My guess is that some sections of RDF1.1 just did not get the scrutiny of rewiew required. > > The relation of my remark to the HTTPS discussion is that I can find empirical evidence in LOD Laundromat that some people are already adding `owl:sameAs' links between what they consider to be syntactic variations of the same identifiers. > > You are right that HTTP/HTTPS is not a syntactic rewrite of the same identifier according to the IRI spec, but my point is that percent-encoded/unencoded is not a syntactic rewrite of the same identifier according to the RDF spec either. I think it is. As shown in section 5.2 Any kind of IRI comparison REQUIRES that all escapings or encodings in the protocol or format that carries an IRI are resolved. This is usually done when the protocol or format is parsed. Examples of such escapings or encodings are entities and numeric character references in [HTML4] and [XML1]. As an example, "http://example.org/rosé" (in HTML), "http://example.org/rosé"; (in HTML or XML), and "http://example.org/rosé"; (in HTML or XML) are all resolved into what is denoted in this document (see section 1.4) as "http://example.org/rosé"; (the "é" here standing for the actual e-acute character, to compensate for the fact that this document cannot contain non-ASCII characters). > > --- > Cheers!, > Wouter. >
Received on Sunday, 22 May 2016 07:56:00 UTC