W3C home > Mailing lists > Public > www-html@w3.org > January 2019

Does expanding a CURIE into an IRI always succeed

From: <akrasner@riseup.net>
Date: Thu, 03 Jan 2019 02:17:48 -0800
To: www-html@w3.org
Message-ID: <f0b3e26167764e504ce2eb5ba01c166f@riseup.net>
Hello!

tl;dr if a base URI and a CURIE suffix both have a fragment, expanding
the CURIE results with an invalid IRI, am I correct?


I'm implementing CURIE[1] expansion, i.e. turning a CURIE into an IRI by
concatenating its suffix with some given base URI. That working group
note linked in [1] suggests that "In all cases a parsed CURIE will
produce an IRI". I'm not sure what it means, but I started implementing
expansion, and I was wondering, when you concatenate a base URI and a
CURIE's suffix, is the result *always* a valid IRI?

And I noticed one case (I didn't find more such cases, but, possibly
missed them, idk) in which it isn't. If your base IRI and CURIE suffix
both have a fragment, then the result is an invalid IRI, because a
literal '#' character isn't allowed to be present anywhere in an IRI
except to start its fragment. For example:

Base IRI: https://riseup.net/some/path#

CURIE: ru:something#xyz

Result: https://riseup.net/some/path#something#xyz

The result's fragment part is "something#xyz", which contains a '#',
that makes it an invalid IRI.

What I'm wondering is:

(1) Am I observing correctly, and indeed CURIE expansion can produce
invalid IRIs, so I should be prepared to return an error (or an invalid
IRI) when implementing this expansion?

(2) When expanding, should I / may I percent-encode that '#' character
so that I do always get a valid IRI? My use case is JSON-LD and RDF, in
which the exact IRI string is the ID of something, so it has to be
precise

(3) I was thinking, just sharing the thought, to have 3 variants of
expansion:

    (a) Regular CURIE, expansion may produce invalid IRI
    (b) CURIE can't have a fragment, always produces valid IRI
    (c) Base IRI can't have a fragment, always produces valid IRI

For a given base IRI, it seems to me that in practice, 99% of the time
(if not 100%) it's one of the latter cases, i.e. either your
XML/RDF/whatever base IRI contains a fragment and your CURIEs get
appended to it (so they probably don't contain a fragment), or your base
IRI has no fragment, and possibly your CURIEs have fragments.

Example for (b):

Base IRI: https://riseup.net/some/path#
CURIE: ru:abc

Example for (c):

Base IRI: https://riseup.net/some/path
CURIE: ru:#abc

I'd love to hear thoughts, especially about whether I'm observing
correctly that CURIEs may produce invalid IRIs if both CURIE and base
have fragments :)

-- a.k.



[1]: https://www.w3.org/TR/curie/
Received on Thursday, 3 January 2019 10:49:20 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 3 January 2019 10:49:21 UTC