Re: How to mitigate accidental/unwelcome IRI expansion? from Josh Tilles on 2016-01-06 (public-linked-json@w3.org from January 2016)

From: Josh Tilles <josh@signafire.com>
Date: Wed, 6 Jan 2016 15:33:34 -0500
To: Linked JSON <public-linked-json@w3.org>
Message-ID: <CAFRY1zqT-YA+DB69DeYFMgSvs8RGQ1D5wXAOCv=CYbjR6ezwMQ@mail.gmail.com>
Over the holidays I came up with a simpler workaround: just avoid using the
@id attribute! By storing the GNIP-provided URIs in a different attribute
(e.g., "gnip:id", where the context maps "gnip" to "
http://example.com/json-ld/3rd-party/gnip#"), I'm effectively opting out of
URI-expansion *without* modifying the AS2 context. The only downside that I
can think of is that uninformed downstream consumers of the data wouldn't
be able to take advantage of many of the documents’ globally-unique
identifiers. But I think even *that* could be mitigated by serving the data
with an RFC6906 profile <http://tools.ietf.org/html/rfc6906> that explains
the design decision to store identifiers in a non-standard attribute.

What do you think?

On Fri, Dec 18, 2015 at 6:56 PM, Josh Tilles <josh@signafire.com> wrote:

> I forgot to mention something in my previous message: compaction can
> create documents that are vulnerable to this IRI-mangling (the JSON-LD
> playground link
> <http://json-ld.org/playground/#startTab=tab-compacted&json-ld=%5B%7B%22%40type%22%3A%5B%22http%3A%2F%2Fwww.w3.org%2Fns%2Factivitystreams%23Create%22%5D%2C%22http%3A%2F%2Fwww.w3.org%2Fns%2Factivitystreams%23actor%22%3A%5B%7B%22%40id%22%3A%22id%3Atwitter.com%3A277184168%22%2C%22%40type%22%3A%5B%22http%3A%2F%2Fwww.w3.org%2Fns%2Factivitystreams%23Person%22%5D%2C%22http%3A%2F%2Fwww.w3.org%2Fns%2Factivitystreams%23displayName%22%3A%5B%7B%22%40value%22%3A%22Sally%22%7D%5D%7D%5D%2C%22http%3A%2F%2Fwww.w3.org%2Fns%2Factivitystreams%23object%22%3A%5B%7B%22%40type%22%3A%5B%22http%3A%2F%2Fwww.w3.org%2Fns%2Factivitystreams%23Note%22%5D%2C%22http%3A%2F%2Fwww.w3.org%2Fns%2Factivitystreams%23content%22%3A%5B%7B%22%40value%22%3A%22This%20is%20a%20simple%20note%22%7D%5D%7D%5D%2C%22http%3A%2F%2Fwww.w3.org%2Fns%2Factivitystreams%23published%22%3A%5B%7B%22%40type%22%3A%22http%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23dateTime%22%2C%22%40value%22%3A%222015-01-25T12%3A34%3A56Z%22%7D%5D%7D%5D&context=%7B%22%40context%22%3A%22http%3A%2F%2Fwww.w3.org%2Fns%2Factivitystreams%22%7D>).
> I thought this was notable because I *had* thought that a potential
> strategy for dealing with unwelcome IRI-expansion might be to maximize the
> amount of processing that I do using the expanded form, only compacting the
> document when it came time to convey the document elsewhere (e.g., to disk,
> or over the network to a client or database). Unfortunately, though, the
> JSON-LD playground example demonstrates that there is *not* a guarantee
> that compaction is an invertible operation.
>
> On Fri, Dec 18, 2015 at 6:33 PM, Josh Tilles <josh@signafire.com> wrote:
>
>> On Wed, Dec 16, 2015 at 1:43 PM, Josh Tilles <josh@signafire.com> wrote:
>>
>>>
>>> On Wed, Dec 16, 2015 at 1:06 PM, James M Snell <jasnell@gmail.com>
>>> wrote:
>>>
>>>> This is a fundamental design issue with JSON-LD's algorithms and CURIE
>>>> expansion.
>>>
>>> …
>>>
>>> The end result is that URLs whose schemes overlap
>>>> with @context defined terms are generally incompatible with use of
>>>> JSON-LD.
>>>>
>>> That seems like a valuably-concise definition of the problem. I’d like
>>> to know if others here agree with your assessment.
>>>
>>
>> Assuming James’s framing of the problem is accurate: is there any
>> intention to fix this limitation of JSON-LD? I realize that what “fix”
>> means isn’t necessarily clear yet, so I’ll get the ball rolling. I recall
>> that one of the first things I reached for instinctively was some way to
>> “opt out” of expansion for specific parts of a document, to somehow mark a
>> value as “raw”. Now, I’m not confident (at all!) that that’s the best
>> approach, but I think it’s something to start with.
>>
>> Alternatively, am I incorrect in viewing this limitation as an insidious
>> yet potent flaw of JSON-LD? Like, is it actually more a superficially-ugly
>> oddity than anything genuinely destructive?
>>
>> I look forward to your comments & suggestions,
>> -Josh
>>
>>
>>>
>>>> On Wed, Dec 16, 2015 at 9:08 AM, Robert Sanderson <azaroth42@gmail.com>
>>>> wrote:
>>>> >
>>>> > Josh,
>>>> >
>>>> > I'm afraid I don't have a solution, but could you also post the
>>>> question to
>>>> > the Social Web WG?
>>>> > We're currently looking to take ActivityStreams to Candidate
>>>> Recommendation
>>>> > early in the new year, and if this is something that might come up
>>>> during
>>>> > the request for comments phase, it would be great to discuss it early
>>>> rather
>>>> > than in last call :)
>>>> >
>>>> > The Social Web list:
>>>> >     https://lists.w3.org/Archives/Public/public-socialweb/
>>>> >
>>>> > Many thanks!
>>>> >
>>>> > Rob
>>>> >
>>>> >
>>>> > On Mon, Dec 14, 2015 at 6:50 PM, Josh Tilles <josh@signafire.com>
>>>> wrote:
>>>> >>
>>>> >> Dear all,
>>>> >>
>>>> >> When learning Activity Streams 2.0, I discovered that certain @ids
>>>> were
>>>> >> vulnerable to being mangled during expansion. For example, the
>>>> absolute IRI
>>>> >> tag:search.twitter.com,2005:593895901623496704 gets expanded to
>>>> >>
>>>> http://www.w3.org/ns/activitystreams#tagsearch.twitter..com,2005:593895901623496704
>>>> .
>>>> >> (JSON-LD playground link for complete example)
>>>> >>
>>>> >> Is this a problem that others have come across before? Is there any
>>>> sort
>>>> >> of standard advice to work around absolute IRIs being mistakenly
>>>> interpreted
>>>> >> as relative?
>>>> >>
>>>> >> An approach I came up with is to “unmap” the offending terms, like:
>>>> >>
>>>> >> {
>>>> >>   "@context": [
>>>> >>     "http://www.w3.org/ns/activitystreams",
>>>> >>     {"tag": null}
>>>> >>   ],
>>>> >>   "@id": "tag:search.twitter.com,2005:593895901623496704",
>>>> >>   "@type": "Create",
>>>> >>   "url": "http://twitter.com/KidCodo/statuses/347769243409977344",
>>>> >>   "actor": {
>>>> >>     "@context": {"id": null},
>>>> >>     "@id": "id:twitter.com:2993982541",
>>>> >>     "@type": "Person",
>>>> >>     "displayName": "Kid Codo",
>>>> >>     "url": "http://www.twitter.com/KidCodo",
>>>> >>     "image":
>>>> >> "
>>>> https://si0.twimg.com/profile_images/3664410292/1d75c213a572873bf6797c5591475da5_normal.jpeg
>>>> "
>>>> >>   }
>>>> >> }
>>>> >>
>>>> >> But this seems kludgy, and I could imagine it having unintended
>>>> >> consequences if other parts of the JSON document actually used the
>>>> tag
>>>> >> property and expected it to expand to
>>>> >> http://www.w3.org/ns/activitystreams#tag. An additional weakness of
>>>> this
>>>> >> approach is that it relies on a human to determine which IRIs “don’t
>>>> look
>>>> >> right” by examining expanded documents, and that there’s no
>>>> guarantee that
>>>> >> other IRIs vulnerable to different prefix-collisions won’t slip in
>>>> in the
>>>> >> future.
>>>> >>
>>>> >> Please share any comments regarding the above, or advice in general
>>>> for
>>>> >> dealing with IRIs properly in JSON-LD.
>>>> >>
>>>> >> A pre-emptive & emphatic “thank you” for any guidance you can
>>>> provide,
>>>> >> -Josh Tilles
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Rob Sanderson
>>>> > Information Standards Advocate
>>>> > Digital Library Systems and Services
>>>> > Stanford, CA 94305
>>>>
>>>
>>>
>>
>
Received on Wednesday, 6 January 2016 20:34:04 UTC