Re: How to mitigate accidental/unwelcome IRI expansion? from Robert Sanderson on 2015-12-19 (public-linked-json@w3.org from December 2015)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Fri, 18 Dec 2015 16:30:51 -0800
To: James M Snell <jasnell@gmail.com>
Cc: Josh Tilles <josh@signafire.com>, Linked JSON <public-linked-json@w3.org>
Message-ID: <CABevsUF0NN9Q-5TODRa7f4RoRpUWnRaoJbjECT4g9TX+2f6Bfw@mail.gmail.com>
How about this scenario:

In an extension to a well used vocabulary (such as AS, but substitute for
anything you like), an attacker puts in a context of:

    {"http": "http://track.me/tracker/", "https": "
https://tracke.me/tracker/"}

Those URLs simply 3XX redirect to the original URL.

Systems requesting URLs end up in the right place, but quietly go through
the tracking site.

Yes, you would need to expand the URLs via the context before resolving
them, but that doesn't seem unlikely.  You would also need to trust the
provider of the context to some degree, but unless you were aware of this
hack, it might not be apparent what tricks can be played.

Seems like a note that "JSON-LD Processors SHOULD NOT apply contexts that
redefine common URI schemes" is in order?

And for anyone wanting to test their context, here's some quick python to
do it:

----
import json
import csv
import urllib
import sys

schemes = "http://www.iana.org/assignments/uri-schemes/uri-schemes-1.csv"
context = sys.argv[1]

if context.startswith('http'):
        cfh = urllib.urlopen(context)
else:
        cfh = file(context)
data = cfh.read()
cfh.close()

js = json.loads(data)
keys = js['@context']
if type(keys) == list:
        keys = keys[0]

fh = urllib.urlopen(schemes)
rows = csv.reader(fh)
for row in rows:
        scheme = row[0]
        if keys.has_key(scheme):
                print "Collision: %s (expands to %s)" % (scheme,
keys[scheme])

fh.close()
----

I hit "resource" and "service" as keys in one of mine :(

Rob


On Fri, Dec 18, 2015 at 4:18 PM, James M Snell <jasnell@gmail.com> wrote:

> On Fri, Dec 18, 2015 at 3:33 PM, Josh Tilles <josh@signafire.com> wrote:
> [snip]
> >
> > Assuming James’s framing of the problem is accurate: is there any
> intention
> > to fix this limitation of JSON-LD? I realize that what “fix” means isn’t
> > necessarily clear yet, so I’ll get the ball rolling. I recall that one of
> > the first things I reached for instinctively was some way to “opt out” of
> > expansion for specific parts of a document, to somehow mark a value as
> > “raw”. Now, I’m not confident (at all!) that that’s the best approach,
> but I
> > think it’s something to start with.
> >
>
> Keep in mind that the expansion is only applied when the term @type is
> defined as @id or @vocab. You can opt out on a specific term by simply
> not defining it as @type=@id in the context. Within the AS2.0
> normative context, however, many of the terms as defined as @type=@id,
> including every field where you'd want to use the tag uri scheme. This
> is done because it enables those values to be represented either as
> URI strings or objects with @id's of their own. It's an important
> piece of functionality that we don't want to lose. So, unfortunately,
> you cannot opt out for the AS 2.0 terms but you can opt out in your
> own extension properties. In JSON-LD in general, opting out is easy...
> but there is a tradeoff in functionality.
>
> > Alternatively, am I incorrect in viewing this limitation as an insidious
> yet
> > potent flaw of JSON-LD? Like, is it actually more a superficially-ugly
> > oddity than anything genuinely destructive?
> >
>
> Little of both, I'm afraid.
>
> > I look forward to your comments & suggestions,
> > -Josh
> >
> >>>
> >>>
> >>> On Wed, Dec 16, 2015 at 9:08 AM, Robert Sanderson <azaroth42@gmail.com
> >
> >>> wrote:
> >>> >
> >>> > Josh,
> >>> >
> >>> > I'm afraid I don't have a solution, but could you also post the
> >>> > question to
> >>> > the Social Web WG?
> >>> > We're currently looking to take ActivityStreams to Candidate
> >>> > Recommendation
> >>> > early in the new year, and if this is something that might come up
> >>> > during
> >>> > the request for comments phase, it would be great to discuss it early
> >>> > rather
> >>> > than in last call :)
> >>> >
> >>> > The Social Web list:
> >>> >     https://lists.w3.org/Archives/Public/public-socialweb/
> >>> >
> >>> > Many thanks!
> >>> >
> >>> > Rob
> >>> >
> >>> >
> >>> > On Mon, Dec 14, 2015 at 6:50 PM, Josh Tilles <josh@signafire.com>
> >>> > wrote:
> >>> >>
> >>> >> Dear all,
> >>> >>
> >>> >> When learning Activity Streams 2.0, I discovered that certain @ids
> >>> >> were
> >>> >> vulnerable to being mangled during expansion. For example, the
> >>> >> absolute IRI
> >>> >> tag:search.twitter.com,2005:593895901623496704 gets expanded to
> >>> >>
> >>> >>
> http://www.w3.org/ns/activitystreams#tagsearch.twitter..com,2005:593895901623496704
> .
> >>> >> (JSON-LD playground link for complete example)
> >>> >>
> >>> >> Is this a problem that others have come across before? Is there any
> >>> >> sort
> >>> >> of standard advice to work around absolute IRIs being mistakenly
> >>> >> interpreted
> >>> >> as relative?
> >>> >>
> >>> >> An approach I came up with is to “unmap” the offending terms, like:
> >>> >>
> >>> >> {
> >>> >>   "@context": [
> >>> >>     "http://www.w3.org/ns/activitystreams",
> >>> >>     {"tag": null}
> >>> >>   ],
> >>> >>   "@id": "tag:search.twitter.com,2005:593895901623496704",
> >>> >>   "@type": "Create",
> >>> >>   "url": "http://twitter.com/KidCodo/statuses/347769243409977344",
> >>> >>   "actor": {
> >>> >>     "@context": {"id": null},
> >>> >>     "@id": "id:twitter.com:2993982541",
> >>> >>     "@type": "Person",
> >>> >>     "displayName": "Kid Codo",
> >>> >>     "url": "http://www.twitter.com/KidCodo",
> >>> >>     "image":
> >>> >>
> >>> >> "
> https://si0.twimg.com/profile_images/3664410292/1d75c213a572873bf6797c5591475da5_normal.jpeg
> "
> >>> >>   }
> >>> >> }
> >>> >>
> >>> >> But this seems kludgy, and I could imagine it having unintended
> >>> >> consequences if other parts of the JSON document actually used the
> tag
> >>> >> property and expected it to expand to
> >>> >> http://www.w3.org/ns/activitystreams#tag. An additional weakness of
> >>> >> this
> >>> >> approach is that it relies on a human to determine which IRIs “don’t
> >>> >> look
> >>> >> right” by examining expanded documents, and that there’s no
> guarantee
> >>> >> that
> >>> >> other IRIs vulnerable to different prefix-collisions won’t slip in
> in
> >>> >> the
> >>> >> future.
> >>> >>
> >>> >> Please share any comments regarding the above, or advice in general
> >>> >> for
> >>> >> dealing with IRIs properly in JSON-LD.
> >>> >>
> >>> >> A pre-emptive & emphatic “thank you” for any guidance you can
> provide,
> >>> >> -Josh Tilles
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Rob Sanderson
> >>> > Information Standards Advocate
> >>> > Digital Library Systems and Services
> >>> > Stanford, CA 94305
> >>
> >>
> >
>
>


-- 
Rob Sanderson
Information Standards Advocate
Digital Library Systems and Services
Stanford, CA 94305
Received on Saturday, 19 December 2015 00:31:21 UTC