- From: Dave Longley <dlongley@digitalbazaar.com>
- Date: Tue, 09 Jul 2013 14:20:34 -0400
- To: David Booth <david@dbooth.org>
- CC: Markus Lanthaler <markus.lanthaler@gmx.net>, public-linked-json@w3.org
On 07/04/2013 12:28 PM, David Booth wrote: > On 07/04/2013 04:16 AM, Markus Lanthaler wrote: >> On Thursday, July 04, 2013 3:54 AM, David Booth wrote: >>>>> Regarding stability, AFAICT relative IRIs would be nearly as >>>>> stable as >>>>> any versioned IRI: the IRI may change if the author decides to >>>>> version >>>>> it, but aside from that it is exactly the same every time the data is >>>>> generated, even if other data elements are added, etc. That is far >>>> >>>> I completely disagree. While technically you are right, the whole >>>> point >> of >>>> using a bnode is to convey it is in fact *not stable* and is not >> intended >>>> to be. >>> >>> Again, you may think of blank nodes that way if you wish, but that is >>> not why they were invented. >> >> Just out of curiosity, why have they then been invented if not >> provide a way >> to express some facts about an "entity" that is unknown? > > They were invented to allow an RDF author to indicate that an entity > is known to exist, and allow facts to be expressed about it. The fact > that bnodes lack a stable identifier was an (unfortunate) by-product > -- not the purpose of their invention. > >> >> >>>> The point is that I don't want them to be stable. I explicitly want to >>>> prevent that people start to rely on them. >>> >>> I suppose that would make sense if your goal is to annoy downstream >>> consumers of your data, but that's rather anti-social. Making it hard >>> for others to refer to resources mentioned in your data is widely >>> viewed >>> as a *negative* -- not a positive -- and it goes against the philosophy >>> of the web. >> >> That might be true.. but exactly the same applies to bnode subjects and >> objects. Arguably even more so to subjects. So why do you think >> predicates >> are so special? > > Yes, it does apply to subjects and objects also. Blank node > predicates are special because they are not a part of standard RDF. > And they are not a part of standard RDF because enough of the working > group thought it would not be a good idea to allow blank nodes as > predicates, just as enough of the working group thought it would not > be a good idea to allow literals as subjects. That could change, of > course, but it cannot change anytime soon, because the RDF working > group charter explicitly states that blank node predicates are out of > scope. > >> >> >>>> OK, so what if we would add a "generalizedRDF" flag to the toRDF >> algorithm >>>> which, when set to false would filter all quads where a bnode is in >>>> predicate position? I would prefer the default value to be set to true >> but >>>> could, if there's a good argument, also live with a false. >>>> >>>> Would that address your concerns? >>> >>> Well, no. An option for extended RDF would be fine (defaulting to >>> standard RDF), but discarding triples would not be fine, because it >>> would involve unnecessary information loss. That would bring us >>> back to >>> figuring out how to avoid that information loss. Skolemization would be >>> one way to do it, but the use of relative URIs seems like a better >>> option because it is so much simpler and it gives the additional >>> benefits (which I understand you do not see as benefits) of more stable >>> identifiers that could eventually be made dereferenceable. >> >> You can't have a syntax which sometimes allows bnode predicates and >> sometimes doesn't. The only option in that case is to raise an error >> when >> converting to RDF saying that information may be lost because some >> generated >> triples contain bnode predicates. That would be acceptable for me but >> I fear >> it won't satisfy you either. > > Right. So the other option is for JSON-LD to prohibit blank nodes as > properties. Authors could simply use relative IRIs instead. So I don't consider this situation to be all that different from the one where an author elects not to provide any mappings at all for certain keys in their JSON. We currently allow this to happen -- and it's an important use case for at least two reasons: 1. It allows authors to slowly transition over to using JSON-LD -- mapping only those keys in their data that they are ready to, that they are confident will be mapped to the correct URL. Also note that JSON developers know nothing about owl:sameAs and we don't need to introduce them to another level of complexity right out of the gate. 2. It allows authors to use their APIs both as JSON and as JSON-LD. This covers two main uses: preventing existing consumers of JSON APIs from being messed with whilst allowing servers to upgrade and consolidate code paths, and allowing servers to include data that is intended to be "private" (not in a security sense) to one particular use of their API (eg: for an HTML interface to their data) without exposing it as valid data otherwise. The point of all this is that sometimes authors would prefer data to be "lost" in some scenarios, and not in others. If the above option were available, it would allow authors to continue this useful practice whilst having the default behavior produce fully compliant RDF. For a more concrete example: Suppose a server has been serving this JSON for a while: { "foo": "bar", "about": { "id": "1", "name": "Phillip J. Fry" }, "website_status": { "editor": { "id": "1", "changes": 4 }, "ad636ee3fb": true } } Clients that are consuming this data as JSON really only look at "foo" and maybe "about", except for the particular website client WC, which also makes use of "website_status". The author has communicated, out-of-band, that anything starting with "website_" is unstable data that should be ignored by consumers of the API. Now, the author of this data would like to make it consumable as RDF, so a change is made to include a @context that appropriately maps "foo", "about", "id", and "name" to URLs/aliases. Now any RDF clients (that understand JSON-LD) can understand the meaning of those keys. However, the author still only uses "website_status" on their local website and doesn't want to have to deal with keeping it stable for any clients. JSON clients are aware of this but so are RDF clients, as "website_status" has no meaning to them; it is dropped by JSON-LD processors. No out-of-band communication is necessary for the RDF clients. Now, suppose the author would like to make the "changes" data found in "website_status" available to RDF clients without changing their existing JSON structure. They would prefer not to leak indexed hashes of private information (that appear as hex JSON keys above) as stable predicates in their data. The meaning of those hash predicates or their range may change in the future. They'd also prefer not to leak "website_status". They may decide to update WC so it can consume RDF, at which time perhaps they'd want access to that information, but that's not in the plan right now. For now, they'd simply like RDF clients to take advantage of the "changes" data. Can they do this with minimal work on their end? If the author could map any non-specifically-mapped predicate to a blank node, then the author could easily achieve most of the above goals. This would allow the deeply-embedded "changes" data to be seen and output by a JSON-LD processor. If a JSON-LD processor, by default, dropped blank node predicates, they could achieve even more -- as most RDF clients would ignore the data that the author would prefer to be ignored. But if it can't be ignored, that's not so bad because at least it is only blank node data -- there are not mappings to URLs that the author really doesn't want. If a JSON-LD processor had an option for keeping those blank nodes, then their potential future plans (updating X to an RDF client) could also work out, as they'd know to set the special option to keep the data they want -- just for their website. If there is no way to map predicates to blank nodes, then the author has to consider other options. If the author uses relative URLs, they'd expose predicates that were never intended to be exposed and that have semantics that may change. The author wants to be able to innovate and play with that particular data before (if ever) it is linked to a stable URL. Instead of engaging in what they would consider data pollution, the author may instead elect to go through a costly API upgrade path that may break existing JSON clients. I think there are use cases where authors simply aren't "ready" to publish *all* their data or would like to reuse the same APIs for different purposes. By disallowing blank node predicates we make their lives more difficult. Perhaps some of these practices can be described as "anti-web" (hiding/siloing information), but I think that there are practical uses for them and that a blind opposition to "anti-web" practices is not a good policy. This is particularly true for cases where an author is actually trying to become less "anti-web", but they can't easily get there because it's all or nothing. -- Dave Longley CTO Digital Bazaar, Inc.
Received on Tuesday, 9 July 2013 18:21:00 UTC