Re: when to de-Skolemize; was Re: Official response to RDF-ISSUE-132: JSON-LD/RDF Alignment -- Sub-issue on the relationship between JSON-LD and RDF from David Booth on 2013-06-11 (public-rdf-comments@w3.org from June 2013)

From: David Booth <david@dbooth.org>
Date: Tue, 11 Jun 2013 10:23:36 -0400
To: Sandro Hawke <sandro@w3.org>
CC: Markus Lanthaler <markus.lanthaler@gmx.net>, public-rdf-comments@w3.org
Message-ID: <51B732E8.4090207@dbooth.org>
On 06/11/2013 09:21 AM, Sandro Hawke wrote:
> On 06/11/2013 05:53 AM, Markus Lanthaler wrote:
>> On Tuesday, June 11, 2013 8:11 AM, David Booth wrote:
>>> On 06/11/2013 01:20 AM, Sandro Hawke wrote:
>>>> On 06/11/2013 12:17 AM, David Booth wrote:
>>>>> Below is a specific proposal for resolving the issue that the
>>>>> normative relationship between JSON-LD and RDF is not clear, and the
>>>>> JSON-LD model is not fully aligned with the RDF model.  It clarifies
>>>>> that JSON-LD is a concrete syntax for RDF and ensures complete
>>>>> alignment with RDF while avoiding additional early mentions of RDF in
>>>>> the document.
>>>>>
>>>>> For substantive changes:
>>>>>
>>>>> 1. In RDF conversion algorithms in JSON-LD 1.0 Processing Algorithms
>>>>> and API,
>>>>> http://json-ld.org/spec/latest/json-ld-api/#rdf-conversion-algorithms
>>>>> specify that **when JSON-LD is interpreted as RDF,** (i.e., when the
>>>>> JSON-LD model is converted to the RDF model) skolem IRIs MUST be
>>>>> generated using the well-known URI suffix "json-ld-genid" for any
>>>>> JSON-LD blank node that would otherwise be mapped to an RDF blank node
>>>>> in a position where an RDF blank node is not permitted.  Conversely,
>>>>> when RDF is serialized as JSON-LD (or when an RDF model is converted
>>>>> to a JSON-LD model), skolem IRIs having the well-known URI suffix
>>>>> "json-ld-genid" SHOULD be serialized as JSON-LD blank nodes.  Finally,
>>>>> register the well-known URI suffix "json-ld-genid", in accordance with
>>>>> RFC5785:
>>>>> http://tools.ietf.org/html/rfc5785
>>>>> BACKGROUND NOTE: The existing well-known URI suffix "genid" is for
>>>>> converting to/from RDF blank nodes (in positions where blank nodes are
>>>>> *permitted* in RDF), whereas "json-ld-genid" will be used for
>>>>> *avoiding* blank nodes (in positions where they are not allowed in
>>>>> RDF).
>>>>>
>>>> -0    This is too clever by half, I think.
>>>>
>>>> If we're talking about blank nodes for predicates, well, people will
>>>> just learn not to use them, I expect.  Or they'll start to use them in
>>>> Turtle, too.   And maybe RDFa and RDF/XML using Skolem IDs, but then
>>>> "json" will be a misnomer.     So for this hack, at least call it
>>>> "generalized-rdf-genid" or something like that.>
>>> A different name makes sense, though something shorter would be nicer,
>>> such as "rdfid".  The practical reason for not using "genid" for this
>>> is because some processors may wish to blindly transform all "genid"
>>> skolem IRIs (back) into RDF blank nodes, and that would cause
>>> problems in
>>> places where blank nodes are forbidden.
>> Could you please explain why it is important to be able to distinguish
>> them?
>> Is it for streaming generators that need to know at the first occurrence
>> whether such a IRI should be replaced with a bnode identifier or not?

Yes, to simplify processors or RDF filters that just want to convert 
genid skolem IRIs (back) to blank nodes, so they don't need to worry 
about the context in which the skolem IRI appears.

>
> David, as I think about this more, I also don't see a need to make the
> distinction.
>
> I think the basic rule is that systems SHOULD keep Skolemization a
> private matter (not visible from outside).  But there are lots of cases
> where it's appropriate to leak genids:
>     - when turning stuff into linked data
>     - in APIs where it's a better idiom
>     - when supporting diff/patch
> So it's okay to leak genids, but one should only do it if they're going
> to be useful to someone.

I'm not following why you say skolemization should be a private matter, 
since the whole point of standardizing skolemization and defining the 
genid well-known URI suffix is to allow skolem IRIs to be shared 
publicly.  Can you explain why you think they should not be public?

>
> Of course, it follows from the specs that one should only convert genids
> to blank nodes in roles the specs allow.   So if you're going from a
> generalized RDF which allows blank-node-predicates (eg JSON-LD, N3, or
> RIF) to a standard RDF, then you have to leave those genids as IRIs.

Right, but it is simpler to blindly convert all genids to blank nodes 
than it is to examine the context to determine whether a blank node 
should be generated.  I think it is also slightly misleading to a human 
reader to use genids in positions where blank nodes are not allowed.

So it certainly would not be a show stopper to use genids for this 
purpose, but it does seem cleaner to me to use a different well-known 
suffix.

David

>
> It seems to me all this advice applies to all genids.
>
>       -- Sandro
>
>>
>>>>> 2. Make any other changes needed to ensure that JSON-LD is a normative
>>>>> concrete syntax for RDF.  (Are any other changes needed?)
>> You should tell us :-P
>>
>>
>>>>> For editorial changes:
>>>>>
>>>>> 3. In section 1
>>>>>
>> https://dvcs.w3.org/hg/json-ld/raw-file/default/spec/latest/json-ld/index.ht
>>
>> ml#introduction
>>>>> make the following editorial change to clarify and move the mention of
>>>>> RDF slightly later in the document.  Delete the sentence: "Developers
>>>>> that require any of the facilities listed above or need to serialize
>>>>> an RDF graph or dataset [RDF11-CONCEPTS] in a JSON-based syntax will
>>>>> find JSON-LD of interest.".  Instead, add the following bullet item to
>>>>> the existing bullet list in section 1.1:
>>>>>
>>>>>   - "Software developers who want to generate or consume Linked
>>>>>     Data, an RDF graph or an RDF Dataset in a JSON syntax."
>> I would be fine with that change but fear that this will just re-open
>> another discussion. That specific sentence was added due to a request
>> by (I
>> think) David would.
>>
>>
>>>>> 4. Without adding any earlier mention of RDF than the JSON-LD spec
>>>>> already contains, make other editorial changes as needed to avoid
>>>>> implying that JSON-LD is not necessarily RDF.  (However it is fine to
>>>>> say that JSON-LD does not need to be *processed* as RDF.)  Some
>> examples:
>>>>>   - Change "Converting JSON-LD to RDF" to either "Interpreting JSON-LD
>>>>> as RDF" or "Converting a JSON-LD model to an RDF model".
>>>>>
>>>>>   - Change "Convert to RDF Algorithm" to "Interpret as RDF Algorithm"
>>>>> or "Algorithm for Interpreting JSON-LD as RDF".
>>>>>
>>>>>   - Change "Convert from RDF Algorithm" to "Serialize from RDF
>>>>> Algorithm" or "Algorithm for Serializing RDF as JSON-LD".
>>>>>
>>>>>   - Change "This algorithms converts a JSON-LD document to an RDF
>>>>> dataset" to "This algorithm interprets a JSON-LD document as an RDF
>>>>> dataset".
>>>>>
>>>>>   - Change "This algorithm converts an RDF dataset" to "This algorithm
>>>>> serializes an RDF dataset".
>>>>>
>>>>>   - Change "turning a JSON-LD document" to "interpreting a JSON-LD
>>>>> document as RDF".
>>>>>
>>>>> There are many other instances in the JSON-LD document, and I would be
>>>>> happy to help find and fix them.  Most of them can be found by
>>>>> searching for the verb "convert" and changing it to "interpret" or
>>>>> "serialize". Alternatively you could say "deserialize" instead of
>>>>> "interpret".
>>>>>
>>>>> 5. At the beginning of appendix C insert: "JSON-LD is a _concrete RDF
>>>>> syntax_ as described in [RDF11_CONCEPTS].  Hence, a JSON-LD document
>>>>> is both an RDF document and a JSON document and correspondingly
>>>>> represents both an instance of the RDF data model and an instance of
>>>>> the JSON-LD data model."
>>>>>
>>>> 0 on all these.  They seem harmless but unnecessary to me.
>> I agree with Sandro. Looks like hair splitting to me but if enough people
>> think it is important it's way simpler to just do it than to continue
>> these
>> discussions.
>>
>>
>>>>> 6. In appendix C change the following paragraph in accordance with #1
>>>>> above:
>>>>> [[
>>>>> Summarized these differences mean that JSON-LD is capable of
>>>>> serializing any RDF graph or dataset and most, but not all, JSON-LD
>>>>> documents can be directly transformed to RDF. It is possible to work
>>>>> around this restriction, when converting JSON-LD to RDF, by converting
>>>>> blank nodes used as graph names or properties to IRIs, minting new
>>>>> "Skolem IRIs" as per Replacing Blank Nodes with IRIs of
>>>>> [RDF11-CONCEPTS]. A complete description of the algorithms to convert
>>>>> from RDF to JSON-LD and from JSON-LD to RDF is included in the JSON-LD
>>>>> Processing Algorithms and API specification [JSON-LD-API].
>>>>> ]]
>>>>>
>>>>> to:
>>>>> [[
>>>>> The algorithm for interpreting JSON-LD as RDF is specified in the
>>>>> JSON-LD Processing Algorithms and API specification [JSON-LD-API],
>>>>> which is hereby normatively included by reference.
>>>>> ]]
>>>> That makes sense, but we didn't structure it that way because we feared
>>>> problems with json-ld-api would prevend json-ld from going to REC.
>>>>
>>>> -0.5
>>> But if it isn't structured that way, then I don't see how someone
>>> reading the JSON-LD spec would know that the API spec is intended to
>>> define the normative mapping from JSON-LD syntax to the RDF model.
>>> Would the following wording would be better?
>>>
>>> [[
>>> The normative algorithm for interpreting JSON-LD as RDF is specified in
>>> the JSON-LD Processing Algorithms and API specification [JSON-LD-API].
>>> ]]
>> As Sandro already said, we structured it that way to avoid a normative
>> dependency between the API spec and the syntax spec. I think we should
>> keep
>> it the way it is
>>
>>
>>
>> --
>> Markus Lanthaler
>> @markuslanthaler
>>
>>
>>
>
>
>
>
>
Received on Tuesday, 11 June 2013 14:24:08 UTC