- From: Manu Sporny <msporny@digitalbazaar.com>
- Date: Tue, 18 Jun 2013 13:47:32 -0400
- To: Linked JSON <public-linked-json@w3.org>
- CC: RDF WG <public-rdf-wg@w3.org>
Thanks to Dave Longley for scribing! The minutes from this week's telecon are now available. http://json-ld.org/minutes/2013-06-18/ Full text of the discussion follows including a link to the audio transcript: ------------------------------------------------------------------- JSON-LD Community Group Telecon Minutes for 2013-06-18 Agenda: http://lists.w3.org/Archives/Public/public-linked-json/2013Jun/0034.html Topics: 1. Linked Data introductory text 2. Skolemization in toRDF() algorithm 3. ISSUE-265: Media Type Registration 4. Support for xsd:short and other integer types. 5. fromRDF() creates nodes for things like rdf:type. Resolutions: 1. Adopt proposal 2g and replace the first paragraph in the JSON-LD Syntax Introduction with: Linked Data[LINKED_DATA] is a way to create a network of standards-based machine interpretable data across different documents and Web sites. It allows an application to start at one piece of Linked Data, and follow embedded links to other pieces of Linked Data that are hosted on different sites across the Web. 2. In the security considerations section, reference RFC4627 and add text explaining that evaluating the data as code can lead to unexpected side effects compromising the security of a system. 3. Add the following text to the Security Considerations section: When processing JSON-LD documents, links to remote contexts are typically followed automatically, resulting in the transfer of files without the explicit request of the user for each one. If remote contexts are served by third parties, it may allow them to gather usage patterns or similar information leading to privacy concerns. Explain that this can be controlled through effective use of the API. Chair: Manu Sporny Scribe: Dave Longley Present: Dave Longley, Markus Lanthaler, Manu Sporny, David Booth, Gregg Kellogg, Niklas Lindström, Clay Wells Audio: http://json-ld.org/minutes/2013-06-18/audio.ogg Dave Longley is scribing. Markus Lanthaler: if we have time, i'd like to discuss the blank nodes as datatypes issue Markus Lanthaler: https://github.com/json-ld/json-ld.org/issues/257 Topic: Linked Data introductory text Manu Sporny: http://lists.w3.org/Archives/Public/public-rdf-comments/2013Jun/0054.html Manu Sporny: david booth just sent out a proposal to the mailing list that i thought was pretty good Manu Sporny: http://lists.w3.org/Archives/Public/public-rdf-comments/2013Jun/0219.html David Booth: http://lists.w3.org/Archives/Public/public-rdf-comments/2013Jun/0219.html Manu Sporny: would you mind going over the proposals, david? David Booth: proposal 1 is separate that i think we already agreed to ... to include TimBL doc in the references Manu Sporny: i'm not sure we agreed David Booth: we can skip over that for the moment David Booth: i tried to give a range of possibilities, 2a would quote TimBL from his doc w/typos, 2b would do the same but clean up typos, 2c would make that text not be a definition of Linked Data David Booth: [[ David Booth: Linked Data[LINKED_DATA] is a technique for creating a network of David Booth: inter-connected machine interpretable data across different documents David Booth: and Web sites. It allows an application to start at one piece of Linked David Booth: Data, and follow embedded links to other pieces of Linked Data that are David Booth: hosted on different sites across the Web. David Booth: ]] Gregg Kellogg: http://lists.w3.org/Archives/Public/public-rdf-comments/2013Jun/0215.html Gregg Kellogg: i thought it was interesting, kingsley chimed in comments yesterday and actually supported the introduction changing to TimBL's version as being consistent with something the [w3c] group might want to do Gregg Kellogg: we actually started off deviating from that because of Kingsley's vehement objections, now that he seems like he has changed his position that's interesting Manu Sporny: i think that's a misread, his position is nuanced, i think if the group wants to make a formal declaration of what Linked Data is but he thinks things are further conflated, it would just make the RDF WG make it clear what their position is Gregg Kellogg: i think this issue about "what is Linked Data" is larger than JSON-LD and we have another linked data spec for the LDP with a different definition is a problem Gregg Kellogg: in my mind, i would go with 2b or 2c from david Manu Sporny: i really like proposal 2c Manu Sporny: i think it accomplishes was david booth wants to see and it doesn't overly complicate the intro Dave Longley: I like proposal 2c. In order to avoid statements of being unfair, we should link to the Linked Data "definition". [scribe assist by Manu Sporny] Dave Longley: I'd like to avoid the problem we had before - people accusing us of not being straightforward. [scribe assist by Manu Sporny] David Booth: do you think there would be big objections to 2b Manu Sporny: i would object to it Manu Sporny: The problem I have is with the last part of this statement - "When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)" [scribe assist by Manu Sporny] Manu Sporny: i'll just quickly outline the objections: is the line with RDF, SPARQL, if it just said "using standards" David Booth: ok, let's not get into that then Manu Sporny: are there any objections to proposal 2c David Booth: to be clearer it should say something about standards Manu Sporny: as long as we're vague about which standards, we can do that Niklas Lindström: specifically, open standards? [scribe assist by Niklas Lindström] David Booth: proposal 2d David Booth: [[ David Booth: Linked Data[LINKED_DATA] is a technique for creating a network of David Booth: inter-connected, standards-based machine interpretable data across different documents David Booth: and Web sites. It allows an application to start at one piece of Linked David Booth: Data, and follow embedded links to other pieces of Linked Data that are David Booth: hosted on different sites across the Web. David Booth: ]] Gregg Kellogg: when we talk about embedding links there are mime types involved Clay Wells: +1 Manu Sporny: that looks good to me Manu Sporny: but it would be good to make it less of a mouthful Gregg Kellogg: maybe move standards-based later on Manu Sporny: let's wordsmith later David Booth: proposal 2e David Booth: [[ David Booth: Linked Data[LINKED_DATA] is a technique for creating a network of David Booth: machine interpretable data across different documents David Booth: and Web sites. It allows an application to start at one piece of Linked David Booth: Data, and follow embedded links to other pieces of Linked Data that are David Booth: hosted on different sites across the Web. David Booth: ]] David Booth: proposal 2f David Booth: [[ David Booth: Linked Data[LINKED_DATA] is a technique for creating a network of David Booth: standards-based machine interpretable data across different documents David Booth: and Web sites. It allows an application to start at one piece of Linked David Booth: Data, and follow embedded links to other pieces of Linked Data that are David Booth: hosted on different sites across the Web. David Booth: ]] Manu Sporny: any problems with this other than switching out "technique" with "practice" ? Markus Lanthaler: What we have in the spec today: [scribe assist by Markus Lanthaler] Markus Lanthaler: These properties allow data published on the Web to work much like Web pages do today. One can start at one piece of Linked Data, and follow the links to other pieces of data that are hosted on different sites across the Web. Niklas Lindström: .. I'd like s/technique/practise/ (and perhaps "open" or "royalty-free" somewhere; unless that's implied by this spec being from W3C?) Markus Lanthaler: "These properties allow data published on the Web to work much like Web pages do today." Clay Wells: method David Booth: how about s/technique for creating/way to/ ? David Booth: proposal 2g David Booth: [[ David Booth: Linked Data[LINKED_DATA] is a way to create a network of David Booth: standards-based machine interpretable data across different documents David Booth: and Web sites. It allows an application to start at one piece of Linked David Booth: Data, and follow embedded links to other pieces of Linked Data that are David Booth: hosted on different sites across the Web. David Booth: ]] PROPOSAL: Adopt proposal 2g and replace the first paragraph in the JSON-LD Syntax Introduction with: Linked Data[LINKED_DATA] is a way to create a network of standards-based machine interpretable data across different documents and Web sites. It allows an application to start at one piece of Linked Data, and follow embedded links to other pieces of Linked Data that are hosted on different sites across the Web. Gregg Kellogg: +1 Dave Longley: +1 Manu Sporny: +1 Niklas Lindström: +1 Clay Wells: +1 David Booth: +1 Markus Lanthaler: +1 presumed the rest of the RDF WG agrees RESOLUTION: Adopt proposal 2g and replace the first paragraph in the JSON-LD Syntax Introduction with: Linked Data[LINKED_DATA] is a way to create a network of standards-based machine interpretable data across different documents and Web sites. It allows an application to start at one piece of Linked Data, and follow embedded links to other pieces of Linked Data that are hosted on different sites across the Web. Topic: Skolemization in toRDF() algorithm Manu Sporny: http://lists.w3.org/Archives/Public/public-rdf-comments/2013Jun/0159.html David Booth: so at present, there is some misalignment between the JSON-LD data model and the RDF data model, and that means that when JSON-LD is interpreted as RDF, the results of that interpretation is unpredictable, some implementations may throw away triples that contain blank nodes in "illegal" positions (illegal for the RDF data model) David Booth: the proposal is to avoid that mismatch when interpreting JSON-LD as RDF, blank nodes that would be illegal MUST be skolemized according to RDF skolemization David Booth: so the question is about requiring skolemization when interpreting JSON-LD as RDF, the only push back i've seen is requiring clients to do skolemization when the client has no way of minting globally unique IRIs Markus Lanthaler: http://lists.w3.org/Archives/Public/public-rdf-wg/2013Jun/0112.html Markus Lanthaler: yes, i brought that up and there were also concerns from Andy Markus Lanthaler: and he said that consumers should really figure out how to handle that use case David Booth: i think there was some confusion on the list about which skolemization topic we were talking about Markus Lanthaler: http://lists.w3.org/Archives/Public/public-rdf-wg/2013Jun/0112.html David Booth: one topic was about skolemization within the RDF spec and the other was about JSON-LD requiring it Markus Lanthaler: andy replied to the minutes where we were discussing that Manu Sporny: if your graph store actually supports graph labels (bnodes as graph names or properties) that there could be an issue by forcing a processor to go through some hoops that it would not normally need to do Manu Sporny: in other words, there's no need to skolemize Manu Sporny: i think we should use a SHOULD, not a MUST, because that gives us interop but still allows application developers to make the decision for what is appropriate for their stack, so the decision can be made closer to the people that it affects David Booth: if it's just an internal decision that an application is making, then i see that as irrelevant, any application can do whatever it wants David Booth: the spec is about what JSON-LD means, the point is, if two independent parties process the same document, they should come up with the set same triples David Booth: except for the naming of the skolemized IRIs Manu Sporny: how does that address markus' point that you could generate data that's wrong, that there's a clash David Booth: if you have a clash then you haven't done proper skolemization Manu Sporny: the point is that it's impossible to know if you've done proper skolemization as a client David Booth: i don't know if that's true, there are lots of ways to ensure uniqueness Markus Lanthaler: they are very simple if you don't have a distributed system Markus Lanthaler: if we take Dave Longley's JavaScript implementation, which email address should he use? his own? David Booth: is the data going to be republished? Markus Lanthaler: the point is that you don't know what the user is going to do with the data, the user knows that, and they can make the best decision Gregg Kellogg: if i sent JSON-LD to a list in an email, then it's effectively republished Gregg Kellogg: although, at a point where it's not reasonable to do such a skolemization Gregg Kellogg: one of the issues of i have with forcing skolemization is that it moves a closed world to an open world David Booth: notes that Gregg is using the term "closed world" in a non-standard way Gregg Kellogg: once you've skolemized a bnode you've gone from a closed model of data with some security benefits, to an open world and you can amend facts that were intended in a closed fashion Gregg Kellogg: intended to be stated in a closed fashion Gregg Kellogg: that's what the payswarm work is about so you can't state things about the graphs later on Manu Sporny: if you're dealing with a financial system and people can later make statements about your data Manu Sporny: that they shouldn't be able to do, and if they figured out your skolemization algorithm they could inject data into your graph Gregg Kellogg: they don't even have to do that, they just have to take a skolemized identifier and make statements about it later Manu Sporny: i think it would complicate implementations greatly Manu Sporny: since we say you have to support bnodes in graph and property positions you don't have to deal with skolemization David Booth: i would categorically object to encouraging features that make it impossible to make statements about things Manu Sporny: no one is proposing that, i don't think that's on the table David Booth: i'm talking about making it hard for someone to make statements about a graph Manu Sporny: i don't follow, how are we going to do that? David Booth: the comment was that if something is published with a bnode that you can't make statements about it elsewhere Gregg Kellogg: that's true today about RDF David Booth: i think that's very anti-web and a negative property David Booth: there is a whole discussion about draconian gov'ts linking to or making reference to certain statements Manu Sporny: i think we're having a philosophical discussion at this point, we should focus on a solid proposal Manu Sporny: there are a number of people that agree with you but let's stick to solid proposals David Booth: i don't think we have enough convergence yet, i wasn't aware of andy pushing back here, i need to analyze what he means, i need to better understand the use case that would be injured by requiring skolemization David Booth: there is another solution entirely, which would be for the RDF WG to allow bnodes in those positions Manu Sporny: peter raised just that this morning David Booth: my overall objective here is that i think it's critical if two parties deserialize RDF they get the same result, or as close as possible David Booth: with a minimum loss of information Markus Lanthaler: but they [aren't] the same result, [are] they? David Booth: the problem is two different clients taking the same JSON-LD document would produce data that is interpreted in different ways Manu Sporny: i think that's where we disagree, i think they are entirely different documents, one uses bnodes the other skolemization David Booth: i'm saying that when a JSON-LD document is interpreted as RDF, skolemized IDs should be generated Manu Sporny: that would mean that your graph store, if it supports bnodes in these positions, is broken Manu Sporny: your graph store could interpret things in the correct way, but it will be forced to interpret in a different way Markus Lanthaler: you are always getting different data since every skolemization will produce different IDs David Booth: the data is only different in non-important ways Gregg Kellogg: RDF Datasets allow bnode graph names: https://www.w3.org/2013/meeting/rdf-wg/2013-06-12#resolution_1 David Booth: by unimportant i mean that the data is the same graph and they use all the same IRIs with the exception of the skolemized IRIs Gregg Kellogg: i think this issue with regards to bnode graph names is moot since RDF datasets can have graph labels that are bnodes Manu Sporny: We did have a use case - JSON Gregg Kellogg: i cannot recall if we had a driving use case for allowing bnodes as predicates Gregg Kellogg: can someone remind me Dave Longley: RDF Concepts doesn't allow blank nodes in predicate position - perhaps we should say that. [scribe assist by Manu Sporny] Manu Sporny: I think RDF Concepts basically does that. [scribe assist by Manu Sporny] Topic: ISSUE-265: Media Type Registration Manu Sporny: https://github.com/json-ld/json-ld.org/issues/265 Dave Longley: I'm with you on this issue [scribe assist by Clay Wells] David Booth: i think there's still some technical misunderstanding of skolemization which i will try to better explain on the mailing list Markus Lanthaler: so the request is for us to be more explicit Markus Lanthaler: the feedback is that we should be more explicit about why using JavaScript's eval() method is a bad idea, etc. Manu Sporny: isn't there some security consideration we could link to Clay Wells: sorry, dbooth, I'm with you on that topic. Thanks! Markus Lanthaler: we need to reference the JSON security section Markus Lanthaler: we just need to add a sentence, i think, explaining why eval() is a problem Markus Lanthaler: we don't have to change much but it would address their concerns PROPOSAL: In the security considerations section, reference RFC4627 and add text explaining that evaluating the data as code can lead to unexpected side effects compromising the security of a system. Markus Lanthaler: we also need to address fetching remote contexts automatically that might leak user information, and we should say that processing a JSON-LD might result in http request without explicit request by the user, etc. Niklas Lindström: +1 Gregg Kellogg: +1 Dave Longley: +1 Markus Lanthaler: +1 Clay Wells: +1 RESOLUTION: In the security considerations section, reference RFC4627 and add text explaining that evaluating the data as code can lead to unexpected side effects compromising the security of a system. David Booth: needs to drop off for another call. Thanks all! PROPOSAL: In the security considerations section, warn that remote contexts are dereferenced automatically and that usage patterns could be tracked based on the requests. Niklas Lindström: do we need more specific text, maybe there is something akin to this... somewhere in the depths in the XML specs where you can include external entities, it's much more like CSS, it's a good analogy i think PROPOSAL: In the security considerations section, warn that remote contexts are dereferenced automatically and that usage patterns could be tracked based on the requests leading to privacy concerns. Markus Lanthaler: When processing JSON-LD documents, links to remote contexts are typically followed automatically, resulting in the transfer of files without the explicit request of the user for each one. If remote contexts are served by third parties, it may allow them to gather usage patterns or similar information Gregg Kellogg: caching can help mitigate with this problem Manu Sporny: the problem is that the third party is tracking you, and if they wanted to, they could say "don't cache this" Gregg Kellogg: the publisher would be the one controlling the cache-control headers Gregg Kellogg: if they made stable contexts that could be cached for long periods of time that would help Markus Lanthaler: the problem is that the user is not in control of the publisher or the third party Markus Lanthaler: if you put schema.org in your context, schema.org would be the third party Manu Sporny: if you have proxy in between it can poison the cache, etc. Niklas Lindström: this all reminds me of XML catalogs, that is, some mechanism for the processor to control cached versions of contexts so that, for instance, you could pass in a reference to a dictionary of contexts and tell the processor it can only use that Dave Longley: yeah, we can do that with the API and the remote context callback loading option Gregg Kellogg: we could mention that you can use that to mitigate this problem Gregg Kellogg: if we don't have a way to mitigate it could result in further debate or denial, so we should mention this PROPOSAL: Add the following text to the Security Considerations section: When processing JSON-LD documents, links to remote contexts are typically followed automatically, resulting in the transfer of files without the explicit request of the user for each one. If remote contexts are served by third parties, it may allow them to gather usage patterns or similar information leading to privacy concerns. Explain that this can be mitigated through effective use of the API. Markus Lanthaler: i don't think this is about implementation guidance Markus Lanthaler: +1 Dave Longley: +1 Clay Wells: +1 Gregg Kellogg: +1 Niklas Lindström: +1 and rewording "mitigate" to "control" Manu Sporny: +1 RESOLUTION: Add the following text to the Security Considerations section: When processing JSON-LD documents, links to remote contexts are typically followed automatically, resulting in the transfer of files without the explicit request of the user for each one. If remote contexts are served by third parties, it may allow them to gather usage patterns or similar information leading to privacy concerns. Explain that this can be controlled through effective use of the API. Topic: Support for xsd:short and other integer types. Dave Longley: We don't want to eliminate the use case where people want to stay within the range of the systems limitations. [scribe assist by Manu Sporny] Manu Sporny: We had discussed this before, we don't want to introduce round-tripping issues. [scribe assist by Manu Sporny] Markus Lanthaler: [scribe missed] [scribe assist by Manu Sporny] Dave Longley: At some point, the spec said "if there is a fractional part, turn it into a double" - that might be difficult to do in certain languages. You do double-based math and get the wrong result. [scribe assist by Manu Sporny] Markus Lanthaler: Yes, you run into rounding errors. [scribe assist by Manu Sporny] Dave Longley: I'm not sure, there are unfortunate cases w/ doubles. I'm less concerned about those than the common use case - if something gets turned into an integer (when it was a double) then that's bad. [scribe assist by Manu Sporny] Markus Lanthaler: a double like 5.0 will become an 5 xsd:integer when going though RDF round-tripping [scribe assist by Markus Lanthaler] Dave Longley: Out of all the possible choices we have, we have picked the least sucky approach. We're trying to be pragmatic. [scribe assist by Manu Sporny] Markus Lanthaler: We keep what we have, and we ask Sandro about his opinion. If he's fine with this, then we can close the issue. [scribe assist by Manu Sporny] Dave Longley: I think people are going to want to control this via the context (convert to native types) [scribe assist by Manu Sporny] Dave Longley: And in that way, it's not tied to XSD. [scribe assist by Manu Sporny] Gregg Kellogg: We could get rid of the runtime flag, not use the context? [scribe assist by Manu Sporny] Dave Longley: The only issue is that everything would happen in compaction/expansion. [scribe assist by Manu Sporny] Dave Longley: In fromRDF/toRDF we can have an option to make it more specific... it would only convert the things in the context that are specified. [scribe assist by Manu Sporny] Niklas Lindström: When you use Turtle, and use the native thing there - most things are turned to decimals. More fine-grained control is needed. [scribe assist by Manu Sporny] Niklas Lindström: How would we do this? [scribe assist by Manu Sporny] Dave Longley: We'd have to add something if we wanted to be fine-grained. [scribe assist by Manu Sporny] Dave Longley: If we wanted something more than we have right now, if we find something that is typed coerced to this type, convert it to an integer or a double - that's the simpler solution w/o adding keywords to JSON-LD. [scribe assist by Manu Sporny] Niklas Lindström: … {"rating": "xsd:decimal"}, values of rating being JSON Numbers, and always xsd:decimals in RDF Gregg Kellogg: That the useNativeTypes flag is false by default, that mitigates this to a large degree. [scribe assist by Manu Sporny] Gregg Kellogg: Developers can specify a wrapper to make this easier. [scribe assist by Manu Sporny] Markus Lanthaler: When a pattern emerges, we can standardize it in 1.1 [scribe assist by Manu Sporny] Dave Longley: Yes, let implementations sort it out and we can standardize it later. [scribe assist by Manu Sporny] Niklas Lindström: RDFLib in Python always exposes a literal as a native number... but uses the added datatype tag to determine the exact datatype tag is very useful. You never really care about the lexical representation in that case. [scribe assist by Manu Sporny] Niklas Lindström: You can always use the number directly in processing. It would be nice to do that in the default case. [scribe assist by Manu Sporny] Topic: fromRDF() creates nodes for things like rdf:type. Niklas Lindström: i brought up this issue a couple of weeks ago, the resulting JSON-LD, flattened JSON-LD from RDF conversion creates nodes for things that haven't been linked to, the most glaring examples being every rdf:type Manu Sporny: scribe-nods. Niklas Lindström: except rdf:nil ... those don't turn up Niklas Lindström: another example would be if you gave someone a homepage, it shows up as a node Niklas Lindström: this is annoying too because choosing JSON-LD to transmit RDF would increase the size of the data Niklas Lindström: with unusable nodes Gregg Kellogg: markus made the point that by doing this you can effectively find all the links in the document Gregg Kellogg: my own position is that it does make it a little bit more ugly, but i don't know if that effectively matters in the long run Markus Lanthaler: that's the result of the node map generation algorithm which basically just collects all the nodes in the doc Niklas Lindström: you don't get all the bnodes, you only get the things that are IRIs Markus Lanthaler: you get bnodes as well, they are all labeled with blank node identifiers Markus Lanthaler: every node that appears in the graph appears in the flattened output Manu Sporny: you could just post-process the output, right? Niklas Lindström: you could also do the same in the other direction Markus Lanthaler: i think it's more useful to be able to find all the nodes in a graph Markus Lanthaler: if you wanted to create a graph you'd just loop Niklas Lindström: when i connect them i have to create all the links anyway Markus Lanthaler: you can just enumerate them in a simple way Manu Sporny: does anyone care enough about this to modify the flattening algorithm to remove these? Niklas Lindström: when i implemented this, it looked much more verbose, i had to do additional work to make the result look uglier Manu Sporny: is it that much of an issue to create a function to remove the nodes you don't want Manu Sporny: it seems to me like something someone could do in 10 minutes at most Niklas Lindström: i don't see the point of making flattening looking uglier than it has to Niklas Lindström: it looks like an artifact of an algorithm that could be changed Niklas Lindström: it seems like preparing for something that someone might want to use for something else Niklas Lindström: i used this with connect and i didn't need this structure Dave Longley: It's already difficult for people to find stuff by subject. [scribe assist by Manu Sporny] Dave Longley: I want to make sure we're not making things more difficult than they already are, since we don't expose a node map. [scribe assist by Manu Sporny] Manu Sporny: notes we're out of time. Discussion about rdf:nil and lists and whether they show up in flattening as well. Markus Lanthaler: We need to cover this next time: https://github.com/json-ld/json-ld.org/issues/257 Dave Longley: We also need some input on this in the issue tracker: https://github.com/json-ld/json-ld.org/issues/264 -- manu -- Manu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny) Founder/CEO - Digital Bazaar, Inc. blog: Meritora - Web payments commercial launch http://blog.meritora.com/launch/
Received on Tuesday, 18 June 2013 17:47:57 UTC