Review and Comments for Linked Data Platform FPWD

Hi all,

Many of you know how late I am getting involved in the LDP WG.  Fewer of you know how interested I am in this work.  Apologies for the late review of this document.

This review relates to Linked Data Platform 1.0 dated 25 Oct 2012 [1]

Caveat 1:  Arnaud asked me last June to provide a summary of the design decisions that went into the development of the Callimachus REST API [2], but I haven't done it.  I believe that most of the value in such a document is implicit in this review.  I have made reference to design decisions and practical implementation experience garnered through Callimachus [3].  It is our intention to ensure that Callimachus becomes compliant with this document (and it appears to be close to it now).

Caveat 2:  I am typing this on an airplane, so apologies for any typos and munged URLs.

Caveat 3:  This review has become lengthy.  I am willing to edit the document to include my non-controversial changes if the editors would like the help and I am willing to summarize what is sure to be a disconnected thread in a subsequent message.   Please let me know.

Sections:
A.  Substantive comments on scope and content
B.  Document style
C.  Minor grammarical suggestions


A.  Substantive comments on scope and content

- What does the LDP WG Charter say about existing implementations?  Can we break them?  Unfortunately, I can't check at the moment. The RDF WG is required to at least minimize breakage. We might want to keep the principle in mind, even though this WG is more free to invent, to avoid causing unnecessary trouble for implementors.

- The Abstract says, "that describe their state using RDF". I suggest changing that to "that describe their state using the RDF data model" in order to avoid confusion regarding current or future RDF (or other non-RDF) syntaxes.  We have seen many people getting confused on this point, from the XML and enterprise communities regarding RDF/XML to the current discussions related to JSON-LD's relationship to RDF.

- What is the official status of the namespace for ldp: (first referenced in Section 2.1, Conventions Used in This Document)?  I don't know W3C's policy on namespace registration, but this should probably be an ISSUE unless the Team Contacts already have a handle on it.

- The first paragraph of Section 4, Linked Data Platform Resource, mentions that a LDPRs may contain data regarding a variety of domains.  I agree, however the statement that such data could be "religious" in nature made me think about logical consistency (specifically, the Christian conception of the Trinity seems to have been formulated in a way that under FOL be logically inconsistent - c.f. the solutions to the Sabellianism and Arianism Heresies). Does the LDP require LDPRs to be logically consistent?  I suggest that this document make no statement about logical consistency within LDPRs, but thought I should mention it in case others disagree.

- I have no objection to requiring HTTP/1.1, as defined in Section 4.1.1, but am curious why the WG thinks it is necessary.

- Regarding Section 4.1.4, can Servers use DNS aliasing to differentiate LDPRs (as with Apache virtual hosts or Callimachus secondary authorities)?  I'd prefer that they can and the document should probably say so in this section.

- I can see where is would be useful in many use cases to have rdf:types served in LDPR representations, but am not sure it is so common that Section 4.1.7 should make this a SHOULD.  Perhaps MAY (or language to the effect of, "it is often useful to") would be more appropriate in the general case?

- Section 4.1.8 references ISSUE-9 (Should properties used in LDPR representations be LDPRs?). I think not, because they may be defined as *part* of an LDPR (as In a schema document using the hash URL pattern to facilitate self-documentation).  Such uses would not seem to match the (non-recursive) definition of an LDPR.

- I am quite nervous about Section 4.1.9's requirement that LDPR representations MUST "use *only* the following standard datatypes" (emphasis mine). I think there are several problems with this: If a representation fails to use only those datatypes it becomes a legally served non-LDPR (in accordance with Section 4.1.3) instead of a non-compliant LDPR and this decision would seriously limit any desired application extensions (via legitimate RDF and Linked Data extension mechanisms).  This relates to ISSUE-6 (Should LDBP say that any kind of user-defined data type is disallowed), to which I would strongly vote "no". Simple is good, but systematically eliminating existing extension mechanisms seems ill advised.  I suggest replacing this with something like, "LDPR representations MAY include the following datatypes and MUST limit the use of other datatypes to the minimum required".

- Section 4.1.11 didn't make sense to me until I read Section 4.2. Until then, it sounded like the document was suggesting that LDPR was another RDF format.  Rewording the second sentence to avoid that problem would fix it.

- Section 4.1.1 references ISSUE-22 (Need to normatively reference and recommend JSON-LD). Similarly, Sections 4.2.2 and 4.2.3 and ISSUE-23 (Remove application/rdf+xml as a SHOULD) relate.  I suggest that although the focus on Turtle is fine (even preferred) as a MUST, all other current and future RDF syntaxes that are W3C Recommendations should be included as SHOULDs.  That way, LDPR Servers SHOULD accept as many standard RDF formats as possible, presumably by using the small number of commonly used parsers.  Standard RDF formats include or are anticipated to shortly include Turtle, RDF/XML, RDFa, n-triples and JSON-LD.  There may be more (or less) some time in the future, so it would be nice to future-proof this document so we don't need to revisit it for changes to other RDF syntax specifications.  I would also be happy with MAY.

- Section 4.1.13 references ISSUE-2 (Do LDPR versions get managed in a systematic, discoverable way?). I agree that ETags and Last-Modified headers are sufficient, but sometimes wonder whether a Server-defined version number in a header wouldn't be useful to inform a Client just how many updates they may have missed.  James Leigh disagrees with me on that and I am not firm on it.  I leave it to the WG.

- Section 4.1.13 references ISSUE-16 (Redirection of non-information resources to LDPRs). I note that Section 5.3.5.1 mandates that 303 redirects SHOULD be used and I like consistency in Client implementations.  Section 5.1.3 also suggests 303 redirection be used for paging.

- Sections 4.4.1, 4.5.1, 4.7.1, 5.4.1, 5.6.1 and 5.8.1 all relate to allowable write operations.  I suggest adding the statement, "An LDPR server MAY require a user to be authenticated and authorized before this action is permitted." to each of those sections.

- Section 4.4.1 says that servers MUST ignore certain RDF statements that may be provided by a client.  I understand (and am sympathetic to) the rationale presented, but am concerned that implementors will have a difficult time with this.  Current implementations make use of a small set of commonly used parsers to ingest RDF data, often serializing it it directly into a data store during streamed parsing.  Should implementors plan to change their code to monitor each incoming triple for dcterms:modified and dcterms:creator properties with the intention of removing them?  I doubt they will do so (due to the performance hit during an already slow process).  Then again, perhaps I worried too much about this.  Reading more carefully, I can see that those terms may be provided by a client but MUST NOT change the relevant state of the server's understanding of the state of the LDPR.  Is that what is intended?

- Section 4.4.1 references ISSUE-11 (Do we need to define server-managed properties or do we leave them to applications?) in relation to an LDPR PUT.  ISSUE-20 (Identifying and naming POSTed resources) would seem relate.  Callimachus requires such PUT and POST requests to contain a slug to assist with naming.  I suggest that LDPR adopt this mechanism (or something very similar) because to fail to do limits interoperability.  Section 5.4.10 mentions the problems with client differentiation in relation to domain-specific constraints and I agree.  Adopting a mandatory slug for PUTs, POSTs and PATCHes would satisfy ISSUE-20 and allow better guidance in Section 5.4.9 (which currently says "application-specific rules"). 

- Why should clients be restricted (MUST NOT) from presuming a known set of predicates for a particular server (Section 4.4.4)?  Since this both impossible to enforce (or to even know) and servers can throw away any triples they don't like, why not make this a SHOULD NOT?

- What is the intent of Section 4.4.6's allowance for LDPR PUTs without an LDPC?  Doesn't an LDPR server without LDPC support mean that the PUT would go to the root service for processing anyway?  My earlier suggestion regarding the use of slugs to request a name could be used to simplify this, I think.

- In Section 4.4.7, I suggest changing "It is common for LDPR servers to put restrictions on representations" to "LDPR servers MAY put restrictions on representations". The sentence as it stands now may reflect the original Member Submission, but doesn't reflect other use cases in the wild.  I don't think the change loses anything, but it does clarify that an LDPR server's available options.

- Sections 4.5.1 and 4.5.2 touch on ISSUE-24 (Should DELETED resources remain deleted?). My personal experience with such "tombstoning" of URIs comes from the two Persistent URL implementations that I have been involved with (the community PURL software at purlz.org that purl.org and others use, and the PURL implementation in Callimachus).  There are certainly some use cases (however uncommon) where tombstoning URIs is a good and even necessary idea.  Therefore, I suggest resolving ISSUE-24 by stating that LDPR servers MAY choose not to recycle DELETED URIs and update Section 4.5.2 to be consistent with that decision.

- Regarding ISSUE-12 (Can HTTP PATCH be used for resource creation?), I suggest not.  That would be ugly and I think we should not do ugly things.

- Regarding ISSUE-17 (changesets as a recommended PATCH format), I suggest yes.  However, they should probably become a separate Note or Rec in parallel.

- Section 4.8 says that the listed RDF vocabularies MUST be used, but the third paragraph of Section 5.1 makes the use of rdfs:member optional.  Those two states are currently in conflict.  Although Section 5.1 is non-normative, the material in it is repeated in Section 5.2.3 (also marked non-normative).  The cleanest way to fix this is probably by removing the rdfs:member entry from the table in Section 4.8.

- The non-normative Section 5.1.3 says, "LDPCs may support... Paging", but the normative Section 5.3.4 says SHOULD.  These should be brought into alignment.

- The first paragraph of Section 5.1.4 makes a good point that clients may wish to order results themselves.  However, the next paragraph throws away that flexibility in order to allow pagination.  I understand that we want to keep requirements small, but perhaps it is worth discussing whether clients should be able to request ordering based on a particular property's value.  I would imagine that a server could always respond that it doesn't support such behavior (a server MAY implement server-side ordering).

- Section 5.3.7 suggests that a client may request a descending order for pages, but doesn't provide guidance on how the DESC keyword should be passed to the server.

- Section 5.3.7.1 says, "The Linked Data Platform does not define how clients discover LDPCs." Has this be decided by resolution or by default?  I would like to suggest that a mechanism similar to that found in Callimachus would be a very useful addition to this spec: HTTP OPTIONS requests may be made on the root URL of a site and on each container to discover relevant URLs for navigation via LINK: headers.  Details of our design may be found at [2].  The value is to provide a standard mechanism for resource discovery on a host or service basis that is consistent with HTTP.  I didn't want to raise an issue on this in case it has already been decided by resolution, but suggest that this be a MAY, not a MUST, requirement.

- Section 5.4.8 says that "LDPC servers MUST interpret the null relative URI for the subject of triples... as referring to the entity in the request body". I understand why you would want to do that, but this is another case where implementors should be expected to scream.  You are asking them to introspect each triple during ingest just in case it might contain a null relative URI in the. subject position.  Then you are asking them to assign a URI for the resource before the parsing is known to be valid...

- I suggest adding the input documents listed in the LDP WG Charter to the Acknowledgements section.


B.  Document style

- I found the normative and non-normative guidance presented in Section 3, Conformance, to be confusing.  There are at least two problems with it: It is unclear which content is (e.g.) a guideline or a note and some non-normative sections clearly give MAY/MUST requirements.  Please see below for specific details regarding the latter problem.

- Section 5.1.3 contains a couple of "mays" in the second and third paragraphs that read like they should be MAYs.  Section 5.2 has a bunch of MAYs, MUSTs and a SHOULD NOT.  However, both sections are explicitly marked non-normative.  This is at best confusing and at worst going to make trouble when the document is in Last Call.  I suggest that anyplace there is a defined term from RFC 2119 that paragraph or section should be normative.

- Similarly, Section 5.3.2 (which is normative) references Section 5.1.2 (a non-normative section) for details of implementation.  I suggest expanding 5.3.2 to include any normative content and changing that section to refer to 5.1.2 only for an example.


C.  Minor grammarical suggestions

- s/A LDPR/An LDPR/g and s/A LDPC/An LDPC/g ; An English "an" is used in front of words starting with vowels as they are  *pronounced*, not necessarily as they are spelled.  NB: Subjects of the British Crown who may titter with bemusement at an American correcting English usage may proceed at will.  I can take it ;)

- I suggest using "Linked Data" in place of "linked data" throughout (e.g. in the Introduction) because it is a defined term (in Section 2, Terminology).  It is also used that way commonly on the Web as a proper noun.

- In the last paragraph of Section 1, Introduction, s/will depend on/will depend upon/.

- I believe a paragraph break belongs in Section 2, Terminology, in the definition of "Server" after the first sentence (following "sending back responses"). The remaining material seems to refer to both clients and servers.

- The part of Section 4.7.2 that reads, "It is common for LDPR servers to put restrictions on representations" is duplicated from Section 4.4.7 so I suggest that it be removed.

- In the first paragraph of Section 4.8, I suggest defining or referring to the definition of the term "application semantic" and changing "BP resource" to simply "resource".

- In Section 4.8.3, rdfs:member seems only to apply to LDPCs. Should it move to Section 5 or should the Comment field note that it is optionally used for container-member relationships?  This only applies if the rdfs:member entry is not removed from the table, as suggested above.

- In the second paragraph Section 5.1.1, s/there will be/there may be/.

- In the fourth paragraph of Section 5.1.3, I suggest removing "JohnZSmith". The example should be referred to in a consistent manner.

- Examples 6 and 7 in Section 5.1.3 aren't quite aligned.  Example 6 shows the container has members a3 and a4 that aren't on page 1.  Therefore, Example 7 (which purports to show page 2) should show details for a3 and a4, but it shows member a5 instead.  Further, the paragraph under Example 7 notes that there is only one member show on page 2, so updating just Example 7 would cause that point to be invalid.  I suggest removing a4 from Example 6 and changing a5 to a3 in Example 7.

- The material in Section 5.2.3 that duplicates the third paragraph of Section 5.1 should be removed ("The membership triples of a container..."). 

- Section 5.3.2 references Section 5.1.2, but gets its title wrong (the word "Only" is missing).

- Section 5.3.4 references Section 5.1.3 in an inconsistent style: s/5.1.3 titled "Paging"/5.1.3 Paging/

- Section 5.3.6.1 extends a highlight color onto the semicolon following "ldp:Page". I suggest that the semicolon should probably be a full stop/period in any case.

- ISSUE-7's title should be edited: s/permittered/permitted/


Whew!

Regards,
Dave

[1] http://www.w3.org/TR/2012/WD-ldp-20121025
[2] http://code.google.com/p/callimachus/wiki/REST_API (If I didn't remember this URI correctly, please go to [3] and select "Documentation"; there is a link to the REST API document near the bottom of that page)
[3] http://callimachusproject.org

Received on Sunday, 28 October 2012 10:23:35 UTC