- From: Carlos Iglesias <contact@carlosiglesias.es>
- Date: Wed, 21 Jan 2015 23:47:18 +0100
- To: Public DWBP WG <public-dwbp-wg@w3.org>
- Message-ID: <CAAa1XzmmJipsT-rPAOziwp6fzCd_G_25tGSdw4B=srceZ5fGWA@mail.gmail.com>
Hello everyone, here are the results of my BPs document walkthrough. Sorry for the huge email in advance. Quite a lot of comments but I think most of them are non-stoppers for a first draft publication. Maybe the only part that may require some more attention IMO could be the data preservation section and associated BPs Happy to further discuss anything that is not clear and also to help editing and updating whatever the group thinks that should be incorporated to the document. INTRO - I think the term "data re-users" may be more appropriate than "data consumers" in all the document given that "re-users" are willing to re-use data to do something (analysis, services, products, alternative presentations, whatever) and consumers just consume data passively (using products or services, reading analysis, etc.) IMO re-users are the real beneficiaries of this BPs. Re-users (also sometimes called infomediaries) are a sort of intermediaries between publishers and consumers. - "A basic knowledge of vocabularies and data models would be helpful to better understand some aspects of this document." would replace this for "A basic knowledge of some specific technologies could also be helpful to better understand certain implementation techniques of this document." In general I would try to keep all literature technologically-neutral with the exception of the content of the implementation sections. CHALLENGES - Some are not described in a technologically-neutral way and thus biassed e.g. "How should URIs be designed and managed for persistence?" (should be "How should IDs be designed and managed for persistence?" "Data Vocabularies How can existing vocabularies be used to provide semantic interoperability? How can a new vocabulary be designed if needed?" (should replace "data vocabularies" for "data models" everywhere) LIFECYCLE - Data creation phase is quite confusing, at least the name, because data already exists somewhere in most of the cases. I think "data preparation" or similar is more appropriate. - I think that the refinement arrow should connect with the data preparation (creation) phase, not with the publication one. - I would think twice about including a "data archiving" phase, as it is usually considered not a good practice to make things disappear from the web (with some exceptions maybe) TEMPLATE - RFC keywords are currently not only used at the intended outcomes section as it is stated in the template description, but also other places in the document. METADATA - "In terms of metadata, the particular implementation method will depend on the format of the dataset distribution, for example, metadata describing a CSV file should be provided in a different way than for an RDF dataset." After reading several times I still don't understand what we mean by this. Maybe it is just me, but I think it is not really clear. BP1 - Last point of implementation (license and rights) is already included in the first one (DCAT) - I don't know why we haven't added also JSON or XML as possible (and frequent) implementations for this. BP2 I have sent a separated mail for this. BP3 - Should keep using terms instead vocabularies in the BP description as in the title for consistence and technology neutrality. - Description of the possible implementation is not about using standard terms but about using self-descriptive formats and that should be part of a different BP - see also my separated email on BP2 and (deleted) BP4. We should focus here on providing a list of well-known reference metadata element sets that are widely used (i.e. dc; dcat; foaf...) BP4 - Have already discussed this on a separated email BP5 - "Search tools must be able to discover datasets." I would say "user agents" or "automated tools" or anything more generic than "search tools" - What kind of access mechanism is "linked data platform"? What's the difference with SPARQL endpoint? BP5-BP6 - Is there any reason for not to provide a more complete list of terms in the implementation sections? (e.g. all those from DCAT) BP7 - In how to test using a formal specification (e.g. ISO) should be also a valid option DATA IDENTIFICATION - Remove "Just by adopting a common identification system we are making possible basic identification and comparison processes by the different stakeholders in a reliable way. These will be essential pre-conditions for proper data management and to facilitate reuse." as it is duplicated in the BP7 content. BP7 - Remove all the IRIs stuff from the why section to keep it technology neutral - Remove IRIs from implementation as I am really hesitant we should be recommending using IRIs or mnemonics for IDs as best practice and need more discussion. Best practices is usually to keep IDs (and URIs) neutral instead. - Remove or complete "Apply the design rules" from test, as it basically means nothing as currently. - Missing link to "HTTP Status codes" DATA FORMATS - RDF and JSON examples should be removed from the introduction to keep technology neutrality there. BP8 - Remove reference to proprietary or non-proprietary formats because (1) it is not the scope of this BP and (2) it is already covered by other BP BP9 - If we are going to include a BP on open standards I would also include one on open licenses. Neither of those are required for having data on the web but both are good practices in order to increase audience, so deserve the same treatment - Include at least XML also in the list of open standards provided BP10 - "Providing data in more than one format reduces costs incurred in data transformation" we should clarify this is for data re-users (increase costs in fact for data producers) DATA VOCABULARIES - Should be called data models or anything else more neutral (also for all BPs titles and descriptions in this section possibly with the only exception of implementation sections) - Get rid off (or move to another more apropriated place) all the introductory vocabularies, ontologies and skos stuff as it is not technology neutral at all BP11 Same problems as for the analog "document metadata" BP. Same alternatives suggested are also valid here. BP13 Implementation to approach really weak. Should need to suggest some minimal versioning policy recommendations (will be looking at that later) BP15 Why section is not technologically neutral and need to be rewritten HOW TO FIND VOCABULARIES Has been integrated in BP15 and should be removed here HOW TO CHOOSE VOCABULARIES Should also be removed from here and integrated in BP15 DATA LICENSES Intro is not technologically neutral, that references should be removed and only part of BP17 possible implementation. Maybe http://theodi.org/guides/publishers-guide-to-the-open-data-rights-statement-vocabulary more appropriate. BP17 Looks like the ODI-LICENSING reference is not providing really useful information here DATA PROVENANCE The provenance ontology reference should be removed from the intro as it is an implementation-only question. BP18 - "Data provenance is metadata that corresponds to data." I don't really understand this sentence. - Can't also understand the expected outcome. - All options in (3) at implementation are indeed machine-readable, not only the two first. DATA QUALITY The ZAVERI reference for LOD techniques should be removed from the intro for not being tech-neutral BP19 Remove reference to the data quality work from implementation as it is still work in progress (more appropriate as a note in the meanwhile) SENSITIVE DATA Reference to HTTPS should be removed from intro for being tech-specific. BP20 Current test looks more like a implementation technique. BP21 "From a consumer machine usage perspective, the Web HTML file could contain Turtle or JSON-LD (for RDF) or it can be embdedded in the HTML page, again as [JSON-LD], or [HTML-RDFA] or [Microdata]." Don't really understand this: the web html file can also be embedded in the HTML page? DATA ACCESS Too much content about the specific techniques in the intro IMO. BP22 I don't think APIs/REST services could be suggested as a good *bulk* download option BP23 "Humans should be possible to access data using browser as a client." looks like a quite strange desirable output, no? I wouldn't say that's a desirable output by itself, more likely a side-effect. BP24 It is somehow already contained (or a specialization) of BP25 BP25 - The BP should be more general, something like "PRovide timely access to data" - "Update frequency" looks like a more appropriate term than "update cycle" BP26 - "Good versioning helps them to determine when to update to a newer version." I don't see how versioning policy could help on this. Update frequency from BP25 looks like much more valuable for that. - Track record of changes is the core of BP27 and should be removed from here. BP27 - I think that "Recommended" is not one of the RFCs, no? - We could include for implementation a recommendation to include references to other versions from each dataset (previous, first, last, next, etc.) BP28 - Shouldn't be a BP as is because is technology-tied (API). Looks more like a technique for BP26 - Implementation should clarify that difference between V1 and V2 should be the data model or the functions or collections or similar, not the data itself. In fact same call for V1 and V2 should retrieve the same data (although maybe in a different data model) DATA PRESERVATION I feel quite uncomfortable with this section in general. I have some problems trying to understand the underlying principles for this BPs, but overall it looks to be about data archiving generally speaking instead about data persistence that is indeed the best practice IMO and also coherent with other BPs in the document (such as versioning). In fact data archiving looks more like a bad practice for me than a best one. BP29 I don't really understand the purpose of this BP BP30 Same as for BP29, but even more confusing given the use of "coverage" apparently with a different meaning of the one from DCAT for example. BP31 Why section should be tech-agnostic BP32 As currently the BP is tech-dependent (only for URIs). Should refer to IDs instead and mention only specific tech on implementation. BP33 The reference to the data usage should not be part of the bp yet because work in progress. A note may be more appropriate at this stage. GENERAL - Several "how to test sections" are a little bit weak from my auditor perspective (i.e. explained in a way that it is difficult to test or not objective enough to ensure two different test by different people will raise similar results) e.g. BP1; BP5; BP13; BP21; BP23; BP25; BP32; BP33 In any case, that's something to review once the content of the BPs is more stable. - URIs/IRIs is used inconsistently around the document. Suggest to use always URIs for the shake of consistency and simplicity. That's all folks! Best, CI. --- Carlos Iglesias. Open Data Consultant. +34 687 917 759 contact@carlosiglesias.es @carlosiglesias http://es.linkedin.com/in/carlosiglesiasmoro/en
Received on Wednesday, 21 January 2015 22:47:47 UTC