- From: Phil Archer <phila@w3.org>
- Date: Thu, 22 Jan 2015 14:45:30 +0000
- To: contact@carlosiglesias.es, Public DWBP WG <public-dwbp-wg@w3.org>
Some comments and actions from me on this. On 21/01/2015 22:47, Carlos Iglesias wrote: > Hello everyone, here are the results of my BPs document walkthrough. > Sorry for the huge email in advance. Quite a lot of comments but I think > most of them are non-stoppers for a first draft publication. Maybe the only > part that may require some more attention IMO could be the data > preservation section and associated BPs > > Happy to further discuss anything that is not clear and also to help > editing and updating whatever the group thinks that should be incorporated > to the document. > > > INTRO > > - I think the term "data re-users" may be more appropriate than "data > consumers" in all the document given that "re-users" are willing to re-use > data to do something (analysis, services, products, alternative > presentations, whatever) and consumers just consume data passively (using > products or services, reading analysis, etc.) IMO re-users are the real > beneficiaries of this BPs. Re-users (also sometimes called infomediaries) > are a sort of intermediaries between publishers and consumers. Personally I agree that re-users is better than consumers but the consensus in the WG so far has been consumer. > > - "A basic knowledge of vocabularies and data models would be helpful to > better understand some aspects of this document." would replace this for "A > basic knowledge of some specific technologies could also be helpful to > better understand certain implementation techniques of this document." > > In general I would try to keep all literature technologically-neutral with > the exception of the content of the implementation sections. It depends on your POV. I like the term data model but Bernadette argues that it means something different in the context of an RDB although I think it's worth thinking about. Vocabularies typically have a UML-like diagram. The important thing about DCAT, for example, is the distinction between a dataset and a distribution. Whether you use dcat or schema.org terms is less important IMO. But this is a WG discussion point. > > CHALLENGES > > - Some are not described in a technologically-neutral way and thus biassed > e.g. > > "How should URIs be designed and managed for persistence?" (should be "How > should IDs be designed and managed for persistence?" No. ID schemes other than URIs are not on the Web and therefore out of scope. Welcome to W3C. > > "Data Vocabularies > How can existing vocabularies be used to provide semantic interoperability? > How can a new vocabulary be designed if needed?" > > (should replace "data vocabularies" for "data models" everywhere) Maybe. See above. > > LIFECYCLE > > - Data creation phase is quite confusing, at least the name, because data > already exists somewhere in most of the cases. I think "data preparation" > or similar is more appropriate. > > - I think that the refinement arrow should connect with the data > preparation (creation) phase, not with the publication one. > > - I would think twice about including a "data archiving" phase, as it is > usually considered not a good practice to make things disappear from the > web (with some exceptions maybe) This has been discussed (at TPAC and since IIRC). We talk only about the Web aspects, such as redirects, proper HTTP response codes etc. Actually I think we can sharpen up that advice perhaps with 303 and 410 codes etc. But that's a BP I'm currently looking at for a variety of reasons and will have a modified version to offer later. > > TEMPLATE > > - RFC keywords are currently not only used at the intended outcomes section > as it is stated in the template description, but also other places in the > document. +1 This needs to be fixed - I'm on it. > > METADATA > > - "In terms of metadata, the particular implementation method will depend > on the format of the dataset distribution, for example, metadata describing > a CSV file should be provided in a different way than for an RDF dataset." > > After reading several times I still don't understand what we mean by this. > Maybe it is just me, but I think it is not really clear. Given the sentences preceding this one, I don't think it actually adds anything except, it seems, confusion. So I've simply removed it. > > BP1 > > - Last point of implementation (license and rights) is already included in > the first one (DCAT) True. I have removed it. > > - I don't know why we haven't added also JSON or XML as possible (and > frequent) implementations for this. That would be useful. Care to suggest some text and links? (I'm trying to help the editors get this done in a bit of a hurry). > > BP2 > > I have sent a separated mail for this. > > BP3 > > - Should keep using terms instead vocabularies in the BP description as in > the title for consistence and technology neutrality. > > - Description of the possible implementation is not about using standard > terms but about using self-descriptive formats and that should be part of a > different BP - see also my separated email on BP2 and (deleted) BP4. We > should focus here on providing a list of well-known reference metadata > element sets that are widely used (i.e. dc; dcat; foaf...) OK, again, suggested text would be helpful. > > BP4 > > - Have already discussed this on a separated email > > BP5 > > - "Search tools must be able to discover datasets." I would say "user > agents" or "automated tools" or anything more generic than "search tools" I agree, user agents it is. > > - What kind of access mechanism is "linked data platform"? What's the > difference with SPARQL endpoint? Reference to the LDP spec added. Perhaps therefore we should also add refs to SPARQL, SOAP, and, less easily, "REST interfaces." > > BP5-BP6 > > - Is there any reason for not to provide a more complete list of terms in > the implementation sections? (e.g. all those from DCAT) Brevity and readability. We have referred to DCAT a lot, what we're highlighting here is that it covers both discovery and admin aspects. === Out of time for this pass. I'll return to the e-mail and look at the rest when I can, hopefully later today. Phil. > > BP7 > > - In how to test using a formal specification (e.g. ISO) should be also a > valid option > > DATA IDENTIFICATION > > - Remove "Just by adopting a common identification system we are making > possible basic identification and comparison processes by the different > stakeholders in a reliable way. These will be essential pre-conditions for > proper data management and to facilitate reuse." as it is duplicated in the > BP7 content. > > BP7 > > - Remove all the IRIs stuff from the why section to keep it technology > neutral > > - Remove IRIs from implementation as I am really hesitant we should be > recommending using IRIs or mnemonics for IDs as best practice and need more > discussion. Best practices is usually to keep IDs (and URIs) neutral > instead. > > - Remove or complete "Apply the design rules" from test, as it basically > means nothing as currently. > > - Missing link to "HTTP Status codes" > > DATA FORMATS > > - RDF and JSON examples should be removed from the introduction to keep > technology neutrality there. > > BP8 > > - Remove reference to proprietary or non-proprietary formats because (1) it > is not the scope of this BP and (2) it is already covered by other BP > > BP9 > > - If we are going to include a BP on open standards I would also include > one on open licenses. Neither of those are required for having data on the > web but both are good practices in order to increase audience, so deserve > the same treatment > > - Include at least XML also in the list of open standards provided > > BP10 > > - "Providing data in more than one format reduces costs incurred in data > transformation" we should clarify this is for data re-users (increase costs > in fact for data producers) > > DATA VOCABULARIES > > - Should be called data models or anything else more neutral (also for all > BPs titles and descriptions in this section possibly with the only > exception of implementation sections) > - Get rid off (or move to another more apropriated place) all the > introductory vocabularies, ontologies and skos stuff as it is not > technology neutral at all > > BP11 > > Same problems as for the analog "document metadata" BP. Same alternatives > suggested are also valid here. > > BP13 > > Implementation to approach really weak. Should need to suggest some minimal > versioning policy recommendations (will be looking at that later) > > BP15 > > Why section is not technologically neutral and need to be rewritten > > HOW TO FIND VOCABULARIES > > Has been integrated in BP15 and should be removed here > > HOW TO CHOOSE VOCABULARIES > > Should also be removed from here and integrated in BP15 > > DATA LICENSES > > Intro is not technologically neutral, that references should be removed and > only part of BP17 possible implementation. Maybe > http://theodi.org/guides/publishers-guide-to-the-open-data-rights-statement-vocabulary > more appropriate. > > BP17 > > Looks like the ODI-LICENSING reference is not providing really useful > information here > > DATA PROVENANCE > > The provenance ontology reference should be removed from the intro as it is > an implementation-only question. > > BP18 > > - "Data provenance is metadata that corresponds to data." I don't really > understand this sentence. > > - Can't also understand the expected outcome. > > - All options in (3) at implementation are indeed machine-readable, not > only the two first. > > DATA QUALITY > > The ZAVERI reference for LOD techniques should be removed from the intro > for not being tech-neutral > > BP19 > > Remove reference to the data quality work from implementation as it is > still work in progress (more appropriate as a note in the meanwhile) > > SENSITIVE DATA > > Reference to HTTPS should be removed from intro for being tech-specific. > > BP20 > > Current test looks more like a implementation technique. > > BP21 > > "From a consumer machine usage perspective, the Web HTML file could contain > Turtle or JSON-LD (for RDF) or it can be embdedded in the HTML page, again > as [JSON-LD], or [HTML-RDFA] or [Microdata]." > > Don't really understand this: the web html file can also be embedded in the > HTML page? > > DATA ACCESS > > Too much content about the specific techniques in the intro IMO. > > BP22 > > I don't think APIs/REST services could be suggested as a good *bulk* > download option > > BP23 > > "Humans should be possible to access data using browser as a client." looks > like a quite strange desirable output, no? I wouldn't say that's a > desirable output by itself, more likely a side-effect. > > BP24 > > It is somehow already contained (or a specialization) of BP25 > > BP25 > > - The BP should be more general, something like "PRovide timely access to > data" > - "Update frequency" looks like a more appropriate term than "update cycle" > > BP26 > > - "Good versioning helps them to determine when to update to a newer > version." I don't see how versioning policy could help on this. Update > frequency from BP25 looks like much more valuable for that. > > - Track record of changes is the core of BP27 and should be removed from > here. > > BP27 > > - I think that "Recommended" is not one of the RFCs, no? > > - We could include for implementation a recommendation to include > references to other versions from each dataset (previous, first, last, > next, etc.) > > BP28 > > - Shouldn't be a BP as is because is technology-tied (API). Looks more like > a technique for BP26 > > - Implementation should clarify that difference between V1 and V2 should be > the data model or the functions or collections or similar, not the data > itself. In fact same call for V1 and V2 should retrieve the same data > (although maybe in a different data model) > > DATA PRESERVATION > > I feel quite uncomfortable with this section in general. I have some > problems trying to understand the underlying principles for this BPs, but > overall it looks to be about data archiving generally speaking instead > about data persistence that is indeed the best practice IMO and also > coherent with other BPs in the document (such as versioning). In fact data > archiving looks more like a bad practice for me than a best one. > > BP29 > > I don't really understand the purpose of this BP > > BP30 > > Same as for BP29, but even more confusing given the use of "coverage" > apparently with a different meaning of the one from DCAT for example. > > BP31 > > Why section should be tech-agnostic > > BP32 > > As currently the BP is tech-dependent (only for URIs). Should refer to IDs > instead and mention only specific tech on implementation. > > BP33 > > The reference to the data usage should not be part of the bp yet because > work in progress. A note may be more appropriate at this stage. > > > GENERAL > > - Several "how to test sections" are a little bit weak from my auditor > perspective (i.e. explained in a way that it is difficult to test or not > objective enough to ensure two different test by different people will > raise similar results) e.g. BP1; BP5; BP13; BP21; BP23; BP25; BP32; BP33 In > any case, that's something to review once the content of the BPs is more > stable. > > - URIs/IRIs is used inconsistently around the document. Suggest to use > always URIs for the shake of consistency and simplicity. > > > That's all folks! > > Best, > CI. > --- > > Carlos Iglesias. > Open Data Consultant. > +34 687 917 759 > contact@carlosiglesias.es > @carlosiglesias > http://es.linkedin.com/in/carlosiglesiasmoro/en > -- Phil Archer W3C Data Activity Lead http://www.w3.org/2013/data/ http://philarcher.org +44 (0)7887 767755 @philarcher1
Received on Thursday, 22 January 2015 14:45:12 UTC