Re: GLD WG Best Practices document review.

Thanks Marios,
Very much appreciate your feedback. I  apologize that you had to address simple typos, no excuse other than we've all been busy.  

Boris, Ghislain & I will coordinate folding in your proposed changes.  Thank you again.

Cheers,

Bernadette Hyland, co-chair 
W3C Government Linked Data Working Group
Charter: http://www.w3.org/2011/gld/

On Mar 18, 2013, at 3:08 PM, Marios Meimaris <m.meimaris@medialab.ntua.gr> wrote:

> Hello Bernadette, all,
> 
> as promised in last Thursday's meeting, I have reviewed the Best Practices [1] document and will indicate errors as well as possible shortcomings, and provide suggestions. I will follow the document's structure of sections to go through my comments, and use the notation "SUGGESTION (Suggestion_number)" for future reference on my suggestions and proposals (sorry for the lengthy email).
> 
> 
> 
> Audience
> 
>> Readers of this document are expected to be familiar with fundemental Web technologies
> SUGGESTION (1): Fix typo "fundemental" to "fundamental"
> 
> Scope
> 
>>  Linked Data utilizes the Resource Description Framework (RDF)
> SUGGESTION (2) : Add a fullstop "."
> 
> --------------------
> 
>> Linked Data can be published by an person or organization behind the firewall or on the public Web.
> SUGGESTION (3): Change "an" to "a"
> 
> 
> 1. Summary of Best Practices
> 
>> Put aside immediate needs of any given application and model the data.
> SUGGESTION (4): The phrase "immediate needs of any given application" is a little bit misleading in the sense that it can make the reader assume that the motivation for publishing a LODset is to power a specific application. 
> Rephrase to "Model the data in an application-independent, objective way in terms of representation" ?
>                                                                                            
> Linked Open Data Lifecycle
> 
> Caption the figures and provide more thorough descriptions (I will propose some).
> 
>    
>> 
>> Hyland et al. propose a Linked Data creation process that consists in the following steps: (1) Identify, (2) Model, (3) Name, (4) Describe, (5) Convert, (6) Publish, and (7) Mantain.
>> Hausenblas et al. propose Linked Data life cycles that consist in the following steps: (1) data awareness, (2) modeling, (3) publishing, (4) discovery, (5) integration, and (6) use cases.
> SUGGESTION (5): I think in this case "consists of" is more appropriate than "consists in"
> 
> --------------------
>> Villazón-terrazas et al. claim that the process of publishing Government Linked Data must have a life cycle, in the same way of Software Engineering, in which every development project has a life cycle. According to our experience this process has an iterative incremental life cycle model, which is based on the continuous improvement and extension of the Government Linked Data resulted from performing several iterations.
> SUGGESTION (6): This description is not consistent with the other two Lifecycle descriptions and it seems as if it is being reused from another document. It is more descriptive than the previous two, and uses "our" in "According to our experience". This should be fixed to something like this:
> "Villazón-terrazas et al. propose a Linked Data life cycle that consists of the following steps: (1) Specify, (2) Model, (3) Generate, (4) Publish and (5) Exploit."
> 
> --------------------
> 
> SUGGESTION (7): Add an introductory paragraph that describes what a Linked Data lifecycle is and why it is needed.
> Something along the lines of:
> "The process of publishing  Government Linked Open Data should be comprised of tractable and manageable steps, forming a life cycle in the same way Software Engineering uses life cycles in development projects. A GLD life cycle should cover all steps from identifying appropriate datasets to actually publishing and maintaining them. In the following paragraph three different life cycle models are presented, however it is evident that they all share common (and sometimes overlapping) characteristics in their constituents. For example, they all identify the need to specify, model and publish data in acceptable LOD formats. In essence, they capture the same tasks that are needed in the process, but provide different boundaries between these tasks."
> 
> SUGGESTION (8): Caption the three images using the following format: "The LD life cycle propose by..."
> 
> 
> 2. Vocabulary Selection
> 
> Vocabulary Discovery Checklist
> 
>> If you have raw data in CSV, the columns of the tables can be used for the searching process. 
> SUGGESTION (9): Remove this sentence perhaps? Is this really useful? Modellers will usually know about the domain in question and its keywords, and perhaps it is not very helpful to suggest such an approach to people who are assumed to be experts, or at least have a fair amount of tech background.
> 
> --------------------
> SUGGESTION (10):  Add section to point people at searching through relevant scientific publications in google scholar etc. ? I know of     a lot of domain modellers (myself included) that use scientific publication search engines to identify relevant work, vocabularies etc.
> --------------------
> 
> Vocabulary Selection Criteria
> 
>>  Ensure vocabularies have permanent URI
>  SUGGESTION (11): change "URI" to "URIs"
> 
> --------------------
>> Vocabularies must be documented
>> What it means: A vocabulary must be documented. This includes the liberal use of labels and comments; tags to language used. Human-readable pages must be provided by the publisher describe the classes and properties, preferably with use cases defined.
> 
> SUGGESTION (12): Rephrase "This includes the liberal use of labels and comments; tags to language used" to  "This includes the liberal use of labels and comments, as well as appropriate language tags."
> 
> SUGGESTION (13): Rephrase "Human-readable pages must be provided by the publisher describe the classes and properties, preferably with use cases defined." to
>         "The publisher must provide human-readable pages that describe the vocabulary, along with its constituent classes and properties. Preferably, easily comprehensible use-cases should be defined and documented."
> 
> SUGGESTION (14): The second and third boxes ("Vocabularies SHOULD be self-descriptive" and "Vocabularies SHOULD be described in more than one language") could be merged and presented as sub-points/specifications of the first box ("Vocabularies MUST be documented")
> 
> --------------------
>> Vocabularies should be used by other data sets
>> What it means: If the vocabulary is used by other authoritative Linked Open Data sets that is helpful. It is in re-use of vocabularies that we achieve the benefits of Linked Open Data. For example: An analysis on the use of vocabularies on the Linke Data cloud reveals that FOAF is reused by more than 55 other vocabularies.
> 
> SUGGESTION (15): Fix typo: "Linke Data" to "Linked Data"
> 
> SUGGESTION (16): Add "Selected vocabularies from third parties should be already in use by other data sets, as shows that they are already established in the LOD community, and thus better candidates for wider adoptation and reuse." before the example.
> --------------------
> 
> Vocabulary Creation
> 
> SUGGESTION (17): Make sub-sections consistent with previous sub-section ("Vocabulary Selection Criteria"). Order should be the same. Titles should match. Eg: "Vocabularies MUST (or should?) be documented." instead of "Vocabularies should provide documentations". Also, use suggestion 16.
> --------------------
> 
> Multilingual Vocabularies
>> As a set of ontology + lexicon. This represent the latest trend in the representation of linguistic (multilingual) information...
> SUGGESTION (18): Fix typo "This represent..." to "This represents..."
>     
> --------------------
>     
> 3. URI Construction
> 
>> The following guidance is providing with respect to creating or sometimes called "minting" URIs for vocabularies, concepts, and datasets. 
> SUGGESTION (19):  Rephrase to "The following guidance is provided with the intention to address URI minting, i.e. URI creation for vocabularies, concepts and datasets." 
> 
> --------------------               
> 
> URI Design Principles
> 
>   
>> 
>> The global scope of URIs promotes large-scale "network effects", in order to benefit from the value of Linked Data government and governmental agencies need to identify their resources using URIs.
> SUGGESTION (20): Rephrase to "The global scope of URIs promotes large-scale "network effects". Therefore, in order to benefit from     the value of LD, government and governmental agencies need to identify their resources using URIs."
> 
>         
>> This section provides a set of general principles aimed at helping to government stakeholders to define and manage URIs for their resources.
> SUGGESTION (21): Fix typo: remove "to" in "aimed at helping to government stakeholders".
>       
>> Provide at least one machine-readable representation of the resource identified by the URI
>> What it means: In order to enable HTTP URIs to be "dereferenced", data publishers have to set up the neccesary infraestructure elements (e.g. TCP-based HTTP servers) to serve representation...
> SUGGESTION (22): Fix typo "[...]neccesary infraestructure[...]" to "[...]necessary infrastructure[...]" (both words have errors).
> 
>   --------------------              
>> 
>> Compliance with http-range-14
>> The World Wide Web Consortium's (W3C) Technical Architecture Group (TAG) attempted to settle a long standing debate about the use of URL resolution on 15 June 2005. Specifically, they decided: The TAG provides advice to the community that they may mint "http" URIs for any resource provided that they follow this simple rule for the sake of removing ambiguity:
> SUGGESTION (23):  Put following three bullets and remaining text inside box.
>  
> --------------------              
> 
> 4. URI Policy for Persistence
> 
>> A Persistent URL is an address on the World Wide Web that causes a redirection to another Web resource. If a Web resource changes location (and hence URL), a PURL pointing to it can be updated. 
> SUGGESTION (24): Add introduction of acronym: "A Persistent URL (PURL) is an address..."
> 
> --------------------              
> 
> 5. Internationalized Resource Identifiers 
> 
> SUGGESTION (25): Add term "IRI" to LD Glossary.
> --------------------              
> 
>> IRI (RFC 3987) is a new protocol element, that represents a complement to the Uniform Resource Identifier (URI). An IRI is a sequence of characters from the Universal Character Set (Unicode/ISO 10646) that can be therefore be used to mint identifiers that use a wider set of characters than the one defined in RFC 3986.
> SUGGESTION (26): Fix typo "...that can be therefore be used to mint..." to "...that can therefore be used to mint..." (remove excess "be")
> 
>  --------------------             
>> 
>> Althought there exist some standards focused on enabling the use of international characters in Web identifiers,
> SUGGESTION (26): Fix typo "Althought" to "Although"
> 
> --------------------             
> 
>> This section is not meant to be exhaustive and we point the interested audience to An Introduction to Multilingual Web Addresses
> SUGGESTION (27): Add link.
> 
> --------------------             
> 
> 7. Security and Hosting    
> 
>> Describe how you plan to host the data (e.g., cloud, agency data center), implementation timelines
> SUGGESTION (28): Rephrase to "Describe how you plan to host the data (e.g., cloud, agency data center). Provide implementation timelines."
> 
> --------------------             
>> These are typically comprised of several layers, such as physical facility security, network and communications, to considerations of operating system, software, integration and many other elements. 
> SUGGESTION (29): Rephrase to "These are typically comprised of several layers, that can range from physical facility security, network and communications, to considerations of operating system, software, integration and many other elements. "
> 
> --------------------             
> 
> 8. Publishers' "Social Contract"
>  
>> 
>> Here is a summary of best practices that relate to the implicite "social contrac
> SUGGESTION (30): Fix typo "implicite" to "implicit"
> 
> --------------------             
> 
>> Giving due consideration your organization's URI strategy should be one of the first activities your team undertakes as they prepare a         Linked Open Data strategy. 
> SUGGESTION (31): Fix "Giving due consideration your organization's URI strategy...." to "Giving due consideration *to* your organization's URI strategy...."
>     
> 
> 
> That's all for now, sorry again for the lengthy email.
> 
> Kind regards,
> Marios Meimaris

Received on Monday, 18 March 2013 21:22:45 UTC