GLD WG Best Practices document review. from Marios Meimaris on 2013-03-18 (public-gld-wg@w3.org from March 2013)

From: Marios Meimaris <m.meimaris@medialab.ntua.gr>
Date: Mon, 18 Mar 2013 21:08:13 +0200
To: Bernadette Hyland <bhyland@3roundstones.com>, public-gld-wg@w3.org
Message-ID: <5147661D.8070002@medialab.ntua.gr>
Hello Bernadette, all,

as promised in last Thursday's meeting, I have reviewed the Best 
Practices [1] document and will indicate errors as well as possible 
shortcomings, and provide suggestions. I will follow the document's 
structure of sections to go through my comments, and use the notation 
"SUGGESTION (Suggestion_number)" for future reference on my suggestions 
and proposals (sorry for the lengthy email).



*Audience*

> Readers of this document are expected to be familiar with fundemental 
> Web technologies
SUGGESTION (1): Fix typo "/fundemental/" to "/fundamental/"

*Scope*

> Linked Data utilizes the Resource Description Framework (RDF)
SUGGESTION (2) : Add a fullstop "."

--------------------

> Linked Data can be published by an person or organization behind the 
> firewall or on the public Web.
SUGGESTION (3): Change "/an/" to "/a/"


*1. Summary of Best Practices*

> Put aside immediate needs of any given application and model the data.
SUGGESTION (4): The phrase "/immediate needs of any given application/" 
is a little bit misleading in the sense that it can make the reader 
assume that the motivation for publishing a LODset is to power a 
specific application.
Rephrase to "/Model the data in an application-independent, objective 
way in terms of representation/" ?

*Linked Open Data Lifecycle*

Caption the figures and provide more thorough descriptions (I will 
propose some).

> Hyland et al. propose a Linked Data creation process that consists in 
> the following steps: (1) Identify, (2) Model, (3) Name, (4) Describe, 
> (5) Convert, (6) Publish, and (7) Mantain.
> Hausenblas et al. propose Linked Data life cycles that consist in the 
> following steps: (1) data awareness, (2) modeling, (3) publishing, (4) 
> discovery, (5) integration, and (6) use cases.
SUGGESTION (5): I think in this case "/consists of/" is more appropriate 
than "/consists in/"

--------------------
> Villazón-terrazas et al. claim that the process of publishing 
> Government Linked Data must have a life cycle, in the same way of 
> Software Engineering, in which every development project has a life 
> cycle. According to our experience this process has an iterative 
> incremental life cycle model, which is based on the continuous 
> improvement and extension of the Government Linked Data resulted from 
> performing several iterations.
SUGGESTION (6): This description is not consistent with the other two 
Lifecycle descriptions and it seems as if it is being reused from 
another document. It is more descriptive than the previous two, and uses 
"/our/" in "/According to our experience/". This should be fixed to 
something like this:
"/Villazón-terrazas et al.//propose a Linked Data life cycle that 
consists of the following steps: (1) Specify, (2) Model, (3) Generate, 
(4) Publish and (5) Exploit./"

--------------------

SUGGESTION (7): Add an introductory paragraph that describes what a 
Linked Data lifecycle is and why it is needed.
Something along the lines of:
"/The process of publishing  Government Linked Open Data should be 
comprised of tractable and manageable steps, forming a life cycle in the 
same way Software Engineering uses life cycles in development projects. 
A GLD life cycle should cover all steps from identifying appropriate 
datasets to actually publishing and maintaining them. In the following 
paragraph three different life cycle models are presented, however it is 
evident that they all share common (and sometimes overlapping) 
characteristics in their constituents. For example, they all identify 
the need to specify, model and publish data in acceptable LOD formats. 
In essence, they capture the same tasks that are needed in the process, 
but provide different boundaries between these tasks./"

SUGGESTION (8): Caption the three images using the following format: 
"/The LD life cycle propose by.../"


*2. Vocabulary Selection*

*Vocabulary Discovery Checklist*

> If you have raw data inCSV 
> <https://dvcs.w3.org/hg/gld/raw-file/default/glossary/index.html#csv>, 
> the columns of the tables can be used for the searching process.
SUGGESTION (9): Remove this sentence perhaps? Is this really useful? 
Modellers will usually know about the domain in question and its 
keywords, and perhaps it is not very helpful to suggest such an approach 
to people who are assumed to be experts, or at least have a fair amount 
of tech background.

--------------------
SUGGESTION (10):  Add section to point people at searching through 
relevant scientific publications in google scholar etc. ? I know of a 
lot of domain modellers (myself included) that use scientific 
publication search engines to identify relevant work, vocabularies etc.
--------------------

*Vocabulary Selection Criteria*

>   *  Ensure vocabularies have permanent URI
>
  SUGGESTION (11): change "/URI/" to "/URIs/"

--------------------
> *Vocabularies/must/be documented*
> /What it means:/A vocabulary/must/be documented. This includes the 
> liberal use of labels and comments; tags to language used. 
> Human-readable pages must be provided by the publisher describe the 
> classes and properties, preferably with use cases defined.

SUGGESTION (12): Rephrase "/This includes the liberal use of labels and 
comments; tags to language used/" to  "/This includes the liberal use of 
labels and comments/, /as well as appropriate language tags./"

SUGGESTION (13): Rephrase "/Human-readable pages must be provided by the 
publisher describe the classes and properties, preferably with use cases 
defined./" to
         "/The publisher must provide human-readable pages that describe 
the vocabulary, along with its constituent classes and properties./ 
/Preferably, easily comprehensible use-cases should be defined and 
documented./"

SUGGESTION (14): The second and third boxes ("/Vocabularies SHOULD be 
self-descriptive/" and "/Vocabularies SHOULD be described in more than 
one language/") could be merged and presented as 
sub-points/specifications of the first box ("/Vocabularies MUST be 
documented/")

--------------------
> *Vocabularies/should/be used by other data sets*
> /What it means:/If the vocabulary is used by other authoritative 
> Linked Open Data sets that is helpful. It is in re-use of vocabularies 
> that we achieve the benefits of Linked Open Data. For example: An 
> analysis on theuse of vocabularies 
> <http://stats.lod2.eu/vocabularies>on the Linke Data cloud reveals 
> thatFOAF <http://xmlns.com/foaf/0.1>is reused by more than 55 other 
> vocabularies.

SUGGESTION (15): Fix typo: "/Linke Data/" to "/Linked Data/"

SUGGESTION (16): Add "/Selected vocabularies from third parties should 
be already in use by other data sets, as shows that they are already 
established in the LOD community, and thus better candidates for wider 
adoptation and reuse./" before the example.
--------------------

*Vocabulary Creation*

SUGGESTION (17): Make sub-sections consistent with previous sub-section 
("Vocabulary Selection Criteria"). Order should be the same. Titles 
should match. Eg: "/Vocabularies MUST (or should?) be documented./" 
instead of "//Vocabularies should provide documentation/s/*"*. Also, use 
suggestion 16.
--------------------

*Multilingual Vocabularies*
>
>   * As a set of ontology + lexicon. This represent the latest trend in
>     the representation of linguistic (multilingual) information...
>
SUGGESTION (18): Fix typo "/This represent.../" to "/This represents.../"

--------------------

*3. URI Construction*

> The following guidance is providing with respect to creating or 
> sometimes called "minting" URIs for vocabularies, concepts, and datasets.
SUGGESTION (19):  Rephrase to "/The following guidance is provided with 
the intention to address URI minting, i.e. URI creation for 
vocabularies, concepts and datasets." /

--------------------
*
URI Design Principles*

> The global scope of URIs promotes large-scale "network effects", in 
> order to benefit from the value of Linked Data government and 
> governmental agencies need to identify their resources using URIs.
SUGGESTION (20): Rephrase to "The global scope of URIs promotes 
large-scale "network effects". Therefore, in order to benefit from the 
value of LD, government and governmental agencies need to identify their 
resources using URIs."


> This section provides a set of general principles aimed at helping to 
> government stakeholders to define and manage URIs for their resources.
SUGGESTION (21): Fix typo: remove "/to/" in "/aimed at helping to 
government stakeholders/".

> *Provide at least one machine-readable representation of the resource 
> identified by the URI*
> What it means: In order to enable HTTP URIs to be "dereferenced", data 
> publishers have to set up the neccesary infraestructure elements (e.g. 
> TCP-based HTTP servers) to serve representation... 
SUGGESTION (22): Fix typo "/[...]neccesary infraestructure[...]/" to 
"/[...]necessary infrastructure[...]/" (both words have errors).

   --------------------
> *Compliance with http-range-14*
> The World Wide Web Consortium's (W3C) Technical Architecture Group 
> (TAG) attempted to settle a long standing debate about the use of URL 
> resolution on 15 June 2005. Specifically, they decided: The TAG 
> provides advice to the community that they may mint "http" URIs for 
> any resource provided that they follow this simple rule for the sake 
> of removing ambiguity:
SUGGESTION (23):  Put following three bullets and remaining text inside box.

--------------------

*4. URI Policy for Persistence*

> A Persistent URL is an address on the World Wide Web that causes a 
> redirection to another Web resource. If a Web resource changes 
> location (and hence URL), a PURL pointing to it can be updated.
SUGGESTION (24): Add introduction of acronym: "/A Persistent URL (PURL) 
is an address.../"

--------------------

*5. Internationalized Resource Identifiers *

SUGGESTION (25): Add term "IRI" to LD Glossary.
--------------------

> IRI (RFC 3987 <http://tools.ietf.org/html/rfc3987>) is a new protocol 
> element, that represents a complement to the Uniform Resource 
> Identifier (URI). An IRI is a sequence of characters from the 
> Universal Character Set (Unicode/ISO 10646) that can be therefore be 
> used to mint identifiers that use a wider set of characters than the 
> one defined inRFC 3986 <http://tools.ietf.org/html/rfc3986>.
SUGGESTION (26): Fix typo "/...that can be therefore be used to 
mint.../" to "/...that can therefore be used to mint.../" (remove excess 
"/be/")

  --------------------
> Althought there exist some standards focused on enabling the use of 
> international characters in Web identifiers,
SUGGESTION (26): Fix typo "/Althought/" to "/Although/"

--------------------

> This section is not meant to be exhaustive and we point the interested 
> audience toAn Introduction to Multilingual Web Addresses
SUGGESTION (27): Add link.

--------------------
*
7. Security and Hosting *

> Describe how you plan to host the data (e.g., cloud, agency data 
> center), implementation timelines
SUGGESTION (28): Rephrase to "/Describe how you plan to host the data 
(e.g., cloud, agency data center). Provide implementation timelines./"

--------------------
> These are typically comprised of several layers, such as physical 
> facility security, network and communications, to considerations of 
> operating system, software, integration and many other elements.
SUGGESTION (29): Rephrase to "/These are typically comprised of several 
layers, that can range from physical facility security, network and 
communications, to considerations of operating system, software, 
integration and many other elements. /"

--------------------

*8. Publishers' "Social Contract"*
> Here is a summary of best practices that relate to the implicite 
> "social contrac
SUGGESTION (30): Fix typo "/implicit//e/" to "/implicit/"

--------------------

> Giving due consideration your organization's URI strategy should be 
> one of the first activities your team undertakes as they prepare a 
> Linked Open Data strategy.
SUGGESTION (31): Fix "/Giving due consideration your organization's URI 
strategy....//"/ to "/Giving due consideration *to* your organization's 
URI strategy....//"/



That's all for now, sorry again for the lengthy email.

Kind regards,
Marios Meimaris
Received on Monday, 18 March 2013 19:09:01 UTC