Re: Request for pre-review of Linked Data Glossary from Dave Reynolds on 2013-03-20 (public-gld-wg@w3.org from March 2013)

From: Dave Reynolds <dave.e.reynolds@gmail.com>
Date: Wed, 20 Mar 2013 11:40:06 +0000
To: Ghislain Atemezing <auguste.atemezing@eurecom.fr>
CC: public-gld-wg@w3.org, Bernadette Hyland <bhyland@3roundstones.com>
Message-ID: <5149A016.2050104@gmail.com>
Hi Ghislain,

On 20/03/13 10:35, Ghislain Atemezing wrote:
> Hi Dave,
> Thanks again for all your suggestions...
> Many of them have been updated to the Glossary see the actual version
> here [1]
>> Suggestions for the remainder ...
>>
>>  > 39. Hash URI Strategy
>>
>> Suggested rewrite:
>>
>> [[[
>> Hash URI Pattern
>>
>> In creating and publishing Linked Data a key design decision is the
>> pattern of URIs to use for the resources in the data. One aspect of that
>> decision is whether to use "hash" URIs (URIs which end in a '#fragid'
>> fragment identifier) or "slash" URIs (no fragment identifier). Hash URIs
>> offer a simple way to separate the URI for the thing from the URL for a
>> data document describing the thing. They are convenient when publishing
>> small files of resources (e.g. small vocabularies) but limit
>> implementation options and extensibility (because the fragment
>> identifier is never seen by the data server).
>> See also [Slash URI Pattern]
>> ]]]
>
> In the last sentence, is there any reference to refer to? Or it is
> widely known?

Which part?

The fact that a server never sees the fragment ID so you can't make use 
of it in serving your response is part of the specs.

I don't have a neat reference which expands on all the consequences of 
that which affect how you can serve responses from e.g. a triple store. 
Though there are a *lot* of emails on the http-range-14 issue which 
touch on the issues of use of fragids :)

I'd be happy with simply omitting the last sentence if it is 
controversial or omitting both entries entirely.

>>  > 48. Linked Data
>>
>> Suggested rephrase:
>>
>> [[[
>> Linked data refers to a set of best practices for creating, publishing
>> and announcing structured data on the Web. See [Linked Data Principles].
>> Linked Data typically makes use of the RDF family of standards for data
>> interchange (RDF/XML, Turtle) and query (SPARQL). Linked Data can be
>> published by a person or organization behind the firewall or on the
>> public Web. If Linked Data is published on the public Web, it is
>> generally called Linked Open Data.
>> ]]]
>
> Done, with a slight modification in the 3rd sentence.
> [[Linked Data is *not* the same as RDF, rather Linked Data uses the RDF
> family of standards for data interchange ( RDF/XML, N3, Turtle and
> N-Triples) and query (SPARQL).]]

My rephrase was specifically designed to replace that sentence :)

First there's something about the bolding of *not* that annoys me but 
that's OK, I'm easily annoyed.

Second, N3 is a not a standard so shouldn't be in that list of standards.

Thirdly, N-Triples was originally designed for test cases and not a 
normative format for interchange. That is probably changing (I'm not 
following RDF 1.1) so I guess I don't really object to that being in there.

>>  > 50. Linked Data Principles
>>
>> Suggest deleting the last sentence, viz:
>>
>> [[[
>> Linked Data Principles provide a common API for data on the Web which is
>> more convenient than many separately and differently designed APIs
>> published by individual data suppliers.
>> ]]]
>>
>> I understand what it's saying, and there's some truth in there. But
>> follow-your-nose linked data is not a sufficient API. If it were we
>> wouldn't need the Linked Data Platform and LDA.
>>
>
> I don't understand your point here. Maybe rephrasing what you understand
> and pointing out the difference/link wiht Linked Data Platform may be
> helpful. WDYT?

I would prefer to simply delete it, but if someone wants to suggest a 
better phrasing then I'd be happy to look at it.

>>  > 55. Machine Readable Data
>>
>> This doesn't seem to actually say what Machine Readable Data is, maybe
>> that's too obvious.
>>
>> The example talks about machine and human readable data from the same
>> page but uses *different* pages (wikipedia v. dbpedia).
>>
>> How about:
>>
>> [[[
>> Machine readable data is data that is available in a format which a
>> machine can usefully interpret and process. For example, if a set of
>> figures is given in a table in a PDF file or an HTML page then it can be
>> transmitted and displayed but can't be easily processed. Screen-scraping
>> techniques may be able to reconstruct the tabular data from the
>> formatted page but they are fragile and inconvenient. For this reason
>> publishing data in a machine readable format qualifies for two-stars on
>> the 5-star scale.
>> ]]]
>
> So, if we rephrase this, do we need to add terms like "Scrapping",
> "Screen scrapping" or "Data aggregation" ?

I don't think so. You could link to the Wikipedia entry if you want to 
expand "screen scrapping": http://en.wikipedia.org/wiki/Data_scraping

>>  > 76. Raw Data
>>
>> That's very contentious, suggest dropping this entry.
>>
>> [The issue is that to a statistician "raw data" is observed data that
>> has not yet been aggregated, analysed and validated. In that world
>> releasing raw data without extreme care to qualify it is correctly
>> regarded as professional bad practice. Given how much government data
>> falls in the area of statistics then a huge amount of confusion,
>> antagonism and justified horror was caused by the cry of "raw data now".
>> Government statistical authorities go to a lot of trouble and have legal
>> statutory obligations on what data standards have to be met by releases
>> of statistical data.]
>
> Maybe we can just change it "Source Data" as it is in the BP doc? See [2]

I guess "source data" would defuse my issue with this one though the 
entry would still need some improvement - "from the wilderness" doesn't 
seem like an appropriate phrasing.

My preference would be to delete, I don't know who would be looking up 
"source data" in a glossary, but would be OK with a rephrase.

>>  > 90. Schema
>>
>> Given how much people confuse schemas, logical data models and
>> ontologies then I'm not happy with the statement that an ontology is a
>> form a schema.
>>
>> However, I don't know enough about why schema is on this list and what
>> you mean by schema in this context to offer a rephrase.
>>
>> Any chance of just dropping it?
>
> I thing it could remains and can justify the different "schemas" (e.g.
> in UML) we have sometimes in the documentation of our vocabulary.

Those aren't schemas. Just because something uses UML (or UML-like) 
notation doesn't make it a schema.

> What about this one:

> [[ Schema:
> A data model that represents the relationships between a set of
> concepts, [e.g. using UML diagram].  Some types of schemas include
> relational database schemas.]]

That doesn't especially click for me, especially the reference to UML.
However, I won't object to it.

Cheers,
Dave
Received on Wednesday, 20 March 2013 11:40:36 UTC