W3C home > Mailing lists > Public > public-gld-wg@w3.org > March 2013

Re: Request for pre-review of Linked Data Glossary

From: Dave Reynolds <dave.e.reynolds@gmail.com>
Date: Tue, 19 Mar 2013 22:40:11 +0000
Message-ID: <5148E94B.6090108@gmail.com>
To: public-gld-wg@w3.org
Suggestions for the remainder ...

 > 39. Hash URI Strategy

Suggested rewrite:

Hash URI Pattern

In creating and publishing Linked Data a key design decision is the 
pattern of URIs to use for the resources in the data. One aspect of that 
decision is whether to use "hash" URIs (URIs which end in a '#fragid' 
fragment identifier) or "slash" URIs (no fragment identifier). Hash URIs 
offer a simple way to separate the URI for the thing from the URL for a 
data document describing the thing. They are convenient when publishing 
small files of resources (e.g. small vocabularies) but limit 
implementation options and extensibility (because the fragment 
identifier is never seen by the data server).
See also [Slash URI Pattern]

In which case also add:

Slash URI Pattern

In creating and publishing Linked Data a key design decision is the 
pattern of URIs to use for the resources in the data. One aspect of that 
decision is whether to use "hash" URIs (URIs which end in a '#fragid' 
fragment identifier) or "slash" URIs (no fragment identifier). Slash 
URIs provide maximum flexibility since the data server will see the full 
URI when it is dereferenced.

 > 43. Internet Engineering Task Force (IETF)



s/had defined/defines and maintains/

 > 44. Inference

Suggested rephrase:

s/To infer something is to create a new relationship./Inference is the 
process of deriving logical conclusions from a set of starting assumptions./

 > 48. Linked Data

Suggested rephrase:

Linked data refers to a set of best practices for creating, publishing 
and announcing structured data on the Web. See [Linked Data Principles]. 
Linked Data typically makes use of the RDF family of standards for data 
interchange (RDF/XML, Turtle) and query (SPARQL). Linked Data can be 
published by a person or organization behind the firewall or on the 
public Web. If Linked Data is published on the public Web, it is 
generally called Linked Open Data.

 > 50. Linked Data Principles

Suggest deleting the last sentence, viz:

Linked Data Principles provide a common API for data on the Web which is 
more convenient than many separately and differently designed APIs 
published by individual data suppliers.

I understand what it's saying, and there's some truth in there. But 
follow-your-nose linked data is not a sufficient API. If it were we 
wouldn't need the Linked Data Platform and LDA.

 > 52. Linked Open Data Cloud


s/datasets/interconnected datasets/

I think that's a defining feature of the LOD Cloud.

 > Linking Open Data Project

I think this is supposed to be a new entry but is currently a run-on 
paragraph of "Linking Government Data"

 > 55. Machine Readable Data

This doesn't seem to actually say what Machine Readable Data is, maybe 
that's too obvious.

The example talks about machine and human readable data from the same 
page but uses *different* pages (wikipedia v. dbpedia).

How about:

Machine readable data is data that is available in a format which a 
machine can usefully interpret and process. For example, if a set of 
figures is given in a table in a PDF file or an HTML page then it can be 
transmitted and displayed but can't be easily processed. Screen-scraping 
techniques may be able to reconstruct the tabular data from the 
formatted page but they are fragile and inconvenient. For this reason 
publishing data in a machine readable format qualifies for two-stars on 
the 5-star scale.

 > 63. Ontology

Suggested rewrite:

An ontology is a formal model of a domain. It describes the types of 
things that exist (classes), the relationships between them (properties) 
and the logical ways those classes and properties can be used together 
(axioms). The OWL (Web Ontology Language) family of languages provide a 
standardized-means for expressing and exchanging ontologies. It builds 
upon, and is compatible with, RDF.

 > 66. Open World

s/external work/external world/

 > 69. Persistent Identifier Scheme

Has a spurious ">"

 > 70. Predicate

Suggested rephrase:

The predicate is the second part of an RDF statement and gives the 
property which connects the subject of the statement to the object of 
the statement. Thus in the informal statement "Alice knows Bob" then 
"knows" is the predicate which connects "Alice" (the subject of the 
statement) to "Bob" (the object of the statement). The term predicate 
derives from predicate calculus. In RDF we use the terms predicate (for 
the role) and property (for the thing that plays that role) regardless 
of whether the value of the property is a simple literal or some other 

 > 74. Quad Store

Suggest adding:

This notion has been clarified and standardized in SPARQL in the form of 
/RDF Datasets/.

 > 76. Raw Data

That's very contentious, suggest dropping this entry.

[The issue is that to a statistician "raw data" is observed data that 
has not yet been aggregated, analysed and validated. In that world 
releasing raw data without extreme care to qualify it is correctly 
regarded as professional bad practice. Given how much government data 
falls in the area of statistics then a huge amount of confusion, 
antagonism and justified horror was caused by the cry of "raw data now". 
Government statistical authorities go to a lot of trouble and have legal 
statutory obligations on what data standards have to be met by releases 
of statistical data.]

 > 83. RDF-JSON

That reference is *not* a W3C recommendation (to my great sadness) it is 
an editor's draft which I don't think is progressing anywhere. Instead 
W3C is doing json-ld.

Probably the best thing is to drop this entry.

[Though part of me wants it to stay in just as it is :)]

 > 84. RDF Schema

s/schema language/vocabulary language/

 > 90. Schema

Given how much people confuse schemas, logical data models and 
ontologies then I'm not happy with the statement that an ontology is a 
form a schema.

However, I don't know enough about why schema is on this list and what 
you mean by schema in this context to offer a rephrase.

Any chance of just dropping it?

 > 94. Semantic Web Standards


 > 99. SPARQL

The SPARQL 1.1 reference link is broken.

 > 106. Triple

s/a verb, or//

[A predicate is not necessarily a verb-like-thing, it is at least as 
often an adjective-like-thing.]

[Technically a triple is not the "smallest possible RDF graph". An RDF 
graph is defined as a set of statements and the smallest set is the 
empty set. But that probably sounds like splitting hairs so I don't mind 
that bit staying :)]

 > 109. Turtle



On 19/03/13 18:04, Dave Reynolds wrote:
> Suggestions for first 40 below, which may duplicate others.
>  > 1. 5 Star Linked Data
> Drop final sentence and mug picture, that's out of place here.
>  > 2. 5 Star Linked Data Diagram
> Drop this entry.
>  > 8. CKAN
> Drop this. I agree with James, you would need to either reference *all*
> relevant software packages or none. None seems better.
>  > 9. Closed World
> s/external work/external world/
>  > 13. Controlled Vocabularies
> Suggested rephrase:
> [[[
> A controlled vocabulary is a selected set of terms that can be used to
> index, tag or describe units of information. By providing a restricted
> and managed set of terms they can be used to reduce ambiguity in
> information systems. Such vocabularies may be unstructured (e.g. code
> lists) or may be organized into increasingly complex knowledge
> organization schemes (taxonomies, thesauri, ontologies). In traditional
> settings the terms in the controlled vocabularies are words or phrases,
> in a linked data setting then they are normally assigned unique
> identifiers (URIs) which in turn link to descriptive phrases.
> ]]]
>  > 17. D2RQ
>  > 18. D2RQ Platform
>  > 19. D2RQ Mapping Language
> Given that there is now a W3C standard for this is seems more
> appropriate to reference that instead. Delete these and insert:
> [[[
> <a href="http://www.w3.org/TR/r2rml/">R2RML</a> (RDB to RDF Mapping
> Language) is a language for expressing customized mappings from
> relational databases to RDF datasets. Such mappings provide the ability
> to view existing relational data in the RDF data model, expressed in a
> structure and target vocabulary of the mapping author's choice.
> ]]]
>  > 20. Database to RDF Queueing
> Drop.
> [Or if not drop then s/Queueing/Querying/ and change the reference to be
> to R2RML.]
>  > 23. Data Market
> This entry makes no sense to me. I wonder if it was suppose to be "Data
> Mart"?
> Suggest drop.
> If you really meant "Data Market" then suggest entry:
> [[[
> A Data Market or Data Marketplace is an online (broker) service to
> enable discovery and access to a large collection of datasets offered by
> a range of data providers. Examples include Infochimps, Azure
> Marketplace and Factual.  Data Marketplaces may include open as well as
> paid-for data, and may offer value added services such as APIs and
> visualizations as well as pure data access.
> ]]]
>  > 24. Data Warehouse
> Hmmm. Possible rewrite:
> [[[
> A Data Warehouse is one approach to data integration in which data from
> various operational data systems is extracted, cleaned, transformed and
> copied to a centralized repository. The centralized repository can then
> be used for data mining or answering analytical queries.
> ]]]
> That rewrite misses out the red-rag-to-a-bull comment on how Linked Data
> is an alternative. The story there is a lot more complex than the
> existing entry suggests. If you really want something to about the
> relationship to linked data then that will take rather more work to
> phrase just right.
>  > 27. Description Logic
> Reads a bit OW1-ish. Maybe replace:
> [[[
> Two variants of the Web Ontology Language (OWL), specifically OWL Lite
> and OWL DL are based on Description Logic.
> ]]]
> with
> [[[
> The Web Ontology Language (OWL) provides a standards-based way to
> exchange ontologies and includes a Description Logic semantics as well
> as an RDF based semantics.
> ]]]
>  > 28. Descriptor Resource
> Suggest dropping. Doesn't seem like common usage.
>  > 30. Directed Graph
> Suggest s/differentiated/labelled/
>  > 33. Dublin Core Element Set
> I tend to think of DC Elements as referring to 1.0 and DC Terms to 1.1.
> Certainly the page you link to is called "dmci-terms".
> Suggest s/Element Set/Metadata Terms/ in both title and body.
>  > 37. Free/Libre/Open Source Software
> s/Sour/Source/
> s/Sourceforge is a public repository of such software.//
> [Either mention all or mention none]
>  > 39. Hash URI Strategy
> Urgh. That one needs a rewrite (and to be paired with one slash URIs).
> Run out of time to suggest something now ...
> Dave
> On 18/03/13 21:34, Bernadette Hyland wrote:
>> Hi,
>> Per our last telecon on Thursday (14-Mar), we agreed to do an internal
>> sanity check on the glossary before moving this to a WG Note.  Later
>> this week I plan to reach out to a number of authors of references on
>> Linked Data for peer review, (i,e., J. Hendler, C. Bizer, T. Heath, D.
>> Wood, M. Zaidman, et al.) once the GLD WG has reviewed.
>> Please consider reviewing the 122 glossary terms [1] prior this
>> Thursday call, that would be very helpful.  Please keep in mind the
>> target audience for this LD glossary is Web developers coming up the
>> curve on basic concepts around publishing data on the Web as LOD, it
>> is not intended as an academic reference per se.
>> The editors will take the lead on folding in feedback asap.  Please
>> cut & paste the current & proposed language & reply to this thread.
>> We'd link the terms to other deliverables including the vocabularies
>> and BP docs during the LC process, and encourage cross linking with
>> dependency and liaison groups in due course.
>> Thank you in advance for your time reviewing the LD glossary.
>> Cheers,
>> Bernadette Hyland, co-chair
>> W3C Government Linked Data Working Group
>> Charter: http://www.w3.org/2011/gld/
>> [1] https://dvcs.w3.org/hg/gld/raw-file/default/glossary/index.html
Received on Tuesday, 19 March 2013 22:40:42 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:32:38 UTC