Re: Linked Data Glossary is published! from John Erickson on 2013-07-01 (public-egov-ig@w3.org from July 2013)

From: John Erickson <olyerickson@gmail.com>
Date: Mon, 1 Jul 2013 11:35:21 -0400
To: KANZAKI Masahide <mkanzaki@gmail.com>
Cc: Bernadette Hyland <bhyland@3roundstones.com>, W3C public GLD WG WG <public-gld-wg@w3.org>, public-ldp-wg@w3.org, Linked Data community <public-lod@w3.org>, egov-ig mailing list <public-egov-ig@w3.org>, HCLS <public-semweb-lifesci@w3.org>
Message-ID: <CAC1Gg8SDYruZ3kgvUV0fEc5N8ywjF7HK5f_2eH+nrV93xHTBdg@mail.gmail.com>
Thanks for your comments, Kanzaki!

You wrote:
> 1. 5 Star ...:
> I'm afraid I don't understand why XML is 2-star (implying proprietary
> format), lower rated than CSV. Does it mean Excel ? (but even Excel uses
> non-proprietary OpenXML now).
> I'd suggest this section should include a link to the TimBL's original
> scheme.

Please note that the formats listed are EXAMPLES...
I think the point w.r.t. spreadsheets is that they are not merely
published using OpenXML, but also using a well-defined document model
that makes the semantics of their content more accessible.

I believe the thinking on XML being listed under two stars is that
often data has been dumped out of some government's data catacombs
using XML serialization but no clear explanation of structure or
semantics. OpenXML has a clearly (and openly) defined structure and a
protocol for interpretation, and thus should get more stars than
"XML." A dumb XML artifact is somewhat more usable than dumping the
data in PDF, but is not as usable or accessible as a CSV, which
usually provides the data in a tabular format wherein the semantics of
each cell are more accessible. This is a sweeping generalization, but
it is often the case.

Thus, I think we should distinguish between "plain old XML" and Office
Open XML/OOXML/OpenXML; based on my understanding and what I read <>
OpenXML could be listed as an example three-star format.

> 58. Machine Readable Data:
> I think "without access to proprietary libraries" is not well fit for the
> definition of this widely used term (it could be called "Open Machine
> Readable Data" or something, if this condition is necessary). Moreover, I
> don't believe "PDF and Microsoft Excel" are good counter examples. There are
> many open source PDF libraries, and parsing OpenXML is not very difficult.

* I agree that "Open, Machine Readable Data" or "Openly Machine
Readable Data" is a better fit with our definition than "Machine
Readable." I believe the distinctions are:
** "Machine Readable Data" is readily accessed using available
libraries or protocols
** "Open (or Openly) Machine Readable Data" is readily accessed using
freely-available libraries or protocols
* I think the POINT is that the data should be published in a way
suited for machine consumption. A format should NOT be considered
"machine readable" simply because someone cooked up a hack on
Scraperwiki for getting the data out of an otherwise opaque data dump
on a site
* On that point, data may be structured within PDF documents in a
variety of ways, depending upon the source program and the code used
to generate the PDF (including such tools as LaTeX).
** IF there was a well-documented and widely-adopted approach for
embedding tabular data in PDFs and IF API for identifying and pulling
out such data from PDFs, then perhaps PDF-published data could be
considered. NOTE: I am NOT talking about metadata...
** One could even embed RDF Linked Data in PDFs using such a
technique. If it only existed...

> Probably, what is needed here is a sort of "Machine Readable Structured
> Data", which PDF and Excel data are sometimes not. However, unstructured
> Excel data is not the fault of the format, but the usage of it, IMHO.

I get your meaning, and this *might* be a useful distinction to make.
I'm wondering however if we really want/need an additional entry in
the Glossary.
* The argument against having a separate term is simply that
(arguably) the common case for publishing "machine readable" data *is*
structured data, and adding the a special "structured" category merely
confuses adopters.
* The argument for a new term is, if the reason we want "machine
readable data" is because we expect (and usually get) structured data,
then we should specify that what we REALLY want is "machine readable
structured data..." (and explain what that means)

John

>
> cheers,
>
>
>
> 2013/6/28 Bernadette Hyland <bhyland@3roundstones.com>
>>
>> Hi,
>> On behalf of the editors, I'm pleased to announce the publication of the
>> peer-reviewed Linked Data Glossary published as a W3C Working Group Note
>> effective 27-June-2013.[1]
>>
>> We hope this document serves as a useful glossary containing terms defined
>> and used to describe Linked Data, and its associated vocabularies and best
>> practices for publishing structured data on the Web.
>>
>> The LD Glossary is intended to help foster constructive discussions
>> between the Web 2.0 and 3.0 developer communities, encouraging all of us
>> appreciate the application of different technologies for different use
>> cases.  We hope the glossary serves as a useful starting point in your
>> discussions about data sharing on the Web.
>>
>> Finally, the editors are grateful to David Wood for contributing the
>> initial glossary terms from Linking Government Data, (Springer 2011). The
>> editors wish to also thank members of the Government Linked Data Working
>> Group with special thanks to the reviewers and contributors: Thomas Baker,
>> Hadley Beeman, Richard Cyganiak, Michael Hausenblas, Sandro Hawke, Benedikt
>> Kaempgen, James McKinney, Marios Meimaris, Jindrich Mynarz and Dave Reynolds
>> who diligently iterated the W3C Linked Data Glossary in order to create a
>> foundation of terms upon which to discuss and better describe the Web of
>> Data.  If there is anyone that the editors inadvertently overlooked in this
>> list, please accept our apologies.
>>
>> Thank you one & all!
>>
>> Sincerely,
>> Bernadette Hyland, 3 Round Stones Ghislain Atemezing, EURECOM Michael
>> Pendleton, US Environmental Protection Agency Biplav Srivastava, IBM
>>
>> W3C Government Linked Data Working Group
>> Charter: http://www.w3.org/2011/gld/
>>
>> [1] http://www.w3.org/TR/ld-glossary/
>
>
>
>
> --
> @prefix : <http://www.kanzaki.com/ns/sig#> . <> :from [:name
> "KANZAKI Masahide"; :nick "masaka"; :email "mkanzaki@gmail.com"].



--
John S. Erickson, Ph.D.
Director, Web Science Operations
Tetherless World Constellation (RPI)
<http://tw.rpi.edu> <olyerickson@gmail.com>
Twitter & Skype: olyerickson
Received on Monday, 1 July 2013 15:35:53 UTC