Re: "machine readable"

I think we need to avoid terms that have a common use that differs from our intended meaning in such a way that our intent can be misinterpreted. This is the case with "machine readable data". If we can substitute a term, like "structured data", that is not subject to the same misinterpretation, we should do so. We can clarify further with other words, depending on the context. We also have a duty to provide actionable guidance, and I don't think that is accomplished by saying that data must be machine readable or that it must be suitable for its intended or potential use. Our readers will want to know *how* to make their data suitable and *how* to make it readily readable and processable by machines. Giving it structure is the key.
-Annette

Sent from a keyboard-challenged device

> On Apr 25, 2015, at 6:37 AM, Laufer <laufer@globo.com> wrote:
> 
> Hi All,
> 
> I think that any term used to substitute "machine readable" would have the same problems to define.
> 
> Being radical, all things transferred via Web are "machine readable" and "machine processable". Computers can read and interpret "0"s and "1"s. Being radical, no thing transferred via Web  is "human readable". A video that can be "read" by a human is first "read" by a computer and shown to the human. Human readable things are, for example, texts on books.
> 
> When we talk about "human readable" and "machine readable" we are talking about the content, the message. I think we could maintain the term "machine readable". I think people can understand the difference between what is "human readable" and what is "machine readable". What we have to do, and that is the purpose of the glossary, is to clarify the term.
> 
> I think that a "machine readable" content is when there is an intentional purpose to make that content "machine readable".
> 
> So, if in a web page, someone embed RDFa, there is an intention to make that content "machine readable". In this case, a specific technology and specific vocabularies are being used to establish a conversation with a machine. When there is not that intention, some formats are more or less easy to try to be understood by machines.
> 
> Well, glossaries always have huge discussions. It´s all about semantics. And it´s what we are trying to facilitate between publishers and consumers.
> 
> 
> Best regards,
> laufer
> 
> 2015-04-25 5:44 GMT-03:00 Christophe Guéret <christophe.gueret@dans.knaw.nl>:
>> Hoi,
>> 
>> Good points! I would also prefer if we could settle on something more explicit than "machine readable" and also avoid pointing at specific cases as much as possible. Saying that "Excel", or even PDF, is bad could get us into a lot of discussion regarding .xls VS .xlsx, the metadata in the PDFs, the fact that "Excel" is the name of the software and not exactly the name of the file format, etc I reckon we would rather spare ourselves this. BTW, "human readable" is not that good either as this really depends on the human you are considering. Someone illiterate or blind will have a very hard time with a JSON without having some kind of interface (that would then benefit from having machine readable data) to make it readable to him.
>> 
>> Anyhow, for the machine readable thing I was also thinking we could use something like "machine processable" ? This aspect that machines can ingest the data is essentially the first part of some process. For "good" file formats this first part is easy (e.g. no OCR, no supervised input, ...) and does not depend on the consumer having access to a specific technology. We could stress that aspect out, saying that a best practice is to share data "easy" to process.
>> 
>> Cheers,
>> Christophe
>> 
>>> On 24 April 2015 at 19:22, João Paulo Almeida <jpalmeida@ieee.org> wrote:
>>> Dear All,
>>> 
>>> I too find the qualification ³machine readable² quite problematic.
>>> 
>>> I raised this last year in the scope of the UCR document:
>>> 
>>> 
>>> > About the whole ³machine readable² debate, it is of course a different
>>> >story:
>>> > http://www.w3.org/2013/dwbp/track/issues/36
>>> 
>>> > Similar to Makx, I find it hard to live with the sloppy ³machine
>>> >readable² qualificationŠ  but, in order to come to a constructive
>>> > suggestion for this other issue as well, perhaps we could say:
>>> 
>>> > R-FormatMachineRead(able)
>>> 
>>> > Metadata should conform to standard formats that aim at facilitating
>>> >automated processing
>>> 
>>> 
>>> 
>>> I would prefer avoiding machine readable. ³Structured² could help
>>> informally, but is also not precise.
>>> 
>>> We need to state the quality we want this artifact to have. Perhaps it is
>>> that it is "amenable to automated processing", but this could of course
>>> also be interpreted as vague and too broad.
>>> 
>>> Makx said "Maybe the requirement is rather that data should be published
>>> in formats that are appropriate for its intended or potential use?² This
>>> is the key aspect: making explicit the quality that the artifact needs to
>>> have such that it can be used.
>>> 
>>> I would say that ³structure² is needed because we use the structure to
>>> document interpretation rules to establish the semantics of "structured
>>> data². But that is not enough...
>>> 
>>> Regards,
>>> João Paulo
>>> 
>>> 
>>> 
>>> On 24/4/15, 1:47 PM, "Phil Archer" <phila@w3.org> wrote:
>>> 
>>> >Good points, Annette.
>>> >
>>> >I think this is what the 1st and 2nd stars of LOD are getting at.
>>> >
>>> >* Available on the web (whatever format) but with an open licence, to be
>>> >Open Data
>>> >** Available as machine-readable structured data (e.g. excel instead of
>>> >image scan of a table)
>>> >
>>> >Note that here, Excel is included in machine readable. The keyword here
>>> >is structured.
>>> >
>>> >So I think we should focus on the word structured, as you suggest, and
>>> >be very cautious about using the phrase machine readable, perhaps
>>> >avoiding it altogether.
>>> >
>>> >Phil.
>>> >
>>> >On 24/04/2015 15:42, Annette Greiner wrote:
>>> >> Re the definition of machine readable as "Data formats that may be
>>> >>readily parsed by computer programs without access to proprietary
>>> >>libraries. For example, CSV, TSV and RDF formats are machine readable,
>>> >>but PDF and Microsoft Excel are not.²
>>> >>
>>> >> I disagree with this definition. All proprietary computer file formats
>>> >>are machine readable. If we want to talk about nonproprietary formats,
>>> >>we should call them nonproprietary formats. If we want to talk about
>>> >>structured data formats, we should call them structured data formats.
>>> >>
>>> >> I just did a search through the BP doc ² for ³machine readable², and I
>>> >>think there are two ways it gets used. In the introduction, it is used
>>> >>in the sense of making it easier for machines to parse and do useful
>>> >>things with data. That could be clarified by changing it to ³more
>>> >>readily machine readable² or some such. Elsewhere, it gets used to mean
>>> >>giving structure to the data. In this latter case, which is the
>>> >>majority, I think we should change it to ³structured².
>>> >> -Annette
>>> >> --
>>> >> Annette Greiner
>>> >> NERSC Data and Analytics Services
>>> >> Lawrence Berkeley National Laboratory
>>> >> 510-495-2935
>>> >>
>>> >>
>>> >>
>>> >
>>> >--
>>> >
>>> >
>>> >Phil Archer
>>> >W3C Data Activity Lead
>>> >http://www.w3.org/2013/data/
>>> >
>>> >http://philarcher.org
>>> >+44 (0)7887 767755
>>> >@philarcher1
>>> >
>> 
>> 
>> 
>> -- 
>> Onderzoeker
>> DANS, Anna van Saksenlaan 51, 2593 HW Den Haag 
>> +31(0)6 14576494
>> christophe.gueret@dans.knaw.nl
>> 
>> Data Archiving and Networked Services (DANS/KNAW)
>> 
>> 
>> e-Humanities Group (KNAW)
>> 
>> 
>> World Wide Semantic Web community
>> http://worldwidesemanticweb.org/
> 
> 
> 
> -- 
> .  .  .  .. .  . 
> .        .   . ..
> .     ..       .

Received on Saturday, 25 April 2015 16:47:33 UTC