- From: Annette Greiner <amgreiner@lbl.gov>
- Date: Thu, 13 Aug 2015 12:47:55 -0700
- To: Makx Dekkers <mail@makxdekkers.com>
- Cc: public-dwbp-wg@w3.org
I think we do need to scope it, but limiting it to tabular data is too restrictive. Even CSV and JSON wouldn’t qualify. If you meant structured data, I think that could work. - Annette -- Annette Greiner NERSC Data and Analytics Services Lawrence Berkeley National Laboratory 510-495-2935 On Aug 13, 2015, at 12:11 PM, Makx Dekkers <mail@makxdekkers.com> wrote: > There was a thread back in March 2015 (subject Meaning of publishing Data on > the Web) where I proposed to narrow the definition of data, for the scope of > this group, to tabular data only. > > As far as I remember, that narrowing of scope was rejected. > > The problem that we still haven't solved is that different members of this > group may have very different opinions on what 'data' is. People from a > scientific background may think about observations of natural phenomena, > humanists think oral histories, legal people see their legislation and > court decisions, financial people think budgets and spending, government > people think base registers with information about buildings and people, > geo-people think maps, museum people think images and 3D models of art works > etc. etc. The use cases at https://www.w3.org/2013/dwbp/wiki/Use_Cases > contain many different types of 'data'. > > Annette writes "The more we try to cover everything that could in any way be > conceived as data, the less specific and helpful our guidance about > publishing data becomes." That was exactly the point I was making here > https://lists.w3.org/Archives/Public/public-dwbp-wg/2015Mar/0036.html. Oh, > and even further back: > https://lists.w3.org/Archives/Public/public-dwbp-wg/2014Feb/0029.html. > > Makx. > > >> -----Original Message----- >> From: Annette Greiner [mailto:amgreiner@lbl.gov] >> Sent: 13 August 2015 20:30 >> To: Makx Dekkers <mail@makxdekkers.com> >> Cc: public-dwbp-wg@w3.org >> Subject: Re: Use machine-readable standardized data formats / Use non- >> proprietary data formats >> >> No, let's throw it out entirely. I strongly disagree with the idea that > this group >> should concern itself with the publication of all kinds of digital > resources on >> the Web At TPAC we defined the scope to include only best practices that >> are unique to publishing data on the web. Yes, other kinds of media can be >> turned into data, but that doesn't mean that our scope must embrace every >> media type posted on the web. In the end, we end up trying to write best >> practices for publishing anything on the web, which is clearly beyond our >> charter. The more we try to cover everything that could in any way be >> conceived as data, the less specific and helpful our guidance about > publishing >> data becomes. I already worry that we are publishing a BP document with >> very little that is helpful to people who think of themselves as > publishing data >> on the web. If we can't agree to even that, then I think I am in the wrong >> working group. >> >> Speaking practically, I have no idea what is meant by "original data, in >> whatever format they have it." In my world, researchers create a variety > of >> data-containing documents, the vast majority of which they would never >> dream of making public because they know it to be messy, incomplete, >> preliminary, and useful only to those in their own research group. > Scientific >> data goes through a series of evolutions to make it usable for others, and >> publishing every incarnation of it would not only preclude publication in > a >> peer-reviewed journal, but it would also litter the web with useless > material. >> Not even the archivists around here want that dross. >> -Annette >> -- >> Annette Greiner >> NERSC Data and Analytics Services >> Lawrence Berkeley National Laboratory >> 510-495-2935 >> >> On Aug 13, 2015, at 4:15 AM, Makx Dekkers <mail@makxdekkers.com> >> wrote: >> >>> All, >>> >>> I very much agree with Tomas here. >>> >>> I think this group is supposed to give advice to people who have >>> today's data and want to know how to best publish it on the Web, not >>> paint a picture of how the world of data may look in ten or twenty >>> year's time. I think today's data is mostly not ready for that quantum >>> jump. Not catering for today's needs means this group will be writing >>> science fiction. That can be entertaining but is maybe not so useful. >>> >>> I also agree with his WordPerfect argument. Publishers should be >>> encouraged to publish original data in whatever format they have it. >>> In addition, the advice should be to provide the data also in >>> additional and higher-starred formats to make it more useful. >>> >>> Annette seems to suggest that "documents" are out of scope. I think >>> the outcome of earlier discussions was that the definition of "data" >>> is very broad and includes all kinds of digital resources on the Web. >>> As an example, all the stuff on http://www.legislation.gov.uk/ is text >>> and all of it is on the Web; it has the whole range of issues: formats >>> (PDF, HTML, XML), identification, versioning, archiving, metadata, >>> multilingualism, granularity etc. etc. Let's not throw that out. >>> >>> Makx. >>> >>> >>>> -----Original Message----- >>>> From: Manuel.CARRASCO-BENITEZ@ec.europa.eu >>>> [mailto:Manuel.CARRASCO-BENITEZ@ec.europa.eu] >>>> Sent: 13 August 2015 11:19 >>>> To: mark.harrison@gs1.org; amgreiner@lbl.gov >>>> Cc: phila@w3.org; mark.harrison@cantab.net; public-dwbp-wg@w3.org >>>> Subject: RE: Use machine-readable standardized data formats / Use >>>> non- proprietary data formats >>>> >>>> Mark, >>>> >>>> Data is a hard problem and this is aiming quite high: >>>> >>>> "... the web as an electronic delivery mechanism for structured data >>>> in >>> open >>>> formats ..." >>>> >>>> Other groups address visualisation, etc. >>>> >>>> We are the miller group with the objective to produce standardised > flour: >>> not >>>> over-glamorous, but necessary . Other groups are for bakery, pastry, > etc. >>> :-) >>>> >>>> Regards >>>> Tomas >>>> >>>> ________________________________________ >>>> From: Mark Harrison [mark.harrison@gs1.org] >>>> Sent: 13 August 2015 07:51 >>>> To: Annette Greiner; CARRASCO BENITEZ Manuel (DGT) >>>> Cc: phila@w3.org; Mark Harrison; public-dwbp-wg@w3.org >>>> Subject: Re: Use machine-readable standardized data formats / Use > non- >>>> proprietary data formats >>>> >>>> Hi Annette, >>>> >>>> I completely agree with you that the discussion should be about how >>>> to encourage people to move beyond / away from publishing static >>>> immutable documents and towards publishing live (data + models + >>>> interactive >>>> visualisations) on the web that are open, interactive and >>>> collaborative >>> and >>>> make it as easy as possible for people and machines to retrieve, >>>> combine, compare, re-analyse and re-visualise data from multiple >>>> sources just as >>> easily >>>> as people can use web technology to collaborate on open source >>>> software today. >>>> >>>> If our focus appears to be primarily on the web as an electronic >>>> delivery mechanism for structured data in open formats, we're >>>> probably aiming far too low and not giving people enough of a bold >>>> vision about what live, interactive, collaborative, mashable data on >>>> the web could be like in the future. >>>> >>>> There are already some sites such as openspending.org that are making >>>> good progress in that direction. There are also toolkits and >>>> frameworks such >>> as >>>> d3.js that make this vision easier to achieve. We can probably find >>>> and critique other examples and comment on the aspects that they do >>>> well, as well as aspects where they could improve further. In this >>>> way, we can explain the big vision for what 'data on the web' really >>>> could be, if done >>> well. >>>> >>>> As Erik says, it needs to be webby. That could mean that the raw >>>> data and the data transformations and visualisation are all fully >>>> interlinked on >>> the web >>>> in the finest detail, potentially down to the granularity of each >>> individual >>>> datapoint. Furthermore, if we want to find related datasets for >>> comparison, >>>> we should be able to easily retrieve those and overlay them within >>>> the >>> same >>>> live visualisation - or even try modelling or visualising the data in >>> different >>>> ways, all interactively and collaboratively on the web. >>>> >>>> Even with 5-star linked open data, we can link to existing data but >>>> cannot immediately link to future data that has not yet been >>>> generated - so >>> instead >>>> we also need to provide rich metadata that describes the scope, >>>> coverage and granularity of the data well. In future, we might >>>> expect that web >>> search >>>> engines can not only help us to retrieve datasets and their metadata >>>> - but allow us to tweak any of the metadata parameters in order to >>>> search for related datasets, e.g. to find similar economic data about >>>> a different >>> country >>>> or different organisation - or to find related scientific data for a >>> related >>>> material - or for the same material studied using a different but >>>> related experimental technique, so that we can compare the data >>>> easily, without having to spend so much effort tracking down the >>>> data, reverse-engineering charts and graphs to extract data, etc. >>>> >>>> To some extent, web technology already exists to enable the whole >>>> Data Model, View and Controller to all be entirely web-based, >>>> resulting in a >>> live, >>>> interactive, collaborative space for data sharing and analysis, which >>>> has >>> so >>>> many advantages over static published documents. My reference to >>>> D3.js was one example of such technology. I think it's a good thing >>>> to point >>> people >>>> to multiple toolkits and frameworks that they can already use to >>>> implement the bold vision of truly collaborative, interactive data on > the >> web. >>>> >>>> I think we would miss a great opportunity if this group cannot >>>> clearly >>> explain >>>> to everyone (including any member of the public) what that bold >>>> vision for 'data on the web' could be like. It could go far beyond >>>> providing >>> datasets via >>>> the web. >>>> >>>> Some people may take the time to read rather dry documents of best >>>> practices and might even understand some of them. Others may >>>> understand the vision better if we can point to existing real >>>> examples of 'data on >>> the web >>>> done very well' and explain which aspects they currently do very well >>>> - >>> and >>>> what they could do even better. The 'gold standard' is probably a >>>> blend >>> of >>>> the best aspects of several existing examples. >>>> >>>> When everyone can understand how data that is truly live on the web >>>> has the potential to greatly increase the efficiency of research and >>>> data >>> analysis >>>> and generation of new insights in so many different fields, then the >>>> best practices documents from this group become a highly relevant and >>>> practical step-by-step instruction manual to help everyone achieve that >> vision. >>>> >>>> Best wishes, >>>> >>>> - Mark >>>> >>>> ________________________________________ >>>> From: Annette Greiner <amgreiner@lbl.gov> >>>> Sent: 12 August 2015 18:31 >>>> To: Manuel.CARRASCO-BENITEZ@ec.europa.eu >>>> Cc: phila@w3.org; Mark Harrison; public-dwbp-wg@w3.org >>>> Subject: Re: Use machine-readable standardized data formats / Use > non- >>>> proprietary data formats >>>> >>>> You're not seriously suggesting people should make data available in >>>> word perfect format, are you? >>>> This discussion seems to be wandering into the realm of publishing >>>> documents. >>>> >>>> -- >>>> Annette Greiner >>>> NERSC Data and Analytics Services >>>> Lawrence Berkeley National Laboratory >>>> 510-495-2935 >>>> >>>> On Aug 12, 2015, at 7:28 AM, Manuel.CARRASCO-BENITEZ@ec.europa.eu >>>> wrote: >>>> >>>>> One should have at least the following variants of the resource: >>>>> >>>>> - Original : foo.wp - WordPerfect 3.0 ~1982, perhaps still >>> processable >>>>> - Content : foo.txt - textual, hopefully processable in 100 years >>>>> - Presentation : foo.tif - TIFF ~1986, perhaps still viewable, might >>>>> be foo.ps >>>>> >>>>> So: >>>>> - http://example.com/foo - negotiate and give me the best >>>>> - http://example.com/foo.wp - I can still process WP >>>>> - http://example.com/foo.txt - I want to process the text, no >>>>> presentation >>>>> - http://example.com/foo.tif - I really want to see how the doc >>>>> looks >>>>> >>>>> Regards >>>>> Tomas >>>>> >>>>>> Perhaps the way we can formulate this is to say that some document >>>>>> formats (such as PDF, .doc / .docx and even .xls / .xlsx ) are >>>>>> concerned with presentation of information in a particular format >>>>>> or layout and therefore carry a significant amount of typesetting / >>>>>> formatting information overhead in addition to the underlying data. >>>>>> Furthermore, at the time those document-centric formats were >>>>>> developed, ease of access to the underlying data and the >>>>>> unambiguous meaning of specific data fields might not have been the >>>>>> main priority in their design. >>>>>> >>>>>> When the main priority is to ensure that the underlying data is >>>>>> available on the web so that others can re-use it, we recommend >>>>>> using simpler data formats such as CSV, TSV, JSON (or better still >>>>>> JSON-LD), RDF or XML. >>>>> >>>> >>>> >>>> >>>> >>>> CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail are >>> confidential >>>> and are not to be regarded as a contractual offer or acceptance from >>>> GS1 (registered in Belgium). >>>> If you are not the addressee, or if this has been copied or sent to >>>> you in >>> error, >>>> you must not use data herein for any purpose, you must delete it, and >>>> should inform the sender. >>>> GS1 disclaims liability for accuracy or completeness, and opinions >>> expressed >>>> are those of the author alone. >>>> GS1 may monitor communications. >>>> Third party rights acknowledged. >>>> (c) 2012. >>>> </a> >>> >>> >>> >
Received on Thursday, 13 August 2015 20:13:04 UTC