W3C home > Mailing lists > Public > public-pdf-open-data@w3.org > October 2017

Notes on discussions so far: Documents, Data, Access

From: Larry Masinter <masinter@adobe.com>
Date: Thu, 26 Oct 2017 18:31:57 +0000
To: "public-pdf-open-data@w3.org" <public-pdf-open-data@w3.org>
Message-ID: <3E996DFA-3AF6-4C64-9C7A-3592613E097F@adobe.com>
I put together some notes about documents and data that I’d like to discuss.

This is an outline of discussion points
We should look at the use cases to see if this analysis gives a better way of evaluation


The 5-star ratings confuse modality with format
     Data, Documents, Access methods all have a place
     Different modalities have different requirements

I’d want to suggest another look at how to evaluate “Open Publication”
that separates the modes

* Documents should be accessible, transportable,
     Searchable, translatable, portable, open format.
     Open format is about current and future tools.
      .docx .pdf .xps are all document formats.
     Documents in source format (LaTex, etc.) often use local context
      PDF can too (e.g., non-embedded fonts)

    Technology is moving rapidly

     Special place for image scan of paper
        Good and bad scans
        Printouts of spreadsheets — acknowledge
             paper dominates work practice for last few centuries
                 And most of current law
             The world is moving slowly to data
           OCR is improving too, but doesn’t currently
               do very well with bad scans of paper tables


* Data should be in data format which is reusable
      Extra points for
        - documenting the schema
        - using a standard schema


* Data needs explanation — hypertext, web applications are great
    - accessible, Multi-lingual are important
    - let people download as data, also as document


  *   Hybrid forms of document + data are interesting

     PDF with data attachments
          If the document explains the schema
     HTML with RDFa or microdata
         Use Schema.org?
     Forms and form-data (e.g. Publishing tax returns in US, 1040 is the schema)



* None of the data portals I’ve seen care about 4th and 5th star
   They’re about hybrid forms and a dream, but not so practical
   Except HTML with microdata

Documents can be doctored, edited, even PDF
Best practice should be to give people a way to validate
    QR-code with URL to official site?
    Use digital signatures

Larry
--
http://LarryMasinter.net


Received on Thursday, 26 October 2017 18:32:26 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 26 October 2017 18:32:28 UTC