Re: PDF and "CSV for the Web"

Yes, the PDF Fragment syntax has already been extended to support file attachments for exactly this type of use case.

Leonard

On 9/7/16, 12:11 PM, "Larry Masinter" <masinter@adobe.com> wrote:

    I had a good discussion yesterday with Gregg Kellogg (new group member) and I thought I would report it.
        
     Gregg worked on the CSV-on-the-web working group, and in some ways we’re trying to do for PDF what the CSV group did for CSV: find a way of letting PDF data be five star.
    
    There is all kinds of data one might want to get out of a PDF, but for lots and lots of use cases, the important data is in tables.
    CSV (comma-separated-values) is a common, simple way of communicating values in a table, can be read into a spreadsheet directly.
        
    The CSVW group defined a way of representing the metadata you need to know to transform the data in the CSV file into RDF triples.  
    https://www.w3.org/standards/techs/csv
    
    Gregg developed a Note about embedding CSV inside HTML
    http://www.w3.org/TR/csvw-html/

    for the same kinds of reasons… keep the data with the report that describes it, keep existing workflows which have grown up around having a single file.
    
    So: suppose, for each table in a PDF file with (useful) data in it, we add an attachment of CSV and JSON metadata. (There’s some question of which points to which, or if you could have multiple CSV fragments for one table, and some issue of what URL to use to get to parts of the CSV file, but these seem workable.)
        
    Gregg has some web utilities that do useful things with RDF and CSV.     One takes a URI of a CSV and produces other formats.
    
    The CSVW github repo has a lot of examples and test cases.
    (under  http://w3c.github.io/csvw/)
    
    Lots of samples were based on “palo alto  trees” (one of the CSVW use cases).
        
    For example
        https://raw.githubusercontent.com/w3c/csvw/gh-pages/examples/tree-ops.csv-metadata.json

        has the metadata for tree-ops. 
        
        I am curious as to whether the PDF fragment identifier syntax could be extended to allow pointers into file attachments. 
        See https://tools.ietf.org/html/draft-hardy-pdf-mime-04.pdf 
    
    We’re going to get together Friday afternoon in San Jose and I hope we can talk about the charter and see how far we can get with the PDFData tools https://github.com/Aiybe/PDFData.

    If you’d like to join, let me know.
    
    Larry
    --
    http://larry.masinter.net

    
    
    
    
    
    

Received on Thursday, 8 September 2016 05:16:09 UTC