Re: Floating Quotable Citations (FQC) from Robert Sanderson on 2013-02-20 (public-openannotation@w3.org from February 2013)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Wed, 20 Feb 2013 11:43:23 -0700
To: David Cuenca <dacuetu@gmail.com>
Cc: public-openannotation@w3.org
Message-ID: <CABevsUFov2MFGXtBwvS3vHA+UEBoGCYgEz2m6kM02Uxe0=xNcQ@mail.gmail.com>

Hi David,

Thanks for sharing the approach!  In Open Annotation terms, it would
be a Selector along the lines of:

_:selector a mw:FQCSelector ;
  mw:fqc "lttia" ;
  mw:start "0,1,5" ;
  mw:end "0,1,0" .

However I'd encourage you to *not* try to do this as a URI Fragment,
as you would be competing with the official specifications of what a
Fragment component of HTML, plain text, XML and PDF resources means.
Within Media Wiki and other conforming implementations you can, of
course, use the query approach.

Some other issues off the top of my head:

* It's hard to determine paragraphs, sentences and words.
-- Paragraphs could be <p>, or <div>, but they might not be.  Perhaps
just <br/><br/> is used to separate the paragraphs.
And that's just HTML, let alone other textual resources.
-- Sentences:  Mr. J. Smith of the U.S.A. took $1.45 from his pocket
... and spent it.   1 sentence or 10?
-- Words: Word splitting is extremely hard in eastern languages.

* We stuck with character counting, but even then it's tricky with
normalization routines.  &amp;  -- 1 character or 5?
You have the same issue with length as well.


Hope that helps!

Rob

On Sun, Feb 17, 2013 at 11:04 PM, David Cuenca <dacuetu@gmail.com> wrote:
> Dear all,
>
> As part of an improvement drive for Wikisource.org, the free library and
> part of the Wikimedia Foundation, we have been thinking of ways to improve
> citations to our text materials and transclusions of the citations as quotes
> into other websites (like Wikipedia). One of our biggest concern was to
> provide a way to cite text without the need of setting anchors, for this we
> developed a new text fragment selector and identification method that we
> called Floating Quotable Citations (FQC). It is still on drafting phase and
> there is no implementation yet.
>
> The FQC system is based on using a paragraph identification code generated
> with the first letter of the X first words of the paragraph (we estimate an
> X between 5 and 15, needs testing), and then using paragraphs, sentences and
> words as counting units. The method is explained on this slideshow:
> https://docs.google.com/presentation/d/1X-Bn_3YC0zrPna08DrzgzeyGNX1k8mC-7UCgqrBIw3A/present#slide=id.p
>
> And the draft v0.4 can be found here:
> https://docs.google.com/file/d/0BzPZCAakZAI3eVF5R1FQcDNJNVE/edit?usp=sharing
>
> We would really appreciate feedback on this. I hope this group is the right
> place to ask, our paths seem to be very close.
>
> All the best,
> David
>
>

Received on Wednesday, 20 February 2013 18:43:58 UTC