Cyberspace analogous to set of all sets: URIs & URLs from drewangel@adelphia.net on 2004-09-20 (public-webarch-comments@w3.org from September 2004)

From: <drewangel@adelphia.net>
Date: Mon, 20 Sep 2004 7:05:33 -0400
To: public-webarch-comments@w3.org <public-webarch-comments@w3.org>
CC: <drewangel@adelphia.net>,<w3c@drewangel.com>
Message-Id: <20040920110533.JVAV9978.mta13.adelphia.net@mail.adelphia.net>
Cyberspace itself should be considered analogous to the set 
of all sets.  
  
Sunday 2004'262.6405 September 19th= PASSED DEADLINE TO 
COMMENT <=2004 Sept16  
This subject is too complex for a full response in the 
short period I had (saw the notice about 2 weeks ago), but 
the following contains my general thoughts, which  
 are that Cyberspace should be considered as a practical 
model of the Universe, although, as with most models, its 
form is extremely different from the object modeled. My 
normal interests are more in the direction of the HTML 
standards, but I definitely do think the Internet 
addressing scheme is too small: it should allow for 
numerically distant catagorizations, that might be humanly 
descernable.   
 
Basically the system should be able to login every single 
human thought, every idea, every photo, every frame of 
every video. It is not inconceivable that some people will 
think that even every pixel of every frame, and every 
sample of every audio clip (i.e. ~50k samples per second) 
should be addressable through Cyberspace.  
 
 
That's a lot larger view than normally taken, but THE 
SYSTEM should have MORE THAN ENOUGH ROOM to grow in the 
forseeable future, and be able to handled as subsets, some 
of which will be compatible with the current system.  
 
 
The world's a big place: it is not going to get smaller.  
 
 
Cyberspace, itself, therefore, should perhaps be considered 
analogous to the set of all sets.  
  
 
USING FLOATING POINT URI ADDRESSING  
  
  
In particular, I think the address space should immediately 
be extended to a larger space "between the dots." It could 
be made allowed to be 6 or 8 digits without any gigantic 
other changes, as an interim fix.  
 
HOWEVER,  
 if the dot (= decimal point ) is not the delimiter used, 
then floating point numbers can be used, basically solving 
all troubles with respect to address space size. That would 
allow machine limited infinitesimal expansions on the right 
side of integer addresses, and, enabling expansions on the 
left as well. The obvious hinderance is the use of decimal 
points. The clear choice is to choose a address delimiter 
and a new glyph without the conflicts of the DOT.  
 
 
A good choice might be binary value 00000100 = ascii code 
04 (commonly represented as a diamond glyph).  
 
 
The expedient of the decimal point combined with fixed 
point interger values for internet addresses was determined 
even before CRT screens were common in computing. It is 
time to grow up.  
 
 
That there would be many unused addresses, as indeed of 
course there are now, is true, but in the future things 
like DNA samples and other testing results will be part of 
the general database, which means that numerous information 
items could refer to the identically same item,, and in 
this case, there could even be identically similar reports 
about the same identical sample, refering to potentially an 
identical source, or not, And that could all come from a 
single drop of spit or a single hair, of which there could 
be numerous similar items in, a single scientific study, or 
legal proceeding.  
  
 
 
The future must contain a virtually infinite URL/URI 
address space, perhaps on the order of 10^50 items, 
supposedly more than the number of atoms in the Observed 
Universe. But it is important, in my view, that the 
cyberspace model should be able to CONTAIN the universe, 
logically, even if it is never fully filled with actual 
atoms. The very idea of SPACE indicates distance between 
particles, and in any cyberspace there should be room for 
plenty of space between addresses. A real problem of 
viewpoint is that regardless of the large limited numbers 
of atoms, or even particles, that figure is a one 
dimensional parameter. It neglects the true nature of the 
universe, where all those particles change relatively every 
instant, and where the true measure is units of action. 
That make the number of particles simply into the basic 
unit measure of the action of the Universe. Thus when 
figures like ten^43rd power are take as markers for the 
most of something, the crucial element of the flow of time 
as a series of actions is neglected. Cyberspace, as a 
space, must, for the future, be seen more as a model of the 
universe than merely a static collection of discrete 
samples of any particular size.  
 
"The World Wide Web is an information space of interrelated 
resources. This information space is the basis of, and is 
shared by, a number of information systems. Within each of 
these systems, people and software retrieve, create, 
display, analyze, relate, and reason about 
resources." {http://W3.org}  
  
 
   
 
 
OTHER SOMEWHAT UNRELATED COMMENTS written earlier.  
 
 
There are a number of problems involved with the practical 
use of such system. When it is assumed that authors should 
be writing their documents online in realtime, then there 
is a danger that incomplete documents may be observed, 
producing problems of misleading interpretations, copyright 
and idea thefts, and what one might call the glass 
bathhouse effect, of making authors feel they are 
improperly exposed during their creative processes.  
  
 
 
For others, that seems no problem, because there is 
software for authoring that can automatically convert 
references from relative links to URI forms. But then the 
author has lost a certain amount of control over the work. 
For example many such software tools also automatically 
"pretty print" burdened format the work in progress, before 
posting the files to the web via FTP. 
  
 
 
In my view automatic pretty printing destroys much of the 
utility of HTML. It is a "mark up language" not a symbolic 
coding language. 
  
 
 
Pretty printing generally adds white space, either spaces 
or tabs, for the most part. Paragraphs were developed as a 
style during centuries of writing. 
  
 
 
However, what is often lost by too many automated 
formatting tools is the original paragraph design of the 
text, hidden by <SPANS>, <FONT SIZES FACES> and what not, 
  
 
 
The invention of CSS (style sheets) seems to have failed to 
make type specifications simpler, and less intrusive in 
actual documents, and instead merely invented some new 
professions, perhaps called that of "stylist."  
 
 
If one program worked perfectly, everyone would use it, if 
it were conveniently inexpensive. The StarOffice/Open Org 
office products were headed in the right direction for a 
while, but seem to have gotten snaffued. There is no 
product that allows tight production of decent HTML text, 
and does page formatting also in frameset contexts. It is 
insane to have to go back to a "text editor" to fix broken 
links, and "detail" pages, (yes "detail" like in car 
washing).  
 
 
If a document is properly formatted, the inclusion of html 
markup <TAGS> merely makes the document more readable. When 
the document is burdened by artifical symbols for common 
textual symbols, and excessively repetitious font 
declarations, the underlying text may be lost entirely to 
examination by ordinary human readers, and search engines 
as well. The text is no longer "marked up," but is instead, 
altered!  
 
  
 
For example the "&nbsp;" non-breaking space token is one of 
the most obnoxious devices ever invented.  
 It definitely could have been designed as an encapsulated 
tag, such as <nb=#> where the number sign "#" is a 
parameter specifying the number of spaces from 0 to any 
number. When the number of spaces should be 1, then no 
argument would be needed, and thus this tag, <nb>, 
typically would be 2 characters shorter than "&nbsp;".  
  
 
 
For advanced browsers, even fractional and negative 
(typeover) spaces might be practical as for example to make 
accents and math symbols, and strike through print 
(although the <strike> tag works fine for most uses, such 
as legal documents there are times when one might want 
double strike-throughs using equal signs, or maybe 
xxxxx's), and even drop shadows might be feasible.  
 The <nb=#> might need to be augmented by another tag, 
<sp=#> to allow similar spacing without the non-breaking 
feature (that is one might say, "with wrap around"). These 
tags could be used similarly as "tab stops" are used in 
other document production tools.  
 
 
 
 The use of "&nbsp;" ("non-breaking space") prevented some 
search engines from finding the words following, was 
evindently a subversion of the basic intention of making 
documents more easily accessible, and certainly makes 
reading html more difficult in many instances. It is 
virtually impossible to attribute these errors of design 
merely to poor judgement, when there were better and easily 
defined alternatives.  
 
 
 
 Similarly the use of "&quot;" for double quote marks was 
an egregious offence against all writers of English, and 
possibly other languages. To compare or be force to search 
for two items such as "egregious offences" with 
"&quot;egregious offences&quot;" is obviously never going 
to always work properly if only because most people won't 
even think of it, and yet that is what many users of the 
World Wide Web are faced with if only they knew.  
 
 
 
 In many situations the most common method of identifying a 
"title" has long been "to enclose it in quotes, if it is an 
"article" or "short story" and italisize it if it is the 
title of a book. Perhaps the developers of the Web did not 
have to attend grammar school, being enrolled in University 
by age 12, but presuming it was technical difficulties, we 
can say that regardless of cause, some wrong answers have 
been published.  
 
 
 
 Another problem is the explicitly necessary parameter 
arguments requiring instantaneous access to the internet 
(such as definition statements used for variants of HTML 
and XHTML. This may be essential for some purposes, but it 
presuposes that 1) internet access is always available, 2) 
that anyone using such documents should be able to be 
tracked down by their necessity to link to the W3.org for 
specifications.  
 
 
 
 There has always been a problem between computer 
programming and general computer document creation. The 
tendency exists for programmers to presume that end users 
should be exposed to the same kinds of messages needed 
during development. The only area where this does not seem 
to be true is in the gaming industry. It certainly is a 
problem in internet activities. grumpy@DrewAngel.com  
 
 
  
 
  
 
sending ~  Mon 2004'263.16501 @03:58 PDT September 20th  
Pacific DayLight Time
Received on Wednesday, 22 September 2004 02:37:43 UTC