Saturday, April 5, 2008

Open(ing) ID(=)

The other day I was editing a personal, private portal page I've created for my own use. It has a lot of features I've been experimenting with such as accelerated login to some services by way of simplified login forms, search engine forms with a single search field set up so just hitting enter in the field starts the search etc.

I was checking the proper HTML for internal fragment links, recalling that a quick reference book I had the form '<A name="....' and some examples I had on another web page had '<A id="....'. This triggered a chain of thought when I was looking at another page, and noticed that 'id="'s were sprinkled all over the place in the HTML - if you don't believe me, just take a look at the source for http://en.wikipedia.org. So - could I tack #xxx's on the end of a URL and have it go to some spot corresponding to the arbitrary piece of html in a web page.

The answer for Lynx was yes, and some checking seemed to indicate the same held true for Firefox as well. Everything up to this point may seem obvious to someone with more in depth knowledge of HTML. My next thought though, was that I'd like to make it easier to bookmark a page with a more specific location added as an HTML fragment to the address.

A few hours of work on the idea turned out a modest sized script that can be used as an EXTERNAL for Lynx. It strips out the name= and id= fields of a web page, reformats them in to #<name> #<id> fragments on the URL and pumps both lists wrapped with appropriate HTML into Lynx for browsing/bookmarking. When you're finished, just exit Lynx to resume where you left off. An example of what gets kicked out when you run it on http://www.google.com:

This may not seem too important when used on such a deliberately simplified page, but it makes it much easier to bookmark the search field for http://en.wikipedia.com as an example. One weakness of this is that since most browsers make it relatively hard to bookmark such a specific location in a page, these are probably more subject to change with the whim of the HTML coder and/or their coding tools.

'name=' didn't seem to be noticed by the browsers I checked with, but was used in the example in the reference guide I looked at, so it may change in the future (or changed in the past.). I left it in the script. Some example pages I checked set 'name=' and 'id=' the same on many HTML elements. Glancing through the quick reference on HTML I found several elements that specificly allowed 'id=' attribute: <A>, <DIV>, <LAYER>, <PARAM> and <SPAN>. But on closer inspection, I found many listed a '%coreattrs', and when I found that in the guide it included besides 'id'; 'class', 'style', and 'title'. %coreattrs pretty much busts wide open the elements that can have 'id' attributes - <FORM>, <UL> etc. I used this to simplify the HTML in my portal page that triggered this discussion, rolling fragments for internal navigation into lists that group related links/forms.

This also is an example of a subject I've railed about in person on occasion, that web pages are subject to different interpretations ("renderings") depending on circumstances and the goals of the 'viewer'. This extreme view provides a list of everything I can think of that could be linked to on a web page.

I've pitched the script temporarily at: http://www.lafn.org/~aw585/microBkMrk

7 April 2008 Addendum:

    A few other points about this topic:
  • Since I suspect that the fragment addresses will be more subject to change than a base address for a page, it may be more important to monitor pages bookmarked/linked to in this manner for changes by whatever means.
  • Conversely this is yet another way to view of a web page that can be hashed/'fingerprinted' to monitor for changes in the page.
  • It might be usefull to dump out a few more attributes and maybe the element they are part of for the IDs and NAMEs, to provide a more human readable overview of the page, many of which are generated by tools that provided the author with a very abstract/high level relation to the html. In other words, he never actually saw the html, and there was no thought to anyone needing to look at it. This provides one view that summarizes features of the page. There are probably quite a few features that could be added to this tool.
  • From an accessability/usability view point, this might save repeated thrashing with a page. Someone might fight it out or get help a first time, but then by finding a precise spot they can either bookmark, link to or just go to with a #<fragment> address, they can effectively reuse the work of initially understanding how to navigate the page. This isn't neccessarily a result of a poorly designed page, but could in some instances merely be because of the nature of the information being presented or asked for. The full implications of this remain to be seen.

No comments: