Thursday, June 26, 2008

Lynx HTTP EXTERNAL Menu Script

One thing keeping my friend Larry from getting the most out of my usnatch program and Lynx externals is awkwardness he has in dealing with the EXTERNAL selection menus. He is blind and this coupled with many of the long, complex URLs, (in many cases having stretches of random hash characters) that are passed to the Lynx EXTERNAL menu for special handling make it hard for him to seperate out the choices the menu presents. He literally can't wade through the URLs to get to the menu selection options. This is different from many of the other Lynx menus, where simple brief descriptions of the choices are given, without interjecting the names of temporary files and such into the options to confuse the issues.

To try and deal with this problem, I present a bash script below to replace a collection of Lynx EXTERNAL entries for http with a single entry, that presents a simplified menu. This whole idea is a variation on what I documented in my article Lynx/Kermit Coordination Part I hosted by the Kermit Center at Columbia Univesity. This script will have to be customized, and a seperate one would have to be created for other protocols such as ftp, ssh, irc and such. I've put extensive notes in this script, since I think it makes explination simpler if the notes are close to the subject instead of before or after in this blog's narrative. Also, the people using it might not be familiar with bash, so I try to explain as much as I can, so they can knowledgably change it to their needs.

To install this script, first make sure your version works from the command promt. Then comment out all the

EXTERNAL:http:....

statements in your lynx.cfg file, both any in your home directory, and those in /etc/ and it's subdirectories, adjust the script to include those possibilities you want in it, and put a single EXTERNAL statement:

EXTERNAL:http:extern.menu %s:TRUE

Where extern.menu is taken to be the name of the menu script.

I've put a copy of this script online at a Yahoo group for Larry's friends, http://groups.yahoo.com/group/larryhsfriends/


#! /usr/bin/env bash
#  #!/usr/bin/bash  -

#  '#! /', env per
#  'bash Cookbook', 1st edition, recipe #15.1, p. 321
#  "Finding bash Portably for #!"
#  http://www.bashcookbook.com
#
#  trailing '-' per
#  'bash Cookbook', 1st edition, recipe #14.2, p. 283
#  "Avoiding Interpreter Spoofing"

#  the first line convention points to the intepretor to be used.

#  Some bash conventions for beginners:
#
#  hash/pound/sharp/octhorps to the end of line are comments.
#
#  Lines ending exactly with a backslash, '\', are continued
#  with the next line.
#  Trailing whitespace after the backslash can cause errors,
#  so beware of it.
#
#  I urge you to look up aspects of this script,
#  either with man bash, or if you are running bash
#  at your command prompt, with the help command
#  such as 'help case' or 'help select'.

URL="${1}"  ;

PS3='What EXTERNAL action do you want? '
#         -- defaults to '#?'
#            trailing blank desirable for spacing of response
#            from prompt
#            PS3 is the Select Menu Loop prompt in bash

#  Note 'for' statement like syntax of select:

select external_action in       \
      'Quit externals                                              '          \
      'Print the URL'           \
      'USnatch'                 \
      'lynxvt'                  \
      'screened-lynx'           \
      'javascript-links2'       \
      'graphical-links2'        \
      'W3M'                     \
      'links'                   \
      'elinks'                  \
      'Blogspotviewer'          \
      'lynx-noreferer'          \
      'lynx-nofilereferer'      \
      'Privoxy Control Panel'   \
      'Microbookmarker'         \
      'wget'                    \
      'Bug-Me-Not'              \
      'whomis'                  \
      'nspeep'                  \
      'pingvolley'              \
      'lynxtab'                 \
      'lynxtab-blank'           \
      'lynx-blank'              \
      'frys'                    \
      'bash'

#    Note: the last select menu item should not and a trailing
#    '\' to continue on to the next line.
#    I've placed each menu item or quoted phrase on a seperate
#    continuation line.
#    At this point, it is normally desirable to have one menu item
#    for each of the case statement stanza's below.
#    They don't need to be in order.
#    There just has to be a menu prompt item
#    and a case "whatever )" that match
#    so the case stanza can be triggered.
#    Be careful in using '*' 'splat' patterns,
#    that you do not unintentionally match more than
#    the desired pattern by mistake.
#    This could accidently short circuit desired action,
#    keeping a case stanze from being taken when wanted,
#    and causing another to be taken instead.
#    If all the menu items are short, select will try to put
#    them in multiple columns.
#    Only one item needs to be wide, (which can be because of trailing
#    blanks) to trigger single column menu display.
#    In this case, I used the 'Quit' item to do this with,
#    since it is basicly matched for the most part with a '*'
#    keeping the line in the case statement a reasonable length.
#    The menu items are only needed to clue the user on the
#    significance of each number chosen,
#    and to tie that to the case stanza.
#    Alternatively, you can keep the menu items in a strict order,
#    and use 'REPLY' for the case variable, in which case
#    each case stanza would be picked on the basis of the
#    number selected from the menus instead of patterns
#    that match the menu item strings.
#    In this case, ordering is critical.

#    Many of these actions are special purpose scripts I have
#    written and are included here just to provide a realistic
#    example.
#    The size is probably excessive for some people.
#    This script will certainly have to be customized to your
#    personal needs.

do

  case ${external_action} in

  #  item between 'case' and 'in' undergoes
  #  several levels of evaluation before
  #  the case statement is finally executed.

  #  Case stanza's start with
  #  string_to_match )
  #  list of actions ;
  #  break  ;   # include a break statement to break out of the
  #             # select menu loop
  #  ;;    #  double semicolons  end actions and stanza
  #

  Q* )
    echo 'Returning to browsing'  ;
    break  ;
    #  Thumb rule in this sort of script is
    #  that a break is needed whenever there
    #  is no exec statement in a case stanza.
    #  This stops the select loop after a decisive action.
    ;;

  P* )
    read -p "The URL in question: ${URL} "  TRASH  ;
    break  ;
    ;;

  USnatch )
    exec    usnatch  ${URL}  -i -p  ;
    #  break is not needed after an exec statement
    #  because this script's process, including
    #  the select loop is replaced
    #  by the action of the exec statement,
    #  ending the select loop.
    #  You could put a break after each exec,
    #  it would probably be excessively cautious,
    #  since it would only be executed under bizarre conditions.
    ;;

  lynxvt )
    #  exec    lynxvt  ${URL}  &  ;
    exec    lynxvt  ${URL}    ;
    ;;

  screened-lynx )
    screen -t 'lynx ...' lynx   \
        -useragent='Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)'  \
        ${URL}    ;
    break  ;
    ;;

  javascript-links2 )
    screen -t 'jlinks'  \
        links2 -enable-javascript 1 -html-numbered-links 1  \
        ${URL}    ;
    break  ;
    ;;

  graphical-links2 )
    sudo links2 -g -driver svgalib -mode 640x480x256   \
        -enable-javascript 1 -html-numbered-links 1    \
        ${URL}    ;
    break  ;
    ;;

  lynx-noreferer )
    exec  lynx -noreferer=off -tna ${URL}    ;
    ;;

#    start of a typical case stanza:
  lynx-nofilereferer )
    exec  lynx -nofilreferer=off -noreferer=off -tna ${URL}  ;
    ;;
# end of the typical case stanza

  Privoxy* )
    screen -t 'Privoxy Control Panel'   \
      lynx -nofilereferer=off -noreferer=off  \
      -tna 'http\://config.privoxy.org'   ;
    break  ;
    ;;

  Microbookmarker )
    exec  microBkMrk ${URL}  ;
    ;;

  Blogspotviewer )
    exec  blogspoter ${URL}  ;
    ;;

  [Ww]3[Mm] )
    exec  w3m ${URL}  ;
    ;;

  links )
    exec  links ${URL}  ;
    ;;

  elinks )
    exec  elinks ${URL}  ;
    ;;

  wget )
    exec  nohup wget  --background -a ~/wget.log -P /mnt/hda8/  ${URL}  ;
    ;;

  Bug-Me-Not )
    screen -t Bug-Me-Not   \
       lynx -cookies www.bugmenot.com/view.php?url=${URL}  ;
    break  ;
    ;;

  whomis )
    exec  whomis  ${URL}  ;
    ;;

  nspeep )
    exec  nspeep  ${URL}  ;
    ;;

  pingvolley )
    exec  pingvolley  ${URL}  ;
    ;;

  lynxtab )
    exec  lynxtab  ${URL}  ;
    ;;

  lynxtab-blank )
    exec  lynxtab    ;
    ;;

   lynx-blank )
    exec  env -u HTTPS_PROXY='' lynx -tna -accept_all_cookies    ${URL}  ;
    ;;

  frys )
    #  to send pdf's straight to a printer
    #  this script has mainly been used for Fry's Electronics
    #  online version of their newspaper ads.
    exec  lprfrys  ${URL}  ;
    ;;

  bash )
    #  this is just to explore the environment that the
    #  externals run in
    exec  bash -i  ;
    ;;

  * )
    # if something unexpected happens,
    # this catchall stanza should simply end the script.
    # '*' matches anything at all, after all other
    # patterns have been given a chance to match.
    # it is customary to include this
    # at the end of bash case statements.
    break  ;
    ;;

  esac     #  'case' backwards marks the end of the case statement

done       #  this done statement marks the end of the select menu loop

exit  ;  #  just to make sure!  :-)

Thursday, June 19, 2008

RAND, The Corporation and It's Non-shareholders

11 June 2008 I attended another ALoud talk at the Los Angeles Public Library. The topic that night, "Soldiers of Reason: The RAND Corporation and the Rise of the American Empire".

When I was in college, the idea of working for a 'think tank' was very appealing. I read some early book on the phenomenon, and naively thought of think tanks as an extension of the college dormatory all night yak session. The speaker was not totally negative about RAND, but pointed out that they had successes on some studies, failures on others. Most people don't realize that today it is not the little Dutch boy plugging the dike hole that keeps Holland from getting flooded, but a RAND Corp. study.

The speaker grabbed my attention when in the first few minutes of the talk phrases like 'Ayn Rand', 'Milton Friedman', 'Chicago School of Economics' and 'logical expectations' were brought up. The speaker criticized a lot of RAND's studies and results as being flawed by describing people as 'logical actors'. He mentioned the idea, as an example, that Corporations only have a duty to maximize profits for their shareholders.

This may have been an underlying belief at RAND, but to associate such naive ideas with Milton and the 'Chicago School' is doing them a disservice. Economist's have long been dealing with the idea of 'externalities', what the others might call side effects. Prominently, Ronald Coase of the University of Chicago developed the so called "Coase's Law" to provide a guide on this topic.

But actually, you can understand the idea by a simple observation. Maximizing the profits of the shareholders may be the most obvious goal of a corporation, but they clearly have others. There are usually more non-shareholders than shareholders, and not provoking them into a lynch mob out to destroy the corporation and it's owners is clearly something they have to keep in mind. 19 June 2008

Monday, June 9, 2008

A bash Tool

I was writing a bash script the other day, and got fed up with having to take seperate steps to handle a lot of the routine steps to make it useable. So, I piled them all together. This bundles together several ideas I been exposed to in the last week.



#!/usr/bin/env  bash

#     -  See 'man env' and the discussion involving 'env'
#        and invoking Perl
#        in "Programming Perl", by Larry Wall

#   bcmp, Bash "CoMPiler"
#   this program really just runs some housekeeping
#   chores you should do before trying to run a bash script.
#   This is partly inspired by a discussion of software
#   configuration management I read somewhere
#   (source forgotten) discussing the nightly 'build'
#   of some perl scripts, which rather than compiling them
#   they were put through regressions tests to verify they
#   were ready to run.
#   Many of the commands used in this were things I'd come
#   across lately that seemed to fit in with this idea.
#
#   7 June 2008 Dallas E. Legan II

USAGE="${0##*/}   -h | <script>"  ;

THESCRIPT=${1:?"What script file? Usage: ${USAGE}"}  ;

[ -f ${THESCRIPT} ]  || { echo ${USAGE}  ; exit ; }   ;

set -e    ;

#     - This causes bash to go into 'abort on command error' mode
#       See "Linux Journal", Feb. 2008, p. 10, "Letters"
#       "mkdir Errors Even More Trivial", Ed L. Cashin
#       http://www.linuxjournal.com/article/9957
#       This is also documented in 'man bash' and 'help set'
#       in somewhat obscure, too understated a way.
#       9 July 2008 addition:
#       This is also referenced in
#       the "bash Cookbook",
#       Carl Albing, JP Vossen & Cameron Newham
#       O'Reilly, (C) 2007
#       ISBN-10: 0-596-52678-4
#       ISBN-13: 978-0-596-52678-8
#       http://www.bashcookbook.com/
#       p. 75 - 76
#       Recipe 4.6 "Using Fewer if Statements"

sed  -i 's/ *$//'   ${THESCRIPT}  ;

#      - to strip out trailing blanks, particularly annoying
#        after '\'   'line continuations'
#        '-i' causes 'editing in place'.
#        (Addition 1 July 2008:)
#        I came across another reason for this
#        in the "bash Cookbook",
#        p. 57 - 59,
#        Recipe 3.3 "Preventing Weird Behaviour in a Here-Document"
#        'trap' on page 59 in particular.
#        Basicly, trailing spaces on here document delimiters
#        can cause a bad day.
#        --
#        Alternatively, this might be done using
#        'ed', to make it more traditional and portable.


#       For now, I'm leaving out this idea,
#       but you might want to install some commands
#       to verify that the number of
#       '('  == ')'   (outside 'case' statements)
#       '['  == ']'
#       '{'  == '}'
#       Total number of "'" and '"' are even, etc.
#       Seperate checks for these might make
#       make interpreting the results easier to figure out.

chmod  ugo+x   ${THESCRIPT}   ;

#     - simply to set the permissions to executable

bash -n  ${THESCRIPT}  ;

#     - to do a simple syntax check
#       per the "bash Cookbook",
#       p. 476 - 477
#       Recipe 19.12 "Testing bash Script Syntax"
#       also 'man bash' and 'help set',
#       where this is obscurely documented.

cp ${THESCRIPT}   ~/bin/   ;

#     - Lastly, copy the script to a directory in PATH,
#       this could be any satisfactory location.

#       You might also want to check the script into some
#       version control system at this point,
#       or runs some functional/regression tests.

echo  ${THESCRIPT}   seems ready to run    ;
#  9 July 2008 addition:
#  This is an idea Garth told me about while at
#  Rockwell - when it doesn't conflict with the
#  purpose of the program, it's good to have a
#  message telling if it succeeded or not.
#  This was in the mainframe world,
#  and this can conflict with the needs of
#  Unix programs, but a good idea when practical.
#        END OF SCRIPT

Google Panic

This post was originally mailed in to mailto:larryhsfriends@yahoogroups.com, 8 June 2008 --

Yesterday, I got a scare when I logged into GMail with Lynx. There were a bunch of check boxes, one for each message, but no link to actually view each of the messages. This put me in shock, my confidence in Google hanging in mid air just off the edge of the cliff.

I floundered around a bit to verify that I was in their basic html user interface (I was) and then looked at the actual html source code. All the links to view the messages were there, but inspection of the markup language source code showed that there were some missing closing tags with each message, the probably cause of the problem. There was no proliferation of javascript/AJAX as I had feared, it is probably just a mistake by some new CGI programmer.

I sent them some email (that still worked :-) ), explaining the problem, pointing out the exact spot and probable missing tags that belonged on the table/row of each inbox message. I think having their basic html user interface output proper html is important to Google, this is the 'Plan B' not just for text mode and other alternative browsers but the many mobile devices they want to get market penetration on.

So we'll see what happens. Dallas E. Legan II / legan@acm.org / dallas.legan@gmail.com / aw585@lafn.org http://isthereanotherquestion.blogspot.com

18 June 2008 Addenda

    Since the above was blogged, I found a couple of ways around the above problem. I should of saved a sample of the HTML when the problem first occured to verify that there haven't been any changes in the mean time.
    • For Lynx, ^V toggles between two different HTML parsers.
    • The apparent default parser is 'SortaSGML', which follows strict, formal standards, and is prone producing bad results when confronted by poorly written markup.
    • The alternate 'TagSoup' parser, which will put up with almost any violation of standards. This of course is the one that can deal with the new GMail format.
  • After some experimentation, I found that the problem seemed to be caused by some '<div>' tags. The exact purpose of these tags seemed to somewhat unclear, especially in major browsers of the time a pocket HTML reference I use was written. A privoxy edit to eliminate them solved the problem:
    s|</?div>||igsx
    
Dallas E. Legan II / legan@acm.org / dallas.legan@gmail.com / aw585@lafn.org http://isthereanotherquestion.blogspot.com

Tuesday, June 3, 2008

Full Screen Ahead

The last couple of days, Peter posted to uuasc@uuasc.org about trying to launch a program in Linux/Unix, background it, then launch screen, the virtual console manager, and have access to process control from inside screen.

After some short but highly technical explinations on why this was probably not possible, I posted an idea, and Peter responded:

From: Peter .......
To: uuasc@uuasc.org
Date: Tue, 3 Jun 2008 03:31:09 +0000 (UTC)
Subject: Re: Switching to Screen
That is a brilliant idea!

>> ...........


>An ounce of prevention is worth a pound of cure.
>A simpler approach is to launch screen in your *profile
>script, and never leave it without a clearcut reason.  :-)
>(As i discussed in my bash talk at UUASC way back when,
>or last week at IEEE-CS/Lilax).
>
>Regards, Dallas E. Legan II / legan@acm.org / aw585@lafn.org
>..........

Well, to merit such accolades, maybe I ought to fill in on what I mean by this. Typically, I launch screen in my ~/.bash_profile script with the phrase:

screen -R

Which usually does a good job of grabbing any detached screen sessions that may be floating around, and if any aren't, it launches a new one.

Now, what are some 'clearcut reasons' to leave screen?

    I've encountered three instances I can recall.
  • I accidently cat some file to standard out, it turns out it wasn't really a text file, but a binary, and it messes up the console so bad 'stty sane', 'reset' etc. fail to clear things up. Sometimes, I've found detaching screen, trying reset and such will get the tty working properly before you reattach. I seem to recall instances you might need to logout and then 'kill -HUP 1' to restart intit and be really serious about this sort of thing.
  • I'm running some sort of remote console connection with screen at the opposite end. I like to use ssh inside kermit for this sort of thing, and if you are running screen at both ends you have to escape your screen escape sequences to the other end of the remote session. I find it simpler to login locally on an inittab controlled vitual console, not runing screen, and that way I can just skip the escape key to send screen key sequences to the remote end of the session.
  • Mouse responsive console apps like Lynx haven't responded to the rodent in the past for me when run in screen. I recall doing some surfing in the past to investigate this and finding some posting about how screen doesn't handle mouse keys and isn't able to pass them on to the application. You can run Lynx in another inittab controlled console, or like I do, I have a short Lynx EXTERN script to use startvt to open another console for the text mode browser if I want to use the mouse. There are probably other apps with this problem. I should note that gpm is not one of them, it's basic cut and past operations seem impervious to it.

Until I understood the first two cases above, I used to be much more hardcore about this. I'd actually end my .bash_profile script with a line like:

exec screen -R

As anyone familiar with what the shell/API call exec does knows, this actually replaces the login shell entirely with screen, screen simply dropping in place of bash and taking over the default I/O. The problem with this is that if you want to run something else like kermit and/or ssh, or detach screen to do something like run reset, then reattach you are out of luck. Ending screen simply logs you out. Another aspect of this, my experience seemed to indicate, was that you need to put all the real work you want done besides launching screen in your .bashrc, non-login initiallization script.

But Wait, There's More

That's not as hardcore as you could get. In the /etc/login.defs file there is a section:


#
# Instead of the real user shell, the program specified by this parameter
# will be launched, although its visible name (argv[0]) will be the shell's.
# The program may do whatever it wants (logging, additional authentification,
# banner, ...) before running the actual shell.
#
# FAKE_SHELL /bin/fakeshell

Yes, you read that correctly. I think a main purpose of this feature is insert programs like 'script' for auditing or simple minded honeypot applications (and I've heard of intruder's being trapped by similar methods). However, I have successfully put in:

FAKE_SHELL /usr/bin/screen

commented out the screen in ~/.bash_profile, and the program in question did run before bash ran it's profile script. Capabilities like this and those of the pam_env module start to override the shell startup scripts considerably.

An alternative plan might be to add /usr/bin/screen to /etc/shells, add a line

shell /bin/bash

into /etc/screenrc or ~/.screenrc, and replace your default shell in /etc/passwd with /usr/bin/screen. I have to confess to not experimenting with this to any serious extent.

Hopefully, you've gotten something out of this diatribe. It's mostly pulled from my bash and password talks (with maybe a bit from an improvised section of the 'Greatest Hits' talk) given initially at UUASC meetings. I felt like this was an opportunity to consolidate some of this information.

Sunday, June 1, 2008

Fresh From a Stream

At tonight's San Fernando Valley LUG meeting, our friend Larry was at it again, trying to access media streams from http://www.freetv.com from the text console, preferably from the Lynx browser. After a lot of floundering around, we managed to play a few of the URLs given in descriptive text (not expressed in HTML links) with mplayer.

After arriving home, here are some things I found to explore the site with.

Install Privoxy, and add the following to your /etc/privoxy/user.action file(or perhaps the /etc/privoxy/standard.action if the release of Privoxy you are using doesn't have the user.action file):


{ +filter{freetv} }
.freetv.com

Then add the following to /etc/privoxy/user.filter (or /etc/privoxy/standard.filter if your Privoxy doesn't use a user.filter):


#################################################################################
#
# freetv: Try to convert certain URLs floating in space to actual links
#               05/31/2008 d.e.l.
#
#################################################################################
FILTER: freetv Try to convert certain URLs to automaticly launch jlinks

s| (mmst?://[^ <>]*) |<A href="$1">$1</A>|igsx


Restart privoxy, typically with

sudo /etc/init.d/privoxy restart

Then, install this script, I called it mmStreamer somewhere in a directory in your PATH variable:


#!/bin/bash


URL="${1}"    ;


TURL=${URL#*://}      ;
TURL=${TURL%%[/?]*}   ;
echo 'Domain: '  ${TURL}          ;

ping  -qc 3  ${TURL}    ;
DOMAIN_TEST_RESULT=$?  ;

if [ ! ${DOMAIN_TEST_RESULT} ]
then
  echo "The domain ${TURL} seems bogus.  Hit return to continue."  ;
  read  BOGUS  ;
  exit  ;
fi

sudo mplayer -vf-clr "${URL}"    ;



sudo mplayer -vf-clr -playlist "${URL}"   ;

exit  ;

You may need to modify the lines with mplayer to match how you normally invoke mplayer, just make sure one has the -playlist parameter so one or the other may run. And of course, make sure to adjust permissions with chmod ugo+x mmStreamer. There are probably many improvements that can be made to this script, but it is partly a proof of concept script, partly something to get rolling with some quick results.

Then to your lynx.cfg file, add to the EXTERNALs section add:

EXTERNAL:mms:mmsStreamer %s:TRUE:TRUE

The last 'TRUE' will cause mmsStreamer to be run automaticly when you activeate a link with type mms.

Now, the next time you go to http://www.freetv.com, you should be able to play some of the mms stream URLs, that will now be actual links. Some of them seemed to be mere ads, others seemed to have bonafide content. A few might be missed, I noticed one with the typo of two colons (mms:://) and a Google search of the site turned up couple of http streams. These seem to be the exceptions that will have to still be handled by hand. A lot seemed no longer there, but perhaps a better choice of mplayer switches and settings might tap into them.

    A few things turned up browsing the site.
  • One is that I was reminded of the http://www.la36.org site, with a lot of Los Angeles local content. I didn't have the time to investigate if it is being updated, but I did notice a few videos of past L.A. Public Library / Library Foundation ALoud talks. A few of these are of technical interest, such Craig "List" Newmark, and one on the Google Book project.
  • I probably should ad some capability to deal with http://www.la36.org to usnatch
  • I was spurred to search Wikipedia for http://www.freetv.com and it turned up http://en.wikipedia.org/wiki/Public_access_stations_%28United_States%29. Perhaps some creative search will turn up a few other pages to explore.
  • http://www.drgenescott.com didn't seem to be active, but it did remind me of the many times I'd passed him the TV dial in days past, and the many discussions of this archtypal L.A. CA personality.

Goggling Google

This is an extract from what was originally posted on the larryhsfriends@yahoogroups.com mailing list Tuesday 27 May 2008. It's posted here on request of a mutual friend, Charles.

I've deliberately avoided replacing the extravigant URLs with Tinyurl.com abbreviations because these URLs would normally be used only once and to illustrate all the dirty work involved in this branch of screen scraping and accessibility issues.

My blind friend Larry was trying to access a video (to listen to, smart alec), located by going to http://video.google.com,

  • clicking the radio button to limit the search to Google hosted videos,
  • searching on the term 'debate'.
  • Lynx numbered link #69, was an MSNBC debate from October 31, 2007, the hosting page being at
  • http://video.google.com/videoplay?docid=8023519711099005229&q=debate&ei=IYM7SOuXEKDk4AL5473bAw

Dallas:

You are correct that there was no link for a .mp4 on that particular page / video, so they don't currently all have that option. I have to agree with that.

However (grin) by a process too complicated to explain right now, I was able to get a bloated, rather complicated URL for the video in question.

Try this, you'll have to cut and paste it, making sure the whole thing, 4 lines, is on one line, quoted to make sure parts aren't passed into background as bogus batch jobs:

http://vp.video.google.com/videodownload?version=0&secureurl=QwAAAD0W5d-wPcVVeTl7QrGr5zTKridzSGf41ASu20PecohoZi2sOgFpCHjq8L-P4O1pMCjFMcitHUDMkPHMpztjyF2Gfx5zzGZQK-3bM6BN4oWB&sigh=NrWxm3ARGWrIAA7a4p9rRb4PcMs&begin=0&len=6395300&docid=8023519711099005229

Since I stepped through this once, in theory it can be automated. I've decided to try and recreate it to record this for posterity. Looking at the page you cited I noted the suggested, simplified URL for embedding at the bottom of the page in some suggested html:

http://video.google.com/googleplayer.swf?docid=8023519711099005229

I then fed this into gnash with this command:

$ gnash -vr 2 'http://video.google.com/googleplayer.swf?docid=8023519711099005229'

-v is for verbose

-r 2 is to play sound only, no video.

This cranked out about a screen of hard to understand very technical messages, and eventually stalled out, but before that it spit out a message:

17430] 22:13:13: SECURITY: Loading XML file from url: 'http://video.google.com/videofeed?fgvns=1&fai=1&docid=8023519711099005229&hl=undefined'

before I lost patience and hit control c. This seems to have the same hash string, all numeric at the original URL, so everthing up to here could be arrived at in a shortcut manner knowing what is crucial in the original URL and the form of this final url. That is to say, gnash is not essential to get to here!

I then dumped that URL with the command:

$ lynx -source 'http://video.google.com/videofeed?fgvns=1&fai=1&docid=8023519711099005229&hl=undefined'

and studying the output I noticed a string:

vidurl=http%3A%2F%2Fvideo.google.com%2Fvideoplay%3Fdocid%3D8023519711099005229%26hl%3Den&usg=AL29H2354Bu9OKKGOkt8CkFi7UeioVIIgQ"

and then fed that into a tool I have 'dex' (de-hex encode) to convert the % escaped hex encoded characters back into straight characters:

$ dex <<< 'http%3A%2F%2Fvideo.google.com%2Fvideoplay%3Fdocid%3D8023519711099005229%26hl%3Den&usg=AL29H2354Bu9OKKGOkt8CkFi7UeioVIIgQ'

and it output:

http://video.google.com/videoplay?docid=8023519711099005229&hl=en&usg=AL29H2354Bu9OKKGOkt8CkFi7UeioVIIgQ

I then did another:

$ lynx -source

'http://video.google.com/videoplay?docid=8023519711099005229&hl=en&usg=AL29H2354Bu9OKKGOkt8CkFi7UeioVIIgQ'

and saw in the output of it the string:

videoUrl\x3dhttp://vp.video.google.com/videodownload%3Fversion%3D0%26secureurl%3DQwAAAD0W5d-wPcVVeTl7QrGr5zTKridzSGf41ASu20PecohoZi2sOgFpCHjq8L-P4O1pMCjFMcitHUDMkPHMpztjyF2Gfx5zzGZQK-3bM6BN4oWB%26sigh%3DNrWxm3ARGWrIAA7a4p9rRb4PcMs%26begin%3D0%26len%3D6395300%26docid%3D8023519711099005229\x26

and feeding the string between the \x escaped hex numbers into dex again:

$ dex <<<

'http://vp.video.google.com/videodownload%3Fversion%3D0%26secureurl%3DQwAAAD0W5d-wPcVVeTl7QrGr5zTKridzSGf41ASu20PecohoZi2sOgFpCHjq8L-P4O1pMCjFMcitHUDMkPHMpztjyF2Gfx5zzGZQK-3bM6BN4oWB%26sigh%3DNrWxm3ARGWrIAA7a4p9rRb4PcMs%26begin%3D0%26len%3D6395300%26docid%3D8023519711099005229'

got the final URL:

http://vp.video.google.com/videodownload?version=0&secureurl=QwAAAD0W5d-wPcVVeTl7QrGr5zTKridzSGf41ASu20PecohoZi2sOgFpCHjq8L-P4O1pMCjFMcitHUDMkPHMpztjyF2Gfx5zzGZQK-3bM6BN4oWB&sigh=NrWxm3ARGWrIAA7a4p9rRb4PcMs&begin=0&len=6395300&docid=8023519711099005229

This is rediculously complicated, and obviously needs to be automated (as it usually is via javascript! :-) ) But it produced working result and is describable.

(after a trip to Food for Less)

However there is an easier way to do this. I just fed the original URL into usnatch:

usnatch 'http://video.google.com/videoplay?docid=8023519711099005229&q=debate&ei=IYM7SOuXEKDk4AL5473bAw' -u

and it output this url, probably by way of scraping it from KeepVid.com:

http://vp.video.google.com/videodownload?version=0&secureurl=twAAAD0W5d-wPcVVeTl7QrGr5zTKridzSGf41ASu20PecohoD0a7UlL0hOryryfecm0kR0Az1TAZjqmcK4Jhzww767-M--b5VXs0aG2FyEksUHG7jZMWLv12yp10ahgVqupjVDS1ehay8IuXr_K5CJYVeSwkYqKv5owxTDiGz7X7xKbrgQVNx-7ue4RTDjur5LWmoqryoLSCkqAgx6UteEa8LIwTCBSDJhB3jzak8cwIcF70G2Np9NfuVtZ8OmJwuiMRsg

which played when I used it with mplayer, making sure to enclose the url in single quotes after pasting into the command. So it could of been played from Lynx and I'm downloading the file from Google right now. The trick is to back up from the page where they want you to view it, and be on top of the video.google.com search results link to run usnatch. Alternatively, from that page you could invoke usnatch from the original url on the original url by using the comma key instead of the period key to invoke the usnatch external program. Period calls externals for the currently active link, comma calls externals for the current page.


2008 June 1 Afterward

...And of course the ultimate goal of the long exercise is to include the algorithm described in usnatch, with the idea of making it less dependent on scraping information from sites like http://KeepVid.com.