Saturday, August 2, 2008

5) dump to printer

Yet another option for looking at PDF files is the use of a special purpose viewing device refered to as a 'printer'. For those of you who may be unclear on the concept, 'printer's use special chemicals to permenantly lay down a copy of the image that would normally be seen on the screen onto a thin sheet of organic celluloid material ("paper"). Bulky and cantankerous at best, the quaint devices are capable of handling not only PDF files, but with appropriate device drivers, such data formats as text, jpeg, png, gif etc.

Below is a script for sending a PDF to a 'printer'. It is set to scale the file to maximize it's use of the paper, but could easily be modified to throw up a menu of scaling factors, typical might be 25%, 33%, 50%, 66%, 75% etc.


#!/bin/bash  -


# ############## security pack ##############################

PATH='/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/usr/local/sbin:/usr/local/bin:/usr/games:~/bin'  ;


hash -r  ;
#  -  Bash Cookbook, 1st ed.,, #14.5

ulimit  -H -c 0 --  ;
#  -  Bash Cookbook, 1st ed.,, #14.6

IFS=$' \t\n'  ;
#  -  Bash Cookbook, 1st ed.,, #14.7

UMASK='002'  ;
umask  $UMASK  ;
#  -  Bash Cookbook, 1st ed.,, #14.8

\unalias -a
#  -  Bash Cookbook, 1st ed.,, #14.4

# ############## security pack ##############################

USAGE="$0  -h | file.pdf | file.ps  "  ;

VERSION='$Id'  ;

set -e  ;
shopt -s  nocasematch    expand_aliases  ;

# alias RM='/bin/rm  -f   2> /dev/null '  ;
# alias RMDIR='/bin/rmdir 2> /dev/null '  ;
# alias MKDIR='/bin/mkdir 2> /dev/null '  ;
# alias EZGV='exec   /usr/bin/zgv'  ;
#  alias ZGV='/usr/bin/zgv'  ;
alias LPR='/usr/bin/lpr -o scaling=100%  '  ;


THEPDFFILE=${1}  ;

case  ${THEPDFFILE}  in
[-/][h?]* )
  echo 'Usage: '${USAGE}  ;
  exit  ;
  ;;
[-/]v* )
  echo ${VERSION}  ;
  exit  ;
  ;;
*.ps )
  THETTAG=${THEPDFFILE%.ps}  ;
  ;;
*.pdf )
  THETAG=${THEPDFFILE%.pdf}  ;
  ;;
* )
  exit  ;
esac

if [ ! -f ${THEPDFFILE} ]  ; then
   echo 'File needed'  ;
   echo ${USAGE}  ;
   exit  ;
fi

LPR  ${THEPDFFILE}  ;



exit  ;

Friday, August 1, 2008

4) individual embedded images

The following script extracts each individual image embedded in a PDF document and displays them. As documented in the script, it is based on a tool I recently became aware of from a "Tech Tip" in Linux Journal. The tool, pdfimages, extracts the individual images from the PDF document, which are then viewed using zgv. I call this script 'pdfimagesview'.


#!/bin/bash  -


# ############## security pack ##############################

PATH='/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/usr/local/sbin:/usr/local/bin:/usr/games:~/bin'  ;


hash -r  ;
#  -  Bash Cookbook, 1st ed.,, #14.5

ulimit  -H -c 0 --  ;
#  -  Bash Cookbook, 1st ed.,, #14.6

IFS=$' \t\n'  ;
#  -  Bash Cookbook, 1st ed.,, #14.7

UMASK='002'  ;
umask  $UMASK  ;
#  -  Bash Cookbook, 1st ed.,, #14.8

\unalias -a
#  -  Bash Cookbook, 1st ed.,, #14.4

# ############## security pack ##############################

USAGE="$0  -h | file.pdf | file.ps  [ startpage# [ endpage# ] ]"  ;

VERSION='$Id: pdfimagesview,v 1.1 2008/07/08 09:20:30 dallas Exp dallas $'  ;

set -e  ;
shopt -s  nocasematch    expand_aliases  ;

alias RM='/bin/rm  -f   2> /dev/null '  ;
alias RMDIR='/bin/rmdir 2> /dev/null '  ;
alias MKDIR='/bin/mkdir 2> /dev/null '  ;
alias PDFIMAGES='/usr/bin/pdfimages'  ;
alias EZGV='exec   /usr/bin/zgv'  ;
#  alias ZGV='/usr/bin/zgv'  ;

#  pdfimages is from the poppler-utils package,
#  referenced in 'Linux Journal' May, 2008, p. 83, Tech Tip
#  'Extract Images from PDF Files', Matthew Martin.
#  It extracts the images from a pdf file


THEPDFFILE=${1}  ;

case  ${THEPDFFILE}  in
[-/][h?]* )
  echo 'Usage: '${USAGE}  ;
  exit  ;
  ;;
[-/]v* )
  echo ${VERSION}  ;
  exit  ;
  ;;
*.ps )
  THETTAG=${THEPDFFILE%.ps}  ;
  ;;
*.pdf )
  THETAG=${THEPDFFILE%.pdf}  ;
  ;;
* )
  exit  ;
esac

if [ ! -f ${THEPDFFILE} ]  ; then
   echo 'File needed'  ;
   echo ${USAGE}  ;
   exit  ;
fi

if [[  ${THETAG} == */* ]] ; then
  THETAG=${THETAG##*/}  ;
  fi  ;

THEDIR1='/tmp/pdfimages'  ;
THEDIR2="${THEDIR1}/${THETAG}"  ;
THEFILES="${THEDIR2}/${THETAG}"  ;

for ii in ${!THEDIR*}
do
if [ ! -d ${!ii}  ] ; then
  RM      ${!ii}  ;
  MKDIR   ${!ii}  ;
fi  ;
done  ;

RM  ${THEDIR2}/*  ;

PDFIMAGES -f ${2:-1} -l ${3:-200}  ${THEPDFFILE} ${THEFILES}  ;

EZGV  --visual ${THEDIR2}/   ;



exit

Tuesday, July 15, 2008

3) One image/page

This is the first method I recall using to look at .pdf files from the command prompt. Basicly, it simply uses Ghostscipt to convert a page into a graphic, and then view it with zgv, an SVGAlib based image viewing tool. When I rethought this through a few days ago, and set it up as the current script below, I modified it to handle more than one page, and in the process, studied capabilities of zgv I was only vaguely aware of.

    Basicly, zgv has two modes of operation
  • Browser Mode, capable of navigating and operating within the file system
  • Viewer Mode
      The viewer mode has two self explanitory subdivisions
    • Single image mode
    • Slide show mode

    Some of the key underdocumented commands. These are just the ones I think most important not in the help mode.
  • ? - context sensitive (but incomplete) help mode
  • u - create thumbnails for use in browse mode
  • n/t - un/tag image files
  • N/T - un/tag all image files
  • ^I (TAB) - Kick into slide show of all tagged files
  • Return - Go from browse mode to viewing mode for an individual image
  • Escape - Go from viewer mode back to browse mode
  • Alt-f - show number of image files and tagged image files in the current directory
  • : - show details about image file cursor is on

zgv may not be the GIMP or Photoshop, but it has much more capability than to simply throw an image on a console screen. I call the script below pdfgsview.


#!/bin/bash  -


# ############## security pack ##############################

PATH='/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/usr/local/sbin:/usr/local/bin:/usr/games:~/bin'  ;


hash -r  ;
#  -  Bash Cookbook, 1st ed.,, #14.5

ulimit  -H -c 0 --  ;
#  -  Bash Cookbook, 1st ed.,, #14.6

IFS=$' \t\n'  ;
#  -  Bash Cookbook, 1st ed.,, #14.7

UMASK='002'  ;
umask  $UMASK  ;
#  -  Bash Cookbook, 1st ed.,, #14.8

\unalias -a
#  -  Bash Cookbook, 1st ed.,, #14.4

# ############## security pack ##############################

set -e  ;
shopt -s  nocasematch    expand_aliases  ;

alias RM='/bin/rm -f  2> /dev/null '  ;
alias MKDIR='/bin/mkdir  2> /dev/null '  ;
alias LS='/bin/ls -A  2> /dev/null '  ;
alias WC='/usr/bin/wc -l  2> /dev/null '  ;
alias GS='/usr/bin/gs'  ;
alias EZGV='exec   /usr/bin/zgv'  ;

USAGE="$0  -h | file.pdf | file.ps"  ;

VERSION='$Id: pdfgsview,v 1.3 2008/07/08 11:22:38 dallas Exp dallas $'  ;

THEPDFFILE=${1}  ;

case  ${THEPDFFILE}  in
[-/][h?]* )
  echo ${USAGE}  ;
  exit  ;
  ;;
[-/]v* )
  echo ${VERSION}  ;
  exit  ;
  ;;
*.ps )
  THETAG=${THEPDFFILE%.ps}  ;
  ;;
*.pdf )
  THETAG=${THEPDFFILE%.pdf}  ;
  ;;
* )
  exit  ;
esac


if [ ! -f ${THEPDFFILE} ]  ; then
   echo 'File needed'  ;
   echo ${USAGE}  ;
   exit  ;
fi

if [[  ${THETAG} == */* ]] ; then
  THETAG=${THETAG##*/}  ;
  fi  ;

THEDIR1='/tmp/pdfgsimages'  ;
THEDIR2="${THEDIR1}/${THETAG}"  ;
THEFILES="${THEDIR2}/${THETAG}.%03d.png"  ;

for ii in ${!THEDIR*}
do
if [ ! -d ${!ii}  ] ; then
  RM      ${!ii}  ;
  MKDIR   ${!ii}  ;
fi  ;
done  ;

RM  ${THEDIR2}/*  ;

#  THETEMP="/tmp/${THETEMP##*/}"  ;
RM     ${THEDIR2}/*.png  ;

GS -SDEVICE=png16m -sOutputFile=${THEFILES} -  ${THEPDFFILE}    <<QUIT
QUIT

count=$( LS ${THEDIR2} | WC )  ;

case ${count}  in

1 )
  EZGV    ${THEDIR2}/*.png  ;
  ;;

* )
  EZGV     ${THEDIR2}/  ;
  ;;

esac  ;

exit  ;

The 'security pack' mentioned above are some suggested actions to improve script security mentioned in the O'Reilly 'bash Cookbook'. There are others in the book, but these I judged 'no-brainers', in that they can simply be included and don't seem to even require understanding or any modification to use.

Sunday, July 13, 2008

2) As text

One approach to viewing .pdf files is presented in the O'Reilly book http://oreilly.com/catalog/9780596009113/index.html|"Linux_Desktop_Hacks". In the first edition, Hack #53, p. 170, "Display PDF Documents in a Terminal", gives a script that uses pdftohtml and elinks to view the text in pdf documents. Below is my variation on their viewpdf, I call it pdfview.


#!/bin/bash  -


#  per ORA  Linux Desktop Hacks  #53
#  with the following line added to '/etc/mailcap':
#  application/pdf; /usr/local/bin/pdfview '%s'; needsterminal; description=Portable Document Format; nametemplate=%s.pdf
# $Id: pdfview,v 1.5 2008/07/14 02:19:18 root Exp root $


#  pdftohtml -q -noframes -stdout ${1} | elinks -force-html
# does not work: pdftohtml -q -stdout ${1} | elinks -force-html

ABSOLUTE=${1:0:1}  ;

case ${ABSOLUTE} in
/ )
  PREFIX=''  ;
  ;;
* )
  PREFIX="${PWD}/"  ;
  ;;
esac

WORKFILE="$(mktemp -p /tmp XXXXXXXX).html"  ;
while [ -a ${WORKFILE} ]
do
  WORKFILE="$(mktemp -p /tmp XXXXXXXX).html"  ;
done

#     -- apparently pdftohtml requires the file end in '.html'

SOURCEFILE="${PREFIX}${1}"  ;

cd /tmp

pdftohtml  -q -nodrm  "${SOURCEFILE}"   ${WORKFILE##*/}  ;

exec elinks -force-html   ${WORKFILE}  ;

Saturday, July 12, 2008

1) General info

Doing a 'man -k pdf' generated a list of pdf related data for my Debian system. One that stood out was pdfinfo.

/usr/bin/pdfinfo -meta -box  <filename>

Seems to generate most of the info you'd want about a pdf, but one exception I noticed was nothing on the number and types of embedded images. It also seemed to need some blank lines thrown out to keep all the information together.

Thursday, July 10, 2008

Window Beyond a Plateau, Window Into Windows

I recently had reason to use VNC, Virtual Network Computing and then while using GRML, noticed something, a mention of a VNC client, that aroused my curiosity, and made me decide to do some googling on the subject. A search of the Debian packages turned up a couple of 'console' based VNC clients that did not require X. These obviously would need some kind of graphical interface, but in the case of svncviewer and directvnc, these were SVGAlib and framebuffers respectively.

Not using a framebuffer on my PC, I installed svncviewer and tried using it. At first it didn't seem to work, but studying the error messages it gave, there was some hint that the graphic modes available for it weren't in line with what the VNC server needed. I was puzzled at first, trying various parameters suggested by the svncviewer man pages, but then remembered that there was a configuration file for SVGAlib, /etc/vga/libvga.config in the case of my Debian computer. I edited this, loosening up everything I could as far as the video went to allow the maximum resolutions suggested by the comments in the file.

This worked great, svncviewer began working, connecting to both Linux X desktops and Windows boxes. The act of editing this configuration file jogged my memory that when I'd originally started using SVGAlib on the computer I was using a monitor already long obsolete when I took possesion of it, and had finally gave out and had been replaced with something more reasonably up to date. In other words, updating the SVGAlib settings was long overdue.

The most obvious initial problem was that the mouse seemed impossibly difficult to work with on the virtual console used by svncviewer. With adjusting the SVGAlib video settings so fresh in mind, I tinkered with the mouse settings in the config file. Part of the problem was that I'd try to move the mouse slightly and it would jump to the other side of the screen or do other crazy things, only occasionally doing what was desired. This all apparently fell under the description of 'accleration' from what I could figure out from the configuration file notes. Without going into specifics, which will undoubtedly differ from case to case, I found I could adjust the trip point and amount of 'accleration' to make the mouse move smoothly, problem solved. These settings solved problems with applications other than svncviewer, notably the links2 browser. Now the mouse was more usable with it as well, and the higher resolution put more of a web page on the screen at one time.

As a test for using this in the real world, there was a MS Window machine I needed to start using, but real world phyisical deskspace is at a premium. I managed to get the machine connected, reasonably close to my personal desktop workstation, and then get a VNC server installed on it. VNC doesn't seem to do anything to link up the audio I/O, only video, keyboard and mouse/pointer, so I connected it to my workstation with some 99 Cents only audio patch cords in addition to putting on the Ethernet LAN. (Anyone know of any 'virtual patchcord' application that could replace these?) On boot up, the VNC connection seemed to start early enough to catch request for a 'Ctl-Alt-Del' entry to get to the graphical login prompt. This I had to reach over and enter from the Windows machine keyboard, but everthing else has seemed to work one way or the other from the svncviewer. Some online documents hinted at possible problems for svncviewer connecting to Windows boxes due to some palete problem, but that doesn't seem to happen with the version I have installed. Using everything involved, I've been able to use the Skype client on the Windows machine to place and recieve calls, and fire up FireFox on it, and set up/handle an account for one of my bills online.

There was some mention online of a bootable Linux floppy disk with svncviewer on it, for an ultra-light X desktop, and I've conducted some experiments with setting svncviewer up on a console in /etc/inittab. (This last might require some addition to the autologin script, putting some sleep delay to prevent excessive respawning from inittab triggering a temporary lock on the virtual console if the server it's pointed at isn't immediately available, but seems like a problem that can be dealt with.) At the very worst, svncviewer provides a remote monitor, with occasional need to get the actual keyboard of the machine you connect to. So far I haven't found F8 or any other keyboard combinations to get to any menus or whatever, as was suggested for other VNC viewers, but this can be dealt with. But basicly, it works. It has me thinking about setting up an X/GUI server.

Friday, July 4, 2008

A Summer Day's Thoughts on Prague Spring

Being the Fourth of July, it seems particulary appropriate to mention a few updates on my earlier post, http://isthereanotherquestion.blogspot.com/2007/12/streetnoise.html.

A few YouTube videos I'd linked to have been removed, no great loss, since there were several to choose from anyway. But...

A couple of more videos have been put up for music off Streetnoise, one hardly qualifying as a "video", but the other, is quite different. The creator put some effort into matching up some news footage of the time with "Czechoslovakia", documenting a 'loss of independence day'. I thought this was important enough to call attention to it in this seperate blog notice, to call attention for a few minutes today to the memory of Alexander Dubcek. He might not be another Milton Friedman or Ludwig von Mises, but he figured out a few decades before Mikhail Gorbachev that 'the system' was not working. For this he belongs in the international pantheon of freedom lovers, along with Jefferson, Adams, Franklin etc.

Wednesday, July 2, 2008

Windows and Cups

These are some notes I came up with a few years ago, for connecting Windows machines to CUPS printer servers. This may have been on my web site or somewhere at one time, and I just wanted to make it available for reference.




Install the CUPS daemon on the printer server computer.
Configure the printer via the web interface at

http://localhost:631/

This is not covered by this document but the web
interface is fairly straight forward.

On the CUPS printer server
*************************

a) On the CUPS printer server
   make some changes to the CUPS config per
   http://www.faqs.org/docs/Linux-mini/Debian-and-Windows-Shared-Printing.html
   on the CUPS server.

   1) in /etc/cups/mime.convs
      uncomment:
      application/octet-stream   application/vnd.cups-raw   0   -

   2) in /etc/cups/mime.types
      uncomment:
      application/octet-stream

   3) In /etc/cups/cupsd.conf per
      http://localhost:631/documentation.html
      This is using the suggestions in the Linux Cookbook,
      by Carla Shroder, ORA, p. 247
      and somewhat more restrictive than the suggestions
      by the mini-Howto referenced above, and needed for
      all clients anyway, whatever the OS the clients run:

      LogLevel info

      Port 631

      Put:

      <Location /printers>
      Order Deny,Allow
      AuthType None
      Deny From All
      Allow From 127.0.0.1
      Allow From 192.168.1.*
      #    -- or whatever is appropriate for the LAN
      </Location>


At the windows end:
*********************

b) get the ethernet cable to it  :-)

c) make sure the Internet Printing Services is installed
   (from Linux Cookbook, by Carla Shroder, ORA p. 249
   This will require the Windows installation CD
   For Windows 95/98 this can be downloaded from
   http://www.microsoft.com/windows98/downloads
   look for "Internet Print Services", wpnpins.exe

   Windows ME is in the Add-on folder on the installation CD

   Windows NT, go to Control Panel -> Network -> Services tab
   -> Add Microsoft TCP/IP Printing

   Windows 2000/XP go to Network and Dial-up Connections ->
   Advanced Menu -> Optional Networking Components ->
   Other Network File and Print Services

   If it acts squirrely when you try to install the driver,
   it may be because the driver is already installed,
   either by default on OS installation/upgrade or perhaps
   someone else took care of it.

d) go to printer/fax setup on the windows machine,
   add a new printer, select network printer, enter the URL
   for example:

   http://192.168.1.1:631/printers/theprintername

   or if DNS resolution is available to the print server

   http://www.somewhere.com:631/printers/theprintername

   and finish up.

   Breakdown of this URL:


         IP address or name of the CUPS printer server
         as needed/appropriate.

         '631' is the port of IPP and http interface to the
         CUPS server on typical CUPS installation

         '/printers/'  the directory the printers appear
         to be in, probably defined by the <Location...>
         statement mentioned above in a.3 .
         This seems to be a typical setting for this printer.

         'theprintername' points it the printer on that server
         a fairly arbitrary name defined in the CUPS configuration
         of the printer (not covered in this article,
         but just handled at the web interface in
         http://localhost:631/)

         I saw this stuff in a comment at:
         http://www.linuxjournal.com/article/8618

d) select the printer model (HP LaserJet 6P in this case)
   and things are ready to rock&roll....

Thursday, June 26, 2008

Lynx HTTP EXTERNAL Menu Script

One thing keeping my friend Larry from getting the most out of my usnatch program and Lynx externals is awkwardness he has in dealing with the EXTERNAL selection menus. He is blind and this coupled with many of the long, complex URLs, (in many cases having stretches of random hash characters) that are passed to the Lynx EXTERNAL menu for special handling make it hard for him to seperate out the choices the menu presents. He literally can't wade through the URLs to get to the menu selection options. This is different from many of the other Lynx menus, where simple brief descriptions of the choices are given, without interjecting the names of temporary files and such into the options to confuse the issues.

To try and deal with this problem, I present a bash script below to replace a collection of Lynx EXTERNAL entries for http with a single entry, that presents a simplified menu. This whole idea is a variation on what I documented in my article Lynx/Kermit Coordination Part I hosted by the Kermit Center at Columbia Univesity. This script will have to be customized, and a seperate one would have to be created for other protocols such as ftp, ssh, irc and such. I've put extensive notes in this script, since I think it makes explination simpler if the notes are close to the subject instead of before or after in this blog's narrative. Also, the people using it might not be familiar with bash, so I try to explain as much as I can, so they can knowledgably change it to their needs.

To install this script, first make sure your version works from the command promt. Then comment out all the

EXTERNAL:http:....

statements in your lynx.cfg file, both any in your home directory, and those in /etc/ and it's subdirectories, adjust the script to include those possibilities you want in it, and put a single EXTERNAL statement:

EXTERNAL:http:extern.menu %s:TRUE

Where extern.menu is taken to be the name of the menu script.

I've put a copy of this script online at a Yahoo group for Larry's friends, http://groups.yahoo.com/group/larryhsfriends/


#! /usr/bin/env bash
#  #!/usr/bin/bash  -

#  '#! /', env per
#  'bash Cookbook', 1st edition, recipe #15.1, p. 321
#  "Finding bash Portably for #!"
#  http://www.bashcookbook.com
#
#  trailing '-' per
#  'bash Cookbook', 1st edition, recipe #14.2, p. 283
#  "Avoiding Interpreter Spoofing"

#  the first line convention points to the intepretor to be used.

#  Some bash conventions for beginners:
#
#  hash/pound/sharp/octhorps to the end of line are comments.
#
#  Lines ending exactly with a backslash, '\', are continued
#  with the next line.
#  Trailing whitespace after the backslash can cause errors,
#  so beware of it.
#
#  I urge you to look up aspects of this script,
#  either with man bash, or if you are running bash
#  at your command prompt, with the help command
#  such as 'help case' or 'help select'.

URL="${1}"  ;

PS3='What EXTERNAL action do you want? '
#         -- defaults to '#?'
#            trailing blank desirable for spacing of response
#            from prompt
#            PS3 is the Select Menu Loop prompt in bash

#  Note 'for' statement like syntax of select:

select external_action in       \
      'Quit externals                                              '          \
      'Print the URL'           \
      'USnatch'                 \
      'lynxvt'                  \
      'screened-lynx'           \
      'javascript-links2'       \
      'graphical-links2'        \
      'W3M'                     \
      'links'                   \
      'elinks'                  \
      'Blogspotviewer'          \
      'lynx-noreferer'          \
      'lynx-nofilereferer'      \
      'Privoxy Control Panel'   \
      'Microbookmarker'         \
      'wget'                    \
      'Bug-Me-Not'              \
      'whomis'                  \
      'nspeep'                  \
      'pingvolley'              \
      'lynxtab'                 \
      'lynxtab-blank'           \
      'lynx-blank'              \
      'frys'                    \
      'bash'

#    Note: the last select menu item should not and a trailing
#    '\' to continue on to the next line.
#    I've placed each menu item or quoted phrase on a seperate
#    continuation line.
#    At this point, it is normally desirable to have one menu item
#    for each of the case statement stanza's below.
#    They don't need to be in order.
#    There just has to be a menu prompt item
#    and a case "whatever )" that match
#    so the case stanza can be triggered.
#    Be careful in using '*' 'splat' patterns,
#    that you do not unintentionally match more than
#    the desired pattern by mistake.
#    This could accidently short circuit desired action,
#    keeping a case stanze from being taken when wanted,
#    and causing another to be taken instead.
#    If all the menu items are short, select will try to put
#    them in multiple columns.
#    Only one item needs to be wide, (which can be because of trailing
#    blanks) to trigger single column menu display.
#    In this case, I used the 'Quit' item to do this with,
#    since it is basicly matched for the most part with a '*'
#    keeping the line in the case statement a reasonable length.
#    The menu items are only needed to clue the user on the
#    significance of each number chosen,
#    and to tie that to the case stanza.
#    Alternatively, you can keep the menu items in a strict order,
#    and use 'REPLY' for the case variable, in which case
#    each case stanza would be picked on the basis of the
#    number selected from the menus instead of patterns
#    that match the menu item strings.
#    In this case, ordering is critical.

#    Many of these actions are special purpose scripts I have
#    written and are included here just to provide a realistic
#    example.
#    The size is probably excessive for some people.
#    This script will certainly have to be customized to your
#    personal needs.

do

  case ${external_action} in

  #  item between 'case' and 'in' undergoes
  #  several levels of evaluation before
  #  the case statement is finally executed.

  #  Case stanza's start with
  #  string_to_match )
  #  list of actions ;
  #  break  ;   # include a break statement to break out of the
  #             # select menu loop
  #  ;;    #  double semicolons  end actions and stanza
  #

  Q* )
    echo 'Returning to browsing'  ;
    break  ;
    #  Thumb rule in this sort of script is
    #  that a break is needed whenever there
    #  is no exec statement in a case stanza.
    #  This stops the select loop after a decisive action.
    ;;

  P* )
    read -p "The URL in question: ${URL} "  TRASH  ;
    break  ;
    ;;

  USnatch )
    exec    usnatch  ${URL}  -i -p  ;
    #  break is not needed after an exec statement
    #  because this script's process, including
    #  the select loop is replaced
    #  by the action of the exec statement,
    #  ending the select loop.
    #  You could put a break after each exec,
    #  it would probably be excessively cautious,
    #  since it would only be executed under bizarre conditions.
    ;;

  lynxvt )
    #  exec    lynxvt  ${URL}  &  ;
    exec    lynxvt  ${URL}    ;
    ;;

  screened-lynx )
    screen -t 'lynx ...' lynx   \
        -useragent='Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)'  \
        ${URL}    ;
    break  ;
    ;;

  javascript-links2 )
    screen -t 'jlinks'  \
        links2 -enable-javascript 1 -html-numbered-links 1  \
        ${URL}    ;
    break  ;
    ;;

  graphical-links2 )
    sudo links2 -g -driver svgalib -mode 640x480x256   \
        -enable-javascript 1 -html-numbered-links 1    \
        ${URL}    ;
    break  ;
    ;;

  lynx-noreferer )
    exec  lynx -noreferer=off -tna ${URL}    ;
    ;;

#    start of a typical case stanza:
  lynx-nofilereferer )
    exec  lynx -nofilreferer=off -noreferer=off -tna ${URL}  ;
    ;;
# end of the typical case stanza

  Privoxy* )
    screen -t 'Privoxy Control Panel'   \
      lynx -nofilereferer=off -noreferer=off  \
      -tna 'http\://config.privoxy.org'   ;
    break  ;
    ;;

  Microbookmarker )
    exec  microBkMrk ${URL}  ;
    ;;

  Blogspotviewer )
    exec  blogspoter ${URL}  ;
    ;;

  [Ww]3[Mm] )
    exec  w3m ${URL}  ;
    ;;

  links )
    exec  links ${URL}  ;
    ;;

  elinks )
    exec  elinks ${URL}  ;
    ;;

  wget )
    exec  nohup wget  --background -a ~/wget.log -P /mnt/hda8/  ${URL}  ;
    ;;

  Bug-Me-Not )
    screen -t Bug-Me-Not   \
       lynx -cookies www.bugmenot.com/view.php?url=${URL}  ;
    break  ;
    ;;

  whomis )
    exec  whomis  ${URL}  ;
    ;;

  nspeep )
    exec  nspeep  ${URL}  ;
    ;;

  pingvolley )
    exec  pingvolley  ${URL}  ;
    ;;

  lynxtab )
    exec  lynxtab  ${URL}  ;
    ;;

  lynxtab-blank )
    exec  lynxtab    ;
    ;;

   lynx-blank )
    exec  env -u HTTPS_PROXY='' lynx -tna -accept_all_cookies    ${URL}  ;
    ;;

  frys )
    #  to send pdf's straight to a printer
    #  this script has mainly been used for Fry's Electronics
    #  online version of their newspaper ads.
    exec  lprfrys  ${URL}  ;
    ;;

  bash )
    #  this is just to explore the environment that the
    #  externals run in
    exec  bash -i  ;
    ;;

  * )
    # if something unexpected happens,
    # this catchall stanza should simply end the script.
    # '*' matches anything at all, after all other
    # patterns have been given a chance to match.
    # it is customary to include this
    # at the end of bash case statements.
    break  ;
    ;;

  esac     #  'case' backwards marks the end of the case statement

done       #  this done statement marks the end of the select menu loop

exit  ;  #  just to make sure!  :-)

Thursday, June 19, 2008

RAND, The Corporation and It's Non-shareholders

11 June 2008 I attended another ALoud talk at the Los Angeles Public Library. The topic that night, "Soldiers of Reason: The RAND Corporation and the Rise of the American Empire".

When I was in college, the idea of working for a 'think tank' was very appealing. I read some early book on the phenomenon, and naively thought of think tanks as an extension of the college dormatory all night yak session. The speaker was not totally negative about RAND, but pointed out that they had successes on some studies, failures on others. Most people don't realize that today it is not the little Dutch boy plugging the dike hole that keeps Holland from getting flooded, but a RAND Corp. study.

The speaker grabbed my attention when in the first few minutes of the talk phrases like 'Ayn Rand', 'Milton Friedman', 'Chicago School of Economics' and 'logical expectations' were brought up. The speaker criticized a lot of RAND's studies and results as being flawed by describing people as 'logical actors'. He mentioned the idea, as an example, that Corporations only have a duty to maximize profits for their shareholders.

This may have been an underlying belief at RAND, but to associate such naive ideas with Milton and the 'Chicago School' is doing them a disservice. Economist's have long been dealing with the idea of 'externalities', what the others might call side effects. Prominently, Ronald Coase of the University of Chicago developed the so called "Coase's Law" to provide a guide on this topic.

But actually, you can understand the idea by a simple observation. Maximizing the profits of the shareholders may be the most obvious goal of a corporation, but they clearly have others. There are usually more non-shareholders than shareholders, and not provoking them into a lynch mob out to destroy the corporation and it's owners is clearly something they have to keep in mind. 19 June 2008

Monday, June 9, 2008

A bash Tool

I was writing a bash script the other day, and got fed up with having to take seperate steps to handle a lot of the routine steps to make it useable. So, I piled them all together. This bundles together several ideas I been exposed to in the last week.



#!/usr/bin/env  bash

#     -  See 'man env' and the discussion involving 'env'
#        and invoking Perl
#        in "Programming Perl", by Larry Wall

#   bcmp, Bash "CoMPiler"
#   this program really just runs some housekeeping
#   chores you should do before trying to run a bash script.
#   This is partly inspired by a discussion of software
#   configuration management I read somewhere
#   (source forgotten) discussing the nightly 'build'
#   of some perl scripts, which rather than compiling them
#   they were put through regressions tests to verify they
#   were ready to run.
#   Many of the commands used in this were things I'd come
#   across lately that seemed to fit in with this idea.
#
#   7 June 2008 Dallas E. Legan II

USAGE="${0##*/}   -h | <script>"  ;

THESCRIPT=${1:?"What script file? Usage: ${USAGE}"}  ;

[ -f ${THESCRIPT} ]  || { echo ${USAGE}  ; exit ; }   ;

set -e    ;

#     - This causes bash to go into 'abort on command error' mode
#       See "Linux Journal", Feb. 2008, p. 10, "Letters"
#       "mkdir Errors Even More Trivial", Ed L. Cashin
#       http://www.linuxjournal.com/article/9957
#       This is also documented in 'man bash' and 'help set'
#       in somewhat obscure, too understated a way.
#       9 July 2008 addition:
#       This is also referenced in
#       the "bash Cookbook",
#       Carl Albing, JP Vossen & Cameron Newham
#       O'Reilly, (C) 2007
#       ISBN-10: 0-596-52678-4
#       ISBN-13: 978-0-596-52678-8
#       http://www.bashcookbook.com/
#       p. 75 - 76
#       Recipe 4.6 "Using Fewer if Statements"

sed  -i 's/ *$//'   ${THESCRIPT}  ;

#      - to strip out trailing blanks, particularly annoying
#        after '\'   'line continuations'
#        '-i' causes 'editing in place'.
#        (Addition 1 July 2008:)
#        I came across another reason for this
#        in the "bash Cookbook",
#        p. 57 - 59,
#        Recipe 3.3 "Preventing Weird Behaviour in a Here-Document"
#        'trap' on page 59 in particular.
#        Basicly, trailing spaces on here document delimiters
#        can cause a bad day.
#        --
#        Alternatively, this might be done using
#        'ed', to make it more traditional and portable.


#       For now, I'm leaving out this idea,
#       but you might want to install some commands
#       to verify that the number of
#       '('  == ')'   (outside 'case' statements)
#       '['  == ']'
#       '{'  == '}'
#       Total number of "'" and '"' are even, etc.
#       Seperate checks for these might make
#       make interpreting the results easier to figure out.

chmod  ugo+x   ${THESCRIPT}   ;

#     - simply to set the permissions to executable

bash -n  ${THESCRIPT}  ;

#     - to do a simple syntax check
#       per the "bash Cookbook",
#       p. 476 - 477
#       Recipe 19.12 "Testing bash Script Syntax"
#       also 'man bash' and 'help set',
#       where this is obscurely documented.

cp ${THESCRIPT}   ~/bin/   ;

#     - Lastly, copy the script to a directory in PATH,
#       this could be any satisfactory location.

#       You might also want to check the script into some
#       version control system at this point,
#       or runs some functional/regression tests.

echo  ${THESCRIPT}   seems ready to run    ;
#  9 July 2008 addition:
#  This is an idea Garth told me about while at
#  Rockwell - when it doesn't conflict with the
#  purpose of the program, it's good to have a
#  message telling if it succeeded or not.
#  This was in the mainframe world,
#  and this can conflict with the needs of
#  Unix programs, but a good idea when practical.
#        END OF SCRIPT

Google Panic

This post was originally mailed in to mailto:larryhsfriends@yahoogroups.com, 8 June 2008 --

Yesterday, I got a scare when I logged into GMail with Lynx. There were a bunch of check boxes, one for each message, but no link to actually view each of the messages. This put me in shock, my confidence in Google hanging in mid air just off the edge of the cliff.

I floundered around a bit to verify that I was in their basic html user interface (I was) and then looked at the actual html source code. All the links to view the messages were there, but inspection of the markup language source code showed that there were some missing closing tags with each message, the probably cause of the problem. There was no proliferation of javascript/AJAX as I had feared, it is probably just a mistake by some new CGI programmer.

I sent them some email (that still worked :-) ), explaining the problem, pointing out the exact spot and probable missing tags that belonged on the table/row of each inbox message. I think having their basic html user interface output proper html is important to Google, this is the 'Plan B' not just for text mode and other alternative browsers but the many mobile devices they want to get market penetration on.

So we'll see what happens. Dallas E. Legan II / legan@acm.org / dallas.legan@gmail.com / aw585@lafn.org http://isthereanotherquestion.blogspot.com

18 June 2008 Addenda

    Since the above was blogged, I found a couple of ways around the above problem. I should of saved a sample of the HTML when the problem first occured to verify that there haven't been any changes in the mean time.
    • For Lynx, ^V toggles between two different HTML parsers.
    • The apparent default parser is 'SortaSGML', which follows strict, formal standards, and is prone producing bad results when confronted by poorly written markup.
    • The alternate 'TagSoup' parser, which will put up with almost any violation of standards. This of course is the one that can deal with the new GMail format.
  • After some experimentation, I found that the problem seemed to be caused by some '<div>' tags. The exact purpose of these tags seemed to somewhat unclear, especially in major browsers of the time a pocket HTML reference I use was written. A privoxy edit to eliminate them solved the problem:
    s|</?div>||igsx
    
Dallas E. Legan II / legan@acm.org / dallas.legan@gmail.com / aw585@lafn.org http://isthereanotherquestion.blogspot.com

Tuesday, June 3, 2008

Full Screen Ahead

The last couple of days, Peter posted to uuasc@uuasc.org about trying to launch a program in Linux/Unix, background it, then launch screen, the virtual console manager, and have access to process control from inside screen.

After some short but highly technical explinations on why this was probably not possible, I posted an idea, and Peter responded:

From: Peter .......
To: uuasc@uuasc.org
Date: Tue, 3 Jun 2008 03:31:09 +0000 (UTC)
Subject: Re: Switching to Screen
That is a brilliant idea!

>> ...........


>An ounce of prevention is worth a pound of cure.
>A simpler approach is to launch screen in your *profile
>script, and never leave it without a clearcut reason.  :-)
>(As i discussed in my bash talk at UUASC way back when,
>or last week at IEEE-CS/Lilax).
>
>Regards, Dallas E. Legan II / legan@acm.org / aw585@lafn.org
>..........

Well, to merit such accolades, maybe I ought to fill in on what I mean by this. Typically, I launch screen in my ~/.bash_profile script with the phrase:

screen -R

Which usually does a good job of grabbing any detached screen sessions that may be floating around, and if any aren't, it launches a new one.

Now, what are some 'clearcut reasons' to leave screen?

    I've encountered three instances I can recall.
  • I accidently cat some file to standard out, it turns out it wasn't really a text file, but a binary, and it messes up the console so bad 'stty sane', 'reset' etc. fail to clear things up. Sometimes, I've found detaching screen, trying reset and such will get the tty working properly before you reattach. I seem to recall instances you might need to logout and then 'kill -HUP 1' to restart intit and be really serious about this sort of thing.
  • I'm running some sort of remote console connection with screen at the opposite end. I like to use ssh inside kermit for this sort of thing, and if you are running screen at both ends you have to escape your screen escape sequences to the other end of the remote session. I find it simpler to login locally on an inittab controlled vitual console, not runing screen, and that way I can just skip the escape key to send screen key sequences to the remote end of the session.
  • Mouse responsive console apps like Lynx haven't responded to the rodent in the past for me when run in screen. I recall doing some surfing in the past to investigate this and finding some posting about how screen doesn't handle mouse keys and isn't able to pass them on to the application. You can run Lynx in another inittab controlled console, or like I do, I have a short Lynx EXTERN script to use startvt to open another console for the text mode browser if I want to use the mouse. There are probably other apps with this problem. I should note that gpm is not one of them, it's basic cut and past operations seem impervious to it.

Until I understood the first two cases above, I used to be much more hardcore about this. I'd actually end my .bash_profile script with a line like:

exec screen -R

As anyone familiar with what the shell/API call exec does knows, this actually replaces the login shell entirely with screen, screen simply dropping in place of bash and taking over the default I/O. The problem with this is that if you want to run something else like kermit and/or ssh, or detach screen to do something like run reset, then reattach you are out of luck. Ending screen simply logs you out. Another aspect of this, my experience seemed to indicate, was that you need to put all the real work you want done besides launching screen in your .bashrc, non-login initiallization script.

But Wait, There's More

That's not as hardcore as you could get. In the /etc/login.defs file there is a section:


#
# Instead of the real user shell, the program specified by this parameter
# will be launched, although its visible name (argv[0]) will be the shell's.
# The program may do whatever it wants (logging, additional authentification,
# banner, ...) before running the actual shell.
#
# FAKE_SHELL /bin/fakeshell

Yes, you read that correctly. I think a main purpose of this feature is insert programs like 'script' for auditing or simple minded honeypot applications (and I've heard of intruder's being trapped by similar methods). However, I have successfully put in:

FAKE_SHELL /usr/bin/screen

commented out the screen in ~/.bash_profile, and the program in question did run before bash ran it's profile script. Capabilities like this and those of the pam_env module start to override the shell startup scripts considerably.

An alternative plan might be to add /usr/bin/screen to /etc/shells, add a line

shell /bin/bash

into /etc/screenrc or ~/.screenrc, and replace your default shell in /etc/passwd with /usr/bin/screen. I have to confess to not experimenting with this to any serious extent.

Hopefully, you've gotten something out of this diatribe. It's mostly pulled from my bash and password talks (with maybe a bit from an improvised section of the 'Greatest Hits' talk) given initially at UUASC meetings. I felt like this was an opportunity to consolidate some of this information.

Sunday, June 1, 2008

Fresh From a Stream

At tonight's San Fernando Valley LUG meeting, our friend Larry was at it again, trying to access media streams from http://www.freetv.com from the text console, preferably from the Lynx browser. After a lot of floundering around, we managed to play a few of the URLs given in descriptive text (not expressed in HTML links) with mplayer.

After arriving home, here are some things I found to explore the site with.

Install Privoxy, and add the following to your /etc/privoxy/user.action file(or perhaps the /etc/privoxy/standard.action if the release of Privoxy you are using doesn't have the user.action file):


{ +filter{freetv} }
.freetv.com

Then add the following to /etc/privoxy/user.filter (or /etc/privoxy/standard.filter if your Privoxy doesn't use a user.filter):


#################################################################################
#
# freetv: Try to convert certain URLs floating in space to actual links
#               05/31/2008 d.e.l.
#
#################################################################################
FILTER: freetv Try to convert certain URLs to automaticly launch jlinks

s| (mmst?://[^ <>]*) |<A href="$1">$1</A>|igsx


Restart privoxy, typically with

sudo /etc/init.d/privoxy restart

Then, install this script, I called it mmStreamer somewhere in a directory in your PATH variable:


#!/bin/bash


URL="${1}"    ;


TURL=${URL#*://}      ;
TURL=${TURL%%[/?]*}   ;
echo 'Domain: '  ${TURL}          ;

ping  -qc 3  ${TURL}    ;
DOMAIN_TEST_RESULT=$?  ;

if [ ! ${DOMAIN_TEST_RESULT} ]
then
  echo "The domain ${TURL} seems bogus.  Hit return to continue."  ;
  read  BOGUS  ;
  exit  ;
fi

sudo mplayer -vf-clr "${URL}"    ;



sudo mplayer -vf-clr -playlist "${URL}"   ;

exit  ;

You may need to modify the lines with mplayer to match how you normally invoke mplayer, just make sure one has the -playlist parameter so one or the other may run. And of course, make sure to adjust permissions with chmod ugo+x mmStreamer. There are probably many improvements that can be made to this script, but it is partly a proof of concept script, partly something to get rolling with some quick results.

Then to your lynx.cfg file, add to the EXTERNALs section add:

EXTERNAL:mms:mmsStreamer %s:TRUE:TRUE

The last 'TRUE' will cause mmsStreamer to be run automaticly when you activeate a link with type mms.

Now, the next time you go to http://www.freetv.com, you should be able to play some of the mms stream URLs, that will now be actual links. Some of them seemed to be mere ads, others seemed to have bonafide content. A few might be missed, I noticed one with the typo of two colons (mms:://) and a Google search of the site turned up couple of http streams. These seem to be the exceptions that will have to still be handled by hand. A lot seemed no longer there, but perhaps a better choice of mplayer switches and settings might tap into them.

    A few things turned up browsing the site.
  • One is that I was reminded of the http://www.la36.org site, with a lot of Los Angeles local content. I didn't have the time to investigate if it is being updated, but I did notice a few videos of past L.A. Public Library / Library Foundation ALoud talks. A few of these are of technical interest, such Craig "List" Newmark, and one on the Google Book project.
  • I probably should ad some capability to deal with http://www.la36.org to usnatch
  • I was spurred to search Wikipedia for http://www.freetv.com and it turned up http://en.wikipedia.org/wiki/Public_access_stations_%28United_States%29. Perhaps some creative search will turn up a few other pages to explore.
  • http://www.drgenescott.com didn't seem to be active, but it did remind me of the many times I'd passed him the TV dial in days past, and the many discussions of this archtypal L.A. CA personality.

Goggling Google

This is an extract from what was originally posted on the larryhsfriends@yahoogroups.com mailing list Tuesday 27 May 2008. It's posted here on request of a mutual friend, Charles.

I've deliberately avoided replacing the extravigant URLs with Tinyurl.com abbreviations because these URLs would normally be used only once and to illustrate all the dirty work involved in this branch of screen scraping and accessibility issues.

My blind friend Larry was trying to access a video (to listen to, smart alec), located by going to http://video.google.com,

  • clicking the radio button to limit the search to Google hosted videos,
  • searching on the term 'debate'.
  • Lynx numbered link #69, was an MSNBC debate from October 31, 2007, the hosting page being at
  • http://video.google.com/videoplay?docid=8023519711099005229&q=debate&ei=IYM7SOuXEKDk4AL5473bAw

Dallas:

You are correct that there was no link for a .mp4 on that particular page / video, so they don't currently all have that option. I have to agree with that.

However (grin) by a process too complicated to explain right now, I was able to get a bloated, rather complicated URL for the video in question.

Try this, you'll have to cut and paste it, making sure the whole thing, 4 lines, is on one line, quoted to make sure parts aren't passed into background as bogus batch jobs:

http://vp.video.google.com/videodownload?version=0&secureurl=QwAAAD0W5d-wPcVVeTl7QrGr5zTKridzSGf41ASu20PecohoZi2sOgFpCHjq8L-P4O1pMCjFMcitHUDMkPHMpztjyF2Gfx5zzGZQK-3bM6BN4oWB&sigh=NrWxm3ARGWrIAA7a4p9rRb4PcMs&begin=0&len=6395300&docid=8023519711099005229

Since I stepped through this once, in theory it can be automated. I've decided to try and recreate it to record this for posterity. Looking at the page you cited I noted the suggested, simplified URL for embedding at the bottom of the page in some suggested html:

http://video.google.com/googleplayer.swf?docid=8023519711099005229

I then fed this into gnash with this command:

$ gnash -vr 2 'http://video.google.com/googleplayer.swf?docid=8023519711099005229'

-v is for verbose

-r 2 is to play sound only, no video.

This cranked out about a screen of hard to understand very technical messages, and eventually stalled out, but before that it spit out a message:

17430] 22:13:13: SECURITY: Loading XML file from url: 'http://video.google.com/videofeed?fgvns=1&fai=1&docid=8023519711099005229&hl=undefined'

before I lost patience and hit control c. This seems to have the same hash string, all numeric at the original URL, so everthing up to here could be arrived at in a shortcut manner knowing what is crucial in the original URL and the form of this final url. That is to say, gnash is not essential to get to here!

I then dumped that URL with the command:

$ lynx -source 'http://video.google.com/videofeed?fgvns=1&fai=1&docid=8023519711099005229&hl=undefined'

and studying the output I noticed a string:

vidurl=http%3A%2F%2Fvideo.google.com%2Fvideoplay%3Fdocid%3D8023519711099005229%26hl%3Den&usg=AL29H2354Bu9OKKGOkt8CkFi7UeioVIIgQ"

and then fed that into a tool I have 'dex' (de-hex encode) to convert the % escaped hex encoded characters back into straight characters:

$ dex <<< 'http%3A%2F%2Fvideo.google.com%2Fvideoplay%3Fdocid%3D8023519711099005229%26hl%3Den&usg=AL29H2354Bu9OKKGOkt8CkFi7UeioVIIgQ'

and it output:

http://video.google.com/videoplay?docid=8023519711099005229&hl=en&usg=AL29H2354Bu9OKKGOkt8CkFi7UeioVIIgQ

I then did another:

$ lynx -source

'http://video.google.com/videoplay?docid=8023519711099005229&hl=en&usg=AL29H2354Bu9OKKGOkt8CkFi7UeioVIIgQ'

and saw in the output of it the string:

videoUrl\x3dhttp://vp.video.google.com/videodownload%3Fversion%3D0%26secureurl%3DQwAAAD0W5d-wPcVVeTl7QrGr5zTKridzSGf41ASu20PecohoZi2sOgFpCHjq8L-P4O1pMCjFMcitHUDMkPHMpztjyF2Gfx5zzGZQK-3bM6BN4oWB%26sigh%3DNrWxm3ARGWrIAA7a4p9rRb4PcMs%26begin%3D0%26len%3D6395300%26docid%3D8023519711099005229\x26

and feeding the string between the \x escaped hex numbers into dex again:

$ dex <<<

'http://vp.video.google.com/videodownload%3Fversion%3D0%26secureurl%3DQwAAAD0W5d-wPcVVeTl7QrGr5zTKridzSGf41ASu20PecohoZi2sOgFpCHjq8L-P4O1pMCjFMcitHUDMkPHMpztjyF2Gfx5zzGZQK-3bM6BN4oWB%26sigh%3DNrWxm3ARGWrIAA7a4p9rRb4PcMs%26begin%3D0%26len%3D6395300%26docid%3D8023519711099005229'

got the final URL:

http://vp.video.google.com/videodownload?version=0&secureurl=QwAAAD0W5d-wPcVVeTl7QrGr5zTKridzSGf41ASu20PecohoZi2sOgFpCHjq8L-P4O1pMCjFMcitHUDMkPHMpztjyF2Gfx5zzGZQK-3bM6BN4oWB&sigh=NrWxm3ARGWrIAA7a4p9rRb4PcMs&begin=0&len=6395300&docid=8023519711099005229

This is rediculously complicated, and obviously needs to be automated (as it usually is via javascript! :-) ) But it produced working result and is describable.

(after a trip to Food for Less)

However there is an easier way to do this. I just fed the original URL into usnatch:

usnatch 'http://video.google.com/videoplay?docid=8023519711099005229&q=debate&ei=IYM7SOuXEKDk4AL5473bAw' -u

and it output this url, probably by way of scraping it from KeepVid.com:

http://vp.video.google.com/videodownload?version=0&secureurl=twAAAD0W5d-wPcVVeTl7QrGr5zTKridzSGf41ASu20PecohoD0a7UlL0hOryryfecm0kR0Az1TAZjqmcK4Jhzww767-M--b5VXs0aG2FyEksUHG7jZMWLv12yp10ahgVqupjVDS1ehay8IuXr_K5CJYVeSwkYqKv5owxTDiGz7X7xKbrgQVNx-7ue4RTDjur5LWmoqryoLSCkqAgx6UteEa8LIwTCBSDJhB3jzak8cwIcF70G2Np9NfuVtZ8OmJwuiMRsg

which played when I used it with mplayer, making sure to enclose the url in single quotes after pasting into the command. So it could of been played from Lynx and I'm downloading the file from Google right now. The trick is to back up from the page where they want you to view it, and be on top of the video.google.com search results link to run usnatch. Alternatively, from that page you could invoke usnatch from the original url on the original url by using the comma key instead of the period key to invoke the usnatch external program. Period calls externals for the currently active link, comma calls externals for the current page.


2008 June 1 Afterward

...And of course the ultimate goal of the long exercise is to include the algorithm described in usnatch, with the idea of making it less dependent on scraping information from sites like http://KeepVid.com.

Wednesday, May 28, 2008

Truth or Skype?

Recently I had to install Skype in order to be able to contact a job lead in the manner most convenient for them. After my recent successes with SIP VOIP I thought it was time to try Skype, I'd heard much good about it.

I went to their site, located the Linux build I needed, scanned the hardware requirements. I thought I saw the requirements say that it only needed a 400 MHz CPU and about 128 MB of memory. The installation on my desktop computer went smooth, a simple dpkg installation of a .deb package. When I finished and tried calling their echo test number, echo123, I was greatly disappointed. The incoming sound was fine, but I'd have to rate what I heard of my voice echoed back as one of the worst audio experiences of my life, that badly distorted.

I installed Skype on another computer, a much more powerful media server and found the performance was fine. The problem then was with my desktop. I went back to the download page, and noticed that the required CPU speed was 1 GHz, not the 400 MHz I remembered seeing. I searched around some and did find a page on Skype's support site that quoted the speed I remembered seeing. http://support.skype.com/index.php?_a=knowledgebase&_j=questiondetails&_i=184&nav2=Skype%20for%20Linux Whether they changed this after I downloaded the .deb or if I had chased off surface pages of their web sites looking for the information I can't recall. I'd heard so much about good about Skype I didn't expect any problem, especially after the SIP successes I'd had lately.

Shane on irc.freenode.net suggested some terms to google and after much flondering I found a couple of pages that gave older versions of Skype's client program.

After an abortive try at 1.4, I found a version 1.3 that seems to work fine. There were some comments that turned up indicating some people found versions after a 1.3 release to be incompatible with some Linux Distros. Whether this was because people using Slackware Linux had no other need to update their hardware or a genuine software incompatability I don't know.

I also thought I saw Skype claim their program works over 33.7 KHz dialup lines. I consider this conclusive evidence that they use some form of audio compression, which could explain why there is increased need for CPU speed. Probably not a problem on more modern computers, but one form of audio compression, encoding to .mp3, is definitly not a real time process on my computer so this probably backs the problem cause scenario.

Anyway, this might save someone some time.

Monday, May 26, 2008

A Phone Call From Ralf

Today I recieved a phone call from Ralf, a friend of me and some of my other acquantances. The thing that was unusual about the call was that it was the first I have recieved using Voice Over Internet Protocol (VOIP) / Session Initiation Protocol (SIP). I've placed many outgoing calls in the past but never recieved any till today. The call was using the services and accounts of the Gizmo Project (http://www.gizmoproject.com) by both of us.

One of the cool things I like about Gizmo's service is that if you are not logged in with it, and you recieve a call, they will email you a .wav file of any voice message a caller leaves for you, effectively turning your inbound email into an answering machine. I've played these wav files with mplayer while browsing my email's web interface with Lynx. I consider this an indication of keeping things interoperable by sticking with basic, universal protocols and formats.

I'd just gotten outgoing calls using my Gizmo account to work yesterday with the Linphonec client, from the Debian Linux distribution package linphone-nox. This particular client is a text console application that simply presents you with a prompt to type in commands, and does offer a 'help' command to get a list and further details of possible commands as well as tab completion. In general the commands are fairly straight forward, but I do have a few complaints. It does not respond to 'exit', you must type in 'quit' to leave the program. There is no command to toggle muting on the microphone, so I have to put it in a wrapper alias to handle this with an 'amixer' command I figured out to work with my particular sound card. This probably varies from soundcard to soundcard. Also there is no direct command to dump the call log (the output of the 'call-log' command) into a file for saving. There may be a switch, '-l ' to handle saving the logfile, but that will need to be folded into the wrapper alias, so it has to be thought out a head of time.

One of the things that was notable about the call was that it traversed three firewalls, two of them performing NAT (Network Address Translation). There had been some adjustments to these firewalls to handle the outgoing calls, but apparently that was sufficient to enable incoming calls. NAT is frequently considered the bane of VOIP, but apparently, at least on a limited scale, it can be dealt with.

Anyway, before the day ended I'd recieved several other calls from Ralf and another friend further confirming that the combination of software, hardware and protocols provided functioning telephony.

Wednesday, April 30, 2008

Quick and Dirty TV Tuner for VLC

I'm partial to mplayer for most of my video and other media playing, but found myself using VLC a lot recently to verify that a V4L (Video for Linux) compatible 'PVR' (Personal Video Recorder) card had proper driver installation and was functioning correctly. I was frequently picking 'File / PVR' and entering

pvr:/dev/video0:channel=0:norm=ntsc:size=640x480:frequency=561250

something I find as awkward to type in as you probably would also. This sets the PVR card to 'channel' zero, the TV tuner, sets it for NTSC broadcast format, a typical screen pixel size, and a frequency from the standards that tunes to the local broadcast channel 28 (KCET, Los Angeles, CA). Fumbling around some, I figured out how to save this as a playlist, and since forgot that process.

But what I did remember was the format that the playlist was in. I used that sample playlist file, the information at http://en.wikipedia.org/wiki/North_American_broadcast_television_frequencies and created a 'TV tuner playlist', stashed online at http://www.lafn.org/~aw585/na.tv.channels.pls

The first few entries are what I want for a default channel, both the 'proper' and 'actual' frequency (perhaps the card or Wikipedia still needs some adjustment), entries for S-Video and Composite settings for the PVR card, and then settles into the standard North American channels in proper order. Using information online at Wikipedia it can of course be changed to personal/local needs. Someone more knowledgable about TV technology can probably suggest improvements on the playlist/table presented.

You can select from the VLC control panel 'Settings' / 'preferences' / 'playlist' and enter your local name of the file ('.vlc/na.tv.channels.pls' in my case, stored in the .vlc subdirectory of my home direcory) for the default playlist and save to get the first entry to start on VLC startup.

This leaves a lot to be desired as an interface, but you just click on the previous/next stream graphics (double arrows) to move up and down the North American TV bands. It sometimes seems to need double clicks to get past some 'humps' going down the bands past stations, but it is functional as a quick hack!

Tuesday, April 29, 2008

Ubuntu Battle Prep

I've ended up doing several Ubuntu installs lately, and find I have to go through a preceedure to get some favorite tools installed while trying to dig out information and solve problems.
  • Edit /etc/apt/sources.list. Make a second copy of all the default sources installed, and change on all the copies 'http://archive....' to 'http://us.archive....', to give some quick and easy extra depth to the available Ubuntu .deb package archives.
  • Immediately 'sudo aptitude update' to ensure access to the latest packages.
  • edit ~/.bashrc, including lines with:
    • set -o vi
    • export IRCSERVER='irc.freenode.net'
    • export IRCNICK='mr_dallas'
    • alias realias='vi ~/.bashrc ; source ~/.bashrc ' ;
  • sudo aptitude install screen lynx-cur tinyirc ckermit gkermit
  • edit /etc/lynx-cur/lynx.cfg, adding 'TEXTFIELDS_NEED_ACTIVATION:TRUE' (this and the below changes to lynx options move lynx from klutz to 'thought control' mode for rapid googling of answers.)
  • edit /etc/screenrc, adding
    • 'escape ^Oo' # this avoids a lot of conflicts of screen with other programs
    • 'startup_message off' # for less noise when starting screen
    • perhaps beefing up 'defscrollback' to maybe 4096.
  • startup Lynx, go into options ('o')
    • change it to advanced user mode
    • turn on 'vi' keymode
    • For the number pad set 'Links and form fields are numbered'
    • check the box to 'save options to disk' before accepting changes
    • Go to and bookmark my private portal page, where I have many usefull links and forms.
This doesn't by any means duplicate all my environment on my personal machine, but it gets some simple tools that help quickly hunt for answers and solve problems available.

Saturday, April 5, 2008

Open(ing) ID(=)

The other day I was editing a personal, private portal page I've created for my own use. It has a lot of features I've been experimenting with such as accelerated login to some services by way of simplified login forms, search engine forms with a single search field set up so just hitting enter in the field starts the search etc.

I was checking the proper HTML for internal fragment links, recalling that a quick reference book I had the form '<A name="....' and some examples I had on another web page had '<A id="....'. This triggered a chain of thought when I was looking at another page, and noticed that 'id="'s were sprinkled all over the place in the HTML - if you don't believe me, just take a look at the source for http://en.wikipedia.org. So - could I tack #xxx's on the end of a URL and have it go to some spot corresponding to the arbitrary piece of html in a web page.

The answer for Lynx was yes, and some checking seemed to indicate the same held true for Firefox as well. Everything up to this point may seem obvious to someone with more in depth knowledge of HTML. My next thought though, was that I'd like to make it easier to bookmark a page with a more specific location added as an HTML fragment to the address.

A few hours of work on the idea turned out a modest sized script that can be used as an EXTERNAL for Lynx. It strips out the name= and id= fields of a web page, reformats them in to #<name> #<id> fragments on the URL and pumps both lists wrapped with appropriate HTML into Lynx for browsing/bookmarking. When you're finished, just exit Lynx to resume where you left off. An example of what gets kicked out when you run it on http://www.google.com:

This may not seem too important when used on such a deliberately simplified page, but it makes it much easier to bookmark the search field for http://en.wikipedia.com as an example. One weakness of this is that since most browsers make it relatively hard to bookmark such a specific location in a page, these are probably more subject to change with the whim of the HTML coder and/or their coding tools.

'name=' didn't seem to be noticed by the browsers I checked with, but was used in the example in the reference guide I looked at, so it may change in the future (or changed in the past.). I left it in the script. Some example pages I checked set 'name=' and 'id=' the same on many HTML elements. Glancing through the quick reference on HTML I found several elements that specificly allowed 'id=' attribute: <A>, <DIV>, <LAYER>, <PARAM> and <SPAN>. But on closer inspection, I found many listed a '%coreattrs', and when I found that in the guide it included besides 'id'; 'class', 'style', and 'title'. %coreattrs pretty much busts wide open the elements that can have 'id' attributes - <FORM>, <UL> etc. I used this to simplify the HTML in my portal page that triggered this discussion, rolling fragments for internal navigation into lists that group related links/forms.

This also is an example of a subject I've railed about in person on occasion, that web pages are subject to different interpretations ("renderings") depending on circumstances and the goals of the 'viewer'. This extreme view provides a list of everything I can think of that could be linked to on a web page.

I've pitched the script temporarily at: http://www.lafn.org/~aw585/microBkMrk

7 April 2008 Addendum:

    A few other points about this topic:
  • Since I suspect that the fragment addresses will be more subject to change than a base address for a page, it may be more important to monitor pages bookmarked/linked to in this manner for changes by whatever means.
  • Conversely this is yet another way to view of a web page that can be hashed/'fingerprinted' to monitor for changes in the page.
  • It might be usefull to dump out a few more attributes and maybe the element they are part of for the IDs and NAMEs, to provide a more human readable overview of the page, many of which are generated by tools that provided the author with a very abstract/high level relation to the html. In other words, he never actually saw the html, and there was no thought to anyone needing to look at it. This provides one view that summarizes features of the page. There are probably quite a few features that could be added to this tool.
  • From an accessability/usability view point, this might save repeated thrashing with a page. Someone might fight it out or get help a first time, but then by finding a precise spot they can either bookmark, link to or just go to with a #<fragment> address, they can effectively reuse the work of initially understanding how to navigate the page. This isn't neccessarily a result of a poorly designed page, but could in some instances merely be because of the nature of the information being presented or asked for. The full implications of this remain to be seen.

Tuesday, March 18, 2008

Breaking News

Tonight I finished reading "How to Break Software" by James A. Whittaker, 203 (C), ISBN 0-201-79619-8. I think the book is/was used as a text book on software testing at Florida Institute of Technology.

    There were a couple of things that stuck in my mind from reading the book.
  • Ultimately, and conceptually most accurately, all 'application software' interacts only with the Operating System. It is one job of the Operating System (OS) to provide access to various forms of I/O and other resources to the application software.
  • When this book was written, various special testing applications were used as wedges between the software under test and the Operating System to supply bogus I/O and envirounments to the tested software as part of the test program.

The time period since this book was written has seen the rise of OS virtualization in importance. Perhaps this has already been worked out, but it seems the ability to dynamicly inject I/O reactions and make arbitrary changes on demand to the the percieved environment of the application/OS system seems to be a logical developement to OS virtualization. By this I mean to literally dial in/make changes on the fly. This would effectively replace these 'wedge programs' with virtualized OS's, and would also extend the testing methodologies Whittaker documents to prepackaged app/virtual OS bundles.

Also I can't recall much in the way of specific web application testing information in the book. Sun proclaimed "The network is the computer", and the client is as much a part of that computer as the server and any intermediate routers are. One continually hears about the difficulty of testing web applications against all possible browsers the people out in user land may be using. An ideal client would of course adhere to all web standards strickly, but today no one is using such an abstraction in the real world. An ideal *test* client would be able to dial in the actual behaviour of all real world browsers as well as the ideal, perfect standard complient browser. Such a test tool would not only be a breakthrough in web testing, but the ideal chamleonic browser for the user, to deal with all the weird, poorly tested 'web apps' on the Internet.

Perhaps someone out there knows more about these subjects and can expand on these thoughts.

Friday, March 14, 2008

Let's be Frank

Continuing on about the Aloud talk, 13 March 2008, by Robin Wright. The moderator for the presentation made some mention that "some people think Rock and Roll brought down the Iron Curtain". This triggered a memory.

I was at a party some time in the late 80's. A woman asks if I've done anything important lately. My response, ~"I saw a Frank Zappa concert". Her response - eyes begin to bug out, face changing color, with a look of total outrage, "Well, I hope you *Rocked Out*....." and a tirade of abuse vented till it ran out of steam.

I want to go on record now, before the world, for all time. I not only glad I saw Frank Zappa while he was alive, I consider it at least on par with touring the Getty Museum or Rome/Vatican. I'd say the same for Carlos Santana, Wynton Marsaillis and Miles Davis, but those are other articles to be written. I understand there are college classes/symoposium to study Frank Sinatra and Madonna - Zappa is at least as worthy.

Producer Tom Wilson thought he'd latched onto an American equivalent of the Rolling Stones, some white guys forming a blues band, when he heard "Trouble Every Day" (about the Watts Riots of the 60's). This idea was greatly changed when he heard "Any Way the Wind Blows" and "Who Are the Brain Police". The list of ~"people who have made this possible. Please do not hold it against them" on one of their first albums was a cultural education most colleges barely begin to match.

Besides dealing with musicians from Edgard Varese, Zubin Mehta, Pierre Boulez to Johnny "Guitar" Watson, Zappa created satire on the level of Groucho Marx and H.L. Menken simply by singing about topics from the Illinois Enema Bandit to Senate hearings:

"Government is the entertainment division of the military industrial complex."

(rock journalism is) "people who can't write interviewing people who can't talk for people who can't read."

and on and on....

But just google "Vaclav Havel" and "Frank Zappa", and you'll get hundreds of hits. Zappa's "question authority" outlook, coupled with his free use of traditional and avant-guarde elements of Central and Eastern European music, putting it in a context of where freedom held sway, provided an inspiring alternative to the "Socialist Realism" the public was expected to buy into during the period between the 60's Czech crisis to the final fall of Communism in Europe. He was labeled "Underground" in the U.S.A. when the public noticed him, but fueled the real "Underground" in societies where the problems were much more serious then some of the silliness he lampooned.

Nerdistan

One of the things I created this blog for was to discuss some of the "ALoud" talks given at the Los Angeles (City) Public Library sponsored by the Library Foundation of Los Angeles. Today, Thursday 13 March 2008 I was at a talk by Robin Wright, who'd recently completed the book 'Dreams and Shadows: The Future of the Middle East'. A podcast of here talk should be up soon, I'll probably post an addendum when it is.

One of her goals in writing the book was to try and find the "Lech Walensas, Nelson Mandelas" of the Middle East. She gave some interesting stories about people committed to fighting the injustices of the area, getting out of prison only to return to the same kind of protests they did time for, but perhaps the most interesting aspects of the story was something she hit on early in the talk. This was referred to by one of the audience questioners as "Nerdistan". In most Western Societies, most people take it for granted that governments are generally benign, but this has largely been the exception in overall human history. (You might want to check out http://en.wikipedia.org/wiki/R._J._Rummel) Basicly, people in the Middle East are using high technology to bypass government controlled media and organize public outrage against injustice and corruption. Wright discussed one example of people video recording government thugs trying to intimidate voters, blogging it, and eventually the videos making it's way to Youtube. In another case people organized protests texting messages by cell phone. As we've learned with things like the Rodney King video, the act of recording governments in action and showing it to the public can sometimes have explosive effects, for good or bad.

Sun Tzu, in the "Art of War"'s last chapter talks about the importance of information in armed struggle. Perhaps the most titanic historical event of my lifetime, the Cold War, while definitly having some real deaths by physical struggle, was largely fought at an abstract level with the Venona Decrypts, spy planes, satillites, submarines, signals intelligence, defectors, etc. We're beginning to see the implications of information getting into the hands, and control of the common man, the pen beating the sword into submission.

Thursday, January 3, 2008

WWWsh

  • meandering back from the County library, thinking about all the Google Search commands, I realized these were just a subset of the http://Yubnub.org commands.
  • This revived the idea of the World Wide Web shell, that Yubnub doesn't quit nail.
  • I'd given thought to making that shell with something like Kermit or CURL.
  • Some experiments with the information from the Yubnub html form indicated that the site makes extensive use of http redirection.
  • Using these tools would also require using some tool to render the html when that is appropriate (most of the time), yet leave it open to output raw html when appropriate (some times)
  • This points back to a web browser, with some kind of plug in, like some of the Yubnub tools listed on their web site.
  • One of the main charateristics of a shell interface is that it normally returns back to the prompt, or the command entry field, when it completes running a command.
  • To make sure that the you always get back to the Yubnub/WWW command prompt would require either something done at the client program or that Yubnub proxy all the final "command" output.
  • One way of implimenting this idea would be to edit some chunk of html form code into all web pages, and this could be done with Privoxy.
  • And so the following Privoxy filter, in /etc/privoxy/user.filter:

    
    ###################################################################### 14:24:24##
    #
    # yubnub: Try at a yubnub 'command prompt' on each web page
    #               30/12/2007 d.e.l.
    #
    #################################################################################
    FILTER: yubnub Try at a yubnub 'command prompt' on each web page
    
    
    s|(</body[^>]*>)|\
      <form action="http://www.yubnub.org/parser/parse" \n\
             + method="get" name="input_box"> \n\
        <input type="text" name="command" size="55" value=""/> \n\
      </form>  \n\
      $1|ig
    
    #  For a single text input field, Lynx  will submit on entering return,
    #    so no submit needed
    #  W3M and possibly members of the Links family browsers have
    #    function to force submission of forms
    #  To use with other browsers, you may need to include a line like
    #    this before the closing of the form with '</form>':
    #
    #   <input type="submit" value="Enter">
    #
                      
    
  • then this in user.action:

     
     { +filter{yubnub} }
     .
      
    
  • And after restarting Privoxy you get a screen shot like this:

    --
    ,,,                                            Los Angeles Free-Net's Home Page+
       The Los Angeles Free-Net
    
       A volunteer non-profit organization dedicated to bringing people
       together, providing community information, and offering Internet
       access at the lowest possible cost.
    
       [1]____________________________________________________
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    (Form field) Inactive. Use  to edit ('x' to submit with no cache).
    --
    

    Where the entry field is the command prompt/entry field.

  • In lynx, ^E will always put at the bottom of the current web page, which from now on will be the prompt location.
  • This has the obvious weakness that there must be a '</BODY..' tag for it to work, but then some apps run inside bash fail to complete properly and hang also.
  • It also turned out that default settings for Privoxy skip edits on sites with 'wiki' in the address (as one example), but it is normal for a shell to not present a prompt while an application is running also, so this makes some logic.
  • Time to start learning more Yubnub commands - the ones I think of off the top of my head are 'g ', 'w ' (for a weather report), 'wp ' 'ls' (for a list of commands).

Addenda 15 Jan. 2008

I should point out this idea could be used for simplified services, like Google search, standard and www.google.com/ie, google proxies like www.scroogle.com/scraper.html, etc. It's just that Yubnub includes these as commands, but if there were no Yubnub, these would certainly be fine enough to justify the idea. Also, I may need to rename this - I noticed that WWWsh is already used for some software of some kind!

Tuesday, January 1, 2008

The Google Project

  • A list can be something between the formlessness of stream of conciousness, and mathematics and computer science.
  • It all started when I realized that Los Angeles FreeNet for various bureaucratic reasons no longer gained any benefit from it's users having LAFN as their start page. An e-mail and the response confirmed this so the quest was on to create my own start or "portal" page.
  • One of the things I wanted on this start page was to have the links go as directly to the functionality of the linked pages as possible.
  • I mentioned this to L. while at one of San Fernando Valley LUG's meetings, in conjunction with a stab at it on Google. He pointed me to http://www.scroogle.org/scraper.html|Scroogle_Scraper. At first I had this confused with some direct hack on Google's home search page.
  • This list of thoughts got under steam the next day while showering and continued through a walk to the L.A. county library.
  • I started digging around for all the Google search hacks.
  • Did a bit of web surfing on the subject.
  • Located my copy of the O'Reilly "Google Hacks" book.
  • Then it occured to me, is there a Google Search man page?
  • I stumbled onto the Google help page,
  • But using one of the tips, 'man google' did not turn up a properly formated, real man page.
  • Getting back to the start page project, one of the goals was to group some links into functional groups like E-mail, Twitter, Web 1.0, Web 2.0 etc.
  • And where did Google belong on here, Web 1.0 or 2.0?
  • And with a brief glimpse, a nanosecond flash of vision from the Stapledonian perpective, I realized that Google was in fact the first Web 2.0 application. With the power of superior indexing and search algorithms, they had effectivly shanghied preceeding Internet activity into providing the user supplied content for The Google Project, previously known as the World Wide Web.
  • (with their search results, the mother of all mashups.)
  • ...Or maybe it was Linux, that pulled ahead of BSD in mindshare by opening up to user supplied content.
  • Or maybe Richard Stallman and the FSF?
  • Besides, Google doesn't seem to have a charismatic leader like Linus, RMS or Jimmy Whales.
  • Do they need one?