This is an extract from what was originally posted on the
larryhsfriends@yahoogroups.com
mailing list Tuesday 27 May 2008.
It's posted here on request of a mutual friend, Charles.
I've deliberately avoided replacing the extravigant URLs with Tinyurl.com abbreviations because these URLs would normally be used only once and to illustrate all the dirty work involved in this branch of screen scraping and accessibility issues.
My blind friend Larry was trying to access a video (to listen to, smart alec),
located by going to http://video.google.com,
- clicking the radio button to limit the search to Google hosted videos,
- searching on the term 'debate'.
- Lynx numbered link #69, was an MSNBC debate from October 31, 2007,
the hosting page being at
- http://video.google.com/videoplay?docid=8023519711099005229&q=debate&ei=IYM7SOuXEKDk4AL5473bAw
Dallas:
You are correct that there was no link for a .mp4 on that particular
page / video, so they don't currently all have that option.
I have to agree with that.
However (grin) by a process too complicated to explain right now,
I was able to get a bloated, rather complicated URL
for the video in question.
Try this, you'll have to cut and paste it, making sure the
whole thing, 4 lines, is on one line, quoted to make sure parts aren't
passed into background as bogus batch jobs:
http://vp.video.google.com/videodownload?version=0&secureurl=QwAAAD0W5d-wPcVVeTl7QrGr5zTKridzSGf41ASu20PecohoZi2sOgFpCHjq8L-P4O1pMCjFMcitHUDMkPHMpztjyF2Gfx5zzGZQK-3bM6BN4oWB&sigh=NrWxm3ARGWrIAA7a4p9rRb4PcMs&begin=0&len=6395300&docid=8023519711099005229
Since I stepped through this once, in theory it can be automated.
I've decided to try and recreate it to record this for posterity.
Looking at the page you cited I noted the suggested, simplified
URL for embedding at the bottom of the page in some suggested
html:
http://video.google.com/googleplayer.swf?docid=8023519711099005229
I then fed this into gnash with this command:
$ gnash -vr 2 'http://video.google.com/googleplayer.swf?docid=8023519711099005229'
-v is for verbose
-r 2 is to play sound only, no video.
This cranked out about a screen of hard to understand very
technical messages, and eventually stalled out, but before
that it spit out a message:
17430] 22:13:13: SECURITY: Loading XML file from url:
'http://video.google.com/videofeed?fgvns=1&fai=1&docid=8023519711099005229&hl=undefined'
before I lost patience and hit control c.
This seems to have the same hash string, all numeric at the original
URL, so everthing up to here could be arrived at in a shortcut
manner knowing what is crucial in the original URL and the form
of this final url. That is to say, gnash is not essential to get to here!
I then dumped that URL with the command:
$ lynx -source 'http://video.google.com/videofeed?fgvns=1&fai=1&docid=8023519711099005229&hl=undefined'
and studying the output I noticed a string:
vidurl=http%3A%2F%2Fvideo.google.com%2Fvideoplay%3Fdocid%3D8023519711099005229%26hl%3Den&usg=AL29H2354Bu9OKKGOkt8CkFi7UeioVIIgQ"
and then fed that into a tool I have 'dex' (de-hex encode) to
convert the % escaped hex encoded characters back into
straight characters:
$ dex <<< 'http%3A%2F%2Fvideo.google.com%2Fvideoplay%3Fdocid%3D8023519711099005229%26hl%3Den&usg=AL29H2354Bu9OKKGOkt8CkFi7UeioVIIgQ'
and it output:
http://video.google.com/videoplay?docid=8023519711099005229&hl=en&usg=AL29H2354Bu9OKKGOkt8CkFi7UeioVIIgQ
I then did another:
$ lynx -source
'http://video.google.com/videoplay?docid=8023519711099005229&hl=en&usg=AL29H2354Bu9OKKGOkt8CkFi7UeioVIIgQ'
and saw in the output of it the string:
videoUrl\x3dhttp://vp.video.google.com/videodownload%3Fversion%3D0%26secureurl%3DQwAAAD0W5d-wPcVVeTl7QrGr5zTKridzSGf41ASu20PecohoZi2sOgFpCHjq8L-P4O1pMCjFMcitHUDMkPHMpztjyF2Gfx5zzGZQK-3bM6BN4oWB%26sigh%3DNrWxm3ARGWrIAA7a4p9rRb4PcMs%26begin%3D0%26len%3D6395300%26docid%3D8023519711099005229\x26
and feeding the string between the \x escaped hex numbers into dex again:
$ dex <<<
'http://vp.video.google.com/videodownload%3Fversion%3D0%26secureurl%3DQwAAAD0W5d-wPcVVeTl7QrGr5zTKridzSGf41ASu20PecohoZi2sOgFpCHjq8L-P4O1pMCjFMcitHUDMkPHMpztjyF2Gfx5zzGZQK-3bM6BN4oWB%26sigh%3DNrWxm3ARGWrIAA7a4p9rRb4PcMs%26begin%3D0%26len%3D6395300%26docid%3D8023519711099005229'
got the final URL:
http://vp.video.google.com/videodownload?version=0&secureurl=QwAAAD0W5d-wPcVVeTl7QrGr5zTKridzSGf41ASu20PecohoZi2sOgFpCHjq8L-P4O1pMCjFMcitHUDMkPHMpztjyF2Gfx5zzGZQK-3bM6BN4oWB&sigh=NrWxm3ARGWrIAA7a4p9rRb4PcMs&begin=0&len=6395300&docid=8023519711099005229
This is rediculously complicated, and obviously needs to be
automated (as it usually is via javascript! :-) )
But it produced working result and is describable.
(after a trip to Food for Less)
However there is an easier way to do this.
I just fed the original URL into usnatch:
usnatch 'http://video.google.com/videoplay?docid=8023519711099005229&q=debate&ei=IYM7SOuXEKDk4AL5473bAw' -u
and it output this url, probably by way of scraping it from
KeepVid.com:
http://vp.video.google.com/videodownload?version=0&secureurl=twAAAD0W5d-wPcVVeTl7QrGr5zTKridzSGf41ASu20PecohoD0a7UlL0hOryryfecm0kR0Az1TAZjqmcK4Jhzww767-M--b5VXs0aG2FyEksUHG7jZMWLv12yp10ahgVqupjVDS1ehay8IuXr_K5CJYVeSwkYqKv5owxTDiGz7X7xKbrgQVNx-7ue4RTDjur5LWmoqryoLSCkqAgx6UteEa8LIwTCBSDJhB3jzak8cwIcF70G2Np9NfuVtZ8OmJwuiMRsg
which played when I used it with mplayer, making sure to enclose the
url in single quotes after pasting into the command.
So it could of been played from Lynx and I'm downloading the
file from Google right now. The trick is to back up from the
page where they want you to view it, and be on top of the video.google.com
search results link to run usnatch.
Alternatively, from that page you could invoke usnatch
from the original url on the original url by using the
comma key instead of the period key to invoke the usnatch
external program. Period calls externals for the currently
active link, comma calls externals for the current page.
2008 June 1 Afterward
...And of course the ultimate goal of the long exercise is to include the algorithm described in
usnatch, with the idea of making it less dependent on scraping information from sites like
http://KeepVid.com.