wget blocked (frustration)
I found that some sites actively block the wget user-agent string, to prevent the automated grabbing of web pages. I particularly found that ctrl-alt-del was blocking it. I tried hitting cad-comic.com so many times, couldn't figure out what was wrong till I test user-agent switcher in Firefox and found out that the Wget agent string is blocked.
I was furious for the moment, but of of course:
---
man wget
/user-agent
---
yielded the solution.
wget --user-agent="opera"
At first I was angry, but then amused, that their efforts are thwarted by an option that comes stock with the tool. Also before that I found a solution in "w3m -dump_source" as w3m is not blocked.
I thought I'd vent my anger a bit here. :) Hope it helps anyone in the future trying to automate the downloading of webpages...
consequently, if anyone wants to read ctrl-alt-del or questionablecontent enmasse without clicking "next" repeatedly, let me know, I have some scripts that make this convenient. :)
(email me as I often forget to check these nowadays)
- Login to post comments

I'm pretty sure there are
I'm pretty sure there are Firefox extensions to automate such processes as well. You might want to look at http://pipes.yahoo.com/pipes/search?r=source%3Acad-comic.com for some inspiration.
--
Andrew
Perhaps use cURL?
Perhaps use cURL?