wget blocked (frustration)

I found that some sites actively block the wget user-agent string, to prevent the automated grabbing of web pages. I particularly found that ctrl-alt-del was blocking it. I tried hitting cad-comic.com so many times, couldn't figure out what was wrong till I test user-agent switcher in Firefox and found out that the Wget agent string is blocked.

I was furious for the moment, but of of course:
---
man wget
/user-agent
---
yielded the solution.

wget --user-agent="opera"

At first I was angry, but then amused, that their efforts are thwarted by an option that comes stock with the tool. Also before that I found a solution in "w3m -dump_source" as w3m is not blocked.

I thought I'd vent my anger a bit here. :) Hope it helps anyone in the future trying to automate the downloading of webpages...

consequently, if anyone wants to read ctrl-alt-del or questionablecontent enmasse without clicking "next" repeatedly, let me know, I have some scripts that make this convenient. :)
(email me as I often forget to check these nowadays)

aberry@uoguelph.ca's picture

I'm pretty sure there are

I'm pretty sure there are Firefox extensions to automate such processes as well. You might want to look at http://pipes.yahoo.com/pipes/search?r=source%3Acad-comic.com for some inspiration.

--
Andrew

npresta@uoguelph.ca's picture

Perhaps use cURL?

Perhaps use cURL?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.