A friend asked me for this and it turns out to be a bit trickier than you’d think.
There are plenty of tools for crawling a website and reporting broken links. You can get 90% of this with just wget or curl.
But to check a list of links is a bit tricker.
The basic test is pretty simple
foreach ($urllist as $url) { if (fopen($url)) { print "valid"; } }
But it requires allow_url_fopen to be enabled, doesn’t check for redirects, chokes if you’re behind a proxy, etc.
Using curl solves these particular problems, but requires libcurl to be built with your PHP, and it is quite clunky to use:
Anyway, here’s what I came up with: https://gist.github.com/1508261
You’ll still run into URL parsing problems (like a URL needs a trailing slash after then hostname (curl command line handles this fine, but not in PHP.) Building the list of URLs is an exercise left to the reader.
If anyone wants it, I can put up a simple web UI wrapper with a text area, file upload button, or REST web service for scanning URLs.
Here’s a perl tool that checks for dead links and more:
http://journalxtra.com/2010/02/how-to-check-for-dead-links/3/