Regex URL Matching
On this site we check for the existence of a URL whenever an entry is updated
There are two key technologies at work
- A PHP function that checks whether a URL is valid (thanks to marufit at gmail dot com in the PHP Manual)
- Regex (regular expression) in a preg_replace_callback routine; this one is mine, all mine
function url_exists($url)
{
//
// checks whether a URL actually exists on the Internet
//
$handle = curl_init($url);
if (false === $handle)
{
return false;
}
curl_setopt($handle, CURLOPT_HEADER, false);
curl_setopt($handle, CURLOPT_FAILONERROR, true);
curl_setopt($handle, CURLOPT_NOBODY, true);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, false);
$connectable = curl_exec($handle);
curl_close($handle);
return $connectable;
}
function aExists($matches)
{
//
// function called by preg_replace_callback
//
// $matches[0] is the complete match
// $matches[1] the match for the first subpattern
// enclosed in '(...)' and so on
//
// checks to see if a regular link exists
// something similar is done for img src= also
//
$srcURL = $matches[3];
if (url_exists($srcURL)) {do something; return "";}
else {do something else; return "";}
}
$foo = preg_replace_callback(
'/(.*?)(<a .*?href=")([^"]*)("[^>]*>)(.*?)(<\/a>)/i',
"aExists",
$source_string);
(my thanks to http://centricle.com/tools/html-entities/ for HTML encoding)
(1347)







