Philadelphia Reflections

The musings of a physician who has served the community for over six decades

Related Topics

No topics are associated with this blog

Regex URL Matching

On this site we check for the existence of a URL whenever an entry is updated

There are two key technologies at work


function url_exists($url) 
{
// 
// checks whether a URL actually exists on the Internet
//
$handle   = curl_init($url);
if (false === $handle)
   {
    return false;
   }
curl_setopt($handle, CURLOPT_HEADER, false);
curl_setopt($handle, CURLOPT_FAILONERROR, true); 
curl_setopt($handle, CURLOPT_NOBODY, true);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, false);
$connectable = curl_exec($handle);
curl_close($handle);   
return $connectable;
}


function aExists($matches)
{
//
// function called by preg_replace_callback
//
// $matches[0] is the complete match
// $matches[1] the match for the first subpattern
//	enclosed in '(...)' and so on

//
// checks to see if a regular link exists
// something similar is done for img src= also
//

$srcURL = $matches[3];
		
if (url_exists($srcURL)) {do something; return "";}  
else {do something else; return "";}
}

$foo = preg_replace_callback(
            '/(.*?)(<a .*?href=")([^"]*)("[^>]*>)(.*?)(<\/a>)/i',
            "aExists",
            $source_string);

(my thanks to https://centricle.com/tools/html-entities/ for HTML encoding)

Originally published: Wednesday, December 12, 2007; most-recently modified: Monday, June 04, 2012