Philadelphia Reflections

The musings of a physician who has served the community for over six decades

Related Topics

No topics are associated with this blog

Parsing name-value pair attributes in an HTML tag

Not only do the attributes in an HTML tag come in random order but many are optional

Here's a regex solution:

<?php
function tagAttr($matches) {print_r($matches);}

$string = '<img src="/images/picture.jpg" width="300" class="left" alt="alt keywords" />';

$foo	= preg_replace_callback(
'/<img\b(?>\s+(?:alt="([^"]*)"|class="([^"]*)"|style="([^"]*)"|src="([^"]*)"|height="([^"]*)"|width="([^"]*)")|[^\s>]+|\s+)*>/i',
"tagAttr",
$string);
?>

Produces the following:

Array
(
    [0] => <img src="/images/picture.jpg" width="300" class="left" alt="alt keywords" />
    [1] => alt keywords
    [2] => left
    [3] => 
    [4] => /images/picture.jpg
    [5] => 
    [6] => 300
)

The regex is a series of alternating sequences; so, add href="([^"]*)"| in front of alt="([^"]*)" to select an additional attribute.

$matches[0] is the complete match
$matches[1] is alt=
$matches[2] is class=
$matches[3] is style=
$matches[4] is src=
$matches[5] is height=
$matches[6] is width=

My thanks (a) to Flagrant Badassery for putting me onto the idea and (b) to https://centricle.com/tools/html-entities/ for HTML encoding

Originally published: Wednesday, December 19, 2007; most-recently modified: Monday, June 04, 2012