Ampersand Madness: Convert & to & to prevent XHTML errors
The whole subject of "encoding" gives me a headache.
Encoding In General
The first thing you have to know is: what is HTML encoding ... so look here:
⇒ http://htmlhelp.com/reference/html40/entities/
or here:
⇒ http://www.cookwood.com/html/extras/entities.html
(These are HTML encodings; URL encoding is something else again ... look here:
» http://www.blooberry.com/indexdot/html/topics/urlencoding.htm)
Ampersand Encoding and Conversion
Later on, you'll find out that the ampersand is a huge source of XHTML errors because it has to be written
- &
or - &
or - &
but you will struggle endlessly with how to get the darn thing to stay converted. First of all, content providers feel justifiably justified in including bare naked "&"s wherever they please; second of all, you will find that encoded ampersands get stripped back to their bare naked selves by browsers and other well-meaning sorts.
So, my undying thanks to Michael Ash's Regex Blog for providing the regex pattern in the following bit of PHP code:
$pattern = '/&(?!(?i:\#((x([\dA-F]){1,5})|(104857[0-5]|10485[0-6]\d|1048[0-4]\d\d|104[0-7]\d{3}|10[0-3]\d{4}|0?\d{1,6}))|([A-Za-z\d.]{2,31}));)/i';
$replacement = '&';
$string = preg_replace ( $pattern, $replacement, $string);
I don't know how it can possibly work, and I may yet eat my words, but for the moment it seems to do the trick.
≈
Ampersand Encoding In RSS
Another thing: & is the only ampersand encoding form acceptable to both RSS and Atom. So, look at the souce of this page and you will find that I use this encoding in the title ... that's because the title goes into the Title field of my RSS and Atom feeds.
≈
(1216)