PHILADELPHIA REFLECTIONS
Musings of a Philadelphia Physician who has served the community for six decades

Return to Home

Related Topics

Website Development
The website technology supporting Philadelphia Reflections is PHP, MySQL and DHTML. The web hosting service is Internet Planners. The development of this website has provided an opportunity to learn new technology, to try out different techniques for getting noticed by the search engines and the trials and tribulations of dealing with malicious hackers and spammers who range from the annoying to the abusive. This collection of articles documents some of our experiences and we hope that people surfing the web looking for solutions to problems we've encountered will benefit.

Regular Expressions

Anyone who has used the expression *.doc to search for Word files has used Regular Expressions ("regex") without realizing it. Regex arose from mathematical theory and is available in many programming languages; it is simply the only way to deal with large amounts of text. And yet most people are completely unaware of it.

Philadelphia Reflections uses regex extensively for two primary purposes: (1) checking input from forms and (2) modifying HTML input in during the creation of articles for the site.

The text PHP and MySQL by Larry Ullman has a very good introduction to regex in his chapter on security.

The great advantage of regex is that it can identify very complex patterns in a mass of text. The great disadvantage of regex is that it has developed in sort of an underground way and there exist numerous varieties that are essentially incompatible. PHP offers two regex functions: one for the POSIX Extended variety of regex and he other for the Perl language compatible vesion called PCRE. POSIX is less powerful but far easier to learn. JavaScript offers its own variety of regex which isn't quite the same as either of the two PHP versions.

References include the Ullman book, the PHP online manual has a number of handy tips on regex use in its two supported varieties, the O'Reilly book Mastering Regular Expressions is interesting and Jan Goyvaerts has a very helpful website (http://www.regular-expressions.info/) and book Regular Expressions: The Complete Tutorial.

My experience is that this area requires diligent hacking which may be sub optimal but unavoidable ... for this purpose, Jan Goyvaerts' Regex Buddy is indispensible; you simply must get this program if you hope to make anything of Regex.

Here are examples of checking for a valid email address in both Javascript and PHP:

Javascript


// check email

var namePattern = /^[a-zA-Z0-9][a-z0-9_.-]*@[a-z0-9.-]+\.[a-z]{2,4}$/i;

document.comment_form.email.value = trim(document.comment_form.email.value);
					
if	(
	(  document.comment_form.email.value.length > 0)
			&&
	(! document.comment_form.email.value == "[none]")
			&&
	(! document.comment_form.email.value.match(namePattern))
	) 
		{
			alert("Please enter a valid email address");
    			document.comment_form.email.focus();
       			document.comment_form.email.select();
			problem = "yes";
       			return false;
   		}

PHP


// check email

$emailpattern = "^[a-zA-Z0-9][a-z0-9_.-]*@[a-z0-9.-]+\.[a-z]{2,4}$";
		
if	(
	(trim(strlen($_POST['email']))  > 0)
		
			and
					
	(!$_POST['email'] == "[none]")
			
			and

	(!eregi ($emailpattern, stripslashes(trim($_POST['email']))))
	)
		{
		$inputerror		=	TRUE;
		$inputerrormessage	.=	"<br />* An invalid email address was entered";
		}


Incomprehensible? Yes, absolutely.

Useful? More than you can realize until you are actually faced with the problem of, say, verifying that a user has input a valid email format, or trying to figure out whether a user-input IMG tag is using the correct syntax; or else maybe trying to convert a huge web page from XHTML 1.1 to HTML 4.01 because you've determined that the browser is syntactically crippled.

And, once you get deep into it, the stuff is actually intriguing and fun.

(1128)

Please Let Us Know What You Think


(HTML tags provide better formatting)

Because of robot spam we ask you to confirm your comment: we will send you an email containing a link to click. We apologize for this inconvenience but this ensures the quality of the comments. (Your email will not be displayed.)
Thank you.