Name:
Location: United Kingdom

I am a software developer and consultant with more than a quarter of a century of technology change and challenges to draw experience from. While I maintain and exercise some skills from the dark ages of computing I also enjoy taming the new technologies as they turn up – always looking for ways to deliver truly effective software systems to my customers.

Friday, September 21, 2007

Third party advertising on your web site

Disclaimer, one of our company web sites takes Google AdWord advertising on some selected general information pages but this blog does not show adverts and does not even provide affiliate links to Amazon for the books reviewed or mentioned. There is a reason for this cautious approach. Once you sign up for advertising to support your site costs you automatically lose control of part of the site content.

A recent example. When I visit the otherwise excellent User Friendly site to read the latest cartoon strip then most days the banner advertising at the top of the page for visitors using a French ISP is for scum ware. This would make it very hard to recommend that others visit this site as they might be tempted to click on the ad. I raised this issue with the team responsible for the site and got a response. Their view was that the site had to be supported by advertising and that they were not going to be bothered by ads that required the user to actively click on something before anything bad happened to that user’s PC. I accept the need for advertising revenue but I am not so sure I am comfortable with that moral position.

The same scum ware ads turn up from time to time on the Dilbert site as well if you are in France (and presumably if you visit from other parts of the world as well). However there is little chance that either the “User Friendly” or Dilbert sites will be blocked by Google because of dangerous content – but it could happen to you. The Register has a post today that tracks more than one web sites problems when they were flagged by Google as a site likely to be actively harmful. Now I would not want that stuff on a site of mine anyway but allowing a third party to plaster malicious advertising over your headlines just does not make Internet business sense.

What’s in an email address?

Well probably many more different characters than you would think.

A maintenance routine run by a customer choked on a new email address the other day – the format was one that lay outside of the range of the regular expression used to validate the entry field on the .NET windows form. The user logged a fault call and we had a tested update to the validation winging it’s way back again in minutes – not a big deal but…

Let us apply a little computational thinking to this area.

The validation problem was not strictly a bug. The regular expression being used was intended to help the user catch keying and transcription errors and was thus fairly strict. It probably validates more than 90% of email addresses correctly but was never intended to allow the full range of possible variations allowed by the specification RFC2822 .
The web site www.Regular-expressions suggests that the full specification would require the following regular expression to implement:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\
x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:
(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4]
[0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:
[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

and this would trap very few typos because it is much too wide ranging.
Out of interest we are currently using:
^([\w\.\-\&]+)@([\w\-.]+)[\.]((([a-zA-Z]){2,3})|((([a-zA-Z]){2,3})[\.](([a-zA-Z]){2,3})))$
and this covers most requirements for a UK based business recording internal and customer email addresses. However it will not validate some email addresses in use (such as mike@computer.museum). No I didn’t know that museum was a valid TLD (top level domain) either.
There are positive benefits in implementing validation routines that are technically faulty – it can make them more effective for daily use. However, they such restrictions only effective when they are applied at the interface between the user and a system. They should never be applied to data that has entered a system – that data should have a high level of trust after all. Plus – if it is a customer entering his or her email address then it might make sense to use a fairly strict validation to supply a warning that the email address might not be correct but a pretty dumb move to void a transaction if the customer’s actual email address contains unusual but valid characters.

Take a look at a post by Ragenwald explaining just what can go wrong (pretty forthrightly too).

2 Comments:

Blogger Reginald Braithwaite said...

If you are asking someone to retype their email to avoid misspellings, what role does validating their email play?

I would guess that if someone doesn't know their own email and thinks it is "mike@computermuseum", he will type that twice and you will catch it.

But he can still make mistakes like "mike@xomputer.museum", and you won't catch that without active validation, doing a domain lookup or actually sending an email.

So... where is the value here?

9:31 AM  
Blogger Mike Griffiths said...

That’s exactly it. Thanks for underlining it Reg (Ragenwald.)

The email validation I described at the start of my post was intended to help my customer’s clerical types get their customer’s email addresses right. Validation made sense in that context because any inadvertent “blocking” of a legitimate email address would be detected and fixed promptly.

Rejecting a customers email address when he or she enters it directly is insane.

You have to think about what the email address is for.

If the address is crucial to a transaction then you have to confirm it not validate it. By that I mean (perhaps) send the customer an email to the specified address and await a confirming server transaction when it is read and (say) a link clicked on. So if you are going to send electronic air tickets then confirm the email address – don’t validate it.

If the email address is not crucial to the transaction then you have to “believe” what is entered – particularly if you ask your customer to enter it twice. It’s not just good manners but sensible business practice.

It is also crucial of course to ensure that processes that are not customer facing do not apply their own validation over and above checking that any given address is actually present in the database.

12:55 AM  

Post a Comment

Links to this post:

Create a Link

<< Home