logo

Logic To Compare Names of Two People To See If Same

Is Jake Howlett the same person as Howlett Jake or Jake R Howlett?

Without some unique key, such as social security or NI number, to compare there's no way of knowing. Is there a way to take an educated guess though?

This week I had a request to produce a view that listed the names of people who had requested a free voucher along-side the name of the person who actually redeemed the voucher.

The name of the redeemers came in the form of a CSV file from the voucher suppliers and contained two columns -- one for voucher code (the key which binds the two names) and one for redeemer.

smudgedMy task was simply to produce a "view" so that the company giving the vouchers out (my customer) could look for naughty people.

What they didn't ask for and what I decided to add is some way of highlighting the names that obviously don't match. I did this (see right) partly as a sweetener for the customer but also as a challenge to myself to make an otherwise dull task a little more interesting.

While what I did was in PHP and Flex I've put together a Notes version for you to play with and test the theory.

My PHP code uses the array_intersect() function to count the number of matching items from both names when split on " ". If there is more than 2 intersects then it's considered the same person. If no intersects it's in red. Only one intersect and it's a partial match (orange).

The Notes version (shown below) works in the opposite way in that it counts the number of items left when one array is replaces with the other. The more items left the less of a match it is.

ScreenShot003

The formula for the first column is:

a1:=@Explode(@LowerCase(Redeemer); " ");
a2:=@Explode(@LowerCase(Requestor); " ");
remaining:=@Elements(@Trim(@Replace(a1; a2; "")));
@If(remaining=0; 85; remaining=1; 86; 87)

While the formula for the second column is:

a1:=@Explode(@LowerCase(Redeemer); " ");
a2:=@Explode(@LowerCase(Requestor); " ");
remaining:=@Elements(@Trim(@Replace(a2; a1; "")));
@If(remaining=0; 85; remaining=1; 86; 87)

Notice the subtle difference? The accuracy of the result seems to depend on which way round your replace the lists. Exactly why that is, I can't remember. You guys?

So, the Friday Fun Challenge (should anybody accept it) is to come up with an @Formula for a column to accurately (as is reasonably possible) match two strings to see if they're the name of the same person or not.

Comments

  1. Well of course it does matter which way round.

    @Replace( sourcelist ; fromlist ; tolist )

    If an entry exists in the fromlist, then it doesn't matter if it's in the source list or not. It is simply never used.

    As to a solution. Well when you're messing with names it can be a real pain. Did you ask for their common names, or full names, or nick names or what? My Full name is David Richard Andrew Graham Orson Neil Cotterill. Given name is David Cotterill. Nickname is Dragon Cotterill (try abbreviating my full name!)

    I suspect that what you have got is a good enough solution. Combining the results of both calculations into one value would help instant checking. If each result returns a value of 0, 1 or 2 as to how well it matches (your smiley/neutral/sad faces) then adding the two togther to get a result works.

    Colour coding the lines based on your two results (ie a scale of 0-4, 0 being a complete match, 4 being nothing matches) would be an obvious choice to make the highlights easier to spot. A tag of 3 or 4 (two sad, or one neutral & one sad) is an obvious fraud. 1 or 2 is a possibility (but I certainly wouldn't worry over them) and obviously 0 is full blown match.

    • avatar
    • Jake Howlett
    • Fri 20 Mar 2009 07:10 AM

    Doh!! I knew that as well.

    Good idea to combine the two. I'll have a go at adding a third column in a mo...

  2. Another thought just hit me. Do you strip extra characters like hyphens, commas and apostrophes?

    Aaron O'Donnell vs ODonnell, A

    This would show as a failure on both tests, yet it shouldn't be

    • avatar
    • Ken
    • Fri 20 Mar 2009 07:57 AM

    Jake - have you looked at @Soundex( string ) ?

  3. I'd be interested to see the php code too.

    • avatar
    • Jim Man
    • Fri 20 Mar 2009 02:35 PM

    @Soundex is not very accurate. Having tried it lately, it sounds to me this is going to be more accurate. At least it is more understandable! Soundex results in a somewhat cryptic value.

  4. Why not just combine what you have with:

    a1:=@Explode(@LowerCase(Redeemer); " ");

    a2:=@Explode(@LowerCase(Requestor); " ");

    remaining1:=@Elements(@Trim(@Replace(a1; a2; "")));

    remaining2:=@Elements(@Trim(@Replace(a2; a1; "")));

    @If(remaining1=0 & remaining2 = 0; 85; a1=a2 ; 87 ; 86)

    • avatar
    • CJ
    • Mon 30 Mar 2009 09:53 AM

    You could use @Keywords, which does away with the need for lots of @Trim(@Replace malarky...

    reds := @LowerCase(@Explode(Redeemer; " "));

    reqs := @LowerCase(@Explode(Requestor; " "));

    matches := @Elements(@Keywords(n1; n2));

    @If(matches=0; ":("; matches=1; ":|"; matches > 1; ":)"; ":?")

Your Comments

Name:
E-mail:
(optional)
Website:
(optional)
Comment:


About This Page

Written by Jake Howlett on Fri 20 Mar 2009

Share This Page

# ( ) '

Comments

The most recent comments added:

Skip to the comments or add your own.

You can subscribe to an individual RSS feed of comments on this entry.

Let's Get Social


About This Website

CodeStore is all about web development. Concentrating on Lotus Domino, ASP.NET, Flex, SharePoint and all things internet.

Your host is Jake Howlett who runs his own web development company called Rockall Design and is always on the lookout for new and interesting work to do.

You can find me on Twitter and on Linked In.

Read more about this site »

Elsewhere

Here are the external links posted on the same day.

More links are available in the archive »

More Content