View Single Post

  #2 (permalink)  
Old 05-26-2008
Gordon Burditt
 
Posts: n/a
Default Re: Narrowing down results

>I'm working on a 'quiz' script that matches user preferences to stored
>attributes. I want the user to select their choice of preferences and
>the query to come back with the nearest match to what they need. Let
>me give an example:
>
>There are three preference categories (age, type, and time), and each
>of these contains three choices. There are only around 17 outcomes so
>clearly not every combination of choices is represented, so I
>anticipate that sometimes the 'match' will only be 2 out of 3
>preferences. Below that and I'll return some kind of 'refine your
>choices' error.


I don't know how well this will work, but it does have some uses
in other applications.

Define a "matching score" which is supposed to be a measure of how
well the preference and one of the outcomes matches. For starters,
take the difference between each preference category and the outcome,
square them, and add them up. The lowest matching score is the
best match. This tends to prefer matches where all the categories
are slightly mismatched over a match where all but one of the
categories is a perfect match but that one is *WAY* off. If all
of the categories are such that there is a "match" or "no match"
with no concept of a "slight mismatch" vs. a "large mismatch", the
matching score becomes a count of the number of mismatched categories.
If you want to de-emphasize the category that is *WAY* off, use the
absolute value of the difference rather than the square of the
difference.

You may wish to weight the categories: a mismatch of 1 unit in
'type' is equivalent to how many years different in 'age'? You
might have a category where there's no "near match": either there's
a match or there isn't. In that case, score 1 for a mismatch and
0 for a match. Then weight it if desired.

If this is a dating application then a mismatch of desired sex of
date with the date's sex should probably have a weight of a billion
or more, or you arrange not to even try to match heterosexual men
with other heterosexual men. Religion is problematical, but Methodist
and Baptist are probably a lot closer match than Jewish and Islamic.
If the desired age range was given as 35-40, anyone in that range
is a perfect match, and someone 41 or 34 is a slight mismatch which
could be traded off against personality factors. You might weight
things so a 3 year mismatch in age is equivalent to a 1 point
mismatch on Aunt Gertrude's scale of Neat Freakiness (scores of
1-5).

In SQL, you can select the outcomes and the matching score, order
by the matching score, and perhaps limit it to some of the best
matches.

>I've attempted to write out on paper a diagram/natural language
>version of how this could work, but I imagine there's a better method
>I'm unaware of. My diagram consists of a set of nested queries and php
>arrays which seems inefficient and inaccurate.
>
>Essentially, the user is going to end up with an array of answers at
>the end of the quiz (say, 2, 5, 8) and I want these three variables to
>be checked against the three rows in the table and to return the
>closest match(es). Can anybody point me in the right direction/part of
>the manual for this kind of querying?


You need to decide what a "good match" is (in terms of numbers you
can calculate manually) before trying to write queries. Since you
don't describe at all what you're trying to match against what, and
what constitutes a good match, no one can help you.

Suppose $qage, $qtype, and $qtime are the answers from the quiz,
which get substituted into the query. oage, otype, and otime are
values from the outcomes. You might use a query like:

select outcomeid, oage, otype, otime,
($qage - oage)*($qage - oage)*0.1 +
($qtype - otype)*($qtype - otype)*2.0 +
($qtime - otime)*($qtime - otime)*1.0 as score
from outcomes_table order by score limit 5;

The weights 0.2, 2.0, and 1.0 are something you've determined by
experimenting with the matches or estimated on a theoretical basis.

Reply With Quote