Narrowing down results

This is a discussion on Narrowing down results within the MySQL Database forums, part of the Database Forums category; Hi again, my second post today... development is going quickly! I'm working on a 'quiz' script that matches user ...


Go Back   Usenet Forums > Database Forums > MySQL Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 05-26-2008
Matt
 
Posts: n/a
Default Narrowing down results

Hi again, my second post today... development is going quickly!

I'm working on a 'quiz' script that matches user preferences to stored
attributes. I want the user to select their choice of preferences and
the query to come back with the nearest match to what they need. Let
me give an example:

There are three preference categories (age, type, and time), and each
of these contains three choices. There are only around 17 outcomes so
clearly not every combination of choices is represented, so I
anticipate that sometimes the 'match' will only be 2 out of 3
preferences. Below that and I'll return some kind of 'refine your
choices' error.

I've attempted to write out on paper a diagram/natural language
version of how this could work, but I imagine there's a better method
I'm unaware of. My diagram consists of a set of nested queries and php
arrays which seems inefficient and inaccurate.

Essentially, the user is going to end up with an array of answers at
the end of the quiz (say, 2, 5, 8) and I want these three variables to
be checked against the three rows in the table and to return the
closest match(es). Can anybody point me in the right direction/part of
the manual for this kind of querying?

Thank you,

Matt
Reply With Quote
  #2 (permalink)  
Old 05-26-2008
Gordon Burditt
 
Posts: n/a
Default Re: Narrowing down results

>I'm working on a 'quiz' script that matches user preferences to stored
>attributes. I want the user to select their choice of preferences and
>the query to come back with the nearest match to what they need. Let
>me give an example:
>
>There are three preference categories (age, type, and time), and each
>of these contains three choices. There are only around 17 outcomes so
>clearly not every combination of choices is represented, so I
>anticipate that sometimes the 'match' will only be 2 out of 3
>preferences. Below that and I'll return some kind of 'refine your
>choices' error.


I don't know how well this will work, but it does have some uses
in other applications.

Define a "matching score" which is supposed to be a measure of how
well the preference and one of the outcomes matches. For starters,
take the difference between each preference category and the outcome,
square them, and add them up. The lowest matching score is the
best match. This tends to prefer matches where all the categories
are slightly mismatched over a match where all but one of the
categories is a perfect match but that one is *WAY* off. If all
of the categories are such that there is a "match" or "no match"
with no concept of a "slight mismatch" vs. a "large mismatch", the
matching score becomes a count of the number of mismatched categories.
If you want to de-emphasize the category that is *WAY* off, use the
absolute value of the difference rather than the square of the
difference.

You may wish to weight the categories: a mismatch of 1 unit in
'type' is equivalent to how many years different in 'age'? You
might have a category where there's no "near match": either there's
a match or there isn't. In that case, score 1 for a mismatch and
0 for a match. Then weight it if desired.

If this is a dating application then a mismatch of desired sex of
date with the date's sex should probably have a weight of a billion
or more, or you arrange not to even try to match heterosexual men
with other heterosexual men. Religion is problematical, but Methodist
and Baptist are probably a lot closer match than Jewish and Islamic.
If the desired age range was given as 35-40, anyone in that range
is a perfect match, and someone 41 or 34 is a slight mismatch which
could be traded off against personality factors. You might weight
things so a 3 year mismatch in age is equivalent to a 1 point
mismatch on Aunt Gertrude's scale of Neat Freakiness (scores of
1-5).

In SQL, you can select the outcomes and the matching score, order
by the matching score, and perhaps limit it to some of the best
matches.

>I've attempted to write out on paper a diagram/natural language
>version of how this could work, but I imagine there's a better method
>I'm unaware of. My diagram consists of a set of nested queries and php
>arrays which seems inefficient and inaccurate.
>
>Essentially, the user is going to end up with an array of answers at
>the end of the quiz (say, 2, 5, 8) and I want these three variables to
>be checked against the three rows in the table and to return the
>closest match(es). Can anybody point me in the right direction/part of
>the manual for this kind of querying?


You need to decide what a "good match" is (in terms of numbers you
can calculate manually) before trying to write queries. Since you
don't describe at all what you're trying to match against what, and
what constitutes a good match, no one can help you.

Suppose $qage, $qtype, and $qtime are the answers from the quiz,
which get substituted into the query. oage, otype, and otime are
values from the outcomes. You might use a query like:

select outcomeid, oage, otype, otime,
($qage - oage)*($qage - oage)*0.1 +
($qtype - otype)*($qtype - otype)*2.0 +
($qtime - otime)*($qtime - otime)*1.0 as score
from outcomes_table order by score limit 5;

The weights 0.2, 2.0, and 1.0 are something you've determined by
experimenting with the matches or estimated on a theoretical basis.

Reply With Quote
  #3 (permalink)  
Old 05-26-2008
Matt
 
Posts: n/a
Default Re: Narrowing down results

On May 26, 8:20 pm, gordonb.2c...@burditt.org (Gordon Burditt) wrote:

> You need to decide what a "good match" is (in terms of numbers you
> can calculate manually) before trying to write queries. Since you
> don't describe at all what you're trying to match against what, and
> what constitutes a good match, no one can help you.
>
> Suppose $qage, $qtype, and $qtime are the answers from the quiz,
> which get substituted into the query. oage, otype, and otime are
> values from the outcomes. You might use a query like:
>
> select outcomeid, oage, otype, otime,
> ($qage - oage)*($qage - oage)*0.1 +
> ($qtype - otype)*($qtype - otype)*2.0 +
> ($qtime - otime)*($qtime - otime)*1.0 as score
> from outcomes_table order by score limit 5;
>
> The weights 0.2, 2.0, and 1.0 are something you've determined by
> experimenting with the matches or estimated on a theoretical basis.


Hi Gordon, thank you for your detailed reply. I can see I'm getting
into a fairly complex operation here...

Let me be more specific with my data: it's not for a dating site
(although I can see how it looks that way). Basically, the site is a
homepage for a charity group that run various projects. Some projects
are with kids, some are with the elderly, etc. To make it easier for
newcomers, the quiz I'm producing will hopefully recommend a project
to them based on preferences.

Each of the projects has an associated age group (kids, teens,
elderly), type (outdoor, indoor, environmental) and time (weekdays,
weekends, long-term). There's three questions in the 'quiz' to
determine the user's preference. In this sense, I'm not really sure
how important these factors are in relation to one another. I mean,
that probably comes down to user preference rather than out-and-out
logic like your Methodist/Baptist example.

Obviously with a smallish amount of projects, there might not be a
result that matches the user's preference (there are only 3 projects
with the elderly, and none of these are long-term, for example). I
guess in this case I need to code for low-scoring matches and give
some kind of alternative suggestion (which does exist). I'll
experiment with your code and scoring suggestions and see if I can get
anywhere.

Thank you again,
Matt
Reply With Quote
  #4 (permalink)  
Old 05-26-2008
strawberry
 
Posts: n/a
Default Re: Narrowing down results

On May 26, 8:42 pm, Matt <guitarroman...@gmail.com> wrote:
> On May 26, 8:20 pm, gordonb.2c...@burditt.org (Gordon Burditt) wrote:
>
>
>
> > You need to decide what a "good match" is (in terms of numbers you
> > can calculate manually) before trying to write queries. Since you
> > don't describe at all what you're trying to match against what, and
> > what constitutes a good match, no one can help you.

>
> > Suppose $qage, $qtype, and $qtime are the answers from the quiz,
> > which get substituted into the query. oage, otype, and otime are
> > values from the outcomes. You might use a query like:

>
> > select outcomeid, oage, otype, otime,
> > ($qage - oage)*($qage - oage)*0.1 +
> > ($qtype - otype)*($qtype - otype)*2.0 +
> > ($qtime - otime)*($qtime - otime)*1.0 as score
> > from outcomes_table order by score limit 5;

>
> > The weights 0.2, 2.0, and 1.0 are something you've determined by
> > experimenting with the matches or estimated on a theoretical basis.

>
> Hi Gordon, thank you for your detailed reply. I can see I'm getting
> into a fairly complex operation here...
>
> Let me be more specific with my data: it's not for a dating site
> (although I can see how it looks that way). Basically, the site is a
> homepage for a charity group that run various projects. Some projects
> are with kids, some are with the elderly, etc. To make it easier for
> newcomers, the quiz I'm producing will hopefully recommend a project
> to them based on preferences.
>
> Each of the projects has an associated age group (kids, teens,
> elderly), type (outdoor, indoor, environmental) and time (weekdays,
> weekends, long-term). There's three questions in the 'quiz' to
> determine the user's preference. In this sense, I'm not really sure
> how important these factors are in relation to one another. I mean,
> that probably comes down to user preference rather than out-and-out
> logic like your Methodist/Baptist example.
>
> Obviously with a smallish amount of projects, there might not be a
> result that matches the user's preference (there are only 3 projects
> with the elderly, and none of these are long-term, for example). I
> guess in this case I need to code for low-scoring matches and give
> some kind of alternative suggestion (which does exist). I'll
> experiment with your code and scoring suggestions and see if I can get
> anywhere.
>
> Thank you again,
> Matt


Here's one idea.

From the point of view of a volunteer, projects are defined by the
three attributes of age_group, environment and commitment,
so you may have a table that looks like this:

+------------+-----------+-------------+------------+
| project_id | age_group | environment | commitment |
+------------+-----------+-------------+------------+
| 1 | kids | indooor | weekdays |
| 2 | kids | outdoor | long-term |
| 3 | teens | indoor | weekends |
| 4 | teens | indoor | weekdays |
| 5 | elderly | outdoor | weekdays |
| 6 | elderly | outdoor | weekends |
+------------+-----------+-------------+------------+

It seems to me that volunteers need to be able to prioritise two
things;
firstly, which of those attributes is most important to them, and
secondly, their preferences within those attributes.

Rather than performing some complex mathematics (although that might,
ultimately, be a better method) to filter the resultset,
you could instead just return the entire resultset, ordered according
to the volunteer's preferences.

So, if a volunteer's key selection pressure is:

commitment (weekdays followed by weekends) followed by
age_group (elderly) followed by
environment (prefers outdoors)
then the query might look like this (untested):

SELECT *
FROM projects
ORDER BY
FIELD(commitment,'weekends','weekdays') DESC,
FIELD(age_group,'elderly') DESC,
FIELD(environment,'outdoor') DESC;

Note that I've reversed the FIELD preferences and added DESC
afterwards - I think this makes sense when substituting numeric
values.
Reply With Quote
  #5 (permalink)  
Old 05-27-2008
Matt
 
Posts: n/a
Default Re: Narrowing down results

On May 26, 11:57 pm, strawberry <zac.ca...@gmail.com> wrote:

> Here's one idea.
>
> From the point of view of a volunteer, projects are defined by the
> three attributes of age_group, environment and commitment,
> so you may have a table that looks like this:
>
> +------------+-----------+-------------+------------+
> | project_id | age_group | environment | commitment |
> +------------+-----------+-------------+------------+
> | 1 | kids | indooor | weekdays |
> | 2 | kids | outdoor | long-term |
> | 3 | teens | indoor | weekends |
> | 4 | teens | indoor | weekdays |
> | 5 | elderly | outdoor | weekdays |
> | 6 | elderly | outdoor | weekends |
> +------------+-----------+-------------+------------+
>
> It seems to me that volunteers need to be able to prioritise two
> things;
> firstly, which of those attributes is most important to them, and
> secondly, their preferences within those attributes.
>
> Rather than performing some complex mathematics (although that might,
> ultimately, be a better method) to filter the resultset,
> you could instead just return the entire resultset, ordered according
> to the volunteer's preferences.
>
> So, if a volunteer's key selection pressure is:
>
> commitment (weekdays followed by weekends) followed by
> age_group (elderly) followed by
> environment (prefers outdoors)
> then the query might look like this (untested):
>
> SELECT *
> FROM projects
> ORDER BY
> FIELD(commitment,'weekends','weekdays') DESC,
> FIELD(age_group,'elderly') DESC,
> FIELD(environment,'outdoor') DESC;
>
> Note that I've reversed the FIELD preferences and added DESC
> afterwards - I think this makes sense when substituting numeric
> values.


This is similar to what I was thinking after experimenting with
Gordon's code above. I got the code to work and got back some
(accurate-looking) results, so began considering which factors here
should be given more weight - I think we agree that commitment is the
most important (or least flexible). I'm still in the process of coding
the quiz itself, I wanted to get the query right before I started
shaping the app itself. Thanks for the food for thought! I'll try out
your code too and see how it compares with Gordon's.

Thanks again everyone.
Matt
Reply With Quote
  #6 (permalink)  
Old 05-27-2008
Gordon Burditt
 
Posts: n/a
Default Re: Narrowing down results

>> You need to decide what a "good match" is (in terms of numbers you
>> can calculate manually) before trying to write queries. Since you
>> don't describe at all what you're trying to match against what, and
>> what constitutes a good match, no one can help you.
>>
>> Suppose $qage, $qtype, and $qtime are the answers from the quiz,
>> which get substituted into the query. oage, otype, and otime are
>> values from the outcomes. You might use a query like:
>>
>> select outcomeid, oage, otype, otime,
>> ($qage - oage)*($qage - oage)*0.1 +
>> ($qtype - otype)*($qtype - otype)*2.0 +
>> ($qtime - otime)*($qtime - otime)*1.0 as score
>> from outcomes_table order by score limit 5;
>>
>> The weights 0.2, 2.0, and 1.0 are something you've determined by
>> experimenting with the matches or estimated on a theoretical basis.

>
>Hi Gordon, thank you for your detailed reply. I can see I'm getting
>into a fairly complex operation here...
>
>Let me be more specific with my data: it's not for a dating site
>(although I can see how it looks that way). Basically, the site is a


Well, I knew it wasn't a dating site from all the stuff NOT mentioned,
but it still makes a reasonable analogy anyway.

>homepage for a charity group that run various projects. Some projects
>are with kids, some are with the elderly, etc. To make it easier for
>newcomers, the quiz I'm producing will hopefully recommend a project
>to them based on preferences.


I'll suggest recommending at least two or three (you can offer them
as alternate choices) to avoid turning away people who might not
like the particular project.

>Each of the projects has an associated age group (kids, teens,
>elderly), type (outdoor, indoor, environmental) and time (weekdays,
>weekends, long-term). There's three questions in the 'quiz' to
>determine the user's preference. In this sense, I'm not really sure
>how important these factors are in relation to one another. I mean,
>that probably comes down to user preference rather than out-and-out
>logic like your Methodist/Baptist example.


One problem with trying to apply my approach directly to your quiz
is that I assume that there's some kind of continuum of preferences.
If you want to date someone 30-35, you're more likely to accept
someone 36 than someone 18. Few people will only date others with
odd-numbered ages. With yours, there's really not a continuum. If
they say they want to work with the elderly, you CANNOT assume they
prefer working with teens over kids as a second choice. I can
certainly see a volunteer with control-freak tendancies who could
work with children or the elderly, but not be able to stand teenagers.
There are also single men who are afraid of being accused of child
molestation if they are seen answering the door to a child (or in
some cases, *female* child).

Your quiz has 9 possible results, and given answers, one group is
chosen. If no project for that group exists, then you can vary one
of the values to get 6 other possibilities, but you have no guidance
as to which of the 6 they would prefer.

Suggestion: Ask for second choices that are acceptable. Ask which feature
is most important to them.

Here's an approach I thought of, courtesy of a hospital doing
customer-satisfaction surveys that called:

Rank on a scale of 1 to 5, with 1 being least desirable and 5 being the
most, which you would like doing on a volunteer project?

Work with the elderly?
Work with teenagers?
Work with children?

Work indoors?
Work outdoors?
Work to protect the environment?

Work on weekends?
Work on weekdays?
Work on a project with a long-term commitment?

If someone ranks all three of a category the same, they don't care
that much about that particular feature. If someone ranks an item
in a category with a 1 and another with a 5, obviously they've got
strong preferences here.

Scoring this: Each project has a percentage score for the attributes
above, adding to 100% in each category. In your example, all of
these are 0% or 100%, but you could possibly have a project that
involves getting children and the elderly together that would rank
50%, 0%, and 50% for working with the elderly, teenagers, and kids.
Multiply corresponding percentages with the user preferences, and add
them up. In this case, highest score is the best match for this
particular user.

Note that it doesn't really matter that much how stingy or generous the
user is with handing out preferences. The same sort of information can
be conveyed with a user who hands out an average of '4' vs. an average of
'2'. It's the differences that express preferences. You don't compare
preferences of two different users to each other. If a user leaves out
an answer to a question, you either insist or use the average of the
other answers.

You can also cheat a little and break ties with a fractional-point
score that indicates how desperately the project is in need of
additional help. This also gives guidance to the program for the
wishy-washy who rank everything with a '3'.

Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 05:09 PM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0