This is a discussion on Narrowing down results within the MySQL Database forums, part of the Database Forums category; Hi again, my second post today... development is going quickly! I'm working on a 'quiz' script that matches user ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hi again, my second post today... development is going quickly!
I'm working on a 'quiz' script that matches user preferences to stored attributes. I want the user to select their choice of preferences and the query to come back with the nearest match to what they need. Let me give an example: There are three preference categories (age, type, and time), and each of these contains three choices. There are only around 17 outcomes so clearly not every combination of choices is represented, so I anticipate that sometimes the 'match' will only be 2 out of 3 preferences. Below that and I'll return some kind of 'refine your choices' error. I've attempted to write out on paper a diagram/natural language version of how this could work, but I imagine there's a better method I'm unaware of. My diagram consists of a set of nested queries and php arrays which seems inefficient and inaccurate. Essentially, the user is going to end up with an array of answers at the end of the quiz (say, 2, 5, 8) and I want these three variables to be checked against the three rows in the table and to return the closest match(es). Can anybody point me in the right direction/part of the manual for this kind of querying? Thank you, Matt |
|
|||
|
>I'm working on a 'quiz' script that matches user preferences to stored
>attributes. I want the user to select their choice of preferences and >the query to come back with the nearest match to what they need. Let >me give an example: > >There are three preference categories (age, type, and time), and each >of these contains three choices. There are only around 17 outcomes so >clearly not every combination of choices is represented, so I >anticipate that sometimes the 'match' will only be 2 out of 3 >preferences. Below that and I'll return some kind of 'refine your >choices' error. I don't know how well this will work, but it does have some uses in other applications. Define a "matching score" which is supposed to be a measure of how well the preference and one of the outcomes matches. For starters, take the difference between each preference category and the outcome, square them, and add them up. The lowest matching score is the best match. This tends to prefer matches where all the categories are slightly mismatched over a match where all but one of the categories is a perfect match but that one is *WAY* off. If all of the categories are such that there is a "match" or "no match" with no concept of a "slight mismatch" vs. a "large mismatch", the matching score becomes a count of the number of mismatched categories. If you want to de-emphasize the category that is *WAY* off, use the absolute value of the difference rather than the square of the difference. You may wish to weight the categories: a mismatch of 1 unit in 'type' is equivalent to how many years different in 'age'? You might have a category where there's no "near match": either there's a match or there isn't. In that case, score 1 for a mismatch and 0 for a match. Then weight it if desired. If this is a dating application then a mismatch of desired sex of date with the date's sex should probably have a weight of a billion or more, or you arrange not to even try to match heterosexual men with other heterosexual men. Religion is problematical, but Methodist and Baptist are probably a lot closer match than Jewish and Islamic. If the desired age range was given as 35-40, anyone in that range is a perfect match, and someone 41 or 34 is a slight mismatch which could be traded off against personality factors. You might weight things so a 3 year mismatch in age is equivalent to a 1 point mismatch on Aunt Gertrude's scale of Neat Freakiness (scores of 1-5). In SQL, you can select the outcomes and the matching score, order by the matching score, and perhaps limit it to some of the best matches. >I've attempted to write out on paper a diagram/natural language >version of how this could work, but I imagine there's a better method >I'm unaware of. My diagram consists of a set of nested queries and php >arrays which seems inefficient and inaccurate. > >Essentially, the user is going to end up with an array of answers at >the end of the quiz (say, 2, 5, 8) and I want these three variables to >be checked against the three rows in the table and to return the >closest match(es). Can anybody point me in the right direction/part of >the manual for this kind of querying? You need to decide what a "good match" is (in terms of numbers you can calculate manually) before trying to write queries. Since you don't describe at all what you're trying to match against what, and what constitutes a good match, no one can help you. Suppose $qage, $qtype, and $qtime are the answers from the quiz, which get substituted into the query. oage, otype, and otime are values from the outcomes. You might use a query like: select outcomeid, oage, otype, otime, ($qage - oage)*($qage - oage)*0.1 + ($qtype - otype)*($qtype - otype)*2.0 + ($qtime - otime)*($qtime - otime)*1.0 as score from outcomes_table order by score limit 5; The weights 0.2, 2.0, and 1.0 are something you've determined by experimenting with the matches or estimated on a theoretical basis. |
|
|||
|
On May 26, 8:20 pm, gordonb.2c...@burditt.org (Gordon Burditt) wrote:
> You need to decide what a "good match" is (in terms of numbers you > can calculate manually) before trying to write queries. Since you > don't describe at all what you're trying to match against what, and > what constitutes a good match, no one can help you. > > Suppose $qage, $qtype, and $qtime are the answers from the quiz, > which get substituted into the query. oage, otype, and otime are > values from the outcomes. You might use a query like: > > select outcomeid, oage, otype, otime, > ($qage - oage)*($qage - oage)*0.1 + > ($qtype - otype)*($qtype - otype)*2.0 + > ($qtime - otime)*($qtime - otime)*1.0 as score > from outcomes_table order by score limit 5; > > The weights 0.2, 2.0, and 1.0 are something you've determined by > experimenting with the matches or estimated on a theoretical basis. Hi Gordon, thank you for your detailed reply. I can see I'm getting into a fairly complex operation here... Let me be more specific with my data: it's not for a dating site (although I can see how it looks that way). Basically, the site is a homepage for a charity group that run various projects. Some projects are with kids, some are with the elderly, etc. To make it easier for newcomers, the quiz I'm producing will hopefully recommend a project to them based on preferences. Each of the projects has an associated age group (kids, teens, elderly), type (outdoor, indoor, environmental) and time (weekdays, weekends, long-term). There's three questions in the 'quiz' to determine the user's preference. In this sense, I'm not really sure how important these factors are in relation to one another. I mean, that probably comes down to user preference rather than out-and-out logic like your Methodist/Baptist example. Obviously with a smallish amount of projects, there might not be a result that matches the user's preference (there are only 3 projects with the elderly, and none of these are long-term, for example). I guess in this case I need to code for low-scoring matches and give some kind of alternative suggestion (which does exist). I'll experiment with your code and scoring suggestions and see if I can get anywhere. Thank you again, Matt |
|
|||
|
On May 26, 8:42 pm, Matt <guitarroman...@gmail.com> wrote:
> On May 26, 8:20 pm, gordonb.2c...@burditt.org (Gordon Burditt) wrote: > > > > > You need to decide what a "good match" is (in terms of numbers you > > can calculate manually) before trying to write queries. Since you > > don't describe at all what you're trying to match against what, and > > what constitutes a good match, no one can help you. > > > Suppose $qage, $qtype, and $qtime are the answers from the quiz, > > which get substituted into the query. oage, otype, and otime are > > values from the outcomes. You might use a query like: > > > select outcomeid, oage, otype, otime, > > ($qage - oage)*($qage - oage)*0.1 + > > ($qtype - otype)*($qtype - otype)*2.0 + > > ($qtime - otime)*($qtime - otime)*1.0 as score > > from outcomes_table order by score limit 5; > > > The weights 0.2, 2.0, and 1.0 are something you've determined by > > experimenting with the matches or estimated on a theoretical basis. > > Hi Gordon, thank you for your detailed reply. I can see I'm getting > into a fairly complex operation here... > > Let me be more specific with my data: it's not for a dating site > (although I can see how it looks that way). Basically, the site is a > homepage for a charity group that run various projects. Some projects > are with kids, some are with the elderly, etc. To make it easier for > newcomers, the quiz I'm producing will hopefully recommend a project > to them based on preferences. > > Each of the projects has an associated age group (kids, teens, > elderly), type (outdoor, indoor, environmental) and time (weekdays, > weekends, long-term). There's three questions in the 'quiz' to > determine the user's preference. In this sense, I'm not really sure > how important these factors are in relation to one another. I mean, > that probably comes down to user preference rather than out-and-out > logic like your Methodist/Baptist example. > > Obviously with a smallish amount of projects, there might not be a > result that matches the user's preference (there are only 3 projects > with the elderly, and none of these are long-term, for example). I > guess in this case I need to code for low-scoring matches and give > some kind of alternative suggestion (which does exist). I'll > experiment with your code and scoring suggestions and see if I can get > anywhere. > > Thank you again, > Matt Here's one idea. From the point of view of a volunteer, projects are defined by the three attributes of age_group, environment and commitment, so you may have a table that looks like this: +------------+-----------+-------------+------------+ | project_id | age_group | environment | commitment | +------------+-----------+-------------+------------+ | 1 | kids | indooor | weekdays | | 2 | kids | outdoor | long-term | | 3 | teens | indoor | weekends | | 4 | teens | indoor | weekdays | | 5 | elderly | outdoor | weekdays | | 6 | elderly | outdoor | weekends | +------------+-----------+-------------+------------+ It seems to me that volunteers need to be able to prioritise two things; firstly, which of those attributes is most important to them, and secondly, their preferences within those attributes. Rather than performing some complex mathematics (although that might, ultimately, be a better method) to filter the resultset, you could instead just return the entire resultset, ordered according to the volunteer's preferences. So, if a volunteer's key selection pressure is: commitment (weekdays followed by weekends) followed by age_group (elderly) followed by environment (prefers outdoors) then the query might look like this (untested): SELECT * FROM projects ORDER BY FIELD(commitment,'weekends','weekdays') DESC, FIELD(age_group,'elderly') DESC, FIELD(environment,'outdoor') DESC; Note that I've reversed the FIELD preferences and added DESC afterwards - I think this makes sense when substituting numeric values. |
|
|||
|
On May 26, 11:57 pm, strawberry <zac.ca...@gmail.com> wrote:
> Here's one idea. > > From the point of view of a volunteer, projects are defined by the > three attributes of age_group, environment and commitment, > so you may have a table that looks like this: > > +------------+-----------+-------------+------------+ > | project_id | age_group | environment | commitment | > +------------+-----------+-------------+------------+ > | 1 | kids | indooor | weekdays | > | 2 | kids | outdoor | long-term | > | 3 | teens | indoor | weekends | > | 4 | teens | indoor | weekdays | > | 5 | elderly | outdoor | weekdays | > | 6 | elderly | outdoor | weekends | > +------------+-----------+-------------+------------+ > > It seems to me that volunteers need to be able to prioritise two > things; > firstly, which of those attributes is most important to them, and > secondly, their preferences within those attributes. > > Rather than performing some complex mathematics (although that might, > ultimately, be a better method) to filter the resultset, > you could instead just return the entire resultset, ordered according > to the volunteer's preferences. > > So, if a volunteer's key selection pressure is: > > commitment (weekdays followed by weekends) followed by > age_group (elderly) followed by > environment (prefers outdoors) > then the query might look like this (untested): > > SELECT * > FROM projects > ORDER BY > FIELD(commitment,'weekends','weekdays') DESC, > FIELD(age_group,'elderly') DESC, > FIELD(environment,'outdoor') DESC; > > Note that I've reversed the FIELD preferences and added DESC > afterwards - I think this makes sense when substituting numeric > values. This is similar to what I was thinking after experimenting with Gordon's code above. I got the code to work and got back some (accurate-looking) results, so began considering which factors here should be given more weight - I think we agree that commitment is the most important (or least flexible). I'm still in the process of coding the quiz itself, I wanted to get the query right before I started shaping the app itself. Thanks for the food for thought! I'll try out your code too and see how it compares with Gordon's. Thanks again everyone. Matt |
|
|||
|
>> You need to decide what a "good match" is (in terms of numbers you
>> can calculate manually) before trying to write queries. Since you >> don't describe at all what you're trying to match against what, and >> what constitutes a good match, no one can help you. >> >> Suppose $qage, $qtype, and $qtime are the answers from the quiz, >> which get substituted into the query. oage, otype, and otime are >> values from the outcomes. You might use a query like: >> >> select outcomeid, oage, otype, otime, >> ($qage - oage)*($qage - oage)*0.1 + >> ($qtype - otype)*($qtype - otype)*2.0 + >> ($qtime - otime)*($qtime - otime)*1.0 as score >> from outcomes_table order by score limit 5; >> >> The weights 0.2, 2.0, and 1.0 are something you've determined by >> experimenting with the matches or estimated on a theoretical basis. > >Hi Gordon, thank you for your detailed reply. I can see I'm getting >into a fairly complex operation here... > >Let me be more specific with my data: it's not for a dating site >(although I can see how it looks that way). Basically, the site is a Well, I knew it wasn't a dating site from all the stuff NOT mentioned, but it still makes a reasonable analogy anyway. >homepage for a charity group that run various projects. Some projects >are with kids, some are with the elderly, etc. To make it easier for >newcomers, the quiz I'm producing will hopefully recommend a project >to them based on preferences. I'll suggest recommending at least two or three (you can offer them as alternate choices) to avoid turning away people who might not like the particular project. >Each of the projects has an associated age group (kids, teens, >elderly), type (outdoor, indoor, environmental) and time (weekdays, >weekends, long-term). There's three questions in the 'quiz' to >determine the user's preference. In this sense, I'm not really sure >how important these factors are in relation to one another. I mean, >that probably comes down to user preference rather than out-and-out >logic like your Methodist/Baptist example. One problem with trying to apply my approach directly to your quiz is that I assume that there's some kind of continuum of preferences. If you want to date someone 30-35, you're more likely to accept someone 36 than someone 18. Few people will only date others with odd-numbered ages. With yours, there's really not a continuum. If they say they want to work with the elderly, you CANNOT assume they prefer working with teens over kids as a second choice. I can certainly see a volunteer with control-freak tendancies who could work with children or the elderly, but not be able to stand teenagers. There are also single men who are afraid of being accused of child molestation if they are seen answering the door to a child (or in some cases, *female* child). Your quiz has 9 possible results, and given answers, one group is chosen. If no project for that group exists, then you can vary one of the values to get 6 other possibilities, but you have no guidance as to which of the 6 they would prefer. Suggestion: Ask for second choices that are acceptable. Ask which feature is most important to them. Here's an approach I thought of, courtesy of a hospital doing customer-satisfaction surveys that called: Rank on a scale of 1 to 5, with 1 being least desirable and 5 being the most, which you would like doing on a volunteer project? Work with the elderly? Work with teenagers? Work with children? Work indoors? Work outdoors? Work to protect the environment? Work on weekends? Work on weekdays? Work on a project with a long-term commitment? If someone ranks all three of a category the same, they don't care that much about that particular feature. If someone ranks an item in a category with a 1 and another with a 5, obviously they've got strong preferences here. Scoring this: Each project has a percentage score for the attributes above, adding to 100% in each category. In your example, all of these are 0% or 100%, but you could possibly have a project that involves getting children and the elderly together that would rank 50%, 0%, and 50% for working with the elderly, teenagers, and kids. Multiply corresponding percentages with the user preferences, and add them up. In this case, highest score is the best match for this particular user. Note that it doesn't really matter that much how stingy or generous the user is with handing out preferences. The same sort of information can be conveyed with a user who hands out an average of '4' vs. an average of '2'. It's the differences that express preferences. You don't compare preferences of two different users to each other. If a user leaves out an answer to a question, you either insist or use the average of the other answers. You can also cheat a little and break ties with a fractional-point score that indicates how desperately the project is in need of additional help. This also gives guidance to the program for the wishy-washy who rank everything with a '3'. |
![]() |
| Thread Tools | |
| Display Modes | |
|
|