This is a discussion on Short URLs within the alt.comp.lang.php forums, part of the PHP Programming Forums category; I want to implement something like tinyurl.com in PHP. I just wondered is there some useful maths / spec behind ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I want to implement something like tinyurl.com in PHP.
I just wondered is there some useful maths / spec behind how to come up with a unique (short) ID for a resource? I would like to avoid a DBMS, so a hash or something? Yes I saw http://freshmeat.net/projects/shortlink/ but that's in Perl, I want PHP. |
|
|||
|
I noticed that Message-ID:
<1128315722.466228.50050@o13g2000cwo.googlegroups. com> from kai.hendry@gmail.com contained the following: >I just wondered is there some useful maths / spec behind how to come up >with a unique (short) ID for a resource? I would like to avoid a DBMS, >so a hash or something? > >Yes I saw http://freshmeat.net/projects/shortlink/ but that's in Perl, >I want PHP. Which also uses a database. You want something that will compress the text of the URL in such a way that you can reverse the algorithm and get the original string back. Interesting challenge. -- Geoff Berrow 0110001001101100010000000110 001101101011011001000110111101100111001011 100110001101101111001011100111010101101011 |
|
|||
|
"Geoff Berrow" <blthecat@ckdog.co.uk> wrote in message
news:kck1k156tch9btrf6r87ej3g276cvrjf51@4ax.com... >I noticed that Message-ID: > <1128315722.466228.50050@o13g2000cwo.googlegroups. com> from > kai.hendry@gmail.com contained the following: > >>I just wondered is there some useful maths / spec behind how to come up >>with a unique (short) ID for a resource? I would like to avoid a DBMS, >>so a hash or something? >> >>Yes I saw http://freshmeat.net/projects/shortlink/ but that's in Perl, >>I want PHP. > > Which also uses a database. You want something that will compress the > text of the URL in such a way that you can reverse the algorithm and get > the original string back. Interesting challenge. Not necessarily, he could mean just a flat file system instead of a dbms. substr(base_convert(time(),10,36), -5); will produce (relatively) short, five-character id's. It simply takes the current timestamp and encodes it to a 36base 0-9 a-z representation. I left out the first digit since it has relatively no effect. If you wanna be really effective, you'll have a simple counter and you'll start from zero and increase each time you add a link. file_put_contents('counter.txt', $ctr = (file_get_contents('counter.txt.')+1)); base_convert($ctr,10,36); That'll keep 'em short. -- Welcome to Usenet! Please leave tolerance, understanding and intelligence at the door. They aren't welcome here. antaatulla.sikanautaa@gmail.com.NOSPAM.invalid |
|
|||
|
"kai.hendry@gmail.com" wrote:
> I want to implement something like tinyurl.com in PHP. > > I just wondered is there some useful maths / spec behind how to come up > with a unique (short) ID for a resource? I would like to avoid a DBMS, > so a hash or something? > > Yes I saw http://freshmeat.net/projects/shortlink/ but that's in Perl, > I want PHP. The only sensible and reliable way of doing this is to use a database. A hash would be no good unless you can find some way of inverting the hash function to retrieve the original URL. Hash functions don't normally work that way. You could of course store the URL table in a flat file, but this would soon become very cumbersome. Upgrade your hosting account and get yourself a mySQL database. The PHP coding for this sort of thing is trivial. -- phil [dot] ronan @ virgin [dot] net http://vzone.virgin.net/phil.ronan/ |
|
|||
|
Philip Ronan wrote:
> "kai.hendry@gmail.com" wrote: > > >>I want to implement something like tinyurl.com in PHP. >> >>I just wondered is there some useful maths / spec behind how to come up >>with a unique (short) ID for a resource? I would like to avoid a DBMS, >>so a hash or something? >> >>Yes I saw http://freshmeat.net/projects/shortlink/ but that's in Perl, >>I want PHP. > > > The only sensible and reliable way of doing this is to use a database. A > hash would be no good unless you can find some way of inverting the hash > function to retrieve the original URL. Hash functions don't normally work > that way. you can not revert a hash functions value. You are just able to find a source value that produces the same hashvalue but that is not always the same. Regards Stefan > > You could of course store the URL table in a flat file, but this would soon > become very cumbersome. > > Upgrade your hosting account and get yourself a mySQL database. The PHP > coding for this sort of thing is trivial. > |
|
|||
|
I noticed that Message-ID:
<9q50f.32435$X52.14844@reader1.news.jippii.net> from Kimmo Laine contained the following: >Not necessarily, he could mean just a flat file system instead of a dbms. He could, but as he suggested (albeit erroneously) a hash, I took it that he was looking for compression rather than a look up table of some sort. -- Geoff Berrow 0110001001101100010000000110 001101101011011001000110111101100111001011 100110001101101111001011100111010101101011 |
|
|||
|
"Geoff Berrow" <blthecat@ckdog.co.uk> kirjoitti
viestissä:h2d2k11pe21jpeitrlpnqv3g1i0ivrvj0o@4ax.c om... >I noticed that Message-ID: > <9q50f.32435$X52.14844@reader1.news.jippii.net> from Kimmo Laine > contained the following: > >>Not necessarily, he could mean just a flat file system instead of a dbms. > > He could, but as he suggested (albeit erroneously) a hash, I took it > that he was looking for compression rather than a look up table of some > sort. Okay, my bad. I have no fucking idea what a hash is :D You know, except the thing a person might smoke, but I figured that's not it... ;) And I also had a feeling that in Java world when you talk about a hash you mean a vector, a 2*n table concisting of key-value pairs... But I'm not sure at all, it's been some time since I've worked with Java... And finally some ASCII humour not completely off topic: # = hash | = pipe #| = hashpipe -- SETI @ Home - Donate your cpu's idle time to science. Further reading at <http://setiweb.ssl.berkeley.edu/> Kimmo Laine <antaatulla.sikanautaa@gmail.com.NOSPAM.invalid> |
|
|||
|
Kimmo Laine wrote:
> > Okay, my bad. I have no fucking idea what a hash is :D You know, except the > thing a person might smoke, but I figured that's not it... ;) > > And I also had a feeling that in Java world when you talk about a hash you > mean a vector, a 2*n table concisting of key-value pairs... But I'm not sure > at all, it's been some time since I've worked with Java... > A hash is the generic name for an algorithm which creates a short string/integer/other data-type based on a given input string. MD5 is the obvious example. A hash-table, on the other hand, (like the one in Java) is a data-structure which splits a set of data up into "bins", each bin corresponding to a particular hash value (remember, hash algorithms have collisions, so multiple keys have the same hash), and containing a linked-list of the original keys. In this way, data lookup is faster. e.g. Suppose the following strings give the following hashes: Key Hash =================== Dog 5 Cat 8 Giraffe 5 Lion 2 Monkey 5 Tiger 8 If they're put into a hash-table, they are stored as follows: +---+ | 2 | --> Lion +---+ | 5 | --> Dog --> Giraffe --> Monkey +---+ | 8 | --> Cat --> Tiger +---+ To look up a value for a given key, the key's hash is first calculated, and then the corresponding bin (which is basically a linked-list) is searched. With a good hash algorithm, this greatly reduces the search time. -- Oli |
|
|||
|
I noticed that Message-ID: <dhrk4k$cib$1@phys-news1.kolumbus.fi> from
Kimmo Laine contained the following: >> He could, but as he suggested (albeit erroneously) a hash, I took it >> that he was looking for compression rather than a look up table of some >> sort. > > >Okay, my bad. I have no fucking idea what a hash is :D You know, except the >thing a person might smoke, but I figured that's not it... ;) Perhaps not :-) A hash involves applying an algorithm to the original data so as to get a shorter value (e.g md5 ) In doing so some data is discarded which means that it is not possible to extract the data (because the same hash could have been produced from more than one input). -- Geoff Berrow 0110001001101100010000000110 001101101011011001000110111101100111001011 100110001101101111001011100111010101101011 |
|
|||
|
"Oli Filth" <catch@olifilth.co.uk> wrote in message
news:1128356529.743465.86520@g44g2000cwa.googlegro ups.com... > Kimmo Laine wrote: >> >> Okay, my bad. I have no fucking idea what a hash is :D You know, except >> the >> thing a person might smoke, but I figured that's not it... ;) >> >> And I also had a feeling that in Java world when you talk about a hash >> you >> mean a vector, a 2*n table concisting of key-value pairs... But I'm not >> sure >> at all, it's been some time since I've worked with Java... >> > > A hash is the generic name for an algorithm which creates a short > string/integer/other data-type based on a given input string. MD5 is > the obvious example. > > A hash-table, on the other hand, (like the one in Java) is a > data-structure which splits a set of data up into "bins", each bin > corresponding to a particular hash value (remember, hash algorithms > have collisions, so multiple keys have the same hash), and containing a > linked-list of the original keys. In this way, data lookup is faster. > > e.g. Suppose the following strings give the following hashes: > > Key Hash > =================== > Dog 5 > Cat 8 > Giraffe 5 > Lion 2 > Monkey 5 > Tiger 8 > > If they're put into a hash-table, they are stored as follows: > > +---+ > | 2 | --> Lion > +---+ > | 5 | --> Dog --> Giraffe --> Monkey > +---+ > | 8 | --> Cat --> Tiger > +---+ > > To look up a value for a given key, the key's hash is first calculated, > and then the corresponding bin (which is basically a linked-list) is > searched. With a good hash algorithm, this greatly reduces the search > time. Thanks both Geoff and Oli for clearing that hash thing. :D -- Welcome to Usenet! Please leave tolerance, understanding and intelligence at the door. They aren't welcome here. antaatulla.sikanautaa@gmail.com.NOSPAM.invalid |