Short URLs

This is a discussion on Short URLs within the alt.comp.lang.php forums, part of the PHP Programming Forums category; I want to implement something like tinyurl.com in PHP. I just wondered is there some useful maths / spec behind ...


Go Back   Usenet Forums > PHP Programming Forums > alt.comp.lang.php

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 10-03-2005
kai.hendry@gmail.com
 
Posts: n/a
Default Short URLs

I want to implement something like tinyurl.com in PHP.

I just wondered is there some useful maths / spec behind how to come up
with a unique (short) ID for a resource? I would like to avoid a DBMS,
so a hash or something?

Yes I saw http://freshmeat.net/projects/shortlink/ but that's in Perl,
I want PHP.

Reply With Quote
  #2 (permalink)  
Old 10-03-2005
Geoff Berrow
 
Posts: n/a
Default Re: Short URLs

I noticed that Message-ID:
<1128315722.466228.50050@o13g2000cwo.googlegroups. com> from
kai.hendry@gmail.com contained the following:

>I just wondered is there some useful maths / spec behind how to come up
>with a unique (short) ID for a resource? I would like to avoid a DBMS,
>so a hash or something?
>
>Yes I saw http://freshmeat.net/projects/shortlink/ but that's in Perl,
>I want PHP.


Which also uses a database. You want something that will compress the
text of the URL in such a way that you can reverse the algorithm and get
the original string back. Interesting challenge.

--
Geoff Berrow 0110001001101100010000000110
001101101011011001000110111101100111001011
100110001101101111001011100111010101101011
Reply With Quote
  #3 (permalink)  
Old 10-03-2005
Kimmo Laine
 
Posts: n/a
Default Re: Short URLs

"Geoff Berrow" <blthecat@ckdog.co.uk> wrote in message
news:kck1k156tch9btrf6r87ej3g276cvrjf51@4ax.com...
>I noticed that Message-ID:
> <1128315722.466228.50050@o13g2000cwo.googlegroups. com> from
> kai.hendry@gmail.com contained the following:
>
>>I just wondered is there some useful maths / spec behind how to come up
>>with a unique (short) ID for a resource? I would like to avoid a DBMS,
>>so a hash or something?
>>
>>Yes I saw http://freshmeat.net/projects/shortlink/ but that's in Perl,
>>I want PHP.

>
> Which also uses a database. You want something that will compress the
> text of the URL in such a way that you can reverse the algorithm and get
> the original string back. Interesting challenge.



Not necessarily, he could mean just a flat file system instead of a dbms.

substr(base_convert(time(),10,36), -5);
will produce (relatively) short, five-character id's. It simply takes the
current timestamp and encodes it to a 36base 0-9 a-z representation. I left
out the first digit since it has relatively no effect. If you wanna be
really effective, you'll have a simple counter and you'll start from zero
and increase each time you add a link.

file_put_contents('counter.txt', $ctr =
(file_get_contents('counter.txt.')+1));
base_convert($ctr,10,36);

That'll keep 'em short.

--
Welcome to Usenet! Please leave tolerance, understanding
and intelligence at the door. They aren't welcome here.
antaatulla.sikanautaa@gmail.com.NOSPAM.invalid


Reply With Quote
  #4 (permalink)  
Old 10-03-2005
Philip Ronan
 
Posts: n/a
Default Re: Short URLs

"kai.hendry@gmail.com" wrote:

> I want to implement something like tinyurl.com in PHP.
>
> I just wondered is there some useful maths / spec behind how to come up
> with a unique (short) ID for a resource? I would like to avoid a DBMS,
> so a hash or something?
>
> Yes I saw http://freshmeat.net/projects/shortlink/ but that's in Perl,
> I want PHP.


The only sensible and reliable way of doing this is to use a database. A
hash would be no good unless you can find some way of inverting the hash
function to retrieve the original URL. Hash functions don't normally work
that way.

You could of course store the URL table in a flat file, but this would soon
become very cumbersome.

Upgrade your hosting account and get yourself a mySQL database. The PHP
coding for this sort of thing is trivial.

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/


Reply With Quote
  #5 (permalink)  
Old 10-03-2005
Stefan Rybacki
 
Posts: n/a
Default Re: Short URLs

Philip Ronan wrote:
> "kai.hendry@gmail.com" wrote:
>
>
>>I want to implement something like tinyurl.com in PHP.
>>
>>I just wondered is there some useful maths / spec behind how to come up
>>with a unique (short) ID for a resource? I would like to avoid a DBMS,
>>so a hash or something?
>>
>>Yes I saw http://freshmeat.net/projects/shortlink/ but that's in Perl,
>>I want PHP.

>
>
> The only sensible and reliable way of doing this is to use a database. A
> hash would be no good unless you can find some way of inverting the hash
> function to retrieve the original URL. Hash functions don't normally work
> that way.


you can not revert a hash functions value. You are just able to find a source value that
produces the same hashvalue but that is not always the same.

Regards
Stefan

>
> You could of course store the URL table in a flat file, but this would soon
> become very cumbersome.
>
> Upgrade your hosting account and get yourself a mySQL database. The PHP
> coding for this sort of thing is trivial.
>

Reply With Quote
  #6 (permalink)  
Old 10-03-2005
Geoff Berrow
 
Posts: n/a
Default Re: Short URLs

I noticed that Message-ID:
<9q50f.32435$X52.14844@reader1.news.jippii.net> from Kimmo Laine
contained the following:

>Not necessarily, he could mean just a flat file system instead of a dbms.


He could, but as he suggested (albeit erroneously) a hash, I took it
that he was looking for compression rather than a look up table of some
sort.

--
Geoff Berrow 0110001001101100010000000110
001101101011011001000110111101100111001011
100110001101101111001011100111010101101011
Reply With Quote
  #7 (permalink)  
Old 10-03-2005
Kimmo Laine
 
Posts: n/a
Default Re: Short URLs

"Geoff Berrow" <blthecat@ckdog.co.uk> kirjoitti
viestissä:h2d2k11pe21jpeitrlpnqv3g1i0ivrvj0o@4ax.c om...
>I noticed that Message-ID:
> <9q50f.32435$X52.14844@reader1.news.jippii.net> from Kimmo Laine
> contained the following:
>
>>Not necessarily, he could mean just a flat file system instead of a dbms.

>
> He could, but as he suggested (albeit erroneously) a hash, I took it
> that he was looking for compression rather than a look up table of some
> sort.



Okay, my bad. I have no fucking idea what a hash is :D You know, except the
thing a person might smoke, but I figured that's not it... ;)

And I also had a feeling that in Java world when you talk about a hash you
mean a vector, a 2*n table concisting of key-value pairs... But I'm not sure
at all, it's been some time since I've worked with Java...

And finally some ASCII humour not completely off topic:
# = hash
| = pipe
#| = hashpipe

--
SETI @ Home - Donate your cpu's idle time to science.
Further reading at <http://setiweb.ssl.berkeley.edu/>
Kimmo Laine <antaatulla.sikanautaa@gmail.com.NOSPAM.invalid>


Reply With Quote
  #8 (permalink)  
Old 10-03-2005
Oli Filth
 
Posts: n/a
Default Re: Short URLs

Kimmo Laine wrote:
>
> Okay, my bad. I have no fucking idea what a hash is :D You know, except the
> thing a person might smoke, but I figured that's not it... ;)
>
> And I also had a feeling that in Java world when you talk about a hash you
> mean a vector, a 2*n table concisting of key-value pairs... But I'm not sure
> at all, it's been some time since I've worked with Java...
>


A hash is the generic name for an algorithm which creates a short
string/integer/other data-type based on a given input string. MD5 is
the obvious example.

A hash-table, on the other hand, (like the one in Java) is a
data-structure which splits a set of data up into "bins", each bin
corresponding to a particular hash value (remember, hash algorithms
have collisions, so multiple keys have the same hash), and containing a
linked-list of the original keys. In this way, data lookup is faster.

e.g. Suppose the following strings give the following hashes:

Key Hash
===================
Dog 5
Cat 8
Giraffe 5
Lion 2
Monkey 5
Tiger 8

If they're put into a hash-table, they are stored as follows:

+---+
| 2 | --> Lion
+---+
| 5 | --> Dog --> Giraffe --> Monkey
+---+
| 8 | --> Cat --> Tiger
+---+

To look up a value for a given key, the key's hash is first calculated,
and then the corresponding bin (which is basically a linked-list) is
searched. With a good hash algorithm, this greatly reduces the search
time.

--
Oli

Reply With Quote
  #9 (permalink)  
Old 10-03-2005
Geoff Berrow
 
Posts: n/a
Default Re: Short URLs

I noticed that Message-ID: <dhrk4k$cib$1@phys-news1.kolumbus.fi> from
Kimmo Laine contained the following:

>> He could, but as he suggested (albeit erroneously) a hash, I took it
>> that he was looking for compression rather than a look up table of some
>> sort.

>
>
>Okay, my bad. I have no fucking idea what a hash is :D You know, except the
>thing a person might smoke, but I figured that's not it... ;)


Perhaps not :-) A hash involves applying an algorithm to the original
data so as to get a shorter value (e.g md5 ) In doing so some data is
discarded which means that it is not possible to extract the data
(because the same hash could have been produced from more than one
input).

--
Geoff Berrow 0110001001101100010000000110
001101101011011001000110111101100111001011
100110001101101111001011100111010101101011
Reply With Quote
  #10 (permalink)  
Old 10-04-2005
Kimmo Laine
 
Posts: n/a
Default Re: Short URLs

"Oli Filth" <catch@olifilth.co.uk> wrote in message
news:1128356529.743465.86520@g44g2000cwa.googlegro ups.com...
> Kimmo Laine wrote:
>>
>> Okay, my bad. I have no fucking idea what a hash is :D You know, except
>> the
>> thing a person might smoke, but I figured that's not it... ;)
>>
>> And I also had a feeling that in Java world when you talk about a hash
>> you
>> mean a vector, a 2*n table concisting of key-value pairs... But I'm not
>> sure
>> at all, it's been some time since I've worked with Java...
>>

>
> A hash is the generic name for an algorithm which creates a short
> string/integer/other data-type based on a given input string. MD5 is
> the obvious example.
>
> A hash-table, on the other hand, (like the one in Java) is a
> data-structure which splits a set of data up into "bins", each bin
> corresponding to a particular hash value (remember, hash algorithms
> have collisions, so multiple keys have the same hash), and containing a
> linked-list of the original keys. In this way, data lookup is faster.
>
> e.g. Suppose the following strings give the following hashes:
>
> Key Hash
> ===================
> Dog 5
> Cat 8
> Giraffe 5
> Lion 2
> Monkey 5
> Tiger 8
>
> If they're put into a hash-table, they are stored as follows:
>
> +---+
> | 2 | --> Lion
> +---+
> | 5 | --> Dog --> Giraffe --> Monkey
> +---+
> | 8 | --> Cat --> Tiger
> +---+
>
> To look up a value for a given key, the key's hash is first calculated,
> and then the corresponding bin (which is basically a linked-list) is
> searched. With a good hash algorithm, this greatly reduces the search
> time.


Thanks both Geoff and Oli for clearing that hash thing. :D

--
Welcome to Usenet! Please leave tolerance, understanding
and intelligence at the door. They aren't welcome here.
antaatulla.sikanautaa@gmail.com.NOSPAM.invalid


Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 11:23 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0