Just Another Kilo Biting Blog | Just another WordPress weblog

Apr/07

6

Mysql selecting random rows with large tables

Have you being asked to write an sql query that retrieves random rows? you you had then you will find it tricky! Database is designed locate specific information from the storage based on a given criteria, it peforms well when query specific into, but random rows is … a totally diffrent story…..

Method 1

The easy approach is to shuffle the result set and pick the first one. This is how:

SELECT * FROM `linkpoll` WHERE `somefield` = 1 ORDER BY RAND() LIMIT 1;

In mysql, SELECT RAND() gives you a random value randing from 0 to 1 (in fraction numbers). Select with ORDER BY RAND() will gives you a random ordered resultset.

This technique works extreamly well when you have very small amount of data, but if you look carefully into the EXPLAIN statement, you will find that it is “Using temporary” & “Using filesort”. This is bad, it slows things down dramatically, it can never scaled up at all.

+----+-------------+----------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows   | Extra                           |
+----+-------------+----------+------+---------------+------+---------+------+--------+---------------------------------+
|  1 | SIMPLE      | linkpoll | ALL  | NULL          | NULL | NULL    | NULL | 130344 | Using temporary; Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+--------+---------------------------------+

Method 2

Later on I discovered a better method, by taking advantages with LIMIT. Similar to a simple query, it retrieve one single row without having to reorder the entire table, like this:

SELECT * FROM `linkpoll` LIMIT ,1;

The down side is that you are required to pre-prepare row count and randomized the before executing this query, well no the big deal right? =p

The EXPLAIN shows:

+----+-------------+----------+------+---------------+------+---------+------+--------+-------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows   | Extra |
+----+-------------+----------+------+---------------+------+---------+------+--------+-------+
|  1 | SIMPLE      | linkpoll | ALL  | NULL          | NULL | NULL    | NULL | 130344 |       |
+----+-------------+----------+------+---------------+------+---------+------+--------+-------+

This method works well on medium sized table as it does not involve any filesort or temporary tables. But as you can see, it is a table scan, it hardly capable of handling over 100k of rows.

Method 3

Wanna know the ultimate, scalable, random row query statement? Commin soon! =D

No tags

No comments yet.

Leave a Reply

<<

>>

Theme Design by devolux.nh2.me