Tip of the Day : SQL Server Database Design - Twitter Profile and Followers
Generate Random Records / Rows

Generate Random Records

Let's say you have a database of customers and you initiated a contest wherein you determine the winners by randomly selecting 5 customers from your database.  How would you go about selecting 5 random records from your customers table?

NEWID() System Function To The Rescue

In SQL Server, it is quite easy to do this thanks to the NEWID() system function.  The NEWID() system function creates a unique value of type uniqueidentifier.  There's no need to add a new column to your table just to have the ability of randomly selecting records from your table.  All that needs to be done is include the NEWID() system function in the ORDER BY clause when doing your SELECT statement.

Let's use the Customers table in the Northwind database.  To randomly SELECT 5 customers from the table, we simply have to add the ORDER BY NEWID() in the SELECT statement.  Since we are only concerned with 5 records, we also have to include the TOP 5 clause in the statement.  The SELECT statement will look as follows:

SELECT TOP 5 [CustomerID], [CompanyName], [ContactName]
FROM [dbo].[Customers] 
ORDER BY NEWID()

Running this SELECT statement generated the following 5 random Customer records (please note that the results shown here will not match your results even if the same query has been executed):

CustomerID CompanyName                              ContactName                    
---------- ---------------------------------------- ------------------------------ 
WANDK      Die Wandernde Kuh                        Rita Müller
BERGS      Berglunds snabbköp                       Christina Berglund
ROMEY      Romero y tomillo                         Alejandra Camino
OTTIK      Ottilies Käseladen                       Henriette Pfalzheim
AROUT      Around the Horn                          Thomas Hardy

Running the same query again now yields me the following output, keeping in mind that the results shown here will again not be the same results you will get due to the randomness of the NEWID() system function.

CustomerID CompanyName                              ContactName                    
---------- ---------------------------------------- ------------------------------ 
SIMOB      Simons bistro                            Jytte Petersen
QUEDE      Que Delícia                              Bernardo Batista
RICAR      Ricardo Adocicados                       Janete Limeira
VINET      Vins et alcools Chevalier                Paul Henriot
LAMAI      La maison d'Asie                         Annette Roulet

RAND() System Function - Will It Work?

The randomness of the result set generated by the previous SELECT statement was due to the NEWID() system function.  There's another system function that generates a random value when executed, which is the RAND() system function.  If we substitute the NEWID() with the RAND() system function, will it generate random records similar to the above?

Let's put that question to the test by modifying the SELECT statement above and use RAND() instead of NEWID().

SELECT TOP 5 [CustomerID], [CompanyName], [ContactName]
FROM [dbo].[Customers] 
ORDER BY RAND()

Running the above SELECT statement generated the following result set:

CustomerID CompanyName                              ContactName                    
---------- ---------------------------------------- ------------------------------ 
ALFKI      Alfreds Futterkiste                      Maria Anders
ANATR      Ana Trujillo Emparedados y helados       Ana Trujillo
ANTON      Antonio Moreno Taquería                  Antonio Moreno
AROUT      Around the Horn                          Thomas Hardy
BERGS      Berglunds snabbköp                       Christina Berglund

Running it over and over again generated the same result set.  If the RAND() system function is supposed to generate random numbers when executed, how come the result set generated are all the same?  To investigate this further, let's include the result of the RAND() system function as part of the result set.

SELECT TOP 5 RAND() AS [Random], [CustomerID], [CompanyName], [ContactName]
FROM [dbo].[Customers] 
ORDER BY RAND()

Executing this SELECT statement generated the following result set (please note again that the results shown here will be different from the results you will get, particularly to the first column).

Random              CustomerID CompanyName                        ContactName
------------------- ---------- ---------------------------------- ------------------
0.73381230186299884 ALFKI      Alfreds Futterkiste                Maria Anders
0.73381230186299884 ANATR      Ana Trujillo Emparedados y helados Ana Trujillo
0.73381230186299884 ANTON      Antonio Moreno Taquería            Antonio Moreno
0.73381230186299884 AROUT      Around the Horn                    Thomas Hardy
0.73381230186299884 BERGS      Berglunds snabbköp                 Christina Berglund

As can be seen from the output, the output generated by the RAND() system function are all the same.  If the RAND() system function is supposed to generate random values, how come it's generating the same value when executed in a SELECT statement?

The answer lies in the SQL Server's Books Online.  According to Books Online under the RAND functions topic:

"Repetitive invocations of RAND() in a single query will produce the same value."

This is the reason why the same value is returned in the SELECT statement above for the RAND() column.

Conclusion

In conclusion, the only way to generate a random number of records from any table is to combine the TOP clause and the ORDER BY NEWID() in the SELECT statement.