One-to-one mapping of random table entries (Oracle SQL): generating random records for unit tests

A large amount of the work in my department is associating music repertoire in our system to “claiming rights owners,” usually labels, using some tools I maintain. When I make an update or add a feature, I like to generate dummy repertoire associations for unit testing. Instead of manually finding individual records and targets, I find it useful to do it programmatically. This allows me to scale the test to however many instances I want.

Obviously, the first step is to write a query that finds repertoire in our system that meets the criteria of my test, and another query that finds rights owner that meets the criteria of my test. The question is, how best do I combine the two?

To avoid talking to much about the specifics of my work problem, I’ll generalize this by framing it in a purchases database. Let’s pretend we want to generate random product purchases for tests. We want the random purchases to include customers from various cities and also products connected to several brands. Depending on how our database was populated, just taking the first few entries from the CUSTOMERS table might result in them all being from the same region, and similarly taking the first few entries in the PRODUCTS table could result in them all being from the same brand. Not really a problem, but it would be better to have diverse test cases.

The first step is randomly shuffling our datasets. Here’s how you accomplish this in oracle:

select *
from customers
order by dbms_random.value

If you are using a different RDBMS, dbms_random.value probably doesn’t exist. Pete Freitag wrote a blog post describing the appropriate shuffling methods for different RDBMS’s which you will probably find useful.

Now, we need to map our shuffled data. Why not just put the shuffled datasets side by side? To do this, we join on the rownumbers of our shuffled datasets:

select cust.id customer_id, prod.id product_id from
    (select id, rownum joinval from (
        select id
        from customers
        order by dbms_random.value)) cust,
    (select id, rownum joinval from (
        select id
        from products
        order by dbms_random.value)) prod,
where cust.joinval = prod.joinval

Again, this is oracle style. rownum will probably have a different name or may not even be supported depending on your RDBMS of choice.

This solution works for me because my analogs of the customers and products tables are large and we have many, many customers and products, so I can feel safe that I’ll get a good mix. If you’re not so lucky or want to be really damn sure that no two customers are from the same city (or that no two products are from the same brand), you can, but things will start to get messy fast. Intuitively, we want something like this:

select b.brand_id, min(p.id)
from products p, xref_products_brands b
where p.id = b.prod_id
group by b.brand_id

But this will always give us the same product ids! We don’t actually want the min(p.id) for each brand, we want a random product id for each brand, so this deduplication method probably doesn’t cut it for us unless we have lots of brands in our system (which we very well may). Since shuffling the data suits my needs I don’t really feel the need to delve into this problem further, but be cognizant of your own needs and data if you decide to use these methods.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s