Killing n+1 problems using groups - Panos development log

Let's say you have the following data on a .csv file and a users table with columns id and first_name columns in your database.

Database ID	First name
1	Nick
2	Nick
3	Alex
4	Alex
5	Panos
6	Panos

Image you want to programmatically update the the first_name column in the corresponding data from the csv.

The dataset is small enough to that allow us to run an update query for each row, but in case the records where thousands this would propably take way longer or event fail.

What we need to is read the data from the file then group/transform it into dictionaries, hashes, associative arrays or whatever your preferred programming language offers. Let's go with json here.

{
"Nick": [1, 2]
"Alex": [3, 4]
"Panos": [5, 6]
}

Now instead of updating each record individually we can bulk update them by group 🎉

UPDATE users SET first_name = Nick WHERE id IN (1, 2);

UPDATE users SET first_name = Alex WHERE id IN (3, 4);

UPDATE users SET first_name = Panos WHERE id IN (5, 6);

If you're using an ORM like Rail's ActiveRecord or Laravel's Eloquent you would probably first retrieve the records and then update them accordingly.

In rails you would probably to the following

data = {
    "Nick" => [1, 2]
    "Alex" => [3, 4]
    "Panos" => [5, 6]
}

data.each do |name, record_ids|
    User.where(id: record_ids).update_all(first_name: name)
end

Having said that there are ways to squish more performance. Probably reading the file in batches and using rails find_in_batches will help save some memory or even skip the ORM and drop down to raw SQL.