Skip to content
This repository was archived by the owner on May 4, 2019. It is now read-only.

Commit bddcfdc

Browse files
author
Sai Wong
committed
Post: How to add columns with default values to really large tables
1 parent 6a15d9f commit bddcfdc

File tree

1 file changed

+150
-0
lines changed

1 file changed

+150
-0
lines changed
Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
---
2+
layout: post
3+
title: Rails Migration - How to add columns with default values to really large tables in Postgres + Rails
4+
author: Sai Wong
5+
summary:
6+
image: http://res.cloudinary.com/wework/image/upload/s--xpIlilub--/c_scale,q_jpegmini:1,w_1000/v1443207604/engineering/shutterstock_294201896.jpg
7+
categories: data
8+
---
9+
10+
We had a fairly simple task of adding a couple of columns to a table for our
11+
Rails app. This is normally a straight forward operation and a boring task at
12+
best but for us, the fun only just started. The table in question was a fairly
13+
large table with lots of reads on it and in the spirit of no down time, this
14+
is the adventure we had.
15+
16+
# The Task
17+
- Add two columns to the notifications table
18+
- Both columns have default values
19+
- Table has 2.2 MM rows!
20+
21+
# Attempt #1
22+
```ruby
23+
class AddPhoneFlagsToNotifications < ActiveRecord::Migration
24+
def change
25+
add_column :notifications, :text_message, :boolean, default: false
26+
add_column :notifications, :call_phone, :boolean, default: false
27+
end
28+
end
29+
```
30+
31+
## Problem
32+
- Migration takes hours!
33+
- The notifications table is locked
34+
- Entire application grinds to a halt
35+
36+
## Reason
37+
- Column creation with default values causes all rows to be touched at the same time
38+
- Updates are a slow operation in Postgres since it has to guarantee consistency
39+
- That guarantee results in whole table locking
40+
41+
## Solution
42+
- Postgres can create null columns extremely fast! Even on a huge table!
43+
- We can split the work to two tasks, creating the columns and populating the default value
44+
45+
# Attempt #2
46+
47+
```ruby
48+
class AddPhoneFlagsToNotifications < ActiveRecord::Migration
49+
def change
50+
add_column :notifications, :text_message, :boolean
51+
add_column :notifications, :call_phone, :boolean
52+
53+
execute <<-SQL
54+
ALTER TABLE notifications
55+
ALTER COLUMN text_message SET DEFAULT false,
56+
ALTER COLUMN call_phone SET DEFAULT false
57+
SQL
58+
59+
last_id = Notification.last.id
60+
batch_size = 10000
61+
(0..last_id).step(batch_size).each do |from_id|
62+
to_id = from_id + batch_size
63+
execute <<-SQL
64+
UPDATE notifications
65+
SET
66+
text_message = false,
67+
call_phone = false
68+
WHERE id BETWEEN #{from_id} AND #{to_id}
69+
SQL
70+
end
71+
end
72+
end
73+
```
74+
75+
## Problem
76+
- Migration takes hours!
77+
- The notifications table is still locked!
78+
- Entire application grinds to a halt
79+
80+
## Reason
81+
- Rails migration tasks are always wrapped in a transaction to allow for rollbacks
82+
- The column adds AND the row updates are in one gigantic transaction!
83+
- Transactions guarantee consistency
84+
- That guarantee results in whole table locking again!
85+
86+
## Solution
87+
- You can disable the transaction handle in Rails migration by calling “disable_ddl_transaction!” in your migration task
88+
- But you have to handle transactions on your own
89+
- We can then run each step in its own transaction
90+
- Add our own error handling to rollback operation
91+
92+
# Attempt #3
93+
94+
```ruby
95+
class AddPhoneFlagsToNotifications < ActiveRecord::Migration
96+
disable_ddl_transaction!
97+
98+
def up
99+
ActiveRecord::Base.transaction do
100+
add_column :notifications, :text_message, :boolean, default: nil
101+
add_column :notifications, :call_phone, :boolean, default: nil
102+
103+
sql = <<-SQL
104+
ALTER TABLE notifications
105+
ALTER COLUMN text_message SET DEFAULT false,
106+
ALTER COLUMN call_phone SET DEFAULT false
107+
SQL
108+
execute(sql)
109+
end
110+
111+
112+
last_id = Notification.last.id
113+
batch_size = 10000
114+
(0..last_id).step(batch_size).each do |from_id|
115+
to_id = from_id + batch_size
116+
ActiveRecord::Base.transaction do
117+
execute <<-SQL
118+
UPDATE notifications
119+
SET
120+
text_message = false,
121+
call_phone = false
122+
WHERE id BETWEEN #{from_id} AND #{to_id}
123+
SQL
124+
end
125+
end
126+
127+
rescue => e
128+
# roll back our work
129+
down
130+
raise e
131+
end
132+
end
133+
```
134+
135+
## Result
136+
- Migration takes hours!
137+
- There is no table locking
138+
- Application is slower due to all the writes to notifications table
139+
- Nothing grinds to a halt
140+
141+
# Takeaways
142+
- Always be mindful of the number of rows affected in the migration
143+
- Be mindful of the transaction size
144+
- Leverage Postgres features
145+
146+
## Possible alternate solution
147+
- Handle NULL case in code to treat as the desired default value
148+
- Add view in database to do mapping for us
149+
- Add trigger to only update rows that are actively queried
150+
- ???

0 commit comments

Comments
 (0)