Renaming a Database Table In-Flight
by Arvind Paul
Most startups run multiple experiments in the early stages of achieving Product-Market fit and Outschool is no exception. A natural engineering consequence is that certain areas of the physical data model become obsolete or in some cases, repurposed. As part of managing technical debt efficiently, we need to consider how to update the database schema on occasion to keep our codebase as self-documenting as possible without disrupting production and external analytics consumers.
Recently, we had an opportunity for one such refactor. While Outschool is primarily a “direct to consumer” marketplace, we also have direct relationships with organizations such as schools, school districts, affiliate partners, companies, and community partners. Some of these partnerships were forged fairly early in our evolution, whereas other use cases were more opportunistic. As a result, we had client data dispersed across three different tables. Schools that enrolled their learners in Outschool classes were stored in the
private_schools table (reflecting our early work with Charter Schools). Schools and districts that supplied teachers and classes were stored in the
educational_orgs table. We also had organization data embedded in the
company_wallets table for employers that offer Outschool funds to their employees as a benefit option.
The above fragmentation posed challenges as we scaled, primarily around reporting. For instance, a school or district could both supply classes and enroll learners, but their data would be spread across multiple tables. Also, in integrating with our CRM, we had multiple sources of truth. As part of an effort to unify the UX for administrators at our corporate partners, we decided to consolidate the different sources of truth into a single, unified data model. The challenge then became how we would accomplish this while maintaining backwards compatibility and without any disruption of service, given that these tables were heavily referenced by foreign keys in transactional tables. Here, I will talk about our strategy for one specific aspect of the migration.
private_schools table had the most data and also had most of the columns we would need in the new, unified table, it made sense to use this table as our starting point. However, we needed a more generic name for this table to reflect its new purpose. So, it was decided to rename the table as
corporate_customers after some debate. (Not easy finding a generic name - naming is hard!) We also decided to add a
type column that would serve to disambiguate existing data from new data that would later be added to the same table.
Outlined below are the steps we undertook to rename the table without breaking existing code paths and without any disruption to our production systems. The steps were executed sequentially in production across multiple days.
Step 1: Create an empty table. We created the
corporate_customers table with a schema identical to the
private_schools table, except that it had a new type column with a default value. We also added nullable foreign keys to all the tables which were already referencing the
Step 2: Migrate data to the new table; replace the existing table with a view. This was accomplished in a single migration that did the following:
- Existing foreign key constraints to the
private_schoolstable were removed without dropping the actual column in the referencing table.
- Triggers were suspended on the
corporate_customerstable in anticipation of the data migration.
private_schoolstable was renamed to
private_schools_copy_in_progress. This was important, to prevent writes to the old table while data was being copied over.
- All data, including the primary key, was copied over from the
private_schools_copy_in_progresstable to the
corporate_customerstable. At this point, the two tables looked identical.
- Triggers were re-enabled on the
- A view called
private_schoolswas created, selecting from the
corporate_customerstable and filtering on the new type column on that table. (From PostgreSQL 9.3 onward, simple views are automatically updatable, meaning that inserts, updates and deletes against the underlying table are supported through the view.) This allows for backward compatibility until all code references to the old table have been updated.
- Foreign key constraints were reestablished on the
private_school_idcolumns, now referencing the
private_schools_copy_in_progresstable was dropped.
At this point, all our code still referenced the
private_schools table. Since this was now an updatable view, all CRUD statements still worked flawlessly. Also, the code continued to update and reference the old foreign keys (
private_school_id); the new foreign keys (
Step 3: Update write queries. Next, we visited all code references where the old primary key to the
private_schools table (
private_school_id) was being written to. We expanded the code to also write to the new foreign key (
Step 4: Copy data to new foreign keys. We copied data from the old to the new foreign key columns. We also enforced not-nullability of these foreign keys where it made sense to do so, knowing that writes would include both the old and the new foreign keys after Step 3.
Step 5: Update read queries. The final piece was to update all code references reading from the old table (
private_schools) or the old foreign keys (
private_school_id) to the new table and foreign keys.
We were able to run the migrations successfully without any disruption. The only additional precaution we took was that we ran the change scripts associated with replacing the existing table with a view during a low-traffic window to minimize the risk of lock contention. We have not yet dropped the
private_schools view or old foreign key columns because of potential dependence of analytics queries on the old schema. When our logs confirm that there is no usage of the old table and columns, they can be safely dropped.
While the above serves as a good generic strategy for renaming a table in-flight, we realized during our post-mortem that we’d missed a trick in this specific instance. The table being renamed featured infrequent writes that we controlled, given that only administrators had access to modify data. Given this, we could have optimized further for read availability by copying the data to the new table without renaming the old. This would have eliminated unavailability of the old table during the data copy window. As engineers, we’re bound to find at least one improvement for the next time!