Make sure your data conversion SQL is easily identifiable/reversible

Praj Basnet
2 min readNov 22, 2020

--

If you work with databases, sooner or later you’ll need to write SQL to “fix” data in a database table (or tables). One key piece of advice is to make sure that the SQL script you come up with is reversible, just in case you need to change things back or make additional changes on the data set that was changed.

Here’s an example of what I mean:

If this was your data conversion script (to flag users as deleted):

update users
set deleted = 1 where id = (
select id from users
where to_be_deleted = 1
and deleted = 0);

Assuming that all deleted users have the flag to_be_deleted set to 1, this script is not reversible. For example, once executed, how would you distinguish the data set you changed (deleted = 1) from all the other users that already have a flag of deleted = 1?

A few options to consider here:

  • If there is a date/time stamp field use that to identify your change date/time (this is probably the most common way).
  • If there is an “updated by” field use that to identify your change by the user who made it.
  • If there is any other column you can use to tag the changes without adversely impacting the data use that (e.g. a comments or notes column).
  • Generate individual update statements for each user id you are updating (use SQL to do this). This is more time consuming to reverse of course but it is still possible to do so.

The key is to think about the scenario, “after running this script, can I identify the data set that was changed easily?”. If not see if you can come up with a reasonable way to do this, it will save you later!

--

--