← All Posts

Your Database is Sacred

Credit: Nils Huber – https://unsplash.com/@nilshuber

The logic looks robust… the code is clean and well-factored… the feature itself works and performs well in QA – we're green for merge!

a typical code review

Are we really ready to merge?

The database is typically the most critical piece of architecture in applications, yet when developing web applications, it's easy to gloss over the importance of reviewing any schema changes during code reviews.

Not everyone needs to be a DBA, with just a little effort, we can achieve a lot. Here are some questions that can provide a steer for code reviews.

Do our changes make sense contextually?

We don't review code changes in isolation without checking how it's integrated into the rest of the application, so we should be doing the same for our schema changes. This means breaking out of the GitHub diff view, and instead viewing the whole schema file.

Without doing this, we can miss opportunities to make simple tweaks that will improve the clarity of our database:

  • we could have duplication – two columns doing very similar things
  • we could have confusing nomenclature – a column may make sense in isolation, but when a second is introduced, it may become confusing, one or both may need renaming
  • it may make sense to break some fields out into their own table (e.g. polymorphism) or even merge denormalise tables back into their respective associations.

Have we used the right data types?

Most RDBMSs support a slew of data types, whatever your poison, you should understand what they are and how they should be configured.

Obviously our goal is to use the most appropriate data types, with their relevant settings (e.g. decimal fields should have precision and scale).

Beyond the basics – RDBMSs like PostgreSQL have some more advanced types, such as arrays or JSONB. If you want to keep your database clean and performant like your code, these are immensely useful, however you shouldn't use them without understanding any impact on performance or querying limitations.

Do we have the correct constraints on the data?

Many people shy away from too much constraint around their database as Rails' validations are much easier to modify as your code evolves, they are easier to test in isolation and if you have validation in both places – you have duplication.

If you're working on a microservices architecture where multiple apps share the same database, it's imperitive you implement a robust set of database constraints as you cannot guarantee that these applications will all share the same validations.

Regardless, I'd recommend you shun that feeling and sacrifice a little DRYness for the ability to sleep at night knowing nobody can accidentally skip validations in their code or via the Rails console and wreck havoc on your data.

Required fields

At the most basic level, required fields should be non-null columns – infact, we should start with fields that way and require justification in making any nullable.

To avoid exceptions with missing data, I've often experienced developers setting default values on non-null columns, but that lets sloppy data land in the database – it's much better to leave the errors and use your application logic to provide user-friendly validation.

def up
  change_column_null :posts, :url_slug, false


To avoid race conditions, for any uniqueness validation you must have a uniqueness constraint (typically as an index) in the database. This will guarantee that no duplicates will ever make it into the table.

def up
  add_index :posts, :url_slug, unique: true

Other validation

If you want to protect your database to the extent you validate data in your app, you can add check constraints. These are fairly straight-forward, and support basically anything you can write to return a boolean in SQL, for example:

ADD CONSTRAINT post_url_slug_must_be_a_slug
CHECK (url_slug ~* '^[a-z0-9]+[\-a-z0-9]*[a-z0-9]+$');

The only limitation is that you'll need to use triggers should you want to reference data outside the table (e.g. performing joins).

For Rails developers, the mv-postgresql and mv-mysql gems provide extensions to natively support a limited set of constraints within migrations. If you need other constraint types, you'll need to switch to the SQL schema format and use ActiveRecord::Base.connection.execute to manually add the constraints.

def up
  validates :posts, :url_slug, format: {
    with: /^[a-z0-9]+[\-a-z0-9]*[a-z0-9]+$/

Referential Integrity

Where you have associations between tables, we can validate the associations at the database-level and not only prevent bad data entering the database but also prevent orphened rows when deleting associated data. To do this you simply need to add add foreign key constraints, again, a feature that should be applied by default.

With Rails 4.2+, you can define these easily in migrations:

add_foreign_key :posts, :users, on_delete: :cascade

Note: the on_delete: :cascade option provides a database-level dependent: :delete_all equivalent.

Unfortunately, for Postgres at least, you cannot add foreign key constraints to polymorphic associations, nor do they work with array columns (despite 2ndQuadrant's efforts).

Can we foresee potential mistakes and prevent them?

If you're dealing with critical data, you'll be wanting to do anything to protect it – not just from malicious attack (I hope you have regular tested backups!), but from human (or developer) error.

Postgres has the concept of rules to aid this:

CREATE RULE prevent_order_deletion

You can also add conditions:

CREATE RULE prevent_cart_deletion
WHERE updated_at > (NOW() - interval '1 month')

In other RDBMSs you can typically use triggers, this would also be beneficial in Postgres as it'll provide the ability to raise exceptions, not just silently swallow the deletion.

CREATE FUNCTION delete_cart () RETURNS trigger AS $$
  IF OLD.updated_at > (NOW() - interval '1 month') THEN
    RAISE EXCEPTION 'Cannot delete carts younger than 1 month.';

CREATE TRIGGER before_delete_cart

Do we have the right indexes?

Credit: Dmitrij Paskevic – https://unsplash.com/zeak

Indexes are absolutely critical for performance. In most cases, provided you have plenty of RAM, it's better to over-index than miss some, as a missing index can mean the difference between even small queries taking milliseconds and minutes.

Assuming you have your index, you'll need to ensure it's the right type. Postgres supports B-trees (the default), common for most RDBMSs, and also GIN and GiST indexes (typically used for array type columns and full-text/geometric data, respectively) – like data types, the key thing here is understanding the different types your RDBMS offers.

For further information on Postgres Indexes, Thoughtbot has a great article on the types available and Heroku has one covering usage.

Adopting schema reviews

For most teams, it's just a case of ensuring your schema is on the list to be reviewed and that the time is available. Most of what's covered here has been pretty basic for many developers, as with a lot of things, it's just usually missing from the process.

Bonus: Cheatsheet

If you use GitHub, you can utilise Pull Request templates to support adoption of reviewing schema changes, here's some Markdown to add to your's:


### Database Review (delete section if there are no changes)

- [ ] Are the new/altered column names clear and understandable?
- [ ] Have we used the most appropriate data types?
- [ ] Are the required fields specified as non-null?
- [ ] Do the unique fields have unique indexes?
- [ ] Do queried fields have appropriate indexes?
- [ ] Do all association fields have foreign key constraints?
- [ ] Are validation constraints in-place for appropriate fields?

Bonus 2: Improved Rails Migrations

If you're developing a Rails application, you can help protect your deployments from broken migrations and database locking by installing the good_migrations and strong_migrations gems, these will enforce good portions.

Photo of Ryan

Hi, I'm Ryan.

I have been working with the web for over 15 years, and I'm Co-Founder & CTO of Shift Commerce, a SaaS ecommerce platform for ambitious retailers.

Running a conference? I'd love to speak — see my speaking page for more.