Wednesday, August 30, 2006

My Database Design Rules

I've been designing and using simple, and complex databases for at least ten years now. These are my rules of thumb database design rules:

Database design:

  1. It is easier to think in terms of Entity Relationships. Entities then become tables. Entity Attributes then become columns. ER can be mapped to classes very easily.
  2. Uses consistent names for fields. If you can, stick with full words or abbreviate very long, but well known columns and tables.
  3. Table names are plurals. Classes, Messages...
  4. Use underscores to separate words.
  5. When naming columns or tales, start with the more general entity, then narrow down. This makes searches and lists much easier. Case Attachments, Case_Notes, User_Roles, User_Groups and so on.
  6. There is no need to prefix tables with tbl or the database name. Have one database or catalog for each application. This is just my personal preference. Since most GUI tools group tables and views, why replicate this? What good would it make to prefix eveything with tbl or v?
  7. Normalize, normalize and normalize even more. However you may need to copy totals, and some user entered or varying data in some tables and denormalize or duplicate. An example is for some critical applications, e.g. financial apps, the lookup data may change, and it may be needed later. E.g. lookup of a location of a terminal may be copied to a transaction record, even though the terminal ID, which contains the location is present in the transaction record. In this case copy the lookup data to the table.
  8. This is related to above point. To make your queries simpler, denormalize some tables by creating some detail views that lookup the master data from the master tables. You can then simplify alot of your queries and reports by not linking the master tables every time you need master data in detail report.
  9. For enterprise applications, it is best to have two columns in every table as: USER_ID and SYSTEM_DATE. Name these two columns anything you want, but use the same name in all tables. I used CREATED_BY and CREATE_DATE sometime. These two columns must be set by triggers and must get the values from the system. If the tables are critical, have the triggers update audit tables (TABLE_NAME_AUDIT) with before and after data for the main table. The point is, these USER_ID and SYSTEM_DATE columns, and the audit tables must be maintained at the database level. This way the application never needs to care about updating them, and the database will contain correct data about each record, no matter what the source of the change is. Come on, even MySQL has triggers now, so there is no excuse.
  10. I had a very strong argument with my Masters Degree DB teacher about this one, and I finally convinced him: If you have many-to-many relationship, you did something wrong. The relationship should be broken down to another entity (table) and then you will see that this link table probably has some more attributes to it. SO... it is not a many-to-many relationship after all. This many-to-many relationship is a new entity.

About Primary Keys:

  1. You must have one column called ID for each table. This must be the primary key for the table, and if you can, have it auto-increment. Even if it does not seem needed at first, it will be later. What if you need to delete or update that single record from the table? you have it's ID.
  2. Use this ID, and only this ID, as link in foreign tables, and prefix the foreign key column with the foreign table name, and an underscore. Always use database constraints for foreign keys. For example, you have one to many relationship between cases and attachments. In Cases, there is ID column. In Attachments, you have ID (Prikey) and Case_ID. The name of the foreign key column must be the singular form of the foreign table.
  3. Never EVER EVER use user entered data as a primary key. If user entered data must be unique, then use a unique key constraint. This way all your links will not be affected if a user enters the wrong "key" and everything must be in sync again. It may seem overkill, but trust me on this.


BuZain said...

Great tips mate. I would have this as a reference whenever I build databases. It would be nice though to give a detailed example to showcase your ideas. It would make it easier for the readers to follow.

Ayman said...

Thanks mate. I'll elaborate on some points in another post.

Shark Hunter said...

Through my readings and experience with Dimensional Modelling, performance seem to rule out many of the brilliant ER concepts such as normalisation. Afterall what you're trying to do is to increase performance without having to construct large or complex queries by linking a number of tables. Other than that may I add that be careful with dates and time fields. Date and time seem to always confuse me so I always use convert function to ensure my comparisions are correct.

Ayman said...

I think at some level, especially starting out from scratch, ER may be the best. You still need to denormalize for some kinds of queries. In my post I mentioned creating denormalized views. For some kinds of queries and analysis, linking tables is not always the best approach. You may need to create summary tables for performance reasons. But generally, for starting out a design, IMHO, ER and normalized data is the best way to start laying out complex databases. Normalized data is the best link to programming in Objects.
I do agree that normalizations may not be the best in some cases, but hey, rules can be broken. :-)

Anonymous said...

I do not support normalizing tables beyond second normal form in any application. It just adds complexity in designing queries and create major performance bottle necks.

Just Google it!