Database schemas
Databases are present in almost all kinds of application that need to store information in some form or the other. Web applications like blogs, cms, social networking sites or business applications etc all have a database along with the code. The database design or schema determines how the tables and their relationships are constructed. Database design is a crucial component in the overall efficiency of a database application.
In this series of posts, I am going to talk about some tips and techniques that can assist in improving the design of your databases in order to make them faster and more efficient and easier to manage. As a disclaimer, these tips are only meant for freelancers/newbies/kids who are making small or medium sized applications every now and then. We wont be talking about optimising a database for a bank or facebook may be, since I do not hold that much expertise. Most of these tips are my personal observations from the past projects that I did over the last decade.
We shall take up each technique as a discrete point and discuss it at length. I would also recommend you to read up topics like database normalisation and indexes which are commonly talked about in database optimisation.
Reduce/eliminate lookup tables
Lookup tables are probably the first things that popup when normalisation is attempted on a database. Yes, that is somewhat breaking down a table into multiple smaller tables for various benefits. But sometimes this goes too far without being noticed. Normalisation would often lead to creating lookup tables, but lookup tables don't always result into normalisation.
For example if you have table called sales, it will have a field called customer_id which is the id of the customer who took the sale. The customer_id in most cases is the primary key from the customers table. Now this is a very basic principle and step of normalisation. The customer data is cleanly stored in a separate table and then the id is used in other tables to record the purchases, payments and reward redemptions etc made by the customer.
For these cases it is fully justified to have lookup tables. But there are cases when lookup tables become an extra burden. Lets take another example.
The table payments records the payments made by the customers
| id | customer_id | date | amount | payment_mode |
The payment_mode field indicates how the payment was made, lets say the values are Cash, Cheque, Card, Paypal, Eft etc. Now going by the rules of normalisation it is tempting to create another lookup table called payment_modes which would have an id field and a name field containing each of the above mentioned payment modes. Then the payment_mode id can be used in the payment_mode table. I remember doing this long ago in one of my first projects.
payment_modes | id | name |----|-------- | 1 | CASH | 2 | CHEQUE | 3 | CARD | 4 | PAYPAL | 5 | EFT | 6 | CREDIT | 7 | STOLEN .......
Actually I thought that my client might need to add additional payment methods later. Now over here its totally un-necessary to create a lookup table like that. The payment mode can be directly written into the payment_mode column in the payments table.
| id | customer_id | date | amount | payment_mode |----|-------------|------|--------|------------- | 1 | 411 | .... | 4000 | CHEQUE
If you are not too comfortable writing the entire thing "CHEQUE" in the payment_mode column, believing that it would waste a lot of space then feel free to make it shorter. Create codes like CSH, CHQ, PYP, EFT. The rest of the logic has to be handled in the application code, so that it displays CSH as Cash and CHQ as Cheque.
Speaking of efficiency, it is almost the same (to the extent you can measure) whether you use a numeric id for the payment mode or a text acronym like CSH, CHQ, PYP, EFT. You can create more codes with 3 characters than you would ever need. Another important thing to note is that this "data set" of "payment modes" will not change much over time. It is theoretically supposed to stay constant. Lookup tables can give users the option to add more entries later if required, but there are clear cases when they wont ever need, or the need would be too sparse.
You might wonder about the efficiency of search (using LIKE or = ) on the column of payment_mode. Just index the column and then the search speed would be same as it would be in case of a numeric value.
So the key things to observe in lookup tables are
1. Is the lookup table just a single column table serving as a mere list. 2. Will the data in it stay nearly constant over the application lifetime. 3. Is it a finite/limited sized set.
If the above points hold for any lookup table, then there are high chances that you can eliminate them from the database and port some logic into the application layer.
There are some other very obvious examples of lookup tables like titles, country codes, language codes etc that do not need any lookup table and have universal codes. So you can just store US in the country column rather than some ID that leads to another table to find the country name.
The Benefits
1. Removing extra tables makes your database lighter.
2. Reduces the number of joins in your queries.
3. Lesser overhead when exporting/importing the tables to other file formats like excel.
That was a short talk about lookup tables. Hope that helps you in some of your projects if you are not already an expert. Will talk about more techniques soon. If you have some feedback then dont forget to comment.
thanks for pointing it out, fixed it.