Direct naar content

Efficient cloud database migration. Prevent a data bomb

“The cloud or out.” A witty slogan once coined by a well-known consulting firm. Admittedly, there is something to it. The cloud offers unlimited scalability and availability, without having to make huge investments. Still, a word of caution is in order. Understanding the cost drivers of cloud solutions is very important. Because scalability is great, but a spontaneously scaling bill won’t make anyone happy. In this blog, Tino Dudink, DBA Consultant and Senior Database Reliability Engineer, explores the importance of careful planning and data modeling when migrating to a Cloud database. If you don’t, you may well have to deal with uncontrolled data growth.

Tino Dudink

DBA Consultant en Senior Database Reliability Engineer
Tino Dudink - DBA Consultant en Senior Database Reliability Engineer

Behind the cloud shines the… cost

The cloud offers flexibility and scalability, no doubt about that. But is it always the most cost-effective solution? That remains to be seen. The bill always comes afterwards, and expenses can continue to rise unnoticed.

Whether it’s Infrastructure as a Service (IaaS), Platform as a Service (PaaS) or Database as a Service (DBaaS), all cloud providers offer different levels of service and flexibility, with the components of compute power (CPU), storage (diskspace), working memory (RAM), and network (I/O) affecting the final cost. Let’s take a closer look at the impact of using (cloud) storage on cost (cloudspend or Opex).

User data in the cloud

A database stores a lot of user data that usually deals with dimensions such as customer, time, product, product unit, price and contract. How much space that data takes up depends on your data model. Before we get into cost, I want to illustrate how much influence the data model has. I’ll do that using two examples.

Using date and time fields

Suppose that in a time field you record not only the date and time (hours, minutes, seconds), but also fractions of seconds to the nearest microsecond. Super accurate, of course, but is it always necessary? Maybe if your name is Jac Orie, but if you keep track of contract dates, it seems a bit excessive to me.

By simplifying this field to just a date, hours and minutes, you can save significantly on storage space. In a database with millions of records, this difference in data size can lead to a reduction in total storage requirements by as much as forty percent!

Optimizing text fields

‘Varchar’ is a data type that stands for ‘variable number of characters’. It is used to store text of variable length. For example, ‘varchar(50)’ means that the field can contain up to 50 characters. It may seem efficient to use this data type for shorter text fields to ensure flexibility.

But beware, it can also cause you to take up unnecessary storage space. Take, for example, a field for zip codes. In the Netherlands, this always consists of six characters. By defining this field as “varchar(50),” you are reserving much more space than necessary. This can lead to significantly larger storage requirements, both for the data itself and for the indexes in which these fields are included.

In practice

Let’s take a mathematical example. Imagine this. You have a business with a thousand customers, a modest number compared to the big players in the industry. My guess is that the local bookstore already has a thousand customers. But then look at the amount of data these thousand customers generate.

In just one year, these customers can generate a storage requirement of more than twenty gigabytes. Sounds pretty manageable, right? But what if the company is successful and expands its customer base to 100,000 or even 1,000,000 customers? Take a Dutch e-commerce company that focuses on home, cooking and lifestyle products. Not even a very extremely large player, but with substantial growth ambitions. If you haven’t looked closely at your data model, the storage requirements will grow exponentially.

| Curious about the exact calculation example? Send an email and you will receive it in your mailbox!

Now you may be thinking: nice math example Tino, but that’s all theory. It won’t be such a big deal, right? Make no mistake, I recently experienced this myself with a customer. If that customer had just migrated all his data to the cloud without much thought, his database would have grown from a modest 2 terabytes to a gigantic 180 terabytes in three years. Yes, you read correctly, a growth of 8,900 percent!

Cloud versus on-premise: a strategic trade-off

As you can see, even a small change in the data model – perhaps negligible at first glance – can have a huge impact on storage requirements and thus costs. This is true for both on-premise systems and cloud storage, but with one key difference: in the cloud, there is no natural “brake” on data growth. Whereas with an on-premise system the physical limitation of disk space sets a limit and makes you consciously think about whether or not to invest extra, the cloud offers seemingly unlimited capacity.

Add to that the cost of computing power, memory and the number of transactions. It’s like shopping with a credit card without a limit: before you know it, you’re unexpectedly faced with a big fat bill. If you’re not very alert to that, it can be a business risk.

Prevent a data bomb in the cloud

Clearly, careful planning and data modeling are indispensable to prevent such uncontrolled data growth. This is true on-premise, but even more so when migrating to the cloud. A hybrid solution, where you combine the benefits of both cloud and on-premise storage, can be an effective strategy. This provides the flexibility of the cloud for scalability where it needs to be and as a fallback option for emergencies, while at the same time you retain the benefit of onpremise: control over data and governance.

In short: economy with diligence

In 1987, you were a hero if you got a computer with a 10 megabyte hard drive through a PC private project. In those days, if you had to build a database, you had no more space than those 10 megabytes. That made you naturally keen on data fields and data types.

Because of this natural brake, you built the most efficient databases. For the “youth of today” that storage (disk space) seems unlimited. As a result, you quickly step into a trap: too large data fields, inefficient data types and too wide indexes. Not because you have to, but because you can. Then the law of large numbers comes into play. As you grow – in number of transactions or number of customers – the risk is that your database will eventually grow to – say – 500 terabytes or more in a relatively short period of time.

Want to know more?

Are you considering migrating to the cloud and wondering what the effect will be on your business processes? OptimaData can provide support in making the right choices, going through a migration readiness program or mapping cloud and management costs via a benchmark. Instantly having a Trusted Advisor behind you for “suppose it’s necessary” and “who do you gonna call“?Feel free to contact us.