Category: Partitioning

June 12th, 2013 by mathias

Wrap-up

This is the last post in this series and I’ll not introduce anything new here, but rather just summarise the changes explained and talk a bit about the value the solution delivers to the organisation.

Let’s first review the situation we faced before implementing the changes.

The cost of writing the log-records to the database was that all the parallell writing from many different sources was such that it introduced severe bottlenecks to the point that the logging feature had to be turned off days at a time. This was not acceptable but rather than shutting down the whole system which would put lives in immediate danger, this was the only option available. Then even if that would have been fast enough, the moving of data was taking over twice the time available and it was fast approaching the point where data written in 24 hours would take more time to move to the historical store for log-data. That would of course have resulted in an ever growing backlog even if the data move was on 24×7. On top of that the data took up 1.5 TB of disk space, costing a lot of money and raising concerns with out ability to move it to EXADATA.

To resolve the issue during business hours of having contention causing a crippling impact on the overall system, we changed the table setup to not have any primary keys, no foreign keys and no indexes. We made the tables partitioned such that we get one partition per day.

To make the move from operational tables to historical tables faster, we opted to have both in the same instance on EXADATA. This allowed us to use partition exchange to swap out the partition from the operational table and swap it into the historical table. This took just a second as all we did was updating some metadata for which table the partition belongs to. Note that this ultra fast operation replaced a process that used to take around 16 hours, for which we had 6.5 and the time it took was expanding as business was growing.

Finally, to reduce the space consumed on disk we used HCC – Hybrid Columnar Compression. This is an EXADATA only feature for compressing data such that columns with repeating values gets a very good compression ratio. We went from 1.5 TB to just over 100 GB. This means that even with no purging of data it would take us over five years to even get back to the amount of storage this used to require.

So in summary

  • During business hours we use 20% of the computing power and even less of the wall clock time it used to take,
  • The time to move data to the historical store was reduced from around 16 hours to less than one second.
  • Disk space requirement was reduced from 1.5 TB to just over 100 GB.

And all of this was done without changing one line of code, in fact there was no rebuild, no configuration change or anything to allow this drastic improvement to work with all the different systems that was writing to these log-tables.

One more thing to point out here is that all these changes was done without using traditional SQL. The fact that it  is an RDBMS does not mean that we have to use SQL to resolve every problem. In fact, SQL is often not the best tool for the job. It is also worth to note that these kinds of optimisations cannot be done by an ORM, it is not what they do. This is what your performance or database architect needs to do for you.

For easy lookup, here are links to the posts in this series.

  1. Introduction
  2. Writing log records
  3. Moving to history tables
  4. Reducing storage requirements
  5. Wrap-up (this post)

Posted in DBA, EXA, Oracle, Partitioning, Performance Tagged with: , , ,

June 5th, 2013 by mathias

Reducing storage requirements

In the last post in this series I talked about how we sped up the move of data from operational to historical tables from around 16 hours down to just seconds. You find that post here.

The last area of concern was the amount of storage this took and would take in the future. As it was currently taking 1.5 TB it would be a fairly large chunk of the available storage and that raised concerns for capacity planning and for availability of space on the EXADATA for other systems we had plans to move there.

We set out to see what we could do to both estimate max disk utilisation this disk space would reach as well as what we could do to minimize the needed disk space. There were two considerations  minimize disk utilisation at the same time as query time should not be worsened. Both these were of course to be achieved without adding a large load to the system, especially not during business hours.

The first attempt was to just compress one of the tables with the traditional table compression. After running the test across the set of tables we worked with, we noticed a compression ratio of 57%. Not bad, not bad at all. However, this was now to be using an EXADATA. One of the technologies that are EXADATA only (to be more technically correct, only available with Oracle branded storage) is HCC. HCC stands for Hybrid Columnar Compression. I will not explain how it is different from normal compression in this post, but as the name indicates the compression is based around columns rather than on rows as traditional compression is. This can achieve even better results, at least that is the theory and the marketing for EXADATA says that this is part of the magic sause of EXADATA. Time to take it out for a spin.

After having set it up for our tables having the same exact content as we had with the normal compression, we had a compression rate of 90%. That is 90% of the needed storage was reduced by using HCC. I tested the different options available for the compression (query high and low as well as archive high and low), and ended up choosing query high. My reasoning there was that the compression rate of query high over query low was improved enough and the processing power needed was well worth it. I got identical results on query high and archive low. It took the same time, resulted in the same size dataset and querying took the same time. I could not tell that they were different in any way. Archive high however  is a different beast. It took about four times the processing power to compress and querying too longer and used more resources too. As this is a dataset I expect the users to want to run more and more queries against when they see that it can be done in a matter of seconds, my choice was easy, query high was easily the best for us.

How do we implement it then? Setting a table to compress query high and then run normal inserts against it is not achieving a lot. There is some savings with it, but it is just marginal compared to what can be achieved. For HCC to kick in, we need direct path writes to occur. As this data is written once and never updated, we can get everything compressed once the processing day is over. Thus, we set up a job to run thirty minutes past midnight which compressed the previous days partition. This is just one line in the job that does the move of the partitions described in the last post in this series.

The compression of  one very active day takes less than two minutes. In fact, the whole job to move and compress has run in less than 15 seconds for each days compression since we took this solution live a while back. That is a time well worth the 90% saving in disk consumption we achieve.

It is worth to note that while HCC is an EXADATA feature not available in most Oracle databases, traditional compression is available. Some forms of it requires licensing, but it is available so while you may not get the same ratio as described in this post you can get a big reduction in disk space consumption using the compression method available to you.

With this part the last piece of the puzzle fell in place and there were no concerns left with the plan for fixing the issues the organisation had with managing this log data. The next post in this serie will summarise and wrap up what was achieved with the changes described in this serie.

Posted in DBA, Oracle, Partitioning, Performance Tagged with: , , ,

May 29th, 2013 by mathias

Moving to history tables

In the last post I talked about how we made the speed of actually writing all those log-records much faster. It has to date been so fast that no a single report of a problem has been filed. you find that post here.

Once the data was written to the log-tables, it had to be moved to the history tables.This was a process that took around 16 hours. It was never allowed to run for that long as it had to be stopped before the business day started.

This move was done from an instance on EXADATA to a database on an old server. Yes, I can hear you all think “AHA! It must be the slow database link.”. That was the leading thought when I started looking at it. And yes it sure was slow, but no it was not having a big impact. The other area that had been tuned and tweaked over and over and over and … was the insert over the database link to the history tables. Sure enough it was taking a long time. However measuring it showed that it only accounted for 20% of the total time. Reducing that would let us save over three hours. While that would be good, where did the rest of the time go?

It went to a place no one had suspected. Nor tweaked Nor had any measurements been made. What else is new?

It was the part of the process that was EXADATA only. It was the delete of the rows that had been moved. Huh? How could this be? Well it turns out that deleting data based on an in-clause was not as fast as one would think (or at least want). The process was based on selecting a set of primary key values and putting them into a temporary table, this table was then used to identify rows to insert into the history table and to delete rows.

Yes, there are quite a few things in this process that one could attempt to optimise  However, no matter what, the speed would probably not be fast enough. If it ended up being, would it handle the projected growth of the business? And is there really no better way than essentially row by row processing?

Sure there is. Sometimes SQL is not the best or only tool at our disposal.

Everything doesn’t have to be done with SQL. ™

We had already removed the indexes, so the delete should now be faster. It was, just barely fast enough. Thus, just with that change we had squeezed into the seemingly unattainable window for this process. But business is growing and we would within weeks be back to tuning and tweaking.

Back to the idea of not using SQL for everything. But first, let’s revisit the approach that led to success with the write speed. What assumptions are made that we can question? Well… Why are we writing this data over a database link to the slowest database we have in our production environment? It has always been that way and yes we’re worried about the impact of doing this on the EXADATA. Both the impact of letting ad-hoc searches be made as well as the impact of storing all this data on the EXADATA. The storage concern is well founded as the log-data takes up close to 1.5 TB and the volume of logs written are increasing.

However, when we question this we all agree that these are assumed problems and assumed solutions to those problems. Based on that a PoC is produced to show what would happen if we could keep the historic log data in the same database instance on the EXADATA.

With the historic tables in the same database, we get a whole new set of tools to use. I build a PoC showing how data can be moved from the operational tables (the one logs are written to) to the historic ones in under a second for the whole days volume. To achieve this I partition the table on range where the partition key is the time when the log was inserted. Next part is to use a technology in the database called exchange partition.

When exchanging a partition, no data is actually moved. The partition with todays data is via exchange partition moved from table A to table B. However, this move is only updating metadata in the database. That is to say that the only change was to specify which table the partition belongs to. The rows in the partition remains in the same exact physical location on disk as they were from the beginning. They are not changed, not read, and not touched in any way.

This is what makes such a move so fast. Even better, it is transactionally safe. If a query started while it belonged to table A, it will be read even though it was moved to another table in the middle of that query. Queries on table A that starts after the move will of course not see the moved data at all.

Sub-second moving data of millions or billions rows is something that cannot be done with SQL no matter how much one tunes the SQL. So again, SQL is not the only tool at your disposal.

With this we proved that the process can be fast enough. I have not discussed it here, but during this process we also showed that the ad-hoc searches was of no concern either. EXADATA smart scan handles the actual ad-hoc queries very well and most of them are actually sub-second response time even with no indexes. This is for 1 billion+ row tables. Yes smart scan is one part of it and storage indexes is another. I will not discuss those in these posts, but take my word for it, when the time they took were presented the concern was immediately forgotten.

In the next post in this series, I will discuss how we dealt with the concern over the amount of disk space we would use now and in the future if we let the historic data stay on the EXADATA.

Posted in DBA, Partitioning, Performance Tagged with: , ,

May 22nd, 2013 by mathias

Writing log records

The last post in this series introduced the problem briefly. You find that post here.

In this post I’ll talk about the changes made to make that writing of log records fast enough. There were 50 million records that was written. Each of them pretty much in its own transaction. Of course the commit activity caused problem, as did log buffer issues. Some of this could be somewhat remedied with configuration.

The big issue though was that the writes themselves took too much time and too often many session ended up in long contention chains. Yes, it would have been great to have the luxury of redesigning the whole logging situation from the ground up. But, as is often the case, the solution was built such that all systems connecting were implemented in such a way that redesigning was not an option. Fixing the performance of this had to be done without requiring code changes to the systems performing the logging. Oh, joy.

So what caused the problem then? For the inserts it was pretty straight forward. Too many transactions making an insert and a commit. This caused indexes to be hotspots where all processes wanted to write at the same spot. Hash-partitioning had been introduced and that had led to less contention but slower performance. As the partitions existed on different parts on the disks the write head had to be constantly moved and that caused slower service times.

What could we do to make a big improvement while not affecting the code? We’re not talking about just 10-20% of improvement on any area in this case, and even more important was to make the performance stable. That is, the most important thing was to ensure that there were no spikes where an insert suddenly to 20 times longer than usual. The contention chains that was occurring made performance spike such that the whole system became unusable.

The solution here turned out to be something so far from advanced technologies as questioning assumptions. The first time I asked “why do we have these indexes”, most people in the room thought I was just joking around. Eventually they realised that I was serious. After an amusing period of silence where I could see them thinking “Do we need to inform him that indexes are needed to enforce uniqueness and to support referential integrity?”, someone went ahead and did just that. OK, now we were on to a productive discussion, as of course that wasn’t what I meant. The followup discussion about why we needed referential integrity and uniqueness for this set of data was very enlightening for everyone. To make a long story short, it was not needed at all. It was there because it had always been there and nobody had questioned the need before.

How come we didn’t need data to be unique? Well, this is log-data. That is it tells us what actions has been performed by the system. If some activity would be reported twice, it really wouldn’t be the end of the world. The possible problem that some activity isn’t logged cannot be handled with defining unique constraints. That is pure system design and nothing I could improve or worsen by removing some indexes.

Thus, the indexes was removed together with the foreign keys (referential integrity).

Sounds simple enough, but did it help? Did it ever! In one month after making the change, there has not been one report of one transaction that was anywhere close to take too long. This simple solution made the logging so fast that it is no longer a concern.

The next post in this series will discuss the solution for moving data to the history tables. This process took around 16 hours and it had to become at least three times as fast. As you’ll see, moving all these rows can be done much faster than so.

Posted in DBA, Partitioning, Performance Tagged with: , , ,

May 15th, 2013 by mathias

Introduction

In my work among other things I tune and tweak solutions for EXADATA. Today I’ll write about a big improvement we achieved with a process that moves data from the operational tables to the ones where the history is stored.

This will not be a technical post. While I talk about using advanced technologies, I will not discuss code or deep details of them in this post.

And yes, when I say post, I mean a series of posts. This will be too long to be a single post. I’ll break it up into an introduction and then a post on each area of improvement.

Let’s first discuss the before situation. This set of tables are logged to during the day. These log records are needed both to investigate how transactions were executed as well as to satisfy legal requirements. It is in a highly regulated industry and for good reason as mistakes could put someone’s life in danger.

In this situation the solutions were writing around 50 million log records per day to five tables. These tables all had a primary key based on a sequence and there was also referential integrity set up. This means that for the indexes, all processes were writing to the same place on disk. The lookup on the referential integrity was also looking at the same place. An attempt to remedy some of this had been made by hash partitioning the tables. The write activity was intense enough during the day that most of the logging had to be turned off as the solution otherwise was too slow. This of course has legal as well as diagnostic implications.

What’s worse is that once all that data was written, it had to be moved to another database where the history is kept. This process was even slower and the estimate for how long it would take to move one days worth of data was 16 hours. It never did run for that long as it was not allowed to run during the day, it had to start after midnight and finish before 7 am. As a result the volume built up every night until logging was turned off for a while and the move then caught up a little every night.

This series will have the following parts:

  1. Introduction (this post)
  2. Writing log records
  3. Moving to history tables
  4. Reducing storage requirements
  5. Wrap-up and summary

The plan is to publish one part each week. Hopefully I’ll have time to publish some more technical posts between the posts in this series.

Posted in DBA, Partitioning, Performance Tagged with: , , ,

May 31st, 2007 by mathias

I'll give myself a quota for no more than one rant a month. However, one of my pet peeves is how Oracle licenses the database. I'l probably return to why I dislike the general model in a future rant, but this on e will be specific to partitioning.

To use partitioning, you have to pay for the Enterprise Edition of the database at $40,000 per CPU and then pay an additional $10,000 for the partitioning option. That is a lot of money if all I really need is SE One (at $5,000 per CPU) with Partitioning. Unfortunately, that is not an option.

I know partitioning used to be a fairly exotic feature. That was however over 10 years ago, when few datbases were really large and few SQL statements would benefit from parallel execution. As the average database has grown to be really large and most database servers now are multi CPU (and core) servers, this should now be treated as core database functionality.

In many cases it doesn't matter as large companies has site wide use of Oracle DB technology. It does, however, matter for smaller companies and startup companies. If you were to build a new solution today and market it initially to small companies or even run it as your own ASP, you would not be able to design with partitioning. Later on, your solution will be created and the cost of rearchitecting for partitioning is unacceptable.

Even worse is when you're taking an existing application and trying to move it downstream where SE One has to be used to meet the budgets those implementations have. Why spend a lot of time of a lot of good resources to implement "manual" partitioning?

I think Oracle would benefit from making partitioning included in all licenses. One thing it would do would tie customers closer to Oracle as the cost of converting partitioning from one DB to another is high due to the lack of standard even for the DDL used. I think it would also remove another reason for customers to consider other databases. I think licensing should be such that customers can scale with the database and get started on what they need. The situation today is such that building small and needing partitioning always requires a discussion of "can we replace Oracle in our technology stack".

By having partitioning as an included feature, companies could design for it when they begin designing new applications and get the use of it as they grow. That helps Oracle attract small companies that hopefully grow into large customers over time. It also helps in not getting people convinced that Oracle doesn't scale as using partitioning was too expensive when they started and now it's too costly (or politically impossible) to change it.

Designing manual partitioning schemes is really a waste of my and other Oracle professionals time. So… Please Oracle, it is time to revisit how licensing partitioning is done. It is a feature we need to put in the core toolbox and get people to make better use of the database with it. I hope 11G comes with a new licensing strategy for this. I believe Oracle would make a lot of money on making this an included feature in all editions of the database.

That's the partitioning rant. I know others have similar ideas on this as I've read some recent blogs on partitioning. I love Oracle's partitioning, but I hate having to use it only for the largest customers. I really dislike having to think of and implement manual partitioning, I know it will n ot scale as well as database driven partitioning and I still have to waste my time on it for financial reasons. Even worse is having to discuss if we should avoid Oracle altogether as we cannot use partitioning and then maybe a cheaper database will work just as well for us. I don't want to have to avoid the best database over financial details.

If one database is the best for the job, then I want a licensing model that allows me to start small and pay more as my need grows (meaning using SE One to start with and grow to EE while I design for and with partitioning the whole time).

Posted in DBA, Licensing, Oracle, Partitioning