Reducing storage requirements
In the last post in this series I talked about how we sped up the move of data from operational to historical tables from around 16 hours down to just seconds. You find that post here.
The last area of concern was the amount of storage this took and would take in the future. As it was currently taking 1.5 TB it would be a fairly large chunk of the available storage and that raised concerns for capacity planning and for availability of space on the EXADATA for other systems we had plans to move there.
We set out to see what we could do to both estimate max disk utilisation this disk space would reach as well as what we could do to minimize the needed disk space. There were two considerations minimize disk utilisation at the same time as query time should not be worsened. Both these were of course to be achieved without adding a large load to the system, especially not during business hours.
The first attempt was to just compress one of the tables with the traditional table compression. After running the test across the set of tables we worked with, we noticed a compression ratio of 57%. Not bad, not bad at all. However, this was now to be using an EXADATA. One of the technologies that are EXADATA only (to be more technically correct, only available with Oracle branded storage) is HCC. HCC stands for Hybrid Columnar Compression. I will not explain how it is different from normal compression in this post, but as the name indicates the compression is based around columns rather than on rows as traditional compression is. This can achieve even better results, at least that is the theory and the marketing for EXADATA says that this is part of the magic sause of EXADATA. Time to take it out for a spin.
After having set it up for our tables having the same exact content as we had with the normal compression, we had a compression rate of 90%. That is 90% of the needed storage was reduced by using HCC. I tested the different options available for the compression (query high and low as well as archive high and low), and ended up choosing query high. My reasoning there was that the compression rate of query high over query low was improved enough and the processing power needed was well worth it. I got identical results on query high and archive low. It took the same time, resulted in the same size dataset and querying took the same time. I could not tell that they were different in any way. Archive high however is a different beast. It took about four times the processing power to compress and querying too longer and used more resources too. As this is a dataset I expect the users to want to run more and more queries against when they see that it can be done in a matter of seconds, my choice was easy, query high was easily the best for us.
How do we implement it then? Setting a table to compress query high and then run normal inserts against it is not achieving a lot. There is some savings with it, but it is just marginal compared to what can be achieved. For HCC to kick in, we need direct path writes to occur. As this data is written once and never updated, we can get everything compressed once the processing day is over. Thus, we set up a job to run thirty minutes past midnight which compressed the previous days partition. This is just one line in the job that does the move of the partitions described in the last post in this series.
The compression of one very active day takes less than two minutes. In fact, the whole job to move and compress has run in less than 15 seconds for each days compression since we took this solution live a while back. That is a time well worth the 90% saving in disk consumption we achieve.
It is worth to note that while HCC is an EXADATA feature not available in most Oracle databases, traditional compression is available. Some forms of it requires licensing, but it is available so while you may not get the same ratio as described in this post you can get a big reduction in disk space consumption using the compression method available to you.
With this part the last piece of the puzzle fell in place and there were no concerns left with the plan for fixing the issues the organisation had with managing this log data. The next post in this serie will summarise and wrap up what was achieved with the changes described in this serie.