Category: Commit

April 23rd, 2007 by mathias

The dreaded ORA-01455 hits again. Snapshot to old is easy to understand, but not always easy to fix. In this article I'll show how the misbehaving process may actually be the process that receives the error.

Snapshot too old is the problem we encounter if the block you need to access cannot be found. This doesn't mean that the block has been misplaced by the database, but rather that the version of it that you need is no longer available in the database. This will happen if the block is updated, and the updating session commits and the block in undo is overwritten before you need to access it. This will often happen in environments where long and short transactions are mixed and your undo is not configured correctly.

Lets look at an example of this:

  1. Process A starts reading table XYZ
  2. Process B updates a block in table XYZ
  3. Process B commits.
  4. Process C makes an update to another table
    Oracle reuses the undo block that held the undo information from process B.
  5. Process A reaches the now updated block in table XYZ.
    The block has been updated and Oracle looks in undo for a version of it that existed at the time process A issued the statement (step 1).
  6. As no such block is available in undo anymore, process be receives the ORA-01455.

The solution to that is of course to have process B not commit as often, run process A at some time where this problem will not be triggered, or set up undo so undo is saved long enough for process A to complete it's reading of table XYZ. Of these, the two first may not be an option in all cases and the last requires enough disk space to be available.

In this blog I'll try to see if the process that causes the issue could be the process that receives the 1455. We typically look at other processes for who to blame when this occurs, so it's interesting to see if we maybe should look at the failing process first. After all, that process will be a long running process and it often performs quite a bit of work that generates undo.

Let's first set up a small undo tablespace.

conn system
create undo tablespace small_undo
datafile '/path/to/the/new/datafile' size 5M
autoextend off;
alter system set undo_tablespace = small_undo;

We now have a small tablespace that will cause generated undo to be overwritten pretty soon after it is committed. The next step is to create a test user and create our test table.

conn system
create user smallundo identified by smallundo;
alter user smallundo quota unlimited on users;
grant create session, resource to smallundo;
conn smallundo/smallundo
create table test_tb as
select rownum col_a
,rpad('ABC', 80, 'X') col_b
from dual
connect by level <= 1000000   order by dbms_random.random;

create unique index test_tb_ind on test_tb (col_a);

begin dbms_stats.gather_table_stats(user
,'test_tb'
,cascade=>true);
end;
commit;

The data is placed in the table in a random order to ensure that we can read it by object_id and thereby get the I/O to access block in a random order.

Lets first begin with verifying that we have a testcase that will use more undo than the undo tablespace can hold. We do this by updating each row one at a time like we'll do later and we'll just not commit until all of them are updated. The commit should of course never be reached if the undo tablespace is small enough.

begin
for row in (select /*+ index(test_tb test_tb_ind) */
rowid
,col_a
,col_b
from test_tb
where col_a > 0
order by col_a) loop
update test_tb
set col_b = lower(col_b)
where rowid = row.rowid;
end loop;
commit;
end;

This will of course update the table row by row in a random order and just change the object_name to the lower case version. If it is successful it will just complete, and if it fails we'll get an error to that effect. Here is the output I got.

ORA-30036: unable to extend segment by 8 in undo tablespace
'SMALL_UNDO'
ORA-06512: at line 4

So all undo needed from this operation does not fit in the undo. Lets fix it by committing after every update, that should make this work, right?

begin
for row in (select /*+ index(test_tb test_tb_ind) */
rowid
,col_a
,col_b
from test_tb
where col_a > 0
order by col_a) loop
update test_tb
set col_b = lower(col_b)
where rowid = row.rowid;
commit;
end loop;
end;

ORA-01555: snapshot too old: rollback segment number 14
with name "_SYSSMU14$" too small
ORA-06512: at line 2

What happened here? Was someone else using the database? No, the database is on my computer and I'm the only one accessing it. I just managed to update blocks I needed and then those blocks were overwritten in the undo before the query had accessed that block for the last time. The result is that my own update caused the select to eventually fail.

While this is a very small test case just to prove the problem, we may still have situations where we contribute to the problem. The solution is to size the undo tablespace for what we need to process and size the transaction for what our unit of work is. This does not mean never commit and always commit is just as incorrect. Size it for what your transaction needs to ensure integrity and make sure you consider what a process failure means for restarting. A well sized transaction will result in just restarting the process and it completes what processing it has left.

Before we're done, we need to restore the undo tablespace so your database is useable for normal processing again. First find the undo tablespaces you have defined, change it back to one of those, and drop the one used in this test.

select tablespace_name
from dba_tablespaces
where contents = 'UNDO';
alter system set undo_tablespace = undotbs1;
drop tablespace small_undo including contents and datafiles;

For the above test case, I used version 10.2.0.2 EE on Solaris 10 x86 running in Parallels on Mac OS X. This is just for future tests if Oracle changes something so it still can be proven when and where this occurred. I believe you'll reproduce this on any version of Oracle that is available today. If you're not using undo, you'll just have to adjust the setup to use rollback segments instead.

This test case is very close to what Tom has in his book. It's not the same exact code, but there is no doubt it is testing and proving the same thing. Tom talks about it more in the context of why some developers may commit too often, while I focus more on showing that it can occur with just one process. A 1455 is not necessarily the result of someone else's poorly written code.

Posted in Commit, Oracle

April 2nd, 2007 by mathias

Too frequent commits can lead to performance problems, but can it also affect how much undo the updates generate?

The fact that very frequent committing causes performance problems has been noticed by almost every Oracle developer with even just a couple of months experience with the Oracle database. However, if commit can increase I/O requests issued by the database, then this would also impact the performance of other processes. With Oracle, that can be done in many ways, but with commit processing I usually hear that it is not really a problem as only the process committing would be impacted as long as there is enough processing power available.

The thinking is that a commit is mostly just a CPU consuming process and if the offending process completes soon enough and your server has more CPU available than it really needs, then what is the problem? If that is true, then this turns into another "Sure we could have written the code better, but it works now and we have more urgent problems to solve. We'll look at this next time we have to change this code.". That is, it will not be changed until it causes enough problems to make the development manager embarrassed.

Let's look at how frequent commit processing can generate a lot of I/O and reduce the usability of a shared resource. We will look at two cases, just to show how we can force it with the smallest possible table and how it works with a larger one.

Single Row & Single column table

Let's first create a new schema to use for the test.

conn system
drop user commfreq cascade;
create user commfreq identified by commfreq;
grant create session, resource, select_catalog_role to commfreq;
alter user commfreq quota unlimited on users;
alter user commfreq default tablespace users;
conn commfreq/commfreq
create table test_tb (col_a number not null);
insert into test_tb (col_a) values(0);
commit;
exec dbms_stats.gather_table_stats(user, 'test_tb',cascade=>true)

We now have a schema with one small table and no index, this is the smallest table you can have in Oracle as it only contains one row with one column.

To see how many megabytes of undo the system has generated in the current session, execute this query.

select round(to_number(b.value)/1024/1024,1) "MB UNDO"
from v$statname a inner join v$sesstat b
on a.statistic# = b.statistic#
inner join v$mystat c
on b.sid        = c.sid
where a.name       = 'undo change vector size'
and c.statistic# = 0;

To make this measurement useful we'll create a new session before each test case.

We'll first test how much undo we use if we commit after updating the row 10,000 times.

conn commfreq/commfreq
begin
for i in 1..10000
loop
update test_tb
set col_a = col_a + 1;
end loop;
commit;
end;
/

On my test system this generated 1.1 MB undo.

The only change we need to do is to move the commit into the loop, repeat the test, and execute the query again.

conn commfreq/commfreq
begin
for i in 1..10000
loop
update test_tb
set col_a = col_a + 1;
commit;
end loop;
end;
/

This time the same update produced 1.7 MB undo. While this difference may not seem too large at first, it is almost 50% more undo.

Remember that this is a small table and while the numbers may seem small, it still shows a significant increase in undo.

While this is interesting, it is not too representative of actual database applications. We'll now take a look at something more realistic to see if the observed behavior changes.

Large table

To set up this test we'll remove our test row, add one column we can use to make the average row length more realistic. Then we'll add 1,000,000 rows to the table with data for the new column defaulted.

conn commfreq/commfreq
delete from test_tb;
alter table test_tb add col_b varchar2(80) not null;
insert into test_tb
(col_a, col_b)
select rownum
,rpad('ABC', 80, 'X')
from dual
connect by level <= 1000000;
exec dbms_stats.gather_table_stats(user, 'test_tb',cascade=>true)

The first test to perform is similar to the first one in the previous test, we read all rows and update each row by setting col_b to the lower case version of what it contains.

conn commfreq/commfreq
begin
for r in (select col_a from test_tb)
loop
update test_tb
set col_b = lower(col_b)
where col_a = r.col_a;
end loop;
commit;
end;
/

Since we're not committing until all rows have been updating, we get all blocks written to undo, but the overhead is minimal as there is just one transaction involved.

The whole update of one million rows, row by row, generated 187.2 MB undo.

Our next test is of course to repeat the same update, but this time to commit after each update.

conn commfreq/commfreq
begin
for r in (select col_a from test_tb)
loop
update test_tb
set col_b = upper(col_b)
where col_a = r.col_a;
commit;
end loop;
end;
/

This time the update generated 248.1 MB undo. That is almost 35% more undo than the single transaction operation produced.

Conclusion

It is clear that committing too frequently causes significantly more undo to be generated. This does of course not mean that you should only commit one time in all cases, but you're not reducing I/O for undo by committing often. Rather, commit to make sure you have your unit of work set to what makes business sense and if possible such that you don't have to write complex restart logic.

One note about the updates used in this post. If you want to update all rows or many rows, you'd of course never do it row by row if you have any interest in getting it done fast. I've only done that here to test and show the impacts on undo for processes that have to do row by row processing. That is how most systems operate and batch processing of data is usually not as easy as just making a very large update to one table at a time.

Please leave feedback if you've read this. I'm interested in both style and content comments, as well as corrections if you find technical or grammatical errors in my blog posts.

Posted in Commit, I/O, Oracle, Undo