Commit Oracle

Commit and snapshot too old

The dreaded ORA-01455 hits again. Snapshot to old is easy to understand, but not always easy to fix. In this article I’ll show how the misbehaving process may actually be the process that receives the error.

Snapshot too old is the problem we encounter if the block you need to access cannot be found. This doesn’t mean that the block has been misplaced by the database, but rather that the version of it that you need is no longer available in the database. This will happen if the block is updated, and the updating session commits and the block in undo is overwritten before you need to access it. This will often happen in environments where long and short transactions are mixed and your undo is not configured correctly.

Lets look at an example of this:

  1. Process A starts reading table XYZ
  2. Process B updates a block in table XYZ
  3. Process B commits.
  4. Process C makes an update to another table
    Oracle reuses the undo block that held the undo information from process B.
  5. Process A reaches the now updated block in table XYZ.
    The block has been updated and Oracle looks in undo for a version of it that existed at the time process A issued the statement (step 1).
  6. As no such block is available in undo anymore, process be receives the ORA-01455.

The solution to that is of course to have process B not commit as often, run process A at some time where this problem will not be triggered, or set up undo so undo is saved long enough for process A to complete it’s reading of table XYZ. Of these, the two first may not be an option in all cases and the last requires enough disk space to be available.

In this blog I’ll try to see if the process that causes the issue could be the process that receives the 1455. We typically look at other processes for who to blame when this occurs, so it’s interesting to see if we maybe should look at the failing process first. After all, that process will be a long running process and it often performs quite a bit of work that generates undo.

Let’s first set up a small undo tablespace.

conn system
create undo tablespace small_undo
datafile '/path/to/the/new/datafile' size 5M
autoextend off;
alter system set undo_tablespace = small_undo;

We now have a small tablespace that will cause generated undo to be overwritten pretty soon after it is committed. The next step is to create a test user and create our test table.

conn system
create user smallundo identified by smallundo;
alter user smallundo quota unlimited on users;
grant create session, resource to smallundo;

conn smallundo/smallundo
create table test_tb as
select rownum col_a
      ,rpad('ABC', 80, 'X') col_b
  from dual
connect by level <= 1000000
 order by dbms_random.random;

create unique index test_tb_ind on test_tb (col_a);

begin dbms_stats.gather_table_stats(user
                                   ,'test_tb'
                                   ,cascade => true);
end;

commit;

The data is placed in the table in a random order to ensure that we can read it by object_id and thereby get the I/O to access block in a random order.

Lets first begin with verifying that we have a testcase that will use more undo than the undo tablespace can hold. We do this by updating each row one at a time like we’ll do later and we’ll just not commit until all of them are updated. The commit should of course never be reached if the undo tablespace is small enough.

begin
  for row in (select /*+ index(test_tb test_tb_ind) */
                     rowid
                    ,col_a
                    ,col_b
                from test_tb
               where col_a > 0
               order by col_a)
  loop
    update test_tb
       set col_b = lower(col_b)
     where rowid = row.rowid;
  end loop;

  commit;
end;

This will of course update the table row by row in a random order and just change the object_name to the lower case version. If it is successful it will just complete, and if it fails we’ll get an error to that effect. Here is the output I got.

ORA-30036: unable to extend segment by 8 in undo tablespace 'SMALL_UNDO'
ORA-06512: at line 4

So all undo needed from this operation does not fit in the undo. Lets fix it by committing after every update, that should make this work, right?

begin
  for row in (select /*+ index(test_tb test_tb_ind) */
                     rowid
                    ,col_a
                    ,col_b
                from test_tb
               where col_a > 0
               order by col_a) 
  loop
    update test_tb
       set col_b = lower(col_b)
     where rowid = row.rowid;

    commit;
  end loop;
end;

ORA-01555: snapshot too old: rollback segment number 14
with name "_SYSSMU14" too small
ORA-06512: at line 2

What happened here? Was someone else using the database? No, the database is on my computer and I’m the only one accessing it. I just managed to update blocks I needed and then those blocks were overwritten in the undo before the query had accessed that block for the last time. The result is that my own update caused the select to eventually fail.

While this is a very small test case just to prove the problem, we may still have situations where we contribute to the problem. The solution is to size the undo tablespace for what we need to process and size the transaction for what our unit of work is. This does not mean never commit and always commit is just as incorrect. Size it for what your transaction needs to ensure integrity and make sure you consider what a process failure means for restarting. A well sized transaction will result in just restarting the process and it completes what processing it has left.

Before we’re done, we need to restore the undo tablespace so your database is useable for normal processing again. First find the undo tablespaces you have defined, change it back to one of those, and drop the one used in this test.

select tablespace_name
  from dba_tablespaces
 where contents = 'UNDO';

alter system set undo_tablespace = undotbs1;

drop tablespace small_undo including contents and datafiles;

For the above test case, I used version 10.2.0.2 EE on Solaris 10 x86 running in Parallels on Mac OS X. This is just for future tests if Oracle changes something so it still can be proven when and where this occurred. I believe you’ll reproduce this on any version of Oracle that is available today. If you’re not using undo, you’ll just have to adjust the setup to use rollback segments instead.

This test case is very close to what Tom has in his book. It’s not the same exact code, but there is no doubt it is testing and proving the same thing. Tom talks about it more in the context of why some developers may commit too often, while I focus more on showing that it can occur with just one process. A 1455 is not necessarily the result of someone else’s poorly written code.

2 Comments

  1. Hi! Excellent article. Finally an accurate description of what is actually troubling us at our firm. Best regards.

  2. Truper,
    I’m happy my article helped you. Thank you for leaving a comment. I should start blogging more/again. I enjoyed the blogging part, I just didn’t like the pain Blogger gives you when you try to format code. I see now that Blogger has changed something so the code doesn’t format properly in all places anymore.
    Mathias

Leave a Comment

Your email address will not be published. Required fields are marked *

*