The dreaded ORA-01455 hits again. Snapshot to old is easy to understand, but not always easy to fix. In this article I’ll show how the misbehaving process may actually be the process that receives the error.
Snapshot too old is the problem we encounter if the block you need to access cannot be found. This doesn’t mean that the block has been misplaced by the database, but rather that the version of it that you need is no longer available in the database. This will happen if the block is updated, and the updating session commits and the block in undo is overwritten before you need to access it. This will often happen in environments where long and short transactions are mixed and your undo is not configured correctly.
Lets look at an example of this:
- Process A starts reading table XYZ
- Process B updates a block in table XYZ
- Process B commits.
- Process C makes an update to another table
Oracle reuses the undo block that held the undo information from process B.
- Process A reaches the now updated block in table XYZ.
The block has been updated and Oracle looks in undo for a version of it that existed at the time process A issued the statement (step 1).
- As no such block is available in undo anymore, process be receives the ORA-01455.
The solution to that is of course to have process B not commit as often, run process A at some time where this problem will not be triggered, or set up undo so undo is saved long enough for process A to complete it’s reading of table XYZ. Of these, the two first may not be an option in all cases and the last requires enough disk space to be available.
In this blog I’ll try to see if the process that causes the issue could be the process that receives the 1455. We typically look at other processes for who to blame when this occurs, so it’s interesting to see if we maybe should look at the failing process first. After all, that process will be a long running process and it often performs quite a bit of work that generates undo.
Let’s first set up a small undo tablespace.
conn system create undo tablespace small_undo datafile '/path/to/the/new/datafile' size 5M autoextend off; alter system set undo_tablespace = small_undo;
We now have a small tablespace that will cause generated undo to be overwritten pretty soon after it is committed. The next step is to create a test user and create our test table.
conn system create user smallundo identified by smallundo; alter user smallundo quota unlimited on users; grant create session, resource to smallundo; conn smallundo/smallundo create table test_tb as select rownum col_a ,rpad('ABC', 80, 'X') col_b from dual connect by level <= 1000000 order by dbms_random.random; create unique index test_tb_ind on test_tb (col_a); begin dbms_stats.gather_table_stats(user ,'test_tb' ,cascade => true); end; commit;
The data is placed in the table in a random order to ensure that we can read it by object_id and thereby get the I/O to access block in a random order.
Lets first begin with verifying that we have a testcase that will use more undo than the undo tablespace can hold. We do this by updating each row one at a time like we’ll do later and we’ll just not commit until all of them are updated. The commit should of course never be reached if the undo tablespace is small enough.
begin for row in (select /*+ index(test_tb test_tb_ind) */ rowid ,col_a ,col_b from test_tb where col_a > 0 order by col_a) loop update test_tb set col_b = lower(col_b) where rowid = row.rowid; end loop; commit; end;
This will of course update the table row by row in a random order and just change the object_name to the lower case version. If it is successful it will just complete, and if it fails we’ll get an error to that effect. Here is the output I got.
ORA-30036: unable to extend segment by 8 in undo tablespace 'SMALL_UNDO' ORA-06512: at line 4
So all undo needed from this operation does not fit in the undo. Lets fix it by committing after every update, that should make this work, right?
begin for row in (select /*+ index(test_tb test_tb_ind) */ rowid ,col_a ,col_b from test_tb where col_a > 0 order by col_a) loop update test_tb set col_b = lower(col_b) where rowid = row.rowid; commit; end loop; end; ORA-01555: snapshot too old: rollback segment number 14 with name "_SYSSMU14" too small ORA-06512: at line 2
What happened here? Was someone else using the database? No, the database is on my computer and I’m the only one accessing it. I just managed to update blocks I needed and then those blocks were overwritten in the undo before the query had accessed that block for the last time. The result is that my own update caused the select to eventually fail.
While this is a very small test case just to prove the problem, we may still have situations where we contribute to the problem. The solution is to size the undo tablespace for what we need to process and size the transaction for what our unit of work is. This does not mean never commit and always commit is just as incorrect. Size it for what your transaction needs to ensure integrity and make sure you consider what a process failure means for restarting. A well sized transaction will result in just restarting the process and it completes what processing it has left.
Before we’re done, we need to restore the undo tablespace so your database is useable for normal processing again. First find the undo tablespaces you have defined, change it back to one of those, and drop the one used in this test.
select tablespace_name from dba_tablespaces where contents = 'UNDO'; alter system set undo_tablespace = undotbs1; drop tablespace small_undo including contents and datafiles;
For the above test case, I used version 10.2.0.2 EE on Solaris 10 x86 running in Parallels on Mac OS X. This is just for future tests if Oracle changes something so it still can be proven when and where this occurred. I believe you’ll reproduce this on any version of Oracle that is available today. If you’re not using undo, you’ll just have to adjust the setup to use rollback segments instead.
This test case is very close to what Tom has in his book. It’s not the same exact code, but there is no doubt it is testing and proving the same thing. Tom talks about it more in the context of why some developers may commit too often, while I focus more on showing that it can occur with just one process. A 1455 is not necessarily the result of someone else’s poorly written code.