The database is working with UNDO_MANAGEMENT=AUTO, which is a default setting and since 10g used very widely, so the administrator may simply forget about rollback segments management. According to the docs with this setting some work parameters relevant to the rollback management are simply ignored (TRANSACTIONS, TRANSACTIONS_PER_ROLLBACK_SEGMENT), while the number of transactions is limited by the undo tablespace size.
So it seems we should not experience this error, but rather session suspension and the error about lack of space in the undo tablespace.
Yet here are are. It is clear from the numbers below, that still there is a space in the undo (up to 4598 undo blocks used
What I could check is this:
select BEGIN_TIME, UNDOTSN, UNDOBLKS, TXNCOUNT, MAXQUERYLEN, MAXQUERYID, MAXCONCURRENCY, ACTIVEBLKS, UNEXPIREDBLKS, EXPIREDBLKS, TUNED_UNDORETENTION from v$undostat where begin_time between to_date('2018-06-25 14', 'yyyy-mm-dd hh24') and to_date('2018-06-25 16', 'yyyy-mm-dd hh24') order by begin_time;
BEGIN_TIME | UNDOBLKS | TXNCOUNT | MAXQUERYLEN | MAXCONCURRENCY | ACTIVEBLKS | UNEXPIREDBLKS | EXPIREDBLKS | TUNED_UNDORETENTION |
2018-06-25 14:39 | 0 | 78 | 1652 | 0 | 160 | 0 | 7168 | 2372 |
2018-06-25 14:49 | 1411 | 91 | 1653 | 2 | 160 | 0 | 7168 | 2373 |
2018-06-25 14:59 | 4598 | 104225 | 1653 | 131 | 160 | 4736 | 5888 | 2373 |
2018-06-25 15:09 | 2169 | 103321 | 933 | 311 | 160 | 6656 | 4352 | 1653 |
2018-06-25 15:19 | 2 | 282 | 1734 | 1 | 160 | 8064 | 2816 | 2454 |
And here few words of wisdom from Jonathan Lewis:
Taking a quick guess:
An (automatic) undo segment has 34 slots in the transaction table (the thing in the undo segment header).
If you and up with a small number N of undo segments then you can't have more than 34 * N concurrent transactions active because each one needs a transaction table slot and the next attempt to start a transaction would see ORA-01554
It seems a little unlikely that this would happan in typical system since Oracle would usually end up stealing an extent from an existing undo segment to use it as the first extent of a new undo segment. But it wouldn't be hard to set up a mechanism that started with a small number of undo segments in a relatively small undo tablespace - started some transactions (then leave them idle forever after the first change - e.g. they might have come in from a remote database) that made it impossible to free any extents, then did a lot of work to allocate all the extents (but still left a little space in allocated extents for continuting transactions), then started lots of transactions until you hit the error.
And Julian Dyke's presentation:
Undo segments are allocated at instance startup.
Undo segments can be added dynamically.
Each undo segment header contains
- Pool of free undo extents
- Set of undo slots
One undo slot is allocated to each transaction.
Undo slot contains list of undo extents.
Extents can migrate from one undo segment to another.
Undo slots are used cyclically:
- remain in header as long as possible
- reduces probability of ORA-01555: Snapshot too old
With all this in mind I would say that 1000 times higher transaction rate is in this particular case the main reason for the error. Apparently the db is not able to extend its concurrent transactions rate capability fast enough to respond to extended demand. Possible action here could be initial setting of _rollback_segment_count to a higher value than 12, which are set here.