APEX SQL

Troubleshooting conversion errors

My friend and Oracle maestro Daniel Ekberg wrote a post about ways to manage errors when converting data. Go read his post if you want to see how the errors you get can be better managed. I think he may be the person that first pointed out the feature to me that I want to show my use of.

One common way to get data into APEX is to load a spreadsheet into a table. But often you get data into datatypes you do not want. dates in text columns and so forth.

If you then want to migrate it to proper columns it is a walk in the park if you have rock solid data quality in the data you load. I did mention it was Excel? So yes, typically the opposite of any kind of data quality.

Let’s say that you have columns that has date entered sometimes as yyyy-mm-dd sometimes followed by a time hh24:mi:ss an sometimes not. People will figure out many ways to mess with the data. So you have dates written as yymmdd sometimes without days and sometimes with just hour and minutes.

For numerical data you may find proper numbers but also things like a dash to show 0, someone may have written Twenty instead of 20. Further when no data it could also have the text “None” or just a dash.

Lets create a table and put some such data into it so we have problematic data to convert.

create table test
  ( id NUMBER
  , saved varchar2(20)
  , num_items varchar2(15));

insert into test (id, saved, num_items) values(1, '2024-02-03 15:10:12', '42');
insert into test (id, saved, num_items) values(1, '2024-02-03 15:10'   , 'Twenty');
insert into test (id, saved, num_items) values(1, '2024-02-03'         , 'None');
insert into test (id, saved, num_items) values(1, '2024-02'            , 'a17');
insert into test (id, saved, num_items) values(1, 'Aug-23'             , '34');
insert into test (id, saved, num_items) values(1, '2024-02-03 15:10:12', '-');

There we have some testdata – very similar in what you can find in an old Excel spreadsheet – to try to convert. Maybe worse than most excel-conversions but all scenarios are found in some Excel probably even within your own organization.

Let’s now attempt a select that will convert the data so it can be inserted into a table with proper datatypes.

select id
     , to_date(saved, 'yyyy-mm-dd hh24:mi:ss')
     , to_number(num_items)
  from test;
ORA-01722: Invalid number
The attempted conversion of a character string for column or expression to a number failed because the character string is not a valid numeric literal. Only numeric fields or character fields containing numeric data can be used in arithmetic functions or expressions.  
Error at Line: 3 Column: 8

Great! Consider a situation where you have thousands of rows. You don’t know which row it is that failed nor which column. Sure it is easy enough in this simple test case but in real applications it is much more complex.

There are many ways people have tried many different ways to manage this in the past. One common solution is to limit the number of rows until you do not get an error and when you then have x rows does not cause an error and x+1 does, then the row that is x+1 is the issue. Very time consuming, tedious and not really using SQL to actually solve the problem.

Let’s turn to validate_conversion, it lets us test the conversion to other datatypes. Instead of the value in the column we get 0 for a failed conversion and 1 for a successful.

select id
     , validate_conversion(saved as date)       r1
     , validate_conversion(num_items as number) r2
  from test;
        ID         R1         R2
---------- ---------- ----------
         1          0          1
         2          0          0
         3          1          0
         4          0          0
         5          0          1
         6          0          0

6 rows selected. 

There is not a single conversion that works. Should we change the date format in the session to ‘yyyy-mm-dd hh24:mi:ss’ the first row would be successful for both columns but the row 3 would fail on both columns.

This works well to begin with when most rows fails, but soon you’ll have most rows work and scrolling to find problematic ones become harder.

Let’s look at a case where many of the cases we have in the testcase has been handled.

select id
     , validate_conversion(case 
                             when length(saved) =  7 then saved || '-01 00:00:00'
                             when length(saved) = 10 then saved || ' 00:00:00'
                             when length(saved) = 16 then saved || ':00'
                             when length(saved) = 19 then saved
                             else                         saved
                           end
                           as date)   r1
     , validate_conversion(case 
                             when num_items    = 'None' then '0'
                             when num_items    = '-'    then '0'
                             when num_items like 'a%'   then substr(num_items,2)
                             else                            num_items
                           end
                           as number) r2
  from test;

Here we have included how we intend to solve the date ssiues leaving only “Aug-23” left in the column saved and only “Twenty” left in the column num_items.

The result is as follows.

        ID         R1         R2
---------- ---------- ----------
         1          1          1
         2          1          0
         3          1          1
         4          1          1
         5          0          1
         6          1          1

AS you can see there are just two rows with 0 for the conversion success. If this was now a table with thousands of rows we want to filter out just those. That of course is not harder than a “where r1 =0 or r2=0”. I prefer using an in-construct when there are many columns as it is more condensed.

with data as (
select id
     , validate_conversion(case 
                             when length(saved) =  7 then saved || '-01 00:00:00'
                             when length(saved) = 10 then saved || ' 00:00:00'
                             when length(saved) = 16 then saved || ':00'
                             when length(saved) = 19 then saved
                             else                         saved
                           end
                           as date)       r1
     , validate_conversion(case 
                             when num_items    = 'None' then '0'
                             when num_items    = '-'    then '0'
                             when num_items like 'a%'   then substr(num_items,2)
                             else                            num_items
                           end
                           as number) r2
  from test
)
 select * from data
 where 0 in (r1,r2)
;

To be able to refer to the result of the conversions in the where-clause as R1 and R2 I have put the SQL in a WITH-construct. If there are many columns that could be the issue this allows just listing them (or their alias) to show just the ones that has an issue.

To me this feature/function has vastly improved the ease of dealing with “invalid number” and such errors when converting between datatypes.

Leave a Comment

Your email address will not be published. Required fields are marked *

*