HANA SQL Query Tuning Tips for Column Engine

SAP HANA Tips

NewDB column engine is best optimized for the most frequent pattern of OLAP queries in the form of single-block SPJG (select, project, join and group by). This document lists up high-cost features that are better to be avoided in formulating SQL queries to get the best out of NewDB column engine.

Note that using high-cost features is fine for small tables or small intermediate results extracted from big tables. Special care is needed only for operations executed over big tables or big intermediate results. Workarounds suggested here are just examples and possible workarounds are different from application to application.

Many tips here suggest avoiding operations that are not natively supported by column engine. Please refer to documentation on SQL query pushdown to column engine to check the operations natively supported by column engine. This document describes the same thing in the opposite direction.

All tables used by examples in this document are column tables.

Expression

CALCULATION

If a calculation that is not supported by column engine is used, the operation based on the calculation is executed on row engine which can lead to large intermediate result transfer between column engine and row engine.

For example, TO_DATE function without format string is supported by column engine, but TO_DATE function with format function is not supported by column engine. If user knows that the source strings can be parsed by TO_DATE function by default, it is better to use TO_DATE function without format string. TO_DATE function can parse strings in the form of 'YYYYMMDD' and 'YYYY-MM-DD' by default. In the example below, date_string is a VARCHAR column.

Example)

Slower query

SELECT * FROM T WHERE TO_DATE(date_string, 'YYYYMMDD') = CURRENT_DATE;

Faster query

SELECT * FROM T WHERE TO_DATE(date_string) = CURRENT_DATE;

Using calculations that are natively supported by column engine is faster than using ones that are not natively supported. However, operations with calculations are still slower than operations without calculations and it is better to avoid calculations at all if possible. Below is an example of avoiding calculation. The transformation is automatically applied by SQL optimizer.

Example)

Slower query

SELECT * FROM T WHERE (CASE WHEN a = 1 THEN 1 ELSE 2 END) = 2;

Faster query

SELECT * FROM T WHERE a <> 1 OR a IS NULL;

If there is no way to avoid calculation, one way to avoid calculation on query processing is adding generated column as the following example. Addition of generated column improves query performance with the expense of increased insertion and update cost. Generated columns are matched automatically by SQL optimizer, but user can also explicitly use them.

Example)

Slower query

SELECT * FROM T WHERE b * c = 10;

Faster query

SELECT * FROM T WHERE bc = 10; 

DDL for faster query

ALTER TABLE T ADD (bc INTEGER GENERATED ALWAYS AS b * c);

Please note that specification of columns is needed on insertion after adding generated column to avoid setting value for generated column as the following example.

Example)

Insertion before adding generated column

INSERT INTO T VALUES (1, 2, 3);

Insertion after adding generated column

INSERT INTO T(a, b, c) VALUES (1, 2, 3);

IMPLICIT TYPE CASTING

System can generate type castings implicitly even if the user didn't explicitly write type casting operation. For example, if there is a comparison between a VARCHAR value and a DATE value, system generates implicit type casting operation that casts the VARCHAR value into a DATE value. Implicit type casting is done from lower precedence type to higher precedence type. You can find type precedence rule in documentation for Data Types.

Such implicit type casting is a kind of calculation and has the same effect to the performance as other calculations. For this reason, it is better to avoid implicit type castings if possible. If two columns are frequently compared by queries, it is better to enforce the two columns have the same data type from the beginning.

One way to avoid the cost of implicit type casting is using explicit type casting on inexpensive part. If a VARCHAR column is compared with a DATE value and the user knows that casting the DATE value into a VARCHAR value produces what the user wants, it is better to save the cost of changing type of the column by changing type of the value.

For example, if the VARCHAR column contains only date values in the form of 'YYYYMMDD', it could be compared with a string generated from a DATE value in the form of 'YYYYMMDD'. In the example below, date_string is a VARCHAR column. Please note that comparison result between strings is different from that between dates in general. They are identical only in some cases.

Example)

Slower query

SELECT * FROM T WHERE date_string < CURRENT_DATE;

Faster query

SELECT * FROM T WHERE date_string < TO_CHAR(CURRENT_DATE, 'YYYYMMDD'); 

If there is no way to avoid implicit type casting, one way to avoid implicit type casting on query processing is adding generated column with the expense of increased insertion and update cost. For example, user can find '1', '1.0' and '1.00' stored in a VARCHAR column using the following queries. In the example below, s is the VARCHAR column.

Example)

Slower query

SELECT * FROM T WHERE s = 1;

Faster query

SELECT * FROM T WHERE n = 1; 

DDL for faster query

ALTER TABLE T ADD (n DECIMAL GENERATED ALWAYS AS s);