Skip to main content

High-performance Lookups in BW Transformations - The use of internal tables vs. SELECTS from the HANA database

In this series we are focusing on implementation methods for lookups where every data record in a table is to be checked. The larger our data packages and lookup tables are, the more important high-performance implementation becomes.

As BW transformations usually involve data packages of 50,000 data records or more, it is not advisable to perform individual SELECT statements within the LOOP in ABAP routines as this would result in database selection for every single data record in the data package. Even if you have a rapid in-memory database such as SAP HANA, this does not automatically mean that millions of data records can be read in an instant. This is in because of the overhead for each instance of access. It is better to save the relevant data with one or a few SELECT statements in internal tables before the transformation LOOP, and then read out these tables using the READ statement. Hashed tables  are best used here.

Whether you then perform multiple SELECT statements per data package (1)…

SELECT
    key_field1
    key_field2
    key_field3

    group_field
    lookup_field

FROM dbTab INTO TABLE itab_group1
FOR ALL ENTRIES IN result_package
WHERE key_field1 = result_package-key_field1
  AND key_field2 = result_package-key_field2

  AND key_field3 = result_package-key_field3
  AND group_field = 'GR1'.

 “If necessary, implement sorting / aggregation / reduction of fields on itab_group1

SELECT

    key_field1
    key_field2
    key_field3 
  
    group_field
    lookup_field

FROM dbTab INTO TABLE itab_group2
FOR ALL ENTRIES IN result_package
WHERE key_field1 = result_package-key_field1
  AND key_field2 = result_package-key_field2

  AND key_field3 = result_package-key_field3
  AND group_field = 'GR2'.

 “If necessary, implement sorting / aggregation / reduction of fields from itab-group1

… or use a SELECT/ENDSELECT loop (2) to populate multiple internal tables is not of significance in terms of speed.

SELECT
    key_field1
    key_field2
    key_field3

    group_field
    lookup_field
FROM dbTab INTO ls_all

FOR ALL ENTRIES IN result_package

WHERE key_field1 = result_package-key_field1
 AND key_field2 = result_package-key_field2

 AND key_field3 = result_package-key_field3
 AND ( group_field = 'GR1'

 OR group_field = 'GR2')

 AND NOT lookup_date = '00000000'.

CASE ls_rest_plus-group_field.

  WHEN 'GR1'.

    IF itab_group1 IS INITIAL.

 "From this point, the field with the group is no longer important => we therefore only transfer the remaining lines

      MOVE-CORRESPONDING ls_all TO ls_group.

    INSERT ls_group INTO TABLE itab_group1.
     
   ELSE.
    READ TABLE itab_group1
      WITH TABLE KEY key_field1 = ls_group-key_field1
         key_field2 = ls_group-key_field2
         key_field3 = ls_group-key_field3
    ASSIGNING <fs_group>.

 IF sy-subrc = 0.

 "If necessary, implement update of line from itab_group1

    ELSE.
      MOVE-CORRESPONDING ls_all TO ls_group.

      INSERT ls_group INTO TABLE itab_group1.

     ENDIF.
     ENDIF.

 WHEN OTHERS.

 "As above, but from itab_group2

 ENDCASE.
 CLEAR ls_all.
 CLEAR ls_group.
 UNASSIGN <fs_group>.
ENDSELECT.

For a DTP with a package size of 200,000 and nearly 16 million data records to load, (1) has taken us 43m 39s and (2) 42m 25s. We have read from a table with nearly 190 million data records and have had to create a total of 5 groups of data in internal tables, which then had to be searched in the transformation LOOP for the lookup. And yes, these loads were implemented on BW on HANA. J

 To compare: were we to read from the database for the 5 lookups within the LOOP, we would need c. 5 hours for a load with the same amount of data.

 The comparative speed of methods (1) and (2) is also explored in the theory (section 4.4.2) that the commonly poor reputation of SELECT/ENDSELECT loops is not justified. While these loops do have certain disadvantages, they are in performance terms not necessarily slower than SELECT INTO statements. They are in fact to be preferred when they allow the development of simpler and clearer programming logic. The disadvantages of SELECT/ENDSELECT loops are less relevant for in-memory databases.

 If you are also careful only to select those data records and fields that you later require, you will generally ensure high-performance transformations. The implementation work is definitely worthwhile!