As a final pillar, security of our data is a specification with a rather "binary" outcome. Binary because any lack of data security in terms of compliance with GDPR/PCI can lead to shutdown or at least massive restriction of the platform's usage. Requirements here arise automatically from a desire to make data available for everyone (persons and processes):
Data requiring protection are needed for customer-centric analyses and processes.
Data must be securely loaded into the data platform, stored there and kept ready for queries.
Ideally, this type of security is based on a central solution instead of an isolated one.
Any fundamental/initial renunciation of PII data usually results in a need for establishing and using special techniques which ultimately raise costs. Examples of special techniques are duplication of tables and schemas for different user groups, or "manual" implementation of selective de-/encryption logics to protect access paths, which leads to more complex ETL processes and authorization concepts.
The challenge here is that standardized/fundamental solutions are very rarely implemented at present, especially for complex system landscapes such as a data platform. Instead, a variety of components are often handled and secured separately, which raises expenditure in terms of implementation and governance.
A detailed consideration of these three pillars soon reveals a need for balancing conflicts arising between the three associated goals:
Absolute security of data is possible only if these are not loaded into the data platform. However, that contravenes availability.
High performance during queries can be achieved either by highly condensing data or using only data with very specific processing requirements in order to reduce the amount of data to be processed. However, this also contravenes the requirement for data availability.
Very complex protection mechanisms lead to complex processing, lack of availability and consequential limits on data processing performance.
Availability of data for everyone potentially violates the need-to-know principle concerning data protection and causes bottlenecks in data processing performance.