What’s underneath: Central Services
At this point, we have end-to-end data flow from our source systems to end-users, but we are still somewhat in the dark. Firstly, our users have no overview of the data stored on the platform, and thus lack the capability to quickly assess which datasets are relevant for their use case. Secondly, if any of the above components fail, our DevOps team lacks the capability to observe and debug. Thirdly, as we have unfortunately witnessed at too many organizations, infrastructure resources are deployed by hand, rendering concepts of automation and repeatability inapplicable.
Metadata management in the form of a data catalog remedies the first problem. Automated data discovery in cloud services (datasets in the DWH or persistent layer) helps to sync technical metadata. These can be extended with schematized tags for business metadata (PII and GDPR). We aim for a fully managed service with security and governance integration, and an easy-to-navigate UI that provides search and discovery capabilities for a unified view of data, wherever it may be (including support for on-premise datasets).
Nevertheless, it's only a matter of time before something goes south in our platform, a harsh reality for cloud services consumers. Despite SLAs provided for each service (remember, we've picked a handful of them already), the service might become unavailable temporarily, breaking our end-to-end data processing pipeline and causing issues downstream. For that reason, as for our on-premise system, we use monitoring tools for visibility into the performance, uptime, and overall health of the platform. By collecting metrics, events, uptime probes, application instrumentation, etc., we can generate insights via dashboards (pattern detection, exhaustion prediction) and, if needed, report anomalies through alerting. Next, cloud services generate a multitude of logs – platform-specific logs, and user logs (generated by our applications), and security logs (an audit trail of administrative changes and data accesses). For logging, we desire features like logs archival, retention and alerting, log-based metrics, custom logs, advanced analytics on generated logs, and third-party integration (exports to SIEM tools).