What is a Data Lake?
A data lake is a storage repository which – using a flat architecture – contains a huge amount of raw data that is kept in its various native formats until analysis is required. It can offer data access to anyone in an organisation in a cost-effective manner (compared to a Data Warehouse for example) as the data can be retained for as long as it is required.
What is HANA
HANA is a relational database management system created by SAP. It stores and retrieves data as and when it is needed by applications, and conducts complex analysis of it – as well as offering ETL (Extract Transform Load) capability.
A SAP HANA Data Lake
Businesses already using SAP HANA for enterprise data that need to gain insight from big data scenarios face the challenge of high IT costs, as they have to acquire ever-increasing amounts of in-memory storage. Purging historical data is rarely an appealing or viable option, so offloading it to a more affordable solution that still allows access via SAP HANA can often be the answer.
Vital ‘hot data’ that is available in-memory and ‘warm’ or ‘cold’ (rarely used) data that is relocated to a data lake can both be accessed in combination via HANA. HANA stores data and performs operations in-memory, so is exceptionally fast – but also expensive; this is why a data lake is ideal for situations where historical data access is required intermittently.