Case Study: Data Lakes, Data Swamps, Data Warehouses – How to utilise with 3C Data Logic
“Well managed data provides the foundation for a successful organisation”.
Data held by businesses is growing at an unprecedented rate driving many organisations to focus on cost-effective and efficient ways to manage and process it. This is key to maximising the value that they can derive from it.
Historically, structured data (data held in databases) has been exported into centralised data warehouses as a means of then deriving insight for the organisation. However, unstructured data (data held in email, SharePoint, shared drives, etc.), which often represents the majority of the data an organisation holds, has to be managed and curated too and this does not suit the data warehouse solution.
Data lakes are being heralded as a potential way of tackling this issue as they can provide a centralised repository for both structured and unstructured data, however, they are a major investment, and it is worth pausing to reflect on what you’re trying to achieve. Too often a poorly managed data lake quickly becomes a ‘data swamp’.
We are often asked how innovative solutions like 3C Data Logic compare to data lake and data warehouse solutions. This article sets out to provide an answer, inform your data strategy and perhaps help you determine an approach to gaining management control and business value from your data that best suits your organisation’s needs.
What is a data lake?
A data lake is a centralised location within the organisation in which to store your data, regardless of its source or format. It can provide a cost-effective, high-volume storage area from which data can be analysed and processed at a later stage in its life cycle.
Importantly, a variety of data processing tools are needed to add structure, curate and extract value from the data that you are storing in the data lake. Failure to do this will result in the creation of a “data swamp”, an outcome that must be avoided!
What is a data warehouse?
A data warehouse is a centralised store of structured data, typically generated by the departments of an organisation from the databases of the software applications they use. It is generally used for business intelligence purposes providing reporting and analysis using visualisation tools like Power BI, Qlik, Tableau and others.
Because data warehouses are highly structured, they can take time to assemble and as a result, can lack the flexibility required by the modern data-led organisation for processing new sources of data, especially of the unstructured variety, which can often account for around 80% of the data an organisation holds.
Implementation challenges and consideration for data lakes and warehouses
Many data lake and warehouse projects fail because they are IT-led projects with no clear linkage to business objectives or operational processes. They are often conceived without a clear understanding of the outcomes required or the vital need for the quality, integrity and compliance of the data they hold.
The solution to ensuring data is managed to gain the greatest value
3C Data Logic in harmony with your data warehouse and data lake
3C Data Logic can be used to manage, curate, analyse and ensure the compliance of data held in both a data lake and warehouse in ways the modern data analysts, data scientists and other business data consumers crave. Importantly, 3C data Logic has been specifically designed for use by social housing organisations and therefore offers two major benefits:
- It has a data dictionary/catalogue tightly integrated, defining what is expected from the data held in any data source. This is vital for data quality assurance purposes.
- Its ability to automate and facilitate data quality and compliance checks ensures the accuracy and completeness of the data held. This supports the essential culture change needed by the organisation by identifying those allowing poor data to enter the system.
In conclusion, 3C Data Logic can either co-exist with a data lake and/or a data warehouse or completely replace them. Its data dictionary/catalogue functionality and its tightly integrated ability to automate and facilitate data quality and compliance checks provide distinct benefits to other data management, compliance and governance solutions.
Cost implications, the time taken to achieve required outcomes, and the return on investment achieved are all essentially important. A review of 3C Data Logic as part of your strategy to become a data-led organisation is highly likely to reveal that costs will be reduced, and risks are far easier to control.