My first official banks rants is about “data lake”. Anyone that wants to make me annoyed will say “data lake” to me. The term drives me crazy.
Why? Well because people throw it out there like they understand what it means and they usually don’t. Almost like the term blockchain. Companies say they want it but most don’t know what it means. Proof? When a company puts it in their company name, their stock goes up. For instance, On-Line Plc but it in their name and the stock went up 394% or an ice tea company changed their name to “Long Blockchain” and their stock went up 275%. You getting my point?
When I first heard “data lake” I had no idea what it was, so I asked. The concept was that streams and rivers of data come onto in a central location, creating a “data lake”. The theory is that the CTO of Pentaho, James Dixon coined the term.
Gartner defines it as ” a collection of storage instances of various data assets additional to the originating data sources. These assets are stored in a near-exact, or even exact, copy of the source format. The purpose of a data lake is to present an unrefined view of data to only the most highly skilled analysts, to help them explore their data refinement and analysis techniques independent of any of the system-of-record compromises that may exist in a traditional analytic data store (such as a data mart or data warehouse).”
It’s a concept, it is a way of doing something. It isn’t a product. It is an enabler. It should be a benefit. One truth, a data lake is for analytics needs and everyone wants to do it differently. So understand what you are getting yourself into… understand what you need… ask questions.