Imagine yourself standing at a lake, staring into the distance. You enjoy the calm and the silence, and hear the water gently lapping at the shore. You close your eyes and hear your own breath. This feels safe, structured and relaxing.
Deep & Desiloed
For a long time Rabobank’s approach to data was siloed. Until the second largest bank of the Netherlands decided to create a data lake in the cloud. Martijn Groen led this challenging operation.
But who makes sure this lake stays as tranquil as it is? Who prevents it from flooding, pollution or drying out? Who monitores the balance between the lake and its green surroundings? Martijn Groen is the guardian whose job is exactly that. Except his lake is filled with data. Zeroes, ones, numbers, charts, graphs, everything that can be seen as data is in this lake. A massive pile of useful, and at times useless, knowledge. Martijn safeguards the data and the processes at Rabobank as Lead IT Manager Enterprise Data Lake.
The bank’s data lake is used to innovate to help people. Groen: “For now we only have internal data. But looking at the future, we would love to be able to add third party data. If you think of farmland with a water deficiency, in the future we could analyse data from soil moisture sensors to determine where and how much to water. Of course we still need people to interpret data, but having the calculating power and technology to combine human smarts with technological wit will only help further our society.”
From warehouse to lake
At first data wasn’t a thing that was easily shared between departments. Rabobank had a siloed approach where different warehouses were committed to different domains; e.g. risk management had its own warehouse, as did investments and other specialities. When you wanted to compare numbers or use other department’s data, it would be a laborious project. You needed to accurately describe what data you needed, it then had to be retrieved from the specific warehouse, consent was given (or not), security measures were taken to ensure safe handling, and only then could you use the data within your own warehouse, resulting in different data sets in different places of the organization. If the legal term of data storage was met, it was difficult to ensure you had deleted all the data sets.
Martijn and his teams were tasked to come up with a scalable enterprise-wide data solution. Groen: “In previous data projects we encountered challenges with our streaming solutions and data availability, but also with our data governance. We needed the cloud for scalability of our data solutions. This data needs to be governed to avoid a data swamp, where you are unable to find your required data and you can’t totally trust the data with regards to compliance and quality.” In the end they opted for an open-source solution that could be built to meet their own needs. In it they combined the skills and knowledge from tech providers like Microsoft, AWS and Google. Together they came up with a solution that met their requirements for customer experience and improvement of processes.
Black swan
Rabobank operates in 37 countries, with nearly 10 million clients. As a bank they offer services like personal and corporate banking both locally and internationally. Safety and trustworthiness were key in developing the cloud solution. “Clients see banks as their confidant. They trust them with sensitive data, their money, personal details and investments. So whether it is privacy issues or legal terms of data storage, as a bank you need to uphold that trust. And that means controlling the risks and making sure everything works.”
Open source is not always considered to be safe. “We were very conscious and scrupulous when it came to security. We needed to know whether or not we could build a wall around it. And we could. We covered all our bases. We have our own server park, our own firewall, we have put up several roadblocks, and many alarms will be triggered before you get to the gold”, says Martijn. “Of course nothing is 100% safe, but I like to go by the theory of the black swan. Some events you just cannot predict. Therefore I expect the unexpected and Rabobank secures its data with multiple layers of defence.”
Looking at his cloud environment, Martijn is very happy with his decision. “Contrary to buying hardware, with cloud you pay as you go. If you need to ramp up CPU for a calculation, you just buy it. If you need less, you scale down. Easy as that.” For people working in his team this means a new mindset and possibly even a new skill set. Working with these amounts of data in this capacity, they need to be very aware of security but also costs. Martijn: “We process enormous amounts of data, so processes are comprehensive and intensive. Our current lake holds about 1 petabyte of data. To manage this, you need to be conscious of costs, processes, scheduling and desired outcomes. There aren’t many IT professionals with this combined skill set. Therefore companies like WAES help us to find the right people with the right skill set who can be easily integrated into the process and the team. Internally we try to find the right people to train and take on in the team.”
Finding data
Back to the lake. The warehouses were phased out and all the data was stored in one big pile. Surprisingly it is now a lot easier to find the right data when you know where to look. Copies of used data sets are stored in different places, depending on how long or how often they are needed. On the data lakeside a shop has been built. Whoever needs data fills out a form detailing what is needed. ‘Processing the order’ automatically means consent and then the data is used directly from the lake. So when compliance says data needs to be erased, it can be done with just one push of a button.
Data from every corner of the organization is added to the lake constantly, structured and unstructured. Data producers (systems like Siebel) get their own place in the lake and have to thoroughly describe what kind of data they add. They are the source of the data. They provide structure through meta data. In the old approach meta data was necessary as well, but often proved too complex or time intensive for the older systems.
Bright future
The future as lake manager seems bright; copious amounts of data all in one lake, ready to be analysed and calculated, more awareness of the importance of sensible data management, and happy people working on the lakeside. Groen concludes: “We are just at the beginning. The possibilities are endless. Real-time data analysis and usage is already starting to happen, and pro-active data management is based on insights and more and more data collection. The job will get harder, but it will also get so much better, nicer and more exciting.”
Thanks for reading.
Now let's get to know each other.
Let's shape our future
Work with WAES
Camilo Parra Gonzalez
Account Manager