Data Lakehouse


Pedro Bonillo
Pedro Bonillo
ML and Big Data Consultant

1 year ago

Publication date

Data warehouse vs Data lake

One of the most negative aspects of Big Data systems is the need to be able to create, read, update and delete (CURD). Especially because of the cost of updating and deleting data, in the case of updates, performance degrades too much, especially in Near to real time (NRT) systems, and in terms of deletion, many Big Data prefer to make a logical deletion of data and not physically delete it. This is the reason why it is so important to understand what is the Data lakehouse and the data lake house architecture. 

This situation has also led to great discomfort on the part of the business intelligence teams that manage a Data Warehouse and that have all the reports resolved and now we are trying to tell them to include the concept of a Data Lake and distributed computing within the data extraction, transformation and loading mechanisms. 

It even happens that, for example, the BI Team considers the Data Engineering Team as enemies, because they think that they are going to replace them with a new technology and take them out of their Datawarehouse comfort zone. Such a recognized problem in the aforementioned industry, that books have been written to try to indicate how the Datawarehouse is not a contradictory opposite to the Data Lake, but rather a complementary opposite.

The good thing is that this WAR between why I’m going to use databricks if I already have snowflake, or why use S3 if I already have redshift, or hadoop if I already have my datawarehouse is OVER.

Data lakehouse concept origin

The Databricks team with the delta lake concept on the one hand and the Uber team with the hudi concept on the other have allowed the war to cease and come to peace. Both delta and hudi concepts are looking for files created in S3, Google Cloud Storage, Apache Hadoop and Azure Data Lake Storage to support data modifications and deletions efficiently. 

This is all accomplished by saving file changes to a manifest file and resolving data modifications and deletions in these manifests before directly querying the files. All this means that there is no longer a need to have a data lake and a data warehouse, but both can coexist in the same technology allowing CURD (delta or hudi). This is how the concept of lake house was born.

Data lakehouse

A data lake house is then a data management architecture that combines the benefits of a traditional data warehouse and a data lake. It seeks to merge the ease of access and support for business analytics capabilities found in data warehouses with the flexibility and relatively low cost of the data lake.

The business world needs to meet new needs in terms of advanced analytics, which represents a challenge that forces organizations and people to give their best. The appearance of this new concept does nothing more than delve into the clear obligation of evolution and continuous improvement. 

It represents an important leap forward by combining in one architecture, the processing of all types of data, both in Streaming and Batch, allowing the integration of artificial intelligence models, to obtain complete reporting.

In other words, it represents the emergence of a new reference solution in Advanced Analytics.

More articles to keep reading

Data Lakehouse

Data warehouse vs Data lake One of the most negative aspects of Big Data systems is the need to be...

Data Lakehouse

Data warehouse vs Data lake One of the most negative aspects of Big Data systems is the need to be...

Eduardo Cunarro - Director and co-funder Innovant

Eduardo Cuñarro

CEO and partner of the INNOVANT holding, Best Top Developers in Uruguay (Clutch 2022) and recognized professor of Software Architecture and Design at the ORT University of Uruguay.

For more than 12 years Eduardo Cuñarro has been helping companies improve their businesses through software and technology consulting.

His purpose is to make INNOVANT the happiest place to develop and contribute with INNOVANT Believe to build a better Uruguay.

He started at the company in 2015 as CTO and minority partner and professionalized the software factory, implemented Clean Code and trained the team to execute software development in the most efficient way possible.

In 2020 he was appointed CEO, formed a specialized management team and managed to double the company both in terms of team and turnover.

His commitment, charisma, strategic thinking, determination and rigor make him a unique leader. Always complementing with humility, with gratitude, and with humor.

Working with Eduardo implies wanting to bring out the best version of yourself.

Ignacio Rohr - CEO and Co-funder Innovant

Ignacio Rohr

COO and founder of the INNOVANT holding company, Best top developers in Uruguay (Clutch 2022) and software engineer and senior consultant with more than 12 years of experience working with engineering teams.

His purpose is to make INNOVANT the happiest place to develop and contribute with INNOVANT Believe to build a better Uruguay.

He founded the company in 2011 with Encantex as his first project.

In 2020 he was appointed COO and doubling the billing and the team that same year managed to improve the company’s profitability by 21%, boosted the company’s performance and implemented an agile communication methodology between departments and customers.

His generosity, understanding and kindness make Ignacio a fundamental piece of INNOVANT’s leadership. He is that person who, above all, leads by example, responsible and hardworking, intuitive to the needs of others, always establishing excellent relationships with the team.

When you work with Ignacio you feel understood, cared for, and in a safe environment to develop in the best way.

Marta Soler - CMO and business developer

Marta Soler

CMO & Business Developer at INNOVANT, Best top developers in Uruguay (Clutch 2022) and with training in engineering, architecture and business. She was introduced to marketing when she started working at Red Bull Spain at the age of 19, she fell in love with it and today it is his great passion.

Its purpose is to make INNOVANT a powerful software development brand generating a large Spanish-speaking community. In addition, she also wants to help other professionals and companies to get clients, sales and relevance with Digital Marketing.

She started at INNOVANT in 2020 and soon became a key part of the company’s management.

She led the change of image, naming, branding and creation of the entire marketing department as well as the growth strategy through content marketing, social networks and paid media.

Marta is the joy of the team; she is creative, authentic, sensitive, expressive and intuitive, and that makes her a culture engine within the company. She infects the whole team with sympathy and energy.

Working with Marta implies looking for meaningful and free interpersonal relationships; she implies exploring creativity to the fullest; and implies that curiosity, sympathy, spontaneity and fun working with her are always guaranteed.