Big Data Methodology for Projects

BIG DATA IT

Pedro Bonillo
Pedro Bonillo
ML and Big Data Consultant

2 years ago

Publication date

A common mistake in the area of ​​Software Development is to think that the big data methodology gets in the way, that it does not work, that it is a way of hindering, and since life is chaos, it is best to make the software without at least one plan and thus the projects are done with the idea of “When the time comes, we’ll see”. The truth is that no quality software product can ignore the process with which it is made and the product obtained. 

Although agile methodologies are in fashion, they are nothing more than a response to the speed with which the market requires software to go into production. But it is important to be able to distinguish between tools, techniques, methods and methodologies.

A tool is something that is used in a workplace, for example Apache Hadoop. The technique is the efficient or effective way to use the tool, for example, using Apache Hadoop, with columnar format and writing the compressed files with Snappy. A Method is a path, a way to get from one point to another, for example, Map Reduce with Yarn or Spark on top of Apache Hadoop. A methodology is a set of methods on which it has been reflected after its application and that, when put into practice under the same premises, similar results are obtained.

In Big Data, many have tried to use orthodox methodologies, for example, now everything is Big Data or Big Data does not work for me. The reality is that the big data methodology must be agile and must start from the selection of a use case that solves a problem that cannot be solved with current tools, techniques, methods and methodologies.

In this way, the proposed Big Data methodology for projects is an adaptation of Scrum (agile and flexible methodology to manage software development), with 4 Sprints; the first two of 2 weeks; the third of 4 weeks; and the last of 6 weeks; for a total of 12 weeks; with daily reviews. (see, Figure 1).

Figure 1: Methodology of Big Data projects

Sprint 1: Ingredients Search

In the first sprint, which we will call Ingredients Search, the possible use cases that we seek to improve through Big Data Management will be identified. This Sprint has a duration of 2 weeks. The selection criteria of the use case are at least three: volume (more than a million records), variability (many columns and some with null or nan), speed (perform queries that traditional relational database systems do not respond or take more than 5 minutes to respond).

Sprint 2: Prepare food

In the second sprint (Prepare Food), it is necessary to identify in each of the use cases described in the previous sprint, which are the variables that we want to calculate (key performance indicator) or predict (Distillation Tier) in each case. This Sprint has a duration of 2 weeks.

Sprint 3: Food plating and Presentation

In the third sprint (Food plating and Presentation), it is about selecting the most relevant use case for the organization, or what we will call the Golden Goal. A use case that could not be solved with traditional relational database systems or with available business intelligence tools and that can be addressed quickly, with less cost and with less effort using Big Data. 

For this selected use case, it will be necessary to obtain the support of Senior Management, which implies evangelizing them based on the New Architecture and the Management of Large Volumes of Data. In addition, it is necessary to obtain the investment and expense budget to be able to execute the next Sprint, and thus be able to develop the use case, with the necessary Big Data components. 

The word necessary is highlighted, since it is a common mistake to try to implement all the components of the Big Data Architecture for this use case. This sprint (the third) has a duration of 4 weeks.

Sprint 4: Delivery food

The last sprint (Delivery Food), is divided into 3 stages:

Packaging

In this stage the components of the Proposed Big Data Architecture for the selected use case are installed and configured. The extraction, transformation and loading of the data is carried out (according to the types of Information Acquisition defined: Real-time ingestion, Micro batch ingestion, Batch ingestion); their storage and use. Opportunities for improvement must be identified, regarding the increase on sales or optimizing processes. This stage lasts 4 weeks.

Proof of delivery

This second stage consists of verifying that the specifications of the use case were met, in order to correct assumptions in the strategy and be able to refine the Return on Investment. It lasts one week. Between this stage and the next, it is customary to adjust the learnings and select another use case to replicate these learnings in a new cycle of the methodology (sprint 5, Figure 1)

Disposable serviceware waste 

Finally it is necessary to measure the results and get rid of the waste, in such a way to document everything that worked and remove all the waste of what did not work or could not be applied. This stage lasts 1 week.

Final Thoughts

This methodology promotes the adaptation of the organization in an evolutionary and incremental way through the use of agile methods. This is why iteration through a phase of selecting a next use case is suggested.

Through this methodology we have been able to implement successful use cases in International Banks, Payment Processors for Uber and Google and financial fintech. Thus demonstrating that a period of twelve weeks or three months is enough to implement a successful Big Data use case, just to exemplify the AWS team implements its famous Data Lab using a methodology very similar to the one described above.

More articles to keep reading

Data Lakehouse

Data warehouse vs Data lake One of the most negative aspects of Big Data systems is the need to be...

Data Lakehouse

Data warehouse vs Data lake One of the most negative aspects of Big Data systems is the need to be...

Eduardo Cunarro - Director and co-funder Innovant

Eduardo Cuñarro

CEO and partner of the INNOVANT holding, Best Top Developers in Uruguay (Clutch 2022) and recognized professor of Software Architecture and Design at the ORT University of Uruguay.

For more than 12 years Eduardo Cuñarro has been helping companies improve their businesses through software and technology consulting.

His purpose is to make INNOVANT the happiest place to develop and contribute with INNOVANT Believe to build a better Uruguay.

He started at the company in 2015 as CTO and minority partner and professionalized the software factory, implemented Clean Code and trained the team to execute software development in the most efficient way possible.

In 2020 he was appointed CEO, formed a specialized management team and managed to double the company both in terms of team and turnover.

His commitment, charisma, strategic thinking, determination and rigor make him a unique leader. Always complementing with humility, with gratitude, and with humor.

Working with Eduardo implies wanting to bring out the best version of yourself.

Ignacio Rohr - CEO and Co-funder Innovant

Ignacio Rohr

COO and founder of the INNOVANT holding company, Best top developers in Uruguay (Clutch 2022) and software engineer and senior consultant with more than 12 years of experience working with engineering teams.

His purpose is to make INNOVANT the happiest place to develop and contribute with INNOVANT Believe to build a better Uruguay.

He founded the company in 2011 with Encantex as his first project.

In 2020 he was appointed COO and doubling the billing and the team that same year managed to improve the company’s profitability by 21%, boosted the company’s performance and implemented an agile communication methodology between departments and customers.

His generosity, understanding and kindness make Ignacio a fundamental piece of INNOVANT’s leadership. He is that person who, above all, leads by example, responsible and hardworking, intuitive to the needs of others, always establishing excellent relationships with the team.

When you work with Ignacio you feel understood, cared for, and in a safe environment to develop in the best way.

Marta Soler - CMO and business developer

Marta Soler

CMO & Business Developer at INNOVANT, Best top developers in Uruguay (Clutch 2022) and with training in engineering, architecture and business. She was introduced to marketing when she started working at Red Bull Spain at the age of 19, she fell in love with it and today it is his great passion.

Its purpose is to make INNOVANT a powerful software development brand generating a large Spanish-speaking community. In addition, she also wants to help other professionals and companies to get clients, sales and relevance with Digital Marketing.

She started at INNOVANT in 2020 and soon became a key part of the company’s management.

She led the change of image, naming, branding and creation of the entire marketing department as well as the growth strategy through content marketing, social networks and paid media.

Marta is the joy of the team; she is creative, authentic, sensitive, expressive and intuitive, and that makes her a culture engine within the company. She infects the whole team with sympathy and energy.

Working with Marta implies looking for meaningful and free interpersonal relationships; she implies exploring creativity to the fullest; and implies that curiosity, sympathy, spontaneity and fun working with her are always guaranteed.