Cole MacDonald
February 23, 2022

How we Successfully Scaled Data Processes at Mero

It wasn’t too long ago that I remember my Co-Founder and I pouring over tables of data, manually deriving insights from hand written SQL statements and pivot tables. Data is challenging to handle when you’re a small team, especially when it's so core to the product at a company like ours. As we scaled and a lot of our processes became more formal, we realized that the way we processed data insights largely stayed the same. Teams manually reviewed and evaluated interesting outputs before sending them out to customers. This was fine when Mero had just a few customers and simple algorithms, but as we grew, not only did the load of data increase, but the complexity of the algorithms did too.  Our data team soon became backlogged with complex requests, it was clear that we needed to change this process to something more scalable. 

Decoupling web development and data science

We have four major teams at Mero, the business team, the web team, the hardware team and lastly, the data team. Typically, the data team worked directly with the business team to deliver insights while passing off interesting outputs to the web team when needed. This process worked fine in the early stages, when the number of algorithms included in the web platform were limited and therefore easily could be translated into any language of choice. However, as we grew and our algorithms became more complex, we were at times lacking the checks and balances that ensured that data was consistently accurate and performant. As a result, we quickly found that the data team became naturally siloed from the web and hardware teams (in other words, the rest of the product!).

Because the data team didn’t have direct visibility into our application’s algorithms, they trusted that the web development team would keep them in check.  On the same token, since the algorithms were coming from the data team, the web team assumed that these algorithms would constantly be working and ready for new features. This isn’t the fault of either team, but the nature of startups: things are moving fast, you are relying on others for their results. In the end, it was clear we had to fundamentally change our data processes to ensure we could properly scale the core of our product.

Improving the speed of development through automation

First, we looked at how we processed data. One of the biggest bottlenecks to analyzing data was that a lot of the processing was done locally on our data team’s laptops. Moving the processing to the cloud allowed the team to get from raw data to output faster, by up to 10x.  This introduced some overhead but also allowed the team to train models directly against databases, a huge benefit compared to needing to manually download the data. 

Creating data pipelines

Next, by moving the processing to the cloud, it allowed the team to take the algorithms they were running remotely and quickly turn them into automated data pipelines. Now, instead of packaging up algorithms locally and sending them to the web development team, the data team could be involved in every aspect of how their algorithms would be used, from QA, to CI/CD. Furthermore, because the process was automated, we could leverage logging and performance tools to automatically trigger alerts if algorithms ever lagged in performance or data quality. 

The result: faster value to customers

This process became so integral to Mero that we quickly moved all our algorithms to these data pipelines. Now, all of our data outputs flow not only into our application but also into a data warehouse where we automatically generate custom reports and analytics for our customers. The new process almost completely eliminated the need for creating data outputs manually and the team could instead focus on identifying new data outputs rather than maintaining old ones. Suddenly, we noticed a strong increase in not only data quality, but the rate in which we were creating innovative new data outputs for customers.

Automating data pipelines created an environment where the data and web development teams needed to be working collaboratively on features. The new process naturally broke down barriers between the teams not only increasing collaboration, but also boosting data quality, performance, and ultimately creating more effective algorithms.  

As a founder, it’s important to continuously evaluate your team structures and processes while recognizing that what worked for you just a few months ago might turn into a barrier as you scale. One thing we know for certain, we are just scratching the surface when it comes to unique insights related to our customers. With our new data structure that will improve output by more than 10x, we’ve already been able to accelerate many of our data initiatives, bringing new features to our customer’s hands faster than ever before.

Stay tuned, we’ll have lots of exciting feature announcements to share in the coming weeks. 

Recommended Stories