With 8,000 employees at HubSpot, keeping everyone rowing in the same direction is no small feat. Hundreds of teams rely on knowing which data assets to trust and where to find them. At this level, three separate analytics engineering teams are needed to build data trust, communication, and make sure that everyone has what they need to make smart decisions.
Tony Avino's team is one of them. His 11-person analytics engineering team is the trusted source for HubSpot’s product data which is used across go-to-market and EPD teams to inform strategy and decisions.
We sat down with Tony to chat about data culture, measuring ROI for analytics engineering, governance, and career development strategies for his team.
Join our upcoming Friends of Data Events!
I would say that a data culture is an organizational mindset. As you think about the different altitudes that exist at a company you've got: individual contributors, management, execs — all of whom are trying to work towards a North Star goal. Oftentimes there's multiple layers to implementing a data culture, it's not as simple as one team wanting to drive data change, so it really has to be done at an organizational level — and that includes leadership commitment to say, "We must double down on becoming a more data-driven organization, where decisions are backed by data and insights."
To build a data-driven culture, you need four key elements: leadership commitment, data literacy, appropriate tooling, and data accessibility across the organization. From there, you can calibrate on where you’re at in each of those buckets and figure out which one needs more attention.
Yeah! I manage a team of analytics engineers and we have a BSA as well who manages Amplitude for product analytics.
My team’s overarching umbrella domain is product data; we sit at the center of product usage and outcomes data. So if you think about what engineering teams produce as part of our products, all that data gets consumed by my team, gets reason transformed, and then gets certified or outputted for functions like marketing, sales, customer success, and other product teams to use.
Within our team, we have two groups:
Product core, which manages a diverse range of product domain data, encompassing areas such as commerce, product limits, feature usage tracking, and CSAT information. This data is crucial in empowering teams to improve product development cycles, optimize pricing and packaging strategies, and monitor key metrics like usage and activation as it relates to their impact on retention outcomes.
Product outcomes, which oversees a more specialized area of customer product usage data. This team takes data directly from our product to get a better sense of what a customer would see when they log in. For example, maybe it’s data they see on the number of forms that they’ve submitted or maybe it’s the number of contacts they have. This team then brings what our customers experience and back into our reporting ecosystem so that other teams can consume from it.
To answer this, I like to start with the cupboard analogy. The data engineers build a cupboard, and there's nothing in it: the analytics engineers then sort the cups, bowls, and plates so that the analysts, who need “xyz information” know exactly which part of the cupboard to pull from.
When you think about it from that perspective, the analytics engineer is there to streamline time-to-insight or decision velocity. It’s two-fold of: how do we model the data to be highly accurate with a process that has high uptime and low latency, and also in a way that produces value for the consumer? And there's a couple of ways you can quantify that.
You can survey your customers. We typically like to do an ENPS score — like an NPS score — where we ask our customers: how easy is it for you to do your job on a day-to-day basis? How easy is it to find the data you need? Do you trust the data?
You can also track time-to-insight, by tracking data request to answer or how quickly your team can deliver on larger projects, like the next month’s forecast or setting up reoccurring reporting. This type of measurement happens a lot in reporting that goes up to management or board meetings.
I like to position any deliverable that my team outputs as a product. So if you think about a product release, there's not only the communication of “Hey, this thing exists” but then there's the follow-up enablement about “how do I use this thing?”
From a mental model perspective, treating your deliverables as products adds weight, especially as you think about long-term adoption and management. You don't want to put out an asset that nobody uses because they didn't know it existed or that it's being updated.
We aim to get the socialization out to other stakeholders or consumers in a way drives adoption, highlights the effort behind getting the information, and makes clear that we’re going to treat it as a living, breathing asset and constantly evolve it.
If you bring something new to the ecosystem, you have to manage it. It’s a very real trade off that you have to make.
From ingestion all the way to insight, there are all these different roles in the analytics ‘value chain.’ So, “AEs within the value chain” refers to the clarification of what roles and responsibilities analytics engineers have in that process. It’s meant to clarify the value that each role should be have along the way as part of how data engineers, analytics engineers, and analysts all collaborate and develop together.
The concept of the value chain
A value chain defines the roles, responsibilities, and hand-off points of data engineers, analytics engineers, and analysts from ingestion all the way to insight. This helps maintain data-asset hygiene by clarifying: Who should transform the data? At what point should analytics engineers request support? What should analytics engineers own?
For example, how we think about our value chain is: analytics engineers take the raw data from the data warehouse, stage it, and put any intermediate models in place. Then we output marts, or what we call marts, that can be universally used across different functions.
And those functions (marketing, sales, customer success) can take the mart, extend from it and utilize it for their own purposes. The key is that we're all pulling from the same mart at the end of the day.
This helps maintain data-asset hygiene, versus an analyst adding a single column to a table without our visibility; they’ve now brought in something new that's unmanaged, unmonitored, but exists in this chain.
So that's where these handoffs points are really important to clarify to say, “Hey, we've got you covered from Point A to Point C, and then proceed from Point C onward as you wish, but make sure you're using the same asset for whatever topic you're reporting out on.
Lineage is an interesting one. There's a lot of great tools that can give some of that oversight of how everything's being used and referenced across the business. The challenge is, how do you govern that?
We do have internal data lineage tools that show us if a source is being used by, let’s say, 30 different teams and if it’s tagged as a non-certified asset. We can then work backwards and say: “Hey, team ABC, I see you're using this this thing that's not managed or monitored. Can we move you over to this asset instead?”
The governance piece is more challenging. You may have a process that outlines what teams should do, but setting up systematic controls to make it so that you're not continuously adding junk or going against the grain of how you envision your assets and lineages for the company is a larger project.
Yeah! And not everything needs to be certified right? You can have a sandbox as long as it's a protected sandbox.
I think, in general, the tricky thing with data is just because you can get a result doesn't mean, it's the right result. There’s a constant data hygiene battle that requires thoughtful governance and audits on a regular basis to evaluate: How many duplicate assets do we have? How many assets are not documented? There's multiple layers going back to your question on analytics engineering's value.
There's a cost aspect with data warehousing as well. If more and more people add assets into the data warehouse but it goes unchecked, your data warehouse costs are gonna hockey-stick up into the right. So then it becomes a question of: how can we make sure that we manage our costs so that what we build is extensible, reusable, but also optimized for process?
As an analytics engineering team, it’s part of the responsibility to identify anything that is really costly and come up with a better way to refactor it so that it runs more efficiently and is extensible for others to consume.
I think it's both in the sense of: 1) you need to have process controls and 2) you need to be able to track against your cost.
There's also a culture layer around making sure that individuals who are contributing to a data warehouse are enabled to do so in a way that keeps costs and performance in mind.
And, I think the other side of it is: trust, but verify. I'm a strong advocate for putting data outputs through a similar process to software engineering methodologies. This includes running new assets through a CI/CD process to get peer-reviewed and to check how long it’s going to take for it to run and know the estimated cost. That includes things being flagged and further investigated if they meet a certain threshold.
If you feel like there's a gap at your company with data literacy or tooling, or data culture, being an advocate for that change is really important. Change doesn't always necessarily come from the data leader and without it a company can stay stagnant. But it is possible to drive or influence change with grass root efforts.
Yes.