Ryan Buick - Making Data Easier (Jake Peterson)

    8:03PM Jul 15, 2022

    Speakers:

    Keywords:

    data

    work

    analytics

    company

    tools

    systems

    business

    build

    problems

    security

    sas

    stack

    team

    staffing

    super

    staffing levels

    projects

    people

    modern

    run

    All right. So welcome to the canvas podcast where we bring together business and data leaders to talk about how to make data easier for everyone. And so today, I'm super excited to have a friend and former coworker of Flexport. Jake Peterson, join the show. He is a former analytics engineering manager at Facebook analytics manager at flex port. And currently now the head of data at vanta. Thanks so much for coming on. Jake.

    Thank you for having me. Ryan, I'm very happy to talk with you. Yeah.

    Awesome. Well, let's start with just maybe a quick overview on on yourself.

    Sure. If you've been working in this data science and analytics game for probably 16 years now something like that. I started at a company called axiom doing predictive analytics. So statistical programming, marketing, consultation, marketing analytics for all kinds of clients in the direct marketing and email marketing space. It's really, really where I got my chops. And then from there, I've had like a pretty weird career, I, I ran it in analytics and started a few startups, I, you know, like back then in like, 2009, like, a lot of the open source packages for analytics didn't exist, you know, most of the tooling to do, you would buy the very, very basic tooling, or you'd have to roll pretty much all of it yourself. And getting data out of your other systems out of your other SAS cloud systems was like, super difficult looking, just most of them didn't have API's. Any tools. And so you'd sit yourself on as like, what other companies I worked at, we were just like, Well, it'd be great if we could have data, we could do all this analytics, but you know, we'd have access to none of it. So I couldn't learn the lesson there that like, if you don't have any, if you don't have any raw data flowing through your systems, you're there's no analytics to do. There's no like, operational improvements to me. So it's a real challenge. And so I went to Facebook after that, and, and got in on this the quote unquote, big data thing, it was an early data science hire at Facebook, as in 2011, I don't know if that counts early, I guess maybe now it does. But no company has good size. So when I joined, it was over 1000 people, and did a lot of different things there built the search data science team and the platform data science teams. And then I had like a little brief stint in venture capital doing some data science, there was like a really long story. And then Flexport, I'd say, in general, most of my career has been across kind of everything in the analytics and data science space, but it's largely been just how do we take this raw data and repackage it up into something to change the business and drive?

    Awesome. And, you know, we ran through a little bit of, you know, the different services you've been at, what what made you get started with, with data in the first place, what drew you to it?

    Oh, I so I can still use simulation and statistics in college, I wasn't really all that interested. And my first job out of college was actually like a production engineering role, sort of, like producing emails and producing websites and stuff like that, like, like, I spent a good part of my day, I spent a good part of my days writing HTML. And, and it was a nice job. But you know, it really wasn't for me. And, and a lot of the work that I actually got excited to doing was working with Dale's, like, you know, we'd have accounts. And so I spent a lot of time like working with data for clients. And then I just got in contact with connected with our analytics team. So it was run by a guy named Todd King, who's now VP at hedge fund, I, and he kind of like, tapped me on the shoulder and said, Hey, I think you'd actually like this, doing this analytical work. And I think you'd be good at like, you should come join me and go do this. And he was totally right. It was, it was it was like a massive improvement for me in terms of quality of life and the challenge of the problem. And the work we were getting, and the work we were doing and so I happened to just kind of fall into it like at the beginning, and then really fell in love with it. Ironically, the thing that really got me interested in working with data and performance he was like was programming in SAS SS, the statistical analysis software tool. It's, it's, it's very strange, but it's have kind of like a bad relationship. With SAS. It's like there are parts of it that are like really abusive and horrible, like the macro language. But there are parts of it that are really incredible and amazingly powerful. Like the data stuff is like really intuitive and really, really powerful. It's probably the best data transformation language there is like this data set But they haven't fast, which is super, super useful. And he started to get like a feel for like the things you could do once you got your hands on SAS, and then got your hands on some, like raw data. And like, we started, actually, and we were writing recommendation engines in SAS, and like doing cluster analysis and says, and we're getting good results for our clients. So it was like, it was a really awesome transition for me, like, I started to kind of like, see the light, and analytics there, where it's like, oh, man, there's this. It's like, if the conditions are right, if you have the tools there, if you have data coming through, and you have clients or partners that can use the data, you can generate a ton of impact. And sometimes it works almost like magic.

    Totally, no, I think that's, that's a good segue to, you know, sort of the the next question I had for you, which is, you know, making this feel like magic and making making data useful, right. Yeah. So you've been at some pretty, like, operationally intensive companies, right? Flex board, obviously. And then and then you're, you're now the head of data advanta in might be worth, you know, explain vanta to the audience as well. But like, what, what draws you to sort of these, you know, a flex, where we had a term called SCHLAPP, right, which is like, these, like airy, you know, nasty, nasty problems. So it sort of draws you to those problems. And like, you know, what, what type of challenges does that present when it comes to data?

    Yeah, I mean, I don't know, if I'm drawn to them, I think there's something deep inside me that looks at situations that are just like gnarly, and then says, oh, I should do that it will be painful, very painful. It's probably like a bad I probably have a bad decision function inside. I think part of the fun about working with, like, operational data. If you if you actually make your partner operational teams jobs easier, it's, it's really rewarding. It's like tremendously rewarding when you like when something for them that they really use, and that makes their life a lot better. That's, that is one of the better times of the job, right? Like you land this data tool that like solids, like harder problems, as far as Vantage, would you like me just explain what the answer is like here? Top? Yeah, that'd be great. But

    where we, we love it,

    our customers, right? The so vanta is security monitoring and compliance platform. So traditionally, the way you prove your security and the way that you manage consumer data quickly, and with the proper respect was you would go to an accounting firm and get a compliance audit done, right. So this is where sock two types of wanted to come from PCI, all that stuff, the process of doing that is extremely painful. It's the standard was designed by accounting firms. So not necessarily cued up to our, our modern cloud infrastructure, or cloud software setups. And there's tons of room for automation. And vanta delivers the software to do that. So you connect your cloud systems to Fanta. And then it provides a single pane of glass to see your security stance and see how you're doing. For a tiny startup, it can turn the VP of engineering into a virtual CISO without like a ton of work involved. And so for me, like, that's actually like, really attractive to me, like security is like one of the areas I'm trying to enhance my career. And I've worked with a bunch of different security teams in the past. And I feel like hey, like the tools are just not up to snuff for for executing other jobs. So I got a lot of sympathy for them in, in that world. So I, you know, I love the current gig here for that. And as far as just like generating impact, you know, being on the data analytics team, means you can really influence the product, I think, from a security monitor, like our product is security monitoring, right? Well, why wouldn't we want bank analytics best practices into the product? Why wouldn't we want to take like the learnings we have internally from running our own vanta and running our own data practice and pushing it out into the product? So to me, that's like super great gig for me in that regard. Awesome.

    What what what is like the relationship? You know, what, when does it relationship, it's something we talked a lot about the relationship between business and data teams, right? Because those two teams often, you know, work together, but, you know, maybe they're not working together early enough, or they're not working together often enough. And it sort of seems like, you know, to some business teams, there's a big sort of, you know, gap and understanding of what data teams you know, do on a day to day basis to make this data actually reasonable and useful. What's What are some of like, the best practices that that you tried to bring, you know, to a company of how to work with the business team like what what have you seen works best and What are some things that you try to try to avoid?

    Yeah, I think it comes back to your whole strategy of how you're going to build your function within the company. So I think the most vanilla framing for this, it's out in the internet and you can Google Places called quote unquote, data maturity. And actually, like I have data maturity goals, advanta, right, as I build a framework as a relates to technology, process and culture, and making sure that all of those loops land in the company and a data lens in it, right? Because you actually, you know, once you run the function, you realize, oh, man, I actually need all these things to work, like I can build the most amazing system. But if nobody logs in, like, that's, you know, that's what's gonna happen. And if I don't train them how to use another to happen, and then if the leaders don't reward people for coming up with cool solutions to cool analytical solutions to problems, and you know, system's not going to get better, it's not going to improve, and then nothing's gonna land so. So I think when you're thinking about working with the business, you have to like start with that foundation. Right? So one is assess, where everything is like build your foundational assessment of, and I think, using a data maturity framework really good for this, right? Just, where is the business? Where is everybody? What's going on with everybody? What are all their core constituencies, and then figure out where you're at, and then figure out where you want to be, and staff that appropriately along the way, so because I think there are different levels of where you want to be, and how to achieve that in partnership, like between the data team and the business team, right? So just as an example, I think kind of the staffing levels you want to think about are in terms of like how much investment you're gonna make an animal to go and do productions, right. And I think the range of that is somewhere between two and like, 13% of total staff, right? Where 2% of staff of a company, once you hit a couple 100 people is kind of bare minimum running a data team, the company, right? Like, if you're at 2% of staff, with your data warehouse and your analysts and your data scientists in the company, you're kind of barely keeping the lights on. Right? Like your, like your you can expect in that like, if your data warehouse and your data stack is not running super reliably like you can expect errors and outages regularly, you can expect to take measurements, that nobody knows how it's working all that stuff at. So that's like the bare basement staffing below, right? At 13%. That's like, I want to leave the industry. Like that's the highest level I've ever heard of. And because I think that was stitch fixes ratio, I think you'd have to talk to someone at Stitch Fix to like, accurately get into that. But it's, that's it's my understanding that that was their, like, target ratio of like data professionals to like the rest of the business. And they were, they're officially going after like data led company, right, where they're actively, you know, their strategy was effectively burning the boats, right? We don't do a sale, unless it goes through the recommendation engine, which is a serious commitment for for your staffing of our data analytics profession. So it's basically like, Hey, we're building the entire company upon this premise.

    That Okay, so now you've got your, you got your range of staffing levels, and how you interact with the business and what you can do. Now you got to decide within that, what are the click stops? Or what I can do, like, do I want to barely keep the lights on? Do I then want to do okay, now I want to actually I'd like to do more than keep the lights on, I'd actually like my projects and like company, little projects? Do I want my partner teams to have embedded analytics? Or do I want to drive my partner teams through analytics, right? And then you've got to partner up with your staffing levels on those. And then and you have different models for engaging depending on what staffing you want to do. And I think a lot of companies go, they start at barely keep the lights on. And then they go okay, actually, there's we're leaving a lot on the table. Let's go to project level embedding. And they embedded projects, you know, like they'll develop a project intake department with business teams and figure that out. And like, those teams will act as sort of like a product project focused data team. And then from there they go to Okay, well, look, we're actually still leaving value on the table. Let's go like direct embedding like, direct analysts or data scientist, one person, per team, every team we've got, and then staff it up that way and then you're starting to get into the like, six to 10% of staff level for what your process is really easy, right? It's like, everybody's got a guy and that guy's on your team. And then And then last is like, okay, no, we're actually gonna we're gonna run the whole business through this whole, which is, you know, I literally, I think the only company I can think of the dozen system. I can't think of an even Facebook, which has got tremendous data science and liberal staffing doesn't stop.

    Yep, it's a district site, the data is the product, right? It's the the recommendations are.

    I don't think I'm telling tales out of school. I think they've talked about it, but like, you know, take with a grain of salt.

    No, no, that makes sense. What about So you mentioned, like, the different stages. And I think that totally makes sense. One thing that we talked to a lot of founders about, and folks, you know, on leadership teams about at, you know, series, a stage, you know, burgeoning, you know, series on stage. You know, you talked about some of the things are keeping the lights on once you have 2% of you know, the company's headcount, blinded the data work, but what are some of the symptoms that lead to even saying, Okay, we need to bring on an update, or we need to hire a first data person, what does stage zero look like? And what are some signals that that founders and leaders should be looking at to say, Okay, we should we should probably think about hiring.

    I? So yeah, it's a good question. The, in general, it's, you don't feel confident that your numbers, right, your tool don't have training, you're not sure that people are using them correctly, and your your goals setting, goal achievement, and project definitions are almost all narrative based. And you feel like the narratives are kind of like, not hitting the mark necessarily, right? Where it's like, I feel like this, you know, this group is telling me this story, this group is telling me this story, but no one's providing a number, and I can't tell and what's gonna happen. And I think that's the case usually we like, or you say, All right, we need someone we need a team to make this easy, right? Because I mean, I like personally, I really like the story about the business and your products, and your projects that you're developing are super important is probably the most important thing, right. But when they start conflicting with each other, is usually when building a data that really seems to make sense, right? Because the quantitative side of those decisions is something that once you have your assumptions that everyone can like, definitely agree on. Right. And it becomes a really powerful coordination tool. So I think that's roughly like the state things are people generally have a sense of, right. And it's like, and to be fair analyst is kind of a big company. Right? Like, it's, it's not really a five person startup thing, like, you know, you shouldn't be hiring a data engineer when you're got five people. Let's just do or us too early, right? It's kind of when you're, you've gotten to the point where you're going you go look at your, all the systems across the company, right? Because, you know, even little startups have tons of sasta. Like, how many SAS tools? Do you guys use a canvas? I bet it's like 20 or 30, right? And you're like, we've got just a few. We've got 10, SAS tools per person. And, and then once all those start generating data, and pulling them and you want to start generating cross departmental insights, it's like, Okay, now, it starts to make sense. We need a team. Let's, let's focus on let's focus on infrastructure. Yeah,

    not I think I think we, we see that a lot. Right. And it's, it's getting the story straight first, based on objective fact, objective evidence. Yeah. That makes a lot makes it a lot easier downstream. Yeah, that is that is really critical

    for goal setting, right? Because, you know, when you're an early company, you, you still have a lot of creativity left in your company that you have to execute on, right? Like, there's no, there's no like success function. Unless you're a hedge fund, right? Like there's no success function sitting around, then you can just like throw a ranking engineer, and be like, oh, sweet, I just like hacked my ARR and like, this guy did it and our 100 Laker No, it's not. Yes, like, there's, there's a lot of like storytelling in the market, you have to go do storytelling internally to go make things work. And then, and having that fixed point of, like, real facts on the ground is super unifying, and coordinating everyone towards the same goal and like heading in the same direction where you can ideally you can distribute it, right? Like that's the that's the dream in your data system, right is you actually do have this success ranking function and I can, I can distribute it at the Pokemon Go hack on it. And then you know, someone will just independently come up with some amazing result. And, you know, we'll be, we'll be off to the races. Yep.

    What? So in that in that situation when, you know, the data came comes, the data team comes in what from like a tooling perspective, what's like the order of operations that you think about when you you know, when you come into a company like, like you did with vanta? Like, yeah, what are what is sort of the order of operations of problems that you'd look to solve, first that that make the most impact?

    Top down. So first and foremost, is I think you want to set the company metrics, unless there's other burning projects, right, that are that are sitting around. And that need codifying. So if you so if you, if you build the company metrics first, in general, that kind of helps pull the infrastructure out of it, right, because you've got a concrete thing you're delivering, you got something people can use. And then the infrastructure and the technology decision will get made after that, because you have a sense of what what kind of beast you're especially to is like, just as it relates to size and diversity of the data, and the datasets you've got, right, like, if you've got if your product or your company is got, you know, like, say you're an Internet of Things company, and you've got like five sensors generating kajillions amounts of data, but you've only got four or five tables in the data warehouse, you're gonna, you're gonna do a very different set of things than then Flexport would write where Flexport got, like a real diversity of datasets, and sources. And a very real diversity of transforming transformation means and team means and it's like, you pick very different stacks. And those two companies, right. And so you start with the company metrics top down, and what you want to do, and then start, like, looking at kind of the underlying things that are feeding. So just based on like, data size, data diverse and number of systems that you're, you're gonna, that you're going to optimize. And then just like the characteristics of those things, too, right. So like, so for vanta security is like, pretty deterministic. So we don't have the demand for traditional data science is limited in a few places, right? Like, there's definitely some places that we've got opportunities there. And they're we're working on, but it's not like, we're going to rank order. Your policies. No, you're gonna need to get your policies

    set in stone. Yeah, yeah, you

    gotta, like you gotta meet you, you've got to, you've got to have, you've got to secure your instance. Right? So so that changes the, you know, the tools we use there too, right? Because we're, we've got a very, like, we've got a deterministic business. So it's more important that these things be deterministic, right, versus like, kind of fuzzy. And probabilistic. And ranking is very different from like Facebook, right, where we've got huge data flowing through it, most of the decisions were where we'd be making and be fine. We're off by a few percent, right? Or even just lost a few percentage of login routes or whatever, no big deal, you got another 100 million. So the not the case in other businesses. So like, You got to, like take those questions into account. Super helpful.

    The, you know, talking about the tools and talking about, you know, the nature of the business and thinking about the stack that you build out, you know, transitioning is sort of talking about the modern data stack. And, you know, there's a lot of a lot of stuff on on Twitter and LinkedIn right now saying like, the data stack, like no modern data stack is dead after, you know, six months ago, finding out what the modern data stack was. And so I'm curious to get your take, you know, now that you've seen a few of these stacks built out and systems like, what problems do you think that the modern data stack has has solved really well? And what problems do you think it is created? Or what problems do you think from main for it to solve? Great

    question solved really well, so if you're talking about, we're talking about the ELT stack, right, which is like stitch and five Tran load into a modern warehouse transform with DBT, or potentially different Transform tool and then one of the modern vis tools. The El side of things that that it fixes is pretty amazing, right? It's, it's a game changer. If I had that in 2009. Like, they would have been incredible, we would have been able to do so much more than you can now. It's just taken everyone involved to evolve to that state, right? Where, like, where all the other cloud tools go? No, we have a structured model now and we can provide that Through data share and like both provide that through a standardized API and get those like that that yield side of it works. I think tremendously well, forum is still I don't think all that great. I think there's a lot of traditional data warehousing data engineering that has not been modernized. I think Maxime B from preset talked about this on a podcast a few years ago, like a little while ago, and in the data engineering practice for like four dimensional modeling and transforming data to like transform it into metric shape. It's still not, I think, up to snuff to where the for the modern day stack where it kind of ought to be, right. I think that's the big gap is that things are working really well as eel works really well, cloud warehouses, snowflake, Starburst, BigQuery, work really well, like they're there, if you have a small data warehouse, they're great if you have a small data warehouse, and they just scale up to whatever size you need, which is great. So you just pick a commodity partner, and then you don't have to make another decision for a really long time, which is great. I think some of the visualization tools are kind of mixed, you know, like dependent, some, some are great, some not so great. And then and then transform, I think is still like native data engineering. And we're, there's still like a long way to go. So I think that's got a long way to go. And I think the integration between all these systems and production still actually is really actually a long way to go. I think a great vision of the world would be why can't I define my transforms, you know, in a language and then not have to worry about all these connections, even though they're, you know, there's not that much work in them currently, right. But I should be able to model out from like, my production objects, like, how I've defined my source application, how the reports and data gets generated, right. And I still think that there's like a lot of opportunity there, I'm surprised that no one's gonna build it. I'm surprised no one's built modern Ruby on Rails for SAS, which is like Ruby, you know, what I mean? Like, full circle, sort of, like, build your application. Here's what comes with your application, auditing, workflow, analytics defined by by populating out these scaffolds, I'm surprised that no one's actually built that framework yet, because it's still most of the techniques are still pretty old, right? Like, we haven't really updated Kimble's dimensional or fact modeling to present data for analytics, like all the new things in data science, or seem to be largely in like huge data and deep learning. Right? versus,

    versus the mass market problem. Yeah,

    exactly. Right. I think the tidy verse stuff like what Hadley Wickham has been doing, and there it's really like tidy modeling is really impressive. But even that is still is still like the last piece of a seven, a seven point like data value chain. Right? And that feels to me like why isn't this like, no one has yet figured out how to capture this in to, like in two links. That's probably available. Maybe it might just be structural, right? Just like to me instinctually I'm like, this seems like there should be some vertical integration of the data value chain.

    Yeah, no, I think it's, you know, there's this cycles of, you know, Rolling Rolling up functionality into one tool and then unbundling it definitely seems like we're in in sort of flux between you know, many systems and you know, watching these best of breed tools come come and then seeing what what happens with with consumers and being able to, you know, maybe it might be easier for some of these tools to be verticalized or rolled up so no, super interesting. I know you got to run here so before we go on so much for coming on, how can people get in touch with you? How can people learn more about Fanta?

    Oh, cool. I so I think vanta I think follow you're probably the best way to learn about math is probably falling Christina catch you up on Twitter or go to the vent website or blog. For me I'm not super online you can you know, you can come to my find me on LinkedIn to search state Peterson. I'll can actually have no problem with that. I probably shouldn't be more public about this. But Ryan, I think you're forcing me out of my my grimy hole.

    You're busy. You're busy, man. Well, awesome. Thank you so much, Jake, for for coming on. Thank you for having me very much. It was a real pleasure. All right. We'll talk soon.