Alright, welcome to the canvas podcast where we bring business and data leaders together to talk about how to make data easier for everyone. And I am super excited to have a friend and former coworker at Flexport. Jessica Larson, who has been a data engineer analytics engineer at top companies like ease Flexport. And now Pinterest, and who also has just literally written the book on snowflake access control. So Jessica, super excited to have you on do you want to start by just telling us a little bit about yourself?
Yeah, no, thanks for having me on. Yeah, so I started, I started my career on the analytic side, I was like an analytics engineer, I ended up, you know, really liking some of the analysis that I was doing and got into, you know, I, when I started up Flexport, I was a data analyst ended up realizing that that wasn't necessarily my favorite thing to be doing. And that was when I switched over to the data engineering side of the house. And that's kind of what I've been doing since then. So I was really focused on like, some system stuff, like when I was at flex port. After that, when I was at ease, I was doing a lot of pipelining. A lot of that, like, you know, really sexy fun, like real time stuff, get a million use cases for analytics data out there. And then Pinterest, I'm actually a platform engineer. So I'm not doing any pipelining. Really, I'm actually just building tools for the data engineers who are doing the pipelining. And doing a lot of working with all the stakeholders all over to make sure that like everybody has the things that they need in order to interact with snowflake. And then yeah, I just wrote the book. So that came out in about in March.
Awesome. So tons of questions there. But let's, let's start with the beginning on how you actually got your start in data.
Yeah, so I studied cognitive science. And I did a minor in Computer Science at Cal, I was really, really interested in the intersection of the human brain and computers. Now, when I was in college, it was more on like the actual neuroscience part of that. And I worked in like a lab where we were doing some really cool stuff. But after I graduated, I, you know, I found myself in this weird situation, because I was too technical, for a ton of roles that I was interested in that then I wasn't getting those roles, because I was being told, Hey, you're not gonna be happy in this role. Like, it's your it's not technical enough, this is more of a writing thing, whatever. While at the same time, I was applying to software jobs, and being told the exact opposite, like you're not technical enough for this, which is like the most like frustrating thing in the whole entire world. Especially because I was I'm literally like, not even one semester away from having a double major. But you know, here we are. So data kind of ended up being that thing, right, because, you know, analytics, it's like, as long as you have a pretty good understanding of, you know, experimental design, some stats, and you know, the ability to like learn SQL, or, you know, already having learned SQL, it was, you know, it's pretty easy for me to get into with my skill set. And so and then I just really liked it, it was really, it's really exciting. It's really fast moving, it's really, really hard to get bored and data, which is important for me, because I get really bored really easily. So yeah, I found my home and data.
Awesome. And what sort of inspired the change from, you know, starting with, you know, the analyst role and thinking about analytics to then going more towards the platform side. And we'll of course, get to the book but curious, like, what, what made you dive deeper and want to go go down the stack?
Yeah, so I think the the immediate answer to that is novelty and boredom. But, you know, when I was when I was on the analytic side, when I was an analyst, I really struggled with this, because I felt like my day to day, I wasn't actually getting any closer to where I want it to be. And I felt like I was kind of diverging from my career path. I was really bored of sequel, I was like, you know, SQL is a, you know, it's not a functionally complete language. You know, you can only spend so much time in SQL before you start going stir crazy. And I was going stir crazy. So I moved over to the data engineering side. And then yeah, I've kind of just been all over the place because I, I do my best work when I'm working on things I've never done before. And as like, I need I need that novelty I need I need something new that all of that learning is what keeps me engaged and like working to like, you know, get these projects across the finish line when I'm working on something that I've done a bunch of times before. It's just really emotionally difficult for me to do.
So. Yeah, I get you on that. Yeah. And I think that's, that's something that we see a lot too where you know, analysts are You know, day to day in SQL doing a lot of repetitive work. And that's, you know, a lot of the work that we're trying to do is, you know, shift, you know, a lot of that SQL work to the business teams where they can just do it with their spreadsheet skills. So, no, I totally get you. And I think we're seeing a, you know, growing trend of data teams going closer to this, you know, down the stack and trying to create reusable data for the business team. So, no, that makes total sense. What's so you, you you wrote this book? Tell me about it? What what inspired it? What's the response been like so far?
Yeah, so. So basically, like the, the genesis for this, as I spoke at the snowflake conference, not this past one, but the year before, on this new feature secondary roles that kind of, you know, just fixed a lot of things for us from an access control standpoint. And so my, my publisher, Jonathan, I guess, not my publisher, then he reached out to me, and he asked me if I had any interest in writing a book on snowflake. And, you know, I kind of joked to my friends, like, the only way that you can answer a question like that is, yeah, yes, I'm right. Like, you know, you have to take these types of opportunities when they come across. And, and he had a few suggestions, because there's, there was kind of already like, a basic snowflake book, there was an advanced snowflake book. And so he's trying to, you know, trying to find something that was a little bit more niche and in depth on a particular sub topic of snowflake. And since at Pinterest, we use snowflake for our most sensitive data, were, you know, for like our HR data, and, you know, some sales stuff, some financial reporting, etc. And so I was just constantly day in and day out, like working on security related things. And so I thought, okay, yeah, let's do access control. It's like, dive deep into that, you know, especially because I think there's a huge need right now with GDPR and CCPA. And hopefully, a whole lot more that come that protect people in the United States, but not in California. Right.
Totally. And what sort of, you know, like, when you think about role based access control, when you think about compliance and security, you know, at the companies that you've been at, what are sort of like the most, you know, common, like, manifests of this, like not being done? Well, like, what, what are some of the impacts on the business that that maybe business teams can fail, or even customers can feel? If you're not thinking about access control and compliance and security in the right way?
Well, it can fundamentally be a data quality issue, right? Because if you're not locking it down, like people can be mutating prod data, right, that's, you know, well, you can end up with some, sorry, my cat, you can end up with some dev data showing up in your production instance, that can mess up any of the models that you have. I primarily though, I feel like when I see this, in practice, the actual like, cause of this is, is not hot, like having access control in some tool, but not in all tools. And so, you know, maybe you have very strict access, control and snowflake, but then when you connect it to Tableau, use a service account. And so then everybody in Tableau has exactly the same access, and then you end up with your business users going in, and like, they're actually putting in all of these additional controls on Tableau. But how are you making sure that those all match up? Right? That's a pretty big one. And then also, when you look at, like, engineering, right? Again, it's all controlled here. But then you're using some service accounts where people have access to all these roles that they shouldn't have access to, and then they're able to, you know, maybe in their DevOps, or like in some Bastion or something, like able to actually, you know, change production data, and usually not intentionally, right. Yeah.
Got it? No, that's, that's super helpful. What like, Are there any other parts of the stack or tools where access control, you know, there's like, there's like common pitfalls, you know, because it's obvious to think about, you know, the BI tool, and maybe some other front ends, but any other tools in the sort of modern data stack that that are often overlooked, or things that you've tried to focus on?
Big one is also that downstream transformation piece. So when you're creating those, you know, you're creating the data model, whatever, you know, does this person, this person who's creating this table downstream in this tool? Are they only able to do that with the data that they're allowed to transform, right? That is something I see not controlled all the time, and it's hard to find the right tool that actually allows you to enforce that access control at that step.
Got it. Super helpful, of course. So switching gears here a little bit. So you've been at some pretty crazy high growth companies. So you've seen you know what makes great data teams and curious again, from your perspective, what are those traits? Like? What are what are the things that you look for when you're, you know, thinking about joining a new data team or a new company? And what are some things that you've seen maybe that that aren't so great about about data teams in the way that they work? So big thing
I look for is how engaged is the data team with the entire rest of the company? Right? Or rather, how much is the entire company engaged with the analytics products? Right, I think one thing that I saw at Flexport, that you probably also saw, which was that data really powered everything, we had a huge thirst for all of the assets that we were creating. Sorry, my cat is just being she's thrown a fit right now. He always does.
I can't bring my dog to the office, otherwise, you'd be going nuts in the background.
She's always she always has to be the star. So yeah, I don't know what she's going to try to do now. So, you know, one thing that was super cool at Flexport was that everybody was just, you know, they were always reaching out to us about, you know, we want more, we want more, we want more, and I think we had a really good reputation internally. There's a lot of trust. And and, uh, yeah, it really felt like the rest of that company, like really cared about, really cared about us. And we're like, you're valuable. We want all of the things that you can offer us right. Now, I would also say that there were also some negatives to that, which was, you know, in a lot of ways, we were creating products that I think we're actually outside of the scope of analytics, right, we had a lot of workflow, workflow tooling, and things like that, that were kind of bridging the the product, right, bridging the actual tech behind the, the company. And I think that that's also pretty common to see to especially in some of these, like fast growing companies, where you basically have your analytics team working as like, prototype errs. And so we kind of have like a similar thing at ease. So one of the things I love to talk about, because I just think it was like a really interesting project to work on in 2019, I believe it was, it was we call it vape gate, right? All of a sudden, we realize that vapes are actually not very good for your lungs, I think, you know, not terribly surprising, but it was very surprising to a lot of people. And and so then we saw for night, we saw these, like crazy changes. So, for example, the city of Santa Cruz, basically, I was like, with, like, 24 hours notice they banned the sale of all vape in the city. And so we needed that, you know, that is way too quick of a turnaround to actually do something on the production engineering side, right? Like, how could you possibly modify your product for that, right. And the way that we had it is we had like one depot serviced a pretty large area, and that's where our menu would be based out of, and so we could either turn off vapes for that entire area. But unfortunately, that's like our, you know, that was our most profitable item, you know, our most popular item, or we could find some way to make it so that people within that city couldn't, couldn't get vapes. And so we ended up solving it on the analytic side, we were already doing, we were catching these, like back end events of somebody checking out their cart, and we're doing stuff with addresses already to make sure that they weren't in schools or government zone, you know, areas were not legally allowed to deliver to, I basically added a polygon forked our process, and said, Okay, if it lands in this polygon of the city of Santa Cruz, and there's a vape, in the in the cart, then we need to fire it off to customer service to figure out what the customer wants to do with the order. Right. So that I mean, there's a lot of like, you know, just trying to be scrappy try, you know, being able to do things very, very, very quickly in you know, kind of, I guess, yeah, kind of a scrappy way. So you see that a lot, which is I guess, you know, maybe not not good or bad, right? Because the whole point of an analytics team is to really scale your org quickly. So maybe it's not so bad. I didn't mind it, because I thought these were really exciting, fun projects to work on. And I frankly, love to do things like this. Again, it's the novelty, so I kind of look for that. But, you know, if you're if you're really trying to do just like, very standard analytics, like, then you probably don't want that. Right.
Yeah, can be a little chaotic for, for most, and you said something there that I want to dive into a little bit, which is like, you look for how engaged the businesses with the data team, you look for how engaged the data team is, like, what are what are some of the sort of ways that you measure that or how do you think about engagement is it you know, we have analytics on our analytics, right? We can see who's you know, refreshing, which which models the most or you know, dashboard consumption, or is it qualitative Is it you know, having, you know, close alignment with stakeholders having embedded analysts? Like, what are some of the ways that you think about engagement? And what are some ways that you think about improving it? If it's if it's not there?
Yeah. Um, so I don't know that I have any, like, fantastic answer for how to how to measure it, right? I think you can kind of start to approximate it by, you know, looking at how many people are, are spending time on your BI tool, right? One metric that that actually works pretty, pretty nicely that we would use for if we needed to, like decommission something is like, if something breaks, how long until somebody complains, right? Like, if it breaks, and nobody complains, like, that's a problem, right? Like, that means people aren't using it. And so like, you know, if you find a dashboard that's been broken for a month, or something like, you can probably just get rid of the dashboard, like people aren't using it, you know, a big one you see is like, when people have OKRs, that are based on metrics that you know, come from your analytics warehouse or whatever, that's a big one. That's, you know, that's less of like that bottom up, that you would see with like, using dashboards more of a top down, right. And so I think like, if, you know, if you're in a situation where you don't have like, a lot of engagement with analytics, like, I do think that this is something where leadership can kind of come in and say, Hey, I need all of you guys to be more data driven, right? So your OKRs need to like, you need to have a dashboard. Every every team has a dashboard for their okiya OKRs, or, you know, something like that, that kind of forces everybody back into the data. Yeah, I'm trying to think I haven't worked with product analytics too much. But I imagine that you can probably see some of this in product as well, like when you're, you know, if you come out with some new feature, and everybody hates it, and it just stays up there for a long time, like, you know, are you really listening to the data there? Right?
Yep. Yep. And it's probably a good time to prune it or Yeah, sunset a feature? If it's, if it's not getting used, right. What about like, you know, like, obviously, like, dashboard usage and consumption is one thing, but what about, you know, like problems with the actual, you know, closer to your world, like the data quality? Like, are there or, you know, maybe the the modeling or the definitions of data, like, what are some of the things that you've seen, that might be like, surprising in terms of, hey, people aren't using this dashboard? We think it's maybe just because it's not useful, or, you know, it's these metrics no longer apply? What are some other inputs on like, oh, well, they actually didn't understand the data that they're looking at, or there's something wrong with the data, like, what are some insights that you've found there throughout your career that led to actually improving engagement?
Yeah, yeah, confusing. Dashboards is a problem, right? Sometimes we're given so many tools, and then we start to make things so complicated. And then it gets to a point where you look at, like, when you look at a chart on the dashboard, you should immediately be able to understand what it is right there. Like, I, it can be frustrating to look at some of these like overcomplicated graphs, where it's like too many things going on, there's like bars, and there's lines, and there's dots, and there's like different colored bars and all of this, and it can be really useful, if you are using it to like, dive down and like really look at like certain particular things. But over overall, it's usually just like sensory overload, it's really hard for people to be engaged with something that's like, stressful, right? I think another thing that you can kind of do to truly work on that engagement is just increasing the data literacy of your company, right? I'm always surprised when I see, you know, course catalogs or whatever, when I, you know, back in college, like you see what you what you can graduate without taking. And I think that, in general, we do not do a great job of training people on you know, how to look at data, how to understand data, how to read it, you know, especially when we start to look at things that are like probabilistic outcomes, right? Like, you know, this is this particular thing, when you look at it and cross it by, you know, with this, this, this, this, this, this this, like, you know, how do we reason about that, versus something that isn't relativized? Right. So I think, yeah, just increasing like people's understanding of data, people's understanding of like, how you control for things, I think that's a big one I see is like, people not having enough controls, and like, their experiments or the way that they look at numbers more, it's like, you know, you're, you're assuming that this is the factor, but, you know, it could be any other number of factors because we're just looking at, we're looking at correlations. So much of the time, we're just looking at correlations and so making sure that we're drawing the right conclusions from what we're seeing.
And what's like, from from like a process perspective, right. So we talked a lot about the intersections between business and data teams, you talked about OKRs. And, you know, we talked about, you know, trying to make engagement higher between the teams, like, from a process perspective, like, how do you think about, like, how data in business teams should work together? Like, what's the right level of coordination that you've seen work?
Yeah, I would say, I don't think I've ever seen a business team and a data team work together early enough. It seems like it's always the the engagement with business and analytics is always just a little bit too late, you know, it needs to be, you know, before the, before the business starts to make a decision, or even before they outline the possible things that they're going to decide the choices that they have analytics should be involved to help inform, even narrowing it down to those choices, right? Yeah, it really should just be early and often like, I think that you just you can't separate analytics from business. And, you know, I'm a huge, one thing we've seen a lot in the past few years is this push for analytics teams to be more business driven, and it'd be more business focused. But I also think that we need to also help our business users kind of understand the nuts and bolts of what we're doing, right. It's the, it's great for them to under to be able to understand, like, this is how this chart works. And like, this is how we can reason with this. But I'm a huge proponent of having those conversations, you know, when somebody asks, oh, hey, I want this other data from this other data source, because I want to solve XYZ, you know, coming back and saying, okay, just so you understand, like, at a very high level, these are the technical steps that I'm going to, to come up with. And what I've found in my career, just, you know, just the past four years, or whatever, very short career so far, but I've already found, it just immediately pays dividends to make sure that like, as much as possible, you are on the same page, and you're doing this, like knowledge sharing, I've had a few stakeholders who would not call themselves technical people who have, you know, seen some data issue and been able to go, Hey, by the way, I just noticed this, I think this might be the problem, right? And that's extremely valuable, because they know the data, they're the ones who are looking at it all the time. I look at it every once in a while, but I have so much other data that I'm thinking about. And so for them to be able to be like, I think identified the problem was huge win for everybody. Right? Yeah.
And it's, it's one thing that that's always been surprising to me, too, is like, you have business teams and, you know, product teams that work together on new features and new functionality in the business, you know, you scope out, you know, with the business team, okay, here's the project, here's Okay, here's the requirements, here's the deliverables. And data is brought in, at the end to say, Okay, well, now we need these, you know, dashboards and you find out, you know, logics been written in the front end, or, you know, there's things that can't be tracked. And it's like, you would never do that to an engineering team. It's like bringing them in last second, after you've all agreed on the scope and the requirements. And it still feels like we're, you know, working harder and harder to get data teams that have that seat at the table at the beginning at the onset of the project. So no, I totally agree.
All the time, too, with when I did do some product analytics, I would be brought in too late. And they would ask me, Hey, I have this question that we're trying to figure out, like, How are people using this new feature? Or are people using this new feature? Are they clicking on this? Like, after after we added this? Are they clicking on this more or less or whatever, but they're, in a lot of these situations, there wasn't any tracking for it. And I had to have that conversation of like, I maybe can try to approximate it in this way, shape, you know, maybe maybe, maybe it's going to be a terrible approximation, or even in situations where I'm like, There's nothing I can do for you. I don't have the data. It's just it's not there. It wasn't, nothing was logged, like, there's I can't synthesize data, like I can't just like make it appear out of nowhere. If you guys didn't start collecting this, like, way before you made the change, then it's totally meaningless, right? Or like, especially when you want to see if a change, like changes people's behaviors, and you don't track it before you change it. And then you're like, I can't I can't do this. I don't have the a priori, right. Like I don't have the before stuff. So shrug emoji and I'm like, I can't help you.
Worst case scenario, for sure. Yeah. And then you're spending another cycle, basically refactoring and, and in trying to get everyone on the same page and collect the data that you need. So yeah, that's a cautionary tale that I've learned to
just happen all the time. It's just a thing that happened. And, and you know, as much as we can get ahead of it, we, you know, it's better if we can, but I think it's probably just something that's going to happen.
Totally cool. Alright, so we just came back from the snowflake conference, which was pretty amazing. So good to see everybody after couple of years of COVID. And seeing everybody just in the zoom and zoom box, one of the sort of, like, topics that I've been seeing flying around is like, the modern data stack is dead, like, you know, all this. All this sort of yellow journalism around, you know, the data stack, modern data stack, and what's, you know, what's wrong with it? I'm curious to get your perspective on like, what problems? Do you think the modern data stack is solved? Like, what's been the biggest benefit of it? And maybe what problems remain with the modern data stack? And how do we think about fixing those things? Yeah.
Yeah, it's, it's definitely been interesting, because I feel like it was maybe only six months ago or something when somebody all of a sudden is like, the modern data stack. And then now it's like, every single conversation is like the modern data stack, the modern data stack. And I had a moment a while back was like, what is the modern data stack? Like, this is my like, this is like my job. But what are they talking about? Somebody just like, came up with this buzzword? Yeah, exactly. I'm like, wow, this is embarrassing. But no, I think I think there's some huge problems that the modern modern data stack is solving. And I don't think it's dead. But also maybe don't quote me on that. Because, you know, maybe it is I don't know. But you know, one of the things that, that I that we talk a lot about in data is that the personnel is just the hardest part, right? You know, all of these technical challenges are difficult. The organizational challenges are difficult, but none of those are unsurmountable. What is is not being able to find enough people, right, and not being able to find the right people for your organization. And, you know, probably anybody listening into this is like, aware of like, how crazy hard it is to like, hire the right people, for your data team. And so I think that that's one of the big problems that the modern data stack is solving is that there's just frankly, not enough data engineers, there's not enough data scientists, there's not enough data analysts, there's not enough people who want to do data modeling, right? Like, there's this, you know, huge need to make the people that we have way more efficient. And also, to offload the like, less interesting work, right? The repetitive stuff, the stuff that's just like not a huge value add. And that's what we see in the modern data stack, right? For example, like five Tran five Tran basically replaces like a few data engineers, which is great, because we can't hire them, we want to, but we can't hire them. And it takes so much time to interview candidates. And you know, and then they start and you start the search, they start in like eight months. Great. You have one that's gonna start in eight months, and then you need to have but you need to hire six, right? So I think that's where you like, see so much value five Tran, you plug it in, it takes 510 minutes to add any new data source, right? Yeah, and you see that with, like, for example, like a muta, or like, alter, like some of those, like doing your access controls. Again, it's not the most fun thing in the whole entire world. And it automating it also makes it a lot safer, because it helps you manage the out of your system thing, right, like in system but like out of your your data warehouse system. So yeah, I think a lot of it is is really just like being able to do as much as humanly possible with knowing that you just don't have enough people. And that's just what it is.
And just to wrap up here, like what are some of maybe like, the biggest problem that you think the modern day SEC hasn't solved, or maybe it's created for companies.
So this is one thing I'm highly critical of is, you know, every time we're doing things in a user interface, instead of doing things in code, we're making it very hard to migrate away. And so we have that we, we save that time that it would we would need in order to hire an entire team in order to do all of these things. But if we decide tomorrow, that we want to move to a different tool, we basically have to start from scratch, right? You can't, you know, there, there may be some ways that you can kind of, you know, go look at your logs and like extract some things using API's and kind of parse it and kind of be really smart about it. But yeah, that's, I mean, that's one reason I'm not a huge proponent of like things like Tableau prep, for example, like, I think ever, you know, we should have things as code as much as possible, right? Like, you think back to like, what was it two years ago, where it was like everything is code like configuration is code, like infrastructures, code security as code like everything as code, and now we're kind of getting away from that with those like no code, low code solutions, and unlike TerraForm etc. It's just so hard to move away from Totally. But I don't know that we want to solve that, right. I mean, I think I also think like, if your company is like building these things, then it's kind of convenient if it's hard to move away from so
yeah, there's a balance between ease of use and convenience and lock in. Right, exactly. No, super interesting. Cool. Well, we want to let you run here. But before we do, it's been amazing. Where can people get in touch with you? Where can people find more of their your work? Maybe read your book?
Yes. So I am, I'm on LinkedIn. I have my personal website super easy. It's just Jessica larson.com, spelled in the very conventional way. And the more popular spelling of both of those names, the best spelling Exactly, yeah. And then my book is available on Amazon, but it's also available on like a bunch of non Amazon sources. So if you know target Barnes and Noble, etc.
Awesome. Well, Jessica, thanks so much for coming on.