Kirk Marple - Data Leadership Lessons Transcript

4:59PM Oct 13, 2021

Speakers:

Kirk Marple

Anthony Algmin

Keywords:

data

metadata

build

unstructured data

organization

graph

images

people

company

platform

customers

api

media

hierarchy

storage

application

area

business

tagging

talked

Welcome to the data leadership lessons podcast. I'm your host, Anthony J. Algmin . Data is everywhere in our businesses and it takes leadership to make the most of it. We bring you the people stories and lessons to help you become a data leader. Today we welcome Kirk Marple. Kirk is a customer focused technology leader with over 25 years of experience. He's the CEO and founder of unstructured data, a company that is building the industry's leading unstructured data warehouse for automating data preparation via metadata enrichment, integrated compute, and graph based search. Kirk, welcome to the show.

Oh, thanks for having me. Really excited to have this discussion today.

So like we do with all of our first time guests, why don't you just take a couple minutes and give us a little bit more in depth career story and how you ended up doing what it is that you're doing now?

Yeah, I mean, it I can't say it was planned out. But I ended up I mean, a career software engineer after college, working for a few smaller companies, but kind of doing everything from image processing to video streaming, and ended up at Microsoft after my master's degree doing some really interesting work in Microsoft Research in 3d virtual worlds. And after that, decided to start a company in the video transcoding and media management space, really cut my teeth on metadata management, file based workflows. And we sold to I mean, all your kind of favorite broadcasters and studios as customers and ended up selling our company after being bootstrapped. The last five years after selling the company, I've been focused on kind of applying all the learnings that from the media entertainment world to we just call it industry, everything from public works to chemical companies, manufacturing companies, things like that. And what I feel is it's an underserved area of the market. There's a lot of data management, or unstructured data management that maybe was commonplace back in the day and media entertainment, but just isn't touched. And so that's what we're focused on.

Great. And so and what you're doing with unstructured data now, and like you mentioned, this goes across all of the different industries. And I think it's actually interesting that you came from a place where you were dealing with a lot of media that's that's an analogy, actually, I use a lot of the time when I'm trying to explain what metadata is to somebody who's just not familiar with it. We talked about pictures, we talked about videos, we talked about how, you know, the the structuring of metadata will allow you to then navigate and find what you're looking for, understand the context in which this thing lives. And and I am really interested to learn more about what you're doing on a couple of these areas, because like you mentioned, I also think metadata is an underserved market, I think that there is a we are still in many ways at our infancy, I can tell you, I did, I've done you know, over a decade of consulting and the data, the broader data management space, right? And and I can't tell you how many times I've seen organizations try to solve all the metadata, like, like, let's create a metadata repository and populate it. And that is a literally never ending effort until well, actually, it is an ending effort, because people work really hard on it, realize that accomplishes nothing, and then give up on it. So yep, have you experienced anything different? Please tell me you upset you solve this? Well, if you just do like

in your world, when you think about video editors, and like that, I mean, tracking metadata, I mean, there's conferences about this there is I mean, it's a really, there's a lot of focus put on this whole area of and, and even to define, like, what metadata for anybody that's listening is there's technical metadata that's inside the file. Like it's your GPS location that your phone took or captured, it's what software update was running on your iPhone or things like that. It's, that's all the data that we can find in the file, and everything from the I mean, the bit rate of your audio and all that kind of stuff. But then there's that semantic metadata of like, Okay, well, what, what is about that file? And that's where I think I mean, it's, it's something I've been dealing with for so many years, where, in a studio or in a broadcast environment, I mean, partly the technical information is interesting, but then it's that taxonomy, it's almost a librarian problem of how do you organize the data and think of those taxonomies and tagging and all those kind of metaphors. And honestly, it's, it's a really interesting space, and it's something I've always kind of fell into a little bit and there's, it does pull from kind of digital library pieces, it pulls from, I mean, data, pure data management, and all those kind of things. And that's what we're trying to put into this space. And just don't mean, it's, it's different enough, but there's also a lot of learnings that we can bring to the industry, this

space, I'm gonna have a lot more questions about metadata and well first, I really do like this notion of metadata isn't just like talking about attributes of a picture for example, like using this analogy and the media space. It is to your point this this library and function there's hierarchies of information that are useful for navigating at different levels of learning. angularity. So people that are trying to enter from a variety of different directions and find what they're looking for. And then ultimately use what they found that needs a bunch of different perspectives. And I think that thinking about it like a librarian and think about it as like, you may know the name or you may not know the name, but you know, the topic or you may not, depending on what it is, you need to kind of find that and start to carve that out. But then you also need and one thing we haven't talked about yet is the notion of data quality, you need to understand what this data of any sort can really effectively be used for. And does it fit what I want to use it for myself, right? And so that there's, there's an implied necessity of bringing data quality more towards that metadata equation, because without that, your metadata is missing something.

Yeah, well, there's even just the storage management piece gets overlooked a lot. I mean, with data lakes, and object storage, and it, it's somewhat of a glue, like I cut my teeth in, like Nazism and sans, and like storage and things like that. And I mean, there's not a ton of difference between that model, it gives you a hierarchical file system, it's distributed, you deal with I mean, how the I mean performance and scalability issues and things like that. But at the end of the day, it's not storing the metadata directly. And that's kind of now where I mean data catalogs have come into play and other kind of layers on top of that. But it really, I mean, just putting data into a data lake or onto an A into an s3 bucket doesn't give you a lot of that taxonomy. I mean, you can maybe name things interestingly, you can put them in folders, but they aren't searchable. They're not explorable. And they're not organized at all, without having some other kind of platform on top of that. And that's really what we're trying to do is provide a layer on top of raw storage that lets you organize your data or provides both. Essentially, it's a bit of a search engine, partly also a tagging engine, it's a way to plug compute in and have it very close to the data. I mean, it kind of covers all those all those ranges. So

So is it is it in layman's terms? Would it be kind of a more robust and sophisticated like tagging and categorization mechanism like an ontological thing? Or is it a taxonomy of hierarchies or multiple views into that? And that's what allows you to do that kind of search functionality? Is that Is that how it manifests?

It can be a little of both. I mean, I think like, if you think of a professional photographer, like Adobe Lightroom, and things, that organization concept is natural, like, I mean, they're ographers, think about tagging, they think about organization, we're dealing with metadata, other images every day. That's, I mean, there's a lot of overlap with that in what we're doing. But we're doing it for I mean, 3d, for CAD drawings, for documents for all of it, like across every type of media that you might come across, in I mean, a manufacturing plant is capturing, I mean, tons of different types of media and at scale. So we kind of look at it as, I mean, sure, if you have 100 photos, Google Photos, iPhoto, Lightroom, they're gonna work. I mean, they're great tools, and very easy to use. But once you kind of go to that scale of how do I deal with 100,000 images? How do I deal with images taken over five years at my plant, and even video or 3d two and put that out in the mix? And then really, for us, it's I mean, we kind of always go back to the easy button, how do we provide 80% of functionality that people need out of the box with no human touch, and then just by putting data into us and ingesting it, and we'll do all the rest of the work, we provide you that, that kind of metadata exploration structure. And, and I mean, we're never going to be everything to everyone, but we believe we can sort of be a catalyst for an organization and then we've talked to companies that I mean, literally, they're trying to build things like this of a power company trying to look at tags on a power pole, like through OCR, like object recognition, like that, sure, they may want to build something custom for their needs, but they don't have to do all this data management work themselves, like they shouldn't have to and we were finding they were trying to build everything soup to nuts all the way down the stack. And that's the kind of place we believe you're just build on top of us are kind of plugged into us and we can just accelerate that process of kind

of what we're about that makes a lot of sense because it's like thinking about something like the optical recognition or if we've got you know, it could be anything you could be looking at, you know, different kinds of pattern recognition on machine learning algorithms really of any kind whether they're against images are easy to imagine but they're not the only thing but any of that stuff that's where like competitive advantages for any of your customers, right? That's where the core business is for them. That's the hard part. That's what they work on every day. It's your you're doing what I like to think of almost like as data management plumbing, like you're just giving them a place where if if you don't do this well enough, it is a competitive disadvantage. But if you do this very well, it's table stakes like it will not make you successful, but not having it will make you unsuccessful. And it's that kind of feature that that I think is really important thing for those for those leaders out there that are listening to this and are thinking about how are we working with data? And how are we propelling our businesses by leveraging that data and driving better activities from it, think about that a lot. Think about what are the things that you're doing poorly, that even if you did better, you would still not be successful necessarily, because those are areas you don't want to spend a lot of time on, you just need to be able to check that box, get the functionality in place, and move on to the thing that you can do uniquely well, and that is in this case, like just understanding your background and understanding what your business is about, I can promise you, your business is going to do this better than most of the potential clients that you have, can do it on their own, because they just can't devote the kind of energy to this layering to this metadata. This is not easy stuff, either. But anybody going off and just trying to do it as an ad hoc process as part of a bigger thing, they are not going to have the level that they're going to need to be as successful as possible. Would you agree with all that?

Very much, and I think it'd be we see ourselves we compete more against DIY, then then other real vendors, because I mean, I believe we're doing something unique. I mean, there's obviously parts of it that overlap with other vendors, but it's the most we find are people, cobbling together solutions, we were talking to a university who was literally trying to hand build a solution, they have done it internally, from a bunch of different tools, but it gives them a piece of what they want. But if you look at the end result of what they're trying to get you from, I mean, kind of visual inspection across the university, we're seeing that across a lot of different commercial, real estate, universities, all these kinds of things. And, to me, it's one of those cases where they shouldn't have to be building us, like a university should not have to be building out, like a visual database or something like that, like it's they should use or not a builder. And, and we're, that's really what we want to do is we want to let them do their job. I mean, they're really smart people there, they know, they see a problem, they're trying to solve it. But it's really I mean, they should just be able to buy something, I mean or and and continue to do that and focus their efforts in other areas. And that's really what we're about is I mean, looking at that kind of I mean, it is just mundane kind of picks and shovels kind of workflows. But I think it then accelerates people and lets them focus on the hard problems, like building a bespoke computer vision algorithm, or wrapping an application around API's that we provide or something like that. That's really what we're focused on.

You know, and in one way, you just gave a great advertisement for why this podcast exists. Because I you know, if I put on my hat of a vain, you know, business side manager who's trying to build something, right, you only have so much time in the day and you, it's so exhausting to try to find the right answers. To complete that technology stack that you're inclined to just try to go and build it because you can't deal with one more, oh, let's have a sales meeting and walk through the deck and talk about that stuff and actually need three meetings before we can ever actually see the product. And then when we see the product, we realize, Oh wait, that would be great, except for 25 reasons why this would never fit us. And now we've wasted three hours on investigation, we should have just gone to build something. So now in the context of a podcast. Now we can say, Hey, this is an interesting topic. Let's understand this. And now as leaders within our organizations, we can start to understand, hey, there is this whole market that many of us, including their host, may not have realized existed. And so then now we can start to think about well, we have a build versus buy question and this little part of our entire value stream. And it's something that we can now be more knowledgeable about as we go into solving that piece for us. And not everybody's going to need something super sophisticated, especially when you're in your early days. But you will quickly realize that this can unlock so much of your energy towards the things that really matter. And this, among other things, like there's a lot of things in those technology stacks that you're going to want to consider as a leader in your organization. But this is definitely one that has often in my opinion been overlooked.

And I think it's I mean, if we can see is the kind of low code no code world of application development. And I mean, we offer a graph qL API as part of the platform. So I mean, literally, you could build a custom application for your team, for your organization just by putting data into our system, using an API cobbling together, easy kind of application for your line of business. And that's what we kind of look at. I mean, that kind of citizen data analyst at a corporation or somebody that maybe I mean, they're technical to their specific problems, but they don't want to have to or they don't have time or they don't have have the skill set to build everything. And so we're just trying to fit in the middle of that not be totally vertical all the way up the stack. And I mean, we are gonna we're gonna do a couple sort of vertical examples we know, to be vertical to every every industry.

Yeah. And I think that too, especially as you get into large organizations, a lot of the leadership may have the technical chops to have built it at one point in their careers and haven't been able to touch code and several years. And, you know, I can certainly relate to that as well. It's like, I could understand how it would build that. But I don't have the two years to dedicate to building that one, I've got to get so much more accomplished. So getting that leverage from people who are experts in that area definitely is something that that people are going to want to do. So I want to come back to because you mentioned this, and I knew this was part of the introduction as well. But we talked about graph based search. And I think knowledge graphs and and the understanding of how to work with them is an area that a lot of organizations are still catching up to, and some are right there. But can you talk a little bit about graph based search, why it's important in this area and what you're doing to help drive a product that that can leverage it?

Yeah, yeah. I mean, I think it's a it's a really interesting space. I mean, if you look at the progression from sequel to no SQL and graph database and things like that, I mean, I really have seen and I worked in it in kind of a related area, but more in consumer side around podcast discovery. And I was looking into kind of just how it relates to, I mean, information could people places and things that are spoken on a podcast, and so I started doing some work on of, how do you how can you create a graph based on that knowledge extracted from media? And it started then thinking about, well, wow, this applies to this kind of limited domain, how would it apply to industry, and it's, it's somewhat of an indexing scheme. So I mean, you could build all of this in just a normal database, but you want to add a field, like another column, or something like that, you have to deal with schema migration, if at all, but what I love about graphs is, it's very dynamic, you can invent a new node type, you can invent a new edge at runtime, I mean, in production almost. And but then also the ability to correlate all that data, like we may have an example we talked about a lot in our sandwich sales and investors is give a tag that says, hey, here's a piece of equipment, I mean, ABC 123, I found it in a document. I found it visually in an image with OCR, I spoke that term in a voice memo. And maybe I manually tagged it again, something, we can now create edges between all those different points in the media. And that tag, though very hard to represent where that tag didn't exist yesterday. So I just extracted it. So we go through an entity extraction and entity enrichment phase. And we basically just keep building out the graph with more data and more and more relationships, totally dynamically. And that's so powerful, that then I can sort of grab that tag and pull on it, and see what it's connected to, and start to look at relationships. That is the classic graph benefit. It's always pitched. But it once you actually start getting more and more data in there, it really gets interesting to explore. And what we've done is we call it the triad of time geospatial and metadata. And we basically represent the graph in those axes, where we can look at and see things on a map, what's nearby, what's within a geo fence, we can do a time range. And then we have the whole graph of metadata and tags. And all those are interrelated. And so you can kind of get to that needle in a haystack in your data, by auto indexing, essentially, is what we do your data, and then be able to, I mean, update the organization via your own tagging, and then drive other applications with it, you could build your own tooling around that. So the graph is really the, the core heart of our knowledge representation. And it is a hybrid that how we built it, it's a it's a bit of a search index. It's, there is a no SQL kind of document store in there for JSON, storage of metadata. And then there's the graph which kind of acts as our index. And then our graph qL API kind of squeezed all that together into into one API.

Yeah, it's it's amazing to me to see how graphs are really starting to take hold in a number of concepts. And I think it's, you know, I've long talked about this notion of highly aligned loosely coupled architectures. And and I mean, I come from the data space, I can't I built data warehouses for a long time. And, you know, they're powerful, and they're still very viable and relevant for especially operational information, business intelligence, things that don't change very rapidly, but there's an embedding of the schema and the data and it's all kind of tightly bound. And what I love about the graph based, you know, evolution is that it allows those core nuggets Of Truth, right that that benefit from additional relationship context to benefit from it without added friction in a data warehouse, it's a lot of added friction to add more context. In graphs, like you said, it's instant, you can literally do it at runtime. And that's, it's, it's awesome to see, because I think that's where the future is, is and I'm still stymied by many of the metadata programs in the enterprise today, and whether we're talking data catalogs, or we're talking, you know, kind of other metadata repository types of things, they're often still reliant on relational databases at the back end, and it's an it you can deal with it. But you it's not, it just feels like you're going back in time compared to what the Knowledge Graph world would allow us to do.

And I built I mean, my whole company was all built on SQL Server. I mean, I'm, I have a dotnet kind of background and stuff. And so I mean, I come from that world, but really seeing the flexibility with kind of document stores and graphs. And like, we have an example, we added a feature to the UI called device type. And so it's, it's, I mean, what type of was a mobile device? Was it a drone? Was it a robot? How was this image or video captured? So it's a new field we invented, so we didn't have to, like, create a new column and do a schema migration or whatever, it's essentially we added this enumerated type, we infer that and did entity extraction on anything that was ingested. And it's just basically goes into our JSON document, and we hoisted up to the search index, it's in the graph. And we can just like invent new organizational methods almost at runtime. We don't feel like push code, necessarily, for some of these things mean to the UI we do. But it's, that's what I love about the I mean, the flexibility of it is where we can make I mean, it kind of we see start to see patterns and go, Wow, this would actually be a great feature in the application. Our developers, I mean, can just add these things much more dynamically.

Hmm. Yeah. So I want to go back to the beginning, in a way, because your unstructured data is is known for building this unstructured data warehouse, which almost sounds paradoxical. Right? What do you mean, exactly? By unstructured data warehouse?

Yeah, that's a great question. And I think it's, it's a way to, I mean, explore and organize, in reverse order to organize and explore data. And but there's also storage management at the heart of this. So unstructured data to me, and to our company is everything, from images, to audio, to video to 3d, it's anything you can basically put on disk. And, and it's the ability to then correlate all of those different data types together with structured data, as well is another goal of our platform. And so it's more than just a file system. It's more than just a data lake. It has elements of the data catalog in terms of that we have metadata management, we have lineage tracking, we actually have somewhat of a version history as you track metadata. We know where files came from. And we know the artifacts we've generated, like thumbnails and preview versions. So it's a little of a mix of things. So it is it is a new area. There aren't other unstructured data warehouses out there. But it's the closest term, we could think of what what fit, it's about taking raw media type files, and canonicalize them into some structure that is represented necessarily in a table, but it's representative and graph. And that is the entity in Richmond, and then the extraction phase, and then providing a method for exploring that data. And, but we also see it as a top down approach where we have to build a good tool. So we have a product called the unstuck lens, which is our front end application. That gives you that visual exploration and visual kind of inspection process, it is more for the line of business user. So we do we're not just a platform, we do see that as a growth kind of land expand model where people can start just dumping data into us connect their s3 bucket, use our app, but as they want to start building applications around our ecosystem, they can just talk or API, or they can. And it's kind of a playbook I used in the media space where I mean, we had a broadcaster that started to use one at one of our systems 2345. And then they said, Look, we don't really care about your UI anymore. Do you have a REST API to talk to? And so they have to using us headless and just driving us through their workflow system. And so that's kind of the model that I'm looking at is people get a taste of it, start using it. And then really, just to be diversified back into the organization.

Oh, yeah. I mean, I think especially in large organizations, I just there's so many applications that are swimming around, like you need to have the UI so that it's an easy Access Point. But you need the power of API integration on the back end to really integrate deeply, if you are going to be fundamental to the most important data workloads that an organization has, you want to be right alongside that as close as you can be all the time. And to that, you'd need an API connection point. And that I think, is to be when it's up to me, I won't even talk to organizations that don't have this at this point. Because even if I don't want to use it, in order, I just need that flexibility. And at some point down the road or in some application may need just a little nugget. And I want people using my application in my business not, I don't want to have to spend energy selling another tool to them to use to better use my application. That is an inefficient process as well.

Yep. And we're I mean, we were built API first. And I mean, we're built platform first. But I also see us and really were a design led company, and we have a great head of design that is really putting together I mean, something we believe is unique, I mean from a data exploration tool. So I'm, I really am driving us top down and bottom up, where we have to have a really good platform to support a really good UI. And then really people I mean, customers can pick and choose how they want to use us. But I think from it's that kind of 8020 rule where 80%, they might just get in the door use us as a line of business tool, but the last 20%, where they need customization, we're already ready for them. We have a plug in model, we have an API. And I mean can get webhooks out when things change, stuff like that. So we're made to be integratable.

There's a question we'll circle back and then it was in my mind earlier, and I never got to it. And and it's it. Hopefully, I don't alienate my less technical listeners by by asking this, but is there and I imagined you can work with anything, because you're going to be designed to work with anything, but I think a lot about like object storage for a data lake especially. And if I'm thinking just like straight up s3 or object storage and other cloud providers, is there any reason or no reason or a good reason to store that data with some sort of explicit hierarchy? Or is it better in general, like this is a very high level? general question. Yeah, is it better to just go completely without a hierarchy, even like in an s3, it's basically bucket with labels. It's a fake hierarchy. But there's still some, it's almost just a throwback to those days of the Nazism, and then the file folder structures and things that are comforting to us, they're probably not necessarily technical, any good reason, technically anymore. But like, what's easier to work with, when you start to think at scale, when you might be looking at a tool, like what unstruck data has,

I mean, once it gets into us, we don't necessarily care about the hierarchy. So we, we don't even use the hierarchy. Today, we're just using the technical metadata in the files. We are tracking the original location. So everything, we have a concept of a site. And so we track what sites files came from as a lineage. So we know, this image came from this s3 bucket. So that's kind of the data catalog, kind of asked kind of model of it. But But yeah, I mean, it's truly we don't, I mean, you can have everything fine. But then you get into I mean, honestly, then it becomes more for backup restore, I mean, if you are partitioning your data by month, year, username, customer name, something like that some key essentially as your partitioning scheme, in the older file naming structure, it could just save your butt later. Just if you need to go back and be like, Oh, I need to, I mean, or you're just trying to delete a user's data or customers data ways, delete a folder at a time and find a million images across a file system. So it's a balance, and I mean, we do we partition. So we actually allocate storage per customer, we spin up a storage account, that is like a cache for our customers, each customer has separate storage on on Azure today. But we can pull from GCP, Azure AWS. But internally, we're actually then partitioning by asset ID. So we have a good that basically wraps together like, Hey, here's my master, here's my thumbnail, here's my, my preview version of videos, all that kind of stuff. So we do have a bit of a light hierarchy. But it's honestly just for ease of deletion. Because we get to delete that asset, just delete that folder, and then just get rid of everything beneath it. So it's not a requirement. It's more of just a nice.

That makes sense. Is there a Is there a minimum viable amount? Like is there a point where you're like, once you hit X level as a customer, or potential customer, now you're really going to likely need help with this kind of area? Do you see any patterns or any rules of thumb for the folks out there that might be you know, potential customers?

Yeah. And once you get into the 1000s range and above, I mean, we have and I mean, customers are talking about like, 100 terabytes a year of content, or millions of images, I mean, and so really, for us, we kind of say 10,000 is kind of the look maybe around the low bar. But if you're starting to manage in that range of like a number of 1000 images, or videos or whatever, you're probably going to need a bit more of a professional tool. And it's sure you could probably just do this in Google Photos or whatever, if you just had 100, or a couple 100 images that you were trying to figure out, because, I mean, you can see where everything is on a map, you could do a little bit of light tagging. But if you're really getting into data exploration, it's like the difference. Why do you need a data warehouse versus a Google Sheet? I mean, it's, it's kind of that model, if there's a tipping point, yeah. And for us, we're focused on scale, it's a serverless architecture, it's all event driven. I mean, it's built to scale, and have millions of media types flowing through at any one time and, and bursts scale out. So and that's where the graph gets really important, where it gives us the ability to, I mean, have good fast search and organization and things like that. And we are I mean, we're on Azure, today, we kind of are coming out of the gate, leveraging a lot of the Azure managed services, but the technology we are looking at, I mean, potentially, to make it more portable and future to unity, because GCP, maybe just Kubernetes, native things like that as the need is there. But it's and even use other databases. I mean, we're plug we could be pluggable on the database front, too, if we wanted to plug in Neo for j or Tiger graph, or something like that, that sort of work. I mean, you got to start somewhere. So we can see, like as we as we move forward, there's potential to kind of make this more sort of refactoring, portable. So do your clients, are you providing this as a service? Or do is there like if I had a lockdown cloud environment and wanted to apply this? Do you offer that as part of the platform, or is this strictly software as a service type of initial rollout of software as a service, so but we do partition the data, and then it's a consumption based billing plan, where they pay by X number of terabytes under under storage, under management, we are looking at becoming going on Azure Marketplace. So you can basically run this in your own resource group and your own Azure subscription. So we're looking at a bunch of different options like that. And I do anticipate, if I look 1224 months down the road, most likely we're gonna have to be portable to other platforms. And we we've heard about, I mean, totally air gapped systems, or just running this all on a Kubernetes cluster. So we I mean, there's a little bit of refactoring we'll probably need to do to get there. But it's, I mean, we do see like you, we got to get out of the gate with something. And that is, is a SaaS product right now.

Now that that makes a lot of sense. And especially in the in the relatively early days, where, if you're offering it as a SaaS product, you're able to iterate very quickly. And there's such a support overhead for more distributed applications and self installing, and that'll help like, as you get to that, that'll help you, you'll get further into the larger enterprises that have a lot more controls around such things. But I would agree like this gives you the best blend today of innovation and marketability and your sales cycles. And, like even the cost of sale and the resources, you know, in a early company, the cash flows and things like that are obviously very important. Um, you know, while we still have a couple minutes, and we've kind of just started to naturally segue towards this, we have a lot of other entrepreneurs or others that are, you're thinking about building companies, or are building companies in the space. Can you tell us a little bit about what your experience has been? And I imagine because I know you've gone out and you've raised funding, and you've done, you've done this before, you've sold businesses, I have to imagine because my experience with I have limited, to be honest, limited experience myself in the VC space or private equity space, and, you know, companies that are looking for funding. But it's a really difficult thing to get VC folks to understand what the heck you're talking about, like it's hard enough for us to understand what we're talking about sometimes, let alone get to a point where we can say hey, this is something you should invest in and help them see those market opportunities. Can you tell us some stories about what that has been like in this space? It's so hard to wrap Yes, technical people around.

It's it's a really interesting thing. So my, my first company I bootstrapped, so it was I'd come out of Microsoft, it was right after the crash in like 2000 stock market crash. So I basically, I mean, said, like, I want to do something new and invest in this new area. So it's very different world. I mean, and it was a long slog way longer than I thought it was gonna be, and then ended up selling it in 2012. This I've been working on the back end of this product for about four or five years, just kind of an idea of it. I was thinking about it more of a podcast discovery platform, like a knowledge layer around media, but media from consumer. So that was kind of my passion project for a while and so what I ended up doing was having a lot of interaction And technology that I just was able to just dump into the new company when we got funding. And the funding process is really I was CTO of a company. And it's a lot, it's who you know, I mean, it's a lot of just luck and the right contacts, and everything kind of came together and the horrible side of COVID in the last year, that actually caused some changes of the company. And we were going to do some different things with it with a product that we've been incubating. And so there was an ability to transition out of this last company, so I had to make a decision, like, go get another job, or follow my passion. And I did, it had the right context, the time to kick off the funder fundraising process. Surprisingly, I mean, the folks we've met, I mean, I've met I don't know, I mean, 50 or more, folks, right now, in the last six, I guess, we started the process in like November, I guess of last year, so it's getting close to eight to nine months in the process. They're savvy, I mean, the VCs that we've talked to, they're data savvy, they're ml savvy. And they get it. And so I mean, there was a handful that debated kind of vertical versus horizontal solutions, that we were coming out with a bit of both. I mean, I'd kind of say we're like, I mean, 6040, maybe platform versus app right now. And but there's a lot of companies coming out with maybe something sort of similar, but very vertical, like, I'm gonna do a solution for insurance companies, or I'm gonna do a solution for construction. And that's what I didn't want to do. I wanted to stay flexible. I'm sort of a platform guy by nature. But I knew that we need to be designed as well. And so once we, I mean, there's just a very small handful that say, like, 10% of the folks we talked to, you didn't get it, I mean, but really, the vast majority got it. And we're like, this is underserved and and understood that this is the market, we are, we want to sit side by side with the snowflakes of the world and say, Look, we can do this unstructured data site very well, we can then integrate over time and be the best of both worlds. But that's what our goal is, is to fit a niche. And we're seeing it now with I mean, with feature stories with just saw the, there was another data catalog, which just got released. And a lot of like great companies like five Tran, like, there's so much great data technology coming out that I feel like we fit right in that mix. And but our sales process is also there. It's a initially it's to the line of business users of somebody at a port somebody at a universities, they need a way to visually inspect and manage data. So they're their consumers. But in parallel, we're also looking at, there's a lot of bigger companies that are managing data at scale, that have much bigger things, they maybe don't care about the UI as much. They're just trying to integrate into a workflow. And that is something I'm aware, we're hiring a new head of go to market now. And so we're really planning for the neck, we're getting into that next phase of growth, and say, Look, how do we bifurcate and target people that user persona and the kind of bigger enterprise persona as well, because we really see I mean, the ability to I mean, to push on both sides of things in parallel.

Yeah. And Amy, it does make sense that it's not like technology, startups are a new idea, like he's had to get pretty sophisticated, and how they're able to interpret into their world, what you're talking about in your world, like there, they may not necessarily know, you know, some of the technical nitty gritty differences, but they can, they can put it in a box that allows them to take action on it. And actually, that's a pretty good lesson for a lot of us, because this is literally a kind of tool that depending on what you're trying to do in your organization, you may say, Hey, we have some unstructured data, and we have a whole lot of trouble making sense of it. And putting that in its box, your solution would allow your technology customers to do that. It's a very similar pattern. And it's understandable that we won't necessarily know all the ins and outs of doing it, but because that's why we want to purchase it, you know.

And I think I mean, a lot of it, our first goal is get data in, get data organized. And then it's what do you want to do with that data? And so the other flipside of that of, I want to be notified when there's anomaly. Tell me when things happen. That's the other part we're gonna bring out later this year. And honestly, the data collaboration is the only thing we haven't talked about is how do you how do teams and organizations like, discuss data, share data, comment on data, and that's another part of a big thing of what we're doing is we're actually in in collaboration with the data. So that I have a data set that I want to share with, I mean, at a port, it might be somebody's manager, but maybe it's in a another organization, it's maybe it's another data scientist, I want to share data to and have them build a model on it. But I want to integrate that, that lineage of comedy, I mean, it almost like like things with that whole that whole flow. So that's something we're very opinionated about that. We're we're burning that in to the

The product, if we were sitting in the same room that one would have gotten a high five, because that it's I've long set like Why? Why are we trying to approach metadata in our organizations and get to this crystal clear, like totally perfect, sanitized version of truth? I'm like, you know where the best context is, it's in the debate, it's in the nuance. It's like it I love the analogy of like, it's like back in the day when you were in school, and you would take a math test, and you would get to the right answer, but you didn't show your work, and you only got partial credit. That's what we're doing. And we're taking three times longer to try to get to that perfect answer, when really it's the work that's so important. Because sometimes that that, that sanitized answer doesn't take into account my particular need at when I'm going to look for an answer. And if you can see that collaboration, if you can see the debate, if you can see the disagreement have have that I'm the opinion from the opposing side, right, have that, have that understanding, and let that be okay. And I think that that's something that we have to in an organizational sense. And from a from a cultural sense, in our, in our organizations, we have to become okay with imperfection,

it is actually, it's funny to me, how many organizations out there claim or desire to be agile, or to be iterative and how they are building things, and yet refuse to acknowledge any imperfection. Because to me, that's impossible. You you it's implied, like, you have to be able to acknowledge, hey, this isn't perfect. And that's okay, because we're gonna make it better. But it's, it's also like, that's expect it's not only okay, but it's actually essential for this to get better. It makes total sense. And I think I mean, it's even just a visual exploration of data, shows interesting things. I mean, it's not all about writing algorithms and Python code to look for data, like I even just exploring it as we go through building the product and looking at sample data. And I was looking, I lived in Austin, for a while I was looking at some pictures that I take in, and visually seeing anomalies where I was like, wait a minute, that picture, I know that picture, that Jeep that data is actually the data quality is wrong, because it should have been over here from a GPS coordinate. And so there's things like that, that we gave the ability to, I mean, pare down the data by time, pare it down by geospatial filter it by I knew I was using the Samsung camera. So they only showed me those images that were on a Samsung phone, then display that and look at those results set and visually explore that data, and then see anomalies, and then oh, I want to then comment on that and be like, hey, you over there and that team, you bet you gave me this data, this doesn't actually meet what my expectations are. And you need a programmatic way to do that, but also a human loop way to do that. And so a platform that can look at all those approaches and and have that integrated collaboration is super key.

Yeah, I couldn't agree more. It's, it's, it's so cool to hear you talk about this, from your perspective of building this kind of technology. And with the knowledge graphs and with the, you know, the thought behind it, of bringing in the collaboration and recognizing that there is, especially in the unstructured data site, I mean, even the structured data site is bad enough, like technical metadata, I think we've covered that, okay. But I think about, like, even when you have structured information, sometimes that business side metadata gets jumbled up into unstructured areas within structured information. And being able to extract that and start to understand that that's where the real gold is, that's where the real value is in a lot of this stuff, and to be able to tackle that head on and provide the tooling because this isn't something that you can just go and do and say, Hey, we'll come in for an engagement, we'll give you all of the insight about your business from your data that you can't, it's it's about providing capabilities that an organization can leverage and understand their data and, and the environment, their assets that much better. And use them explore them in ways that they can't do today.

And it's it is like, I mean, just look at images. And then everybody's familiar with maybe a little bit of metadata on images. But there's so much diversity in unstructured data that I think once you get into it, I've kind of lived my almost my whole career doing all this kind of stuff of like looking at TIFF files, or looking at mp4 files, and there's so much more information in there that people aren't aware of. And we end the ability to, I mean, dig into there and go, I mean, does everybody realize there's an altitude captured in every photo, every picture you take on your iPhone, you can actually use that to infer, like was this taken higher in a building or and graded for that data. And so there's so much more information there that we're looking at, providing that that discoverability platform on top of and then I mean, really just letting people even. I'm really excited about auto ml, I mean the ability to build new algorithms, you know, Low code easy environment that I know we're never going to get to 100%. But does it let them filter their data? Does it let them organize their data? And say and even similarity search I mean, we're talking to partners about similarity search as a service. So we could say, hey, show me things like this. And to me that's I mean, if we could do that in the graph realm, and be like, Hey, what's tagged like this or what looks like this or what sounds like this. It's a perfect fit. And so we're we're going to look at integrating that later this year as well. So just that whole exploration and trying to get to that needle in haystack is really boundless. I mean it and we see there's so many opportunities there that I think it's a super fun area to be dealing with media, but in a much wider domain and media entertainment world, which is actually kind of a small market.

Yeah, I this is so exciting. And I can't wait to see where you take this organization, I hope that you'll come back and share some updates down the road because I think it'll be really cool to see how you know it's you're going to evolve you're going to have to pivot you're going to have to think about things and take opportunities but to understand this space and to learn more about it and how you've how you grow this company going forward, we really interested in following back up with you. So unfortunately, we're over time I was like we're gonna get 45 minutes we're gonna go along and that's okay. Because it was it's it's been an awesome conversation, but we're gonna have to cut it here. So Kirk, thank you so much for being on the show today.

Yeah, thank you. I really appreciate the opportunity. Great discussion.

Absolutely. So and thank you all for watching or listening today you'll find more info in the show notes. Please remember to follow data leadership lessons on YouTube or wherever you get your podcasts the enjoy the show, please rate and review and tell others about us. Learn more about data leadership with my book at data leadership book, calm and use promo code elven dl at the university Online Training Center for 20% off your first purchase. Stay safe during these unusual times and go make an impact.