Generative AI won’t be building Falcon 9s or new space shuttles just yet (wait a few years!). But can it help with all the work that goes into running an organization that builds the future?
According to Kendall Clark, CEO of Stardog, yes.
Generative AI that democratizes access to data and insight and knowledge speeds up organizations can help with launching space ships, or anything else. For NASA, a generative AI solution is apparently helping the team to do in days what used to take weeks.
Subscribe to the TechFirst podcast
AI summary: using generative AI to speed up the enterprise
The script is a conversation between John Koetsier and Kendall Clark, CEO of Stardog, during a technology podcast. The discussion revolves around the role of generative AI in speeding up complex processes within large organizations. Kendall Clark discusses how Stardog leverages generative AI for data management, differentiating their approach from other companies like Salesforce by focusing on on-premise and hybrid multi-cloud environments. He also explains their strategy to prevent hallucinations or errors in AI-generated responses. The conversation concludes with the significance of creativity in AI applications.
AI transcript: generative AI and institutional knowledge systems
John Koetsier: Can generative AI make rockets launch faster? Hello and welcome to Tech First. My name is John Koetsier. I’ve done a ton of TechFirsts on generative AI. It’s getting pretty good. OpenAI just announced it can see, listen, talk back. What about the enterprise? What about companies, big organizations?
Specifically, what about NASA? A generative AI solution is apparently helping NASA do what used to take weeks, take it down to days. Thanks, of course, to generative AI.
To dig in, we’re joined by Stardog CEO, Kendall Clark. Welcome, Kendall. Hi, John. Thanks for having me. Hey great name, Stardog. That’s an awesome name for a company, My first question is, so is NASA going to Mars next month , thanks to generative AI?
Kendall Clark: when NASA returns to the moon and goes to Mars for the first time is more a function of the U. S. Senate to be honest, and, and those budgets and that sort of thing, it’s obviously an expensive endeavor.
But, to answer your question, I think generative AI can help speed up a lot of complex things, largely around I think in this first wave, there’s going to be a bunch of waves, and I think most of the interest among the, Educated public, or people who are paying attention, right? It’s not everybody.
We should remind ourselves. The people who are paying attention largely because of its impact in what we call B2C space. Help me make a invitation for my children’s seventh birthday party and make it have dragons and fairies and something else. And I’ll pop these amazing photos.
We’ve all played with that and I love it. I’m addicted to it. Frankly, I can’t make a deck now without having some generative AI images so much so that. My employees now tease me about it. Stardog is a great brand partially because it, it lends itself so well to, I have a folder called astronaut dogs on my computer.
That’s got God knows how many variations I’m obsessed. In the enterprise. I think the first impact from generative AI will be in what we can call question answering. The movement from query writing to question answering. I would say that something like, and I’ll just make this up, 60 percent of the value, more than half of the value that enterprises get from technology information technology at all is in the area of answering questions of data.
And the dominant way that’s happened to date is there is some data in a database somewhere. There’s a lot of those, in fact. And someone who’s either very smart or someone who spent a lot of money on a product like Tableau, maybe, it’s got a BI tool, does something, manipulates an interface in some way.
And that results in a simple to very complex query, often SQL query, that SQL query goes to a system, gets executed, and answer comes back and value is achieved because the answer says false instead of true, or it says, 17, 000 instead of minus 17, 000, or it says false. John, instead of Kendall or whatever.
John Koetsier: Or more likely you discover, Oh, shoot. That was actually not the right question to ask the data. I should have asked this question. And then you send that question back to the data analyst. The data analyst goes, why did this idiot tell me the wrong question in the first place and then runs that query and there takes another level of time.
And then you go back and forth and finally dial in five or six levels down. You have what you think might actually be the real answer. You want it.
Kendall Clark: You’re a hundred percent on it. And what that means is there is space organizational friction. Now, people were employed because of this friction to make it work anyway, but there’s space between my intent of wanting to ask a question of the data through the process that gets translated into a query, either by a piece of software by some people or both typically, and then it gets executed and then it comes back to me and, oh, it wasn’t quite, and this is a little bit like, for the folks in your audience who are it minded or history of it minded, this is a little bit what it was like to write code in the sixties.
So Or even in the seventies, like you’ve got two chances to ask questions, to make changes to your code base per day, because the compiler and the tool chain and the nature of the languages and the slowness of the computers back then, you start your morning run. It took four hours to compile.
You went and found something else to do. You came back at noon day. Oh shit. I forgot a semi colon or something like that, right? Oh, run it again. The day is shot, but now we use these, super fast computers, high level languages. Dynamic compiler, stuff like python, I can ask a million questions.
So the iterative cycle has sped up tremendously for programmers for generative AI and the enterprise. The first big impact, my prediction is not even a prediction. What I think is we’re already seeing is the movement from query answering query writing, which is this process we’ve been talking about, which works, but it’s slow and it’s error prone and it’s got a lot of extra people in it.
There’s a lot of people between the data and my intent. That’s going to get all compressed. And so question answering is. The LLM takes the place of all that space, right? And it says, Oh, if you’re asking what is the assessed or what is the test readiness of this subcomponent of the humans, of the man’s space capsule returned to moon thing.
And, give me the full lineage and traceability of all the. thAt’s a complex question that’s literally rocket science, right? And the answer, now if people can, not just technical people, but something like everyone, knowledge workers. Can interact with the data. I will say directly from their experiential point of view, although there’s a lot of stuff in between, obviously a big stack, uh, that’s going to compress all those cycles, much like the rise of dynamic languages did for programming, uh, programmer productivity.
We’re going to see that in what you would call just a, we used to call general office worker, people, but now business analysts, knowledge, or people who need whose job depends on. Interrogating the data where, you know, now, according to normal technology, interrogating is a metaphor. It’s like a fancy word in the LLM era question answering era.
Interrogating is not a metaphor anymore. We’ve concretized the metaphor. We’ve made it literal. You’re literally like, what about this? What about this? What about this? Firing questions at the data by typing them out as you say, opening extend this to this kind of verbal thing. But, That’s a fun thing, but it’s not going to really make a difference to the questions you get back.
I don’t think, uh, but yeah, that’s what’s good. That’s the first thing we’re going to see. I think that’s the original nub of your question. So in that sense, it can help everything that NASA does, everything that all of our customers, everything that everyone’s who’s engaging this technology. Can help them make their jobs go faster, better, because effectively, I like to say, democratizes access to data.
John Koetsier: So it’s interesting because when I got the pitch for you to jump on the podcast, what I immediately thought of was enterprise knowledge management and there’ve been huge. Products and projects that enterprises have been going on for, I want to say decades, and I don’t even think that’s an exaggeration of enterprise knowledge management.
What do we know? How do we categorize what we know? How do we put it in a place where people can access it? How do we search it? How do we surface it? And as we’re pre chatting before we start recording saying, hey, that’s not our space. We’re not about that sort of static data, that static knowledge and documents and stuff like that.
We’re about data. Talk about the evolution of knowledge management or how you fit or contrast to it.
Kendall Clark: Yeah, it’s a fair question. I feel like I’m cynical about knowledge management, lots of but that’s the average view. I think, as we were talking before you, you said, you signaled some of that in your, in yourself.
I think the, let’s start with the fair thing to say. I think the fair thing to say is it has made sense at all points since, let’s say, I don’t know, 1970. For big companies to make some investments and really what we should call it as library science because that’s really what it is And I don’t mean that in a I guess I played it for a joke a little bit But I mean it I mean seriously librarians Serve a really super useful function in our society by organizing knowledge, right?
Like I like and I know you know, no one in your audience or who’s listening to us who’s under What would you say? 40? Certainly no one under 30 will know what these next words mean. But remember you used to be able to go to a library. It was a place right in the world. It’s not the mall and it had knowledge in it.
Primarily in the form of books, but other forms as well. And there were these people there who basically organize that knowledge and they sat there and waited all day for you to come in and ask a question. And they love to help you answer that question. That was a way that our society worked, right? It was this kind of a socialist vision, frankly, strictly speaking that knowledge was a common asset that we had created as a civilization.
We should all have equal access to it. You could even use to be able to call them on the telephone and say, I’m, I’m writing a story about. The winter migration of carrier pigeons in Finland, I don’t know, something, whatever. And they’d go, okay, there’s a book about that, come get it, they’d be very excited.
And then you would read it and you would know stuff. The web ruined that, or destroyed it, or changed it, altered it forever. But I still think there, but, and in some ways what’s happening with the web, what’s Google has done is find a way to make the machines do a lot of that work. And so with respect to documents, websites, web, it’s just a collection of documents.
After all, we mostly self serve. We go into the search bar. This is what people, everybody knows how to do. That’s replaced that call a librarian, but it’s the same. We’re satisfying the same human need. sO with respect to knowledge management, doing that for large bodies of information inside of a big enterprise, my hat’s off.
It’s on the side of the angels, right? I’m thinking, be cynical about that. What I think we can be cynical about is there were, there was always this obvious. Obvious to me, collision course between what we typically call data management, enterprise data management, ETL databases, data warehouses, and knowledge management.
Then those needed to as I like to say, be on a collision course, smash together, mutate into some new thing. And I think that’s been obviously, it’s been obviously, it’s obviously been the case for the last, say, three or four years that’s happened. LLM. In particular makes it, I think, I don’t think you can deny that anymore, that this question answering capability we were talking about previously, and then you can extend question answering to all the other traditional jobs to be done in data management, data modeling, data mapping, data quality discovery, metadata management, inference rules.
And then the tradition, and then the, the traditional realm of data science. That machine learning has eaten all those things. In a way that’s now really accessible to everyone, not just to Google. And that’s going to forever change the practice of data management, just like the web forever changed the practice of.
Library science and knowledge organization. That’s my non cynical take on what’s happening.
John Koetsier: It’s really interesting, actually, if you could somehow study and understand what percentage of human knowledge, let’s say, resided in dead trees. And then how that moved to documents, electrons in hard drives, and how that transition is happening as a greater and greater percentage of our operational knowledge.
Transitions to maybe more dynamic forms of knowledge that are in databases that are measuring processes that are ongoing, that are real time in a lot of senses. And that’s an interesting transition of what percentage of the world’s knowledge is in different places. Certainly the percentage in databases that is a live database, that is growing, that is that is measuring live activity.
And that you want to query because you want to know the status of that live activity is certainly growing and being able to access that easily is really impressive.
Kendall Clark: Okay. So this is a super interesting question. You forgot, or you didn’t mention, I should say a third important source, which is the knowledge that only resides.
In and between people. Yeah, exactly. That for a variety of reasons, people didn’t need to write down. They haven’t had time to write down or it’s just too fluid and it doesn’t fit in a database. The thing about, you don’t really put stuff into a database until it has this kind of a particular kind of ossification of form by which I mean, in a database of traditionally what we mean by database is a relational database, a particular data model.
It’s not the most agile, flexible thing. In fact, it’s rigid. Relational databases were typically intended for basically accounting data and accounting data, whatever else it is. It’s not dynamic and fluid and creative. The values may change. You have a. A status of the, of an account that changes, but the rules of structure, right?
The gap rules are pretty, it goes back to what? Seth 16th century, Florence, double entry accounting. A lot of that stuff is really old and well understood. Then you jump ahead to somebody like NASA, literally trying to do rocket science, get humans. Back and forth across the solar system and they’re learning new things every day.
They’re right on the cusp of the barrier between or the boundary, the border between knowledge and ignorance, on the other side of what we know is the black, scary, nothingness of ignorance. And if you’re trying to peck away at that, push that border out a little bit every day, you may need different techniques for data management.
What I think is interesting, and then you add to that the fact that while you say that the big historical trend is to go from books. So electronic form, all of the growth in the next 10 years forecasted for enterprise data is not in what we call structured data databases. It’s in semi structured and unstructured data.
So like this conversation 20 years ago. Was two guys talking right this conversation five years ago was a thing you could watch on YouTube this conversation Now or any time in the future, I push a button at the end You push a button out pops a transcript We stick that into some kind of knowledge platform and all the entities and everything we mentioned, you know I mentioned migratory patterns of birds and Finland and We mentioned libraries and you mentioned, like those accounting rules, those all pop up as nodes in a graph, the knowledge graph, right?
And connections between them and. Then, okay, this is a conversation of a different kind, but if we’re doing this for work, we, this might be work product, right? So that transition about where the knowledge is, what they call in the academia, the sociology of knowledge, the production of knowledge, thinking about knowledge, getting produced, like thinking about cars, getting produced.
It’s an industry and there are processes and there’s inputs and outputs, and you can measure it. And. There’s this whole big field since probably the eighties and academia and studying the output of knowledge, what we’re talking about. And when we were talking about knowledge management and data management, colliding and fusing into a new thing, taking a lot of those techniques, a lot of the new algorithmic insights and helping big companies.
Manage the data that they produce better. I will say, I’ll stop with this. I’ve been, I don’t want to give you a filibuster here. It’s interesting to think about company’s competitive landscape vis a vis one another, like you take two big global pharmaceutical companies. Maybe the most differentiated assets they have are their data sets.
Like you cannot, I mean you could maybe swap all the people and the people at one pharma can do the jobs and there’s some, winners and losers at the margins, but if you took the easiest way to destroy a company is a thought experiment is to take all their data and swap it with their nearest competitor.
Just so you come work on Monday and let’s say all of Glaxo GSK’s data belongs to, uh, Nova Nordisk and vice versa. Like you haven’t destroyed anything. Every bite’s preserved, right? But you just swapped like what happens the, they’re destroyed, right? So it’s difficult to, I think it’s difficult to overemphasize or exaggerate the importance of managing data and managing knowledge and yeah, generative AI, given that it produces text and all of this knowledge and data management ultimately more or less ends up in text.
Let’s say images or a form of text, right? Close enough. The applicability of these techniques to this area is pretty endless.
John Koetsier: Yeah, it’s an interesting space, and I just came back from Salesforce’s big Dreamforce conference in San Francisco, and they’ve added a ton of generative AI to Tableau, which you already mentioned, other products as well.
Their vision also is that you can query your data. Natural language, anyone can do it. Everyone is a data scientist, all that stuff. And so that’s super powerful. Of course, if you are going to buy Salesforce, I’m pretty sure you’re in for significant charges for each user. And significant challenges there, but it is a compelling vision that all the data.
That company produces is at your fingertips that you have control over, that you have access to, that you have permissions for, and you can query it, you can know what’s going on, you’re a sales rep, you’re a sales manager, you’re a product manager, you can instantly know all this stuff. How does your vision differentiate from that?
Kendall Clark: Well, at a high enough level, it isn’t. It’s, that’s exactly what we want to do. I think there are differences that matter. First off, I would say Stardogs really focused, we’re focused on financial services, farm and manufacturers. Of a certain size and unlike Salesforce, now it’s interesting.
Salesforce is an interesting example because they did not start off as a data management company, enterprise data management company, become an enterprise data management company because strategically they decided to move in that direction because they make a lot of money and they got, frankly excess capital they need to deploy and they could have done many things, but moving into data management makes sense because they do control a critical.
Strategically critical corporate data asset in terms of the CRM. And that gives you some leverage. And so it’s clear with the acquisition of Mule, MuleSoft a couple of years ago, six years, whatever that was a big signal. Hey, we’re going to be a data management company, but I think it’s important.
I, the big, I think probably the biggest differentiation between our vision and theirs is Salesforce is really a cloud company. And they’re really best at managing and connecting data that exists in the cloud. But companies still have a lot of data in what we call on prem, not in the cloud. And our focus has always been on that data that either hasn’t gone to the cloud or we’ll never go to the cloud.
So Stardog is a cloud platform, but it also can operate on prem. It’s a Kubernetes platform, which. Technical folks in your audience just means, Kubernetes basically replaced the Java virtual machine as the enterprise delivery mechanism. The dominant one. But that just means you, we can operate our platform.
Our customers operate it both on prem and in any cloud environment. Excuse me. That just means startup could be adjacent to data no matter where it is, not just the part of the data that’s in the cloud, even if in the end, in the next 10 years, let’s say, 80 percent of all corporate data resides in the cloud, 20 percent of all enterprise data is still a very large amount of data.
And it needs to be connected. And what we’re really focused on is connecting data and then making it accessible with this LLM technology we’ve been discussing in what I like to call the hot everyone calls the hybrid multi cloud. So that part of the data that’s on prem hasn’t moved to the cloud yet.
Or again, my favorite statistic, 85 percent of all businesses, irrespective of size, have data assets at more than one cloud. Now for most businesses, that means they have Salesforce and HubSpot, right? Which is fine. And those are different solutions, different clouds, but really, that problem is going to get solved for SMB and small businesses by those vendors.
But it’s true of big businesses, like our big banking customers have data everywhere in every conceivable. Location format. And
John Koetsier: it’s not comforting that the financial industry has this data everywhere.
Kendall Clark: That doesn’t mean they’re not controlling it, but I just mean by everywhere.
Like what’s the newest, like most globally significant banks, unlike say Facebook and one important regard, Facebook is like a teenager of a business and globally significant banks are like grandparents there. On average, what, 75, 100, 150 years old. So they’ve existed longer than computers have existed, which just means you take a cross sectional slice of a big bank, you’re looking at the archaeology of the last 70 years of IT.
They started with mainframes or what was even before mainframe, boxes of punch cards or whatever. And they’ve had, they got one of everything, and there’s legacy all over the place. They have data everywhere in that sense. That may also not comfort you, which is fair. But that’s a tough problem to solve.
You’ve got a system that’s running. It works. It meets requirements. It’s just old. And you meet some it people in the smart, their smart view is don’t mess with that. Leave it alone. It’s running. Why mess with it? And then, somebody else equally smart says no, we need to modernize that they’re not, nobody’s right or wrong there.
It’s just, it’s a hard problem. Not to come on your show and make a brief for banks, but you get my point. Like they’re in a tough spot when your organization’s 150 years old. They are in a
John Koetsier: tough spot. And that’s why we see the rise of neobanks, but we are straying way far afield, so we’re going to, we’re going to pull it back here.
So you’re building your solutions so that organizations can query their data. That’s great. NASA is using it. Others are using it. How do you solve hallucinations? That’s obviously a challenge with generative AI, and that’s the problem you cannot have in your scenario. That’s a problem I’m okay with. If I talk to open AI, I can, does it pass the sniff test?
I can double check it. Bar, Google just added or bar just added some double checking of what it says as well. How do you solve hallucinations?
Kendall Clark: Yeah, look, there’s a cheating answer to this question, which is what we’re doing. And then there’s the hard research question of making the LLMs stop hallucinating.
I won’t address that. That’s a research question. It will get solved, I suspect, to some appreciable level of, quality, precision and recall. I think the first thing to say is LLMs are not databases, and they should not be treated like databases. The way we solve it, which is like somewhat cheating, is we don’t use LLMs and Stardog as a source of data.
We use it as a mechanism to discern human intent. Which is a kind of a fancy way of saying, what is the person talking about in their natural language? Whether that’s my, my, one of my co founders is my CTO is from, born in Istanbul. So he speaks Turkish and English. And I said to him at some point, I said like, how’s the LLM working in Turkish?
As as a joke, I knew they worked for many natural languages, but not necessarily for all of them. And he just showed me a demo. This was this summer. And it was just straight up Turkish and it just worked. It was amazing. We use the LLM as a way to figure out what the person is talking about.
Translate that into a query or a search or a hybrid query search or a data modeling, piece or a data mapping piece or a rule or something that then gets executed by our platform. And that cuts out the chance for hallucination. What it means is sometimes the LLM will get the human intent determination wrong.
But then that just means a query. Now we’re just back in the case. You said earlier, someone expressed the need for a business question. Some other mechanism translated that into some query and didn’t get it right. Yeah. Frankly, relative to the status quo. So what happens … you just redo it.
John Koetsier: And I’ve been in that scenario quite frequently, and usually it’s. Not the data analysts who got it wrong. Usually it’s me who asked the wrong question.
Kendall Clark: Not specifically, but yes, almost always. It is just frankly, and I tell the team this all the time, an LLM is not magic. It cannot determine intent when no crisp intent exists yet, but that’s okay.
People often find, we find our way by. Asking a kind of, frankly, not very good question, and then we ask a slightly better question, and then we iteratively improve, and then we discover what our intent was all along, more like probably we create the intent in this iterative process, and then we retroactively attribute it to ourselves, and that’s a, it’s a psychological thing we do, and that’s fine.
Oh, what I meant all along was this probably not. It’s what you mean now, and here’s the answer. Fine. Nobody’s throwing rocks at those two. It’s just normal human stuff. Yeah. So in our approach, we don’t treat the L we don’t ask the l m any questions that, where it’s hallucinations can bother us, right?
. . . Cool. And so in the near term, that’s the best solution. It’s not su, this is use case specific in context relative. That’s what you want to do in a regulated industry where the questions really matter. But, if I want another cool picture of an astronaut dog and, mid journey or something, I want that slightly random quote unquote wrong component because that’s really the source of creativity.
John Koetsier: Absolutely. And creativity is a wonderful thing. Just not always when you’re querying data from your own company. That’s exactly right. Kendall, this was interesting. It went places I didn’t expect it would. Thank you for taking the time.
Kendall Clark: Thanks, John. I appreciate it.
TechFirst is about smart matter … drones, AI, robots, and other cutting-edge tech
Made it all the way down here? Wow!
The TechFirst with John Koetsier podcast is about tech that is changing the world, including wearable tech, and innovators who are shaping the future. Guests include former Apple CEO John Scully. The head of Facebook gaming. Amazon’s head of robotics. GitHub’s CTO. Twitter’s chief information security officer, and much more. Scientists inventing smart contact lenses. Startup entrepreneurs. Google executives. Former Microsoft CTO Nathan Myhrvold. And much, much more.
Subscribe to my YouTube channel, and connect on your podcast platform of choice: