Large language models have dominated the AI conversation, but are small language models (SLMs) actually the future?
In this episode of TechFirst, host John Koetsier sits down with Andy Markus, SVP & Chief Data and AI Officer at AT&T, to unpack how small language models are delivering enterprise-grade accuracy at a fraction of the cost and latency of massive LLMs.
- Get the deepest insights concisely on the TechFirst Substack newsletter
- Subscribe to the TechFirst YouTube channel to never miss an episode
And, watch our conversation here:
Andy explains how AT&T uses SLMs for:
- Contract analysis at massive scale
- Network analytics and outage root-cause analysis
- Fraud detection and enterprise knowledge systems
- AI-driven “field coding” and agent-based workflows
They also dive into the rise of agentic AI, how structured “archetypes” replace risky vibe coding, and why the future of software development may be humans supervising autonomous AI systems rather than writing every line of code.
If you’re building AI for real-world, high-scale use cases — especially in enterprise environments — this conversation is essential.
Transcript:
Note: this is a partially AI-generated transcript. It may not be 100% correct. Check the video for exact quotations.
Andy Markus:
We have small language models that are, of course, way cheaper to run—in many cases, about 10% of the cost of large language models. They’re super fast, and the accuracy rates that we’ve achieved with our fine-tuning methodology are about as accurate as the large language models.
John Koetsier:
When it comes to AI, could the future be small? In fact, very small. Hello and welcome to Tech First. My name is John Koetsier. LLMs have been all the rage for, what, two years now? Large language models, right? What about SLMs—small language models? They’re faster, they’re more efficient, they’re surprisingly effective when well tuned for a specific purpose, and they can run on some pretty minuscule hardware.
To chat, we have someone who’s a former SVP at Time Warner Media. He’s currently SVP and Chief Data and AI Officer at AT&T. His name is Andy Markus. Welcome, Andy. How you doing?
Andy Markus:
Great, great, John. Thanks for having me.
John Koetsier:
Super pumped to have you. I hear you’re getting snow soon. You’re in Atlanta—that doesn’t compute—but hey, I hope you stay warm.
Let’s talk about AI. We’re going to talk about some AI and agent trends in a bit, but we’ll start with SLMs. Maybe kick us off—what is an SLM?
Andy Markus:
Yeah, so small language models—we’re super excited about the future there. You kind of hit it as you led off, which is that the beginning of this new era of AI was all around large language models.
When we started at AT&T, my direction to the team was: let’s make this work, let’s show it works, and then we’ll figure out the cost later. But first of all, we have to get the accuracy that we—and your CFO—
John Koetsier:
—freaked out.
Andy Markus:
Exactly. That’s exactly right. But we did. We proved that it worked. We proved that it could drive value for the use cases that we’re executing. And now it became time to think about, as we scale, how do we get the cost and the latency to the point that we need?
So the path was small language models. We had to kind of get our feet under us on how to accurately fine-tune these small language models with our own data. But surprisingly, we were able to do that pretty quickly. And what we found is that the trade-offs are really great.
Usually, between accuracy, cost, and latency, you have to pick two. We were able to solve all three.
John Koetsier:
Wow.
Andy Markus:
So we have small language models that are way cheaper to run—many cases 10% of the cost of large language models. They’re super fast. And the accuracy rates that we’ve achieved with our fine-tuning methodology are about as accurate as the large language models. For most of our use cases, we can definitely deal with that trade.
John Koetsier:
That’s pretty impressive. Let’s maybe quantify a little bit. How small is small? I’ve talked to some people who are building SLMs that are literally the number of parameters you’d see in a fruit fly’s brain—really, really tiny. Can run on a Raspberry Pi or some other edge device. And of course, LLMs are trillions of parameters—massive, tens of thousands if not hundreds of thousands of GPUs training them and running inference. What kind of parameters are you talking about for “small”?
Andy Markus:
For us, for enterprise-scale use cases, somewhere between four billion and seven billion parameters is kind of an average. It depends on the sparsity of data and what we’re trying to do, but four to seven billion is a good average for us.
John Koetsier:
Cool. You talked about them already being useful and way cheaper to run. Where have you found them useful?
Andy Markus:
We’re finding them useful across so many use cases. For example, we’re doing evaluations of large data repositories where we have transcripts and need to understand intent or discussion points. That’s super useful, and that’s a lot of data.
One interesting use case we have—like most companies—is that we now have all of the contracts across AT&T in a vector store. We have the concept of an enterprise vector store, where we store the data once and it serves many use cases.
If you want to parse out individual clauses from contracts so you can do complex analytics, there’s a lot of data there. Small language models have been really important. Then we’re moving on from there into network analytics, how we manage dispatch across the company—just all the use cases we do. Fraud as well.
John Koetsier:
Network analytics is a super interesting one, and it’s super topical as well. A major competitor of yours just had a pretty much national outage. If you can have things running at various levels in your network—analyzing where traffic’s coming from, where it’s going, where there might be outages, how to reroute around it—that probably keeps your uptime higher.
Andy Markus:
Yeah, it’s a super complex use case, but it’s one we’re really excited about. Network root-cause analysis—the ability to evaluate all the data sets that an expert network engineer would have to understand—and do it in a way where the handoffs are seamless, the processing is super fast, and you get from the starting point to the conclusion in a fraction of the time. That really helps the team pinpoint the solution instead of trying to figure out the next step.
John Koetsier:
I can only imagine the complexity. You’ve got hundreds of millions of devices connected, all requesting data, moving from zone to zone. When there’s an outage, there must be tens or hundreds of millions of data points to look at. If you can get down to the root cause quickly, that’s impressive.
Andy Markus:
Yeah, and it’s network log data, but also policies and procedures for how the network should work, plus previous issues that have happened. We can evaluate prior root-cause findings to pinpoint issues faster. It’s using all of that together, just like a really advanced network engineer would.
John Koetsier:
I’m guessing it’s also a target-rich environment. There are probably more use cases than you can hit right away. I was chatting with Mark Vange recently—former CTO at Electronic Arts, now at Automate. They took complex contracts that used to take two hours to work through and turned that into an AI agent that got it down to 10 minutes.
You don’t have to understand a form anymore. The form can talk to you and say, “Here’s what I need. Here’s where you put it.” You’re not reading some massive enterprise or government form and trying to figure it out.
Andy Markus:
Yeah, it’s really powerful. There’s a lot to unpack there. You’re right—it’s a target-rich environment—but we focus on things that drive the most value for AT&T and our customers.
We work with our business partners, look at use cases, and do formal business cases. If it has real value, we focus on it. We don’t want to focus on something complex that delivers little value.
We’re focusing on free cash flow-impacting use cases. Two years ago, we had a 2x ROI that was free-cash-flow impacting. Last year, that jumped to 4x. That’s really meaningful value at the scale of AT&T.
The other thing you mentioned is AI field coding. We’re still doing use cases that drive efficiency and customer benefit, but how we get there now is through AI field coding.
It’s a super agent that understands the instructions, lays out a plan, whether it’s a simple app or a complicated use case tied into existing systems. That super agent handles planning, and then we have what we call archetypes—very function-specific instructions for how we do things at AT&T. If we’re writing SQL, we do it this way, conforming to our standards.
The super agent understands how to call the archetypes to deliver the task. It turns vibe coding—which has negative connotations—into one-shot, two-shot, three-shot runs that do really complicated things efficiently. We’re delivering extremely complicated apps in hours or days that used to take months.
John Koetsier:
It’s out of this world. I’m vibe coding an app myself. Anthropic just released a new tool that was 99.9% AI-coded. Somebody said most of the code building the company right now is written by AI. That’s a crazy, weird, wonderful, sometimes scary world—especially if you’re a software engineer. How do you view that?
Andy Markus:
It’s great. It’s like having extra firepower at your fingertips. The archetypes are super important. You have to be a software engineer to write them accurately. Without them, vibe coding can be a dead end.
As the output converges, you still have to understand whether it’s right. The AI can make mistakes. Archetypes minimize that, but integration still matters—you have to bolt it into existing systems. It just allows us to do a lot more.
John Koetsier:
Exactly. The to-do list never shrinks.
Andy Markus:
Exactly.
John Koetsier:
I saw a GIF recently of a vibe-coded app—it was like a train set where the tracks go everywhere.
Andy Markus:
I saw that. It was brilliant—and accurate for vibe coding. But that’s not AI field coding. Archetypes mandate that the output conforms to instructions, so the train tracks actually connect.
We have about 75 archetypes for data engineering and data science. It’s all AI field coded. The super agent calls the right archetype, and the sub-agent delivers exactly what we’ve instructed.
John Koetsier:
It’s an amazing evolution in programming. Punch cards, machine code, languages, interpreted languages—and now you just talk to the machine.
Andy Markus:
I used to code punch cards.
John Koetsier:
Wow.
Andy Markus:
We still haven’t seen where this is going. The coding stuff is amazing. We’re seeing incredible things from OpenAI. Where we’re focused now is figuring out the right checkpoints for humans in the loop.
We’ve gone from human in the loop to human on the loop, and now we’re figuring out what’s appropriate for true autonomous agentic work.
John Koetsier:
Where else do you see the ability to use AI? Service staff? People on the road?
Andy Markus:
The challenge is how small you can really get and still be productive. For enterprise use cases, I gave you our scale. Others are trying to do advanced things on phones—that’s achievable over time.
Our focus is making sure that when we serve internal clients or customers with AI-driven outcomes, we do it responsibly and deliver the expected results. Responsible AI is core to serving both customers and employees.
John Koetsier:
Where do you see future trends going with agents and SLMs?
Andy Markus:
Agentic AI is a big focus. Early use cases were one-turn generative solutions—valuable, but limited. As problems got bigger, they needed more context.
Agentic AI lets us break big problems into smaller ones and solve them more accurately. Whether it’s coding, or figuring out credits and rebates across AT&T contracts, agents let us handle different functions appropriately.
Right now, humans check many steps. We’re moving toward humans checking final outputs as confidence and accuracy improve. Over time, we’ll move toward more autonomous environments, balancing risk and value.
John Koetsier:
That’s interesting. I think about image generation and coding—it used to feel like shaking a magic eight ball. You try 10 times and hope one works. It sounds like you’re getting beyond that.
Andy Markus:
Totally. Image generation was classic—hands and feet wrong all the time. But it’s progressed so fast that now you’re really struggling to find issues.
John Koetsier:
Exactly. Super interesting. Thanks for taking the time.
Andy Markus:
Absolutely. Good to talk to you.