AI and the fight for data control

The battle for control of network data and AI Net ops is being fought out right now.

“The war that’s being fought is really about who gets that data, because that’s the game.” 

“That’s been a big revelation for us; that the telcos are really pushing for an open, level playing field as it relates to AI.”

“We as a telecom industry have been used to managing everything internally. We also realise that that has become little bit of a bottleneck. OK, a lot of a bottleneck.”

In this article:

  • Andrew Coward, GM of Software Defined Networking, IBM
  • Kailem Anderson, Vice President of Global Products & Delivery for Blue Planet, Ciena
  • Manish Mangal, Telecom CTO, Tech Mahindra
  • Alex Jinsung Choi, Softbank, Principal Fellow, SoftBank Corp.’s Research Institute of Advanced Technology, AI-RAN Alliance
  • Dr Junlan Feng, Chair and General Manager of AI and Intelligent, Operation R&D Center, China Mobile Research Institute

Network operators can use AI to automate their network operations, from core the the RAN, with visions evolving as far as zero touch network operations, and new innovations driven by AI integrated in the network. AI starts with data, and there’s a battle being fought to structure where the data resides, and how it interacts with AI, and what models will work best for telcos. If they get the foundations right, however, telcos could go beyond AI-ops into something that actually helps them innovate more quickly, and take advantage of external innovation.

The battle for control

Andrew Coward, IBM, says there are three ways AI capabilities are being offered to the industry. The first comes from the providers of large foundational models, such as the cloud hyperscalers. He likens this to selling you an engine and expecting you to build a car. Only the most well resourced and staffed telcos can do this – and even then it’s a push. So it’s good for kicking off a billion dollar consulting business, less good for getting telcos the result they want.

The second source is the big vendors, who add AI capabilities to their existing products. The issue here is that operators then need to manage and orchestrate across a variety of vendor environments – often two or more within the same domain. And although the vendors may talk a good game about being open, Coward says interfaces are in fact becoming more closed and technology more proprietary.

“If you think about the ways we get data from SNMP, configuration, NetConf and so on, we’re seeing quite a few vendors kind of lock up that ecosystem. So you have to talk through their API, through their NMS, to get information from anything in the stack.
What they’ve realised is the value is in that data stack, and they want to expose it only through their tools. In parallel, you’ve got some of the cloud providers, all of the cloud providers, really, who are almost giving away AI for free, but capturing all the customers’s data in the process.”

“If vendors are making data harder to get at, putting it in the cloud might also be an issue.”If you think about all the telcos that are pushing petabytes of data into the cloud providers, it’s very hard and very expensive to move that data out of those cloud providers to do something with it. The war that’s being fought in this in these halls this week, is really about who gets that data, because that’s the that’s the game.

Not surprisingly, Coward identifies a third approach – what he sees as the sweet spot – and again not surprisingly this is the IBM way. The third way is a horizontal approach that can integrate Ai technology across a network, but in a way that “brings AI to the data” and not the data to the AI.

“There’s very few vendors that have the technology and the reach to be able to do this. We’re building AI models for each of the domains, one by one. Often with partners, for instance with Juniper on Mist AI. And we operate across all the clouds, bringing the AI to the data, rather than the data to our AI.”

For Coward, the skill is in understanding how telco data is structured and behaves. For example, in his view LLMs are “bad for time series data” – and a lot of telco data is time series. IBM has built open source models that are time series focussed that are much smaller, with a million parameters and not billions.

Telcos wanting level, open AI playing field

I personally think, as a software vendor in the OSS space, if you’re not opening up your environment so that telcos and partners can apply their own AI to your data set, you’re going to get left behind

Ciena’s Kailem Anderson says that telcos have got much more serious about AI in just a year.

A year ago, telcos were trying to figure out what AI meant for them. They weren’t really sure what use cases to look for. Gen AI was still new and Agentic AI didn’t exist.

“There’s a big shift from last year to this year. Telcos have decided they want to be in the AI space, developing agents and leveraging and exerting a level of control in their network.”

For Anderson, one of the key realisations has been that telcos do not want a vendor partner to simply implement AI within a capability – they want to leverage that AI for themselves.

“If I’m telling them I’ve got embedded AI in my products, they say, sounds good, but how do I influence the stuff in there? I don’t want it to be a black box.

“And that’s been really big revelation for us – that the telcos are really pushing for an open, level playing field as it relates to AI.”

Blue Planet’s response to that has been to introduce its Blue Planet AI Studio, which builds on the concept of bring your own AI.

Anderson described AI Studio as an AI development environment where telcos can load agents – which could  be fault detection, alarm correlation and so on – and then choose that product or environment for that agent to live in. An external agent could be applied against a Blue Planet data set or source an external data, with the Studio acting “like an agent orchestrator”.

This has involved a change of thinking. Anderson said that Blue Planet, like a lot of other companies, had been focused on building AI/ML into its products and talking about specific use case.

“For telcos it’s like, well, that’s great. How do I get a seat at your table? It’s very clear to me now that the telcos want an open, level playing field where they’re leveraging their expertise – because they know their domain better than anyone else.”

“Instead of us touting all the amazing AI that we’ve got – fault detection and all these agents we’ve built – telcos can create their own AI into our platform and deploy it against our data sets,  and bring in their own data and then apply those algorithms.

“They really like that concept, because every one of the telcos is building up 1,500 -2,000 data scientists in their own environment to figure out whatever use case is relevant for them. So the ability for them to be able to program AI into the vendor technologies, is key.

“I personally think, as a software vendor in the OSS space, if you’re not opening up your environment so that telcos and partners can apply their own AI to your data set, you’re going to get left behind.”

Autonomous operations

Manish Mangal is CTO of Tech Mahindra’s Communications Unit. For him, AI offers a promise of enabling a future operational environment, or OSS, in which networks can be autonomous.

Tech Mahindra is developing a toolkit to bring operators from the OSS stack of today into the what OSS needs to look like in the future. That means looking at how AI “plays a larger and larger role as we go forward and become more native, rather than sort of a bolt on.”

Mangal says not only is he clear that 30 to 40% of OPEX can be saved by running autonomous networks (figures he says Tech Mahindra is will to commit to), but that operators will gain greatly in resiliency too.

“95% of the outages in the network today happened because of human errors during change management. You know, somebody fat fingered the wrong command. By bringing automation and bringing AI inside the operations, I believe that we’ll actually eliminate a lot of those errors.” AI also brings with it the predictive element, catching anomalies that are happening in the network much earlier on. Mangal’s objective is to automate to the extent that 95% of tickets are resolved without human intervention.

Tech Mahindra’s platform for this is its Netops.AI platform – which Mangal describes as a digital OSS stack integrating AI.

The company just announced a partnership with AWS and Nvidia, creating its own LLMs that are trained to sit in the operations layer of the network to do all the alarm correlation, catch anomalies, early warning detection, using these models to predict network issues.

Although IBM’s Coward cautioned against the suitability of LLMs for time-sensitive telco data, Mangal counters that there are nuances. And he adds that for timestamp data, LLMs are also getting much faster.

“While timestamp data may change in a shorter span, the direction of that doesn’t change dramatically. I do see some LLMs going to SLMs to give us more agility. I envisage a future where we have agents sitting inside the operational environment, and where they discover an anomaly they push it up to LLMs to make sense of it and figure out the trend that needs to be managed”

Getting the data right

Structuring that Agentic AI operational environment starts with the data – where the war is being fought, in Coward’s terms.

The “number one step”, according to Mangal, is getting a data lake and telemetry architecture established. All the different agents and toolkits – say incident management tool, alarm correlation, anomaly detection – all these are built and deployed on top of the data lake.

Ciena’s Anderson says, “You can’t ‘AI’ what you can’t see. One of the biggest issues we find  is the data is not clean, it’s not normalised.”

As a result, he says, there’s been a recent trend over the last year for telcos to replace their multiple existing OSS with next generation inventory systems that act as a single source of truth and have the data normalised for how the OSS systems need to use it. Telcos have talked about doing this, but AI has been the tipping point, Anderson said.

“We think that’s because the telcos are starting to understand that if you’ve got clean, normalised data structured against a data model, then that is used as a solid foundation for AI-based use cases,” he said.

US CSP Lumen is a recent poster child for Ciena in this regard, although Anderson said there are four or five others in a similar position that Ciena is not allowed to reference. As for Lumen, it is consolidating 17 inventory systems into one, in part because these systems were not doing what they needed for AI use cases.

“We’re scrubbing the data, we’re normalising it, we’re applying it against the data model, and then providing a fabric to expose the information to whatever systems they want. They could be other OSS, BSS systems. They could be AI agents that need that information.”

For Anderson, this process actually began as telcos started they network automation journeys – needing systems that gave them an up to date network state and through which they could understand network changes.

“If you have a consolidated data layer across planning, orchestration and assurance you are pulling from one data source and that maintains a state between those three functions.

“We didn’t develop that specifically for AI. We did it for the concept of autonomous networks, but it benefits this vision of openness and programmability for AI also.”

Softbank’s Alex Jinsung Choi, who is driving forward the AI-RAN Alliance’s concept of using compute in the network to run AI workloads both for and alongside the network, also sees the structure of network data as a live issue.

“AI models depend on high quality data to function effectively,” he told LF Networking’s Open Networking & Edge Summit. “But such datasets are often scarce in telecom networks, making it difficult to deploy AI at scale.”

For AI-RAN use cases, the further problem is that much of the data is in real time, and streaming from multiple network platforms. Privacy and sovereignty laws further limit the availability of data for training.

“There is a sense of urgency. We have to do something,” he says.

He sees benefit in using synthetic data to reduce dependency on that scarce operational data, although that too must be done with care, to avoid models collapsing or becoming biased.

What is required for the AI-RAN vision is an overall data-for AI framework, which creates an automated data pipeline, with AI-driven cleaning and augmentation, to deliver high quality structured data across a network. The AI-RAN Alliance has created its own “Data for AI” taskgroup to take on some of these challenges.

China Mobile’s Junlan Feng also addressed the data demands of the telco’s AI strategy at the Summit. A commercial general model requires tuning to be domain specific. So a telco, say, will have to replace quite a big fraction of the general data – and in the network field an operator will have petabytes every day, from its database. But a current general model has no sense what is in there, creating a requirement for a structured foundation model to do augmented analytics, prediction, classification. Even more complex, a Radio Access Network (RAN) doesn’t create neat sequential data. It has spatial and temporal information, and users moving around the network. Creating that as a graph is not as not as simple as deploying a natural language model.

“How to encode that data is quite a challenge,” Fen says.

Beyond that, creating telco-specific models is hard because it is hard, legally and commercially, for telcos to share data.

Fen posits a possible solution – that telcos keep hosting data on their own platforms, but create common interfaces. “People could come from outside and work to solve a problem. It’s worth a try,” she says.

The cloud choice

Another key driver for the success of the AI vision will be the extent to which network functions, data, and AI technology itself, can take advantage of cloud native capabilities.

Mangal rejects the notion that a hyperscaler platform necessarily locks the customer data in. Tech Mahindra thinks of itself as cloud agnostic, with partnerships where it takes cloud APIs and functionalities and use them as part of Neto’s.AI.

Rather than be a tool for lock-in, a hyperscaler partnership can actually speed up innovation, in Mangal’s view.

“The idea is that the customers pick and choose whatever cloud platform. It’s a trust journey for sure,” he said. And he’s sure that’s almost impossible for the operator to drive all of their cloud and innovation strategy themselves.

We as a telecom industry have been used to managing everything internally. We also realise that that has become little bit of a bottleneck. OK, a lot of a bottleneck. I see  more and more telcos getting comfortable with using the public cloud. There’s a retraction around compliance and sovereignty, but other than that the hesitation to use public cloud has gone down dramatically overall.”

Ciena Blue Planet’s Anderson would agree that one underlying pivot has been the transference of data into the public cloud.

“Three years ago, we probably had zero of our workloads on the public cloud. Today, we have 85%.”

That is being pushed by the telcos, Anderson says. “We have telcos deploying Blue Planet on Azure, GCP, AWS, and we’re starting to see Oracle also play out there, particularly in places like the Middle East.”

While the shift of OSS towards the cloud seems to be coming more mainstream, for the layer below – network management systems and the things that actually touch the network day in and day out – it’s slower.

However, Anderson says, “I think in another three years time, you’ll see a lot of these SDN controllers, NMS, move to the cloud. The telcos aren’t ready at that layer, but definitely they’re transitioning the OSS layer, and that’s where we have to come in and support them in terms of their design. But the complexity for us is we don’t control the cloud of choice, so we’re having to support all the major clouds.”

Anderson’s view that public cloud OSS deployments are only just gearing up may come as a surprise to some have have been hearing for years about cloud-based OSS instances. But for Anderson there’s an important nuance.

“I would argue that a lot of these companies that say they’re cloud native, they’ve just containerised their whole platform, and they won’t scale out on demand. They’re hugely inefficient in terms of the resources they store. We consume only what we need, because we’re truly cloud elastic.

“There’s a difference between taking a product that’s monolithic, or maybe it’s microservices based, and just deploying it as a big container in the cloud. I would say there’s zero benefits doing that if you want to inherit the true value of cloud . Now, if you’re containerising all the microservices that you’ve got within your product, within Kubernetes, that’s a huge benefit because you can scale out.”

Overcoming the telco debt for innovation

Eventually, an AI infrastructure that enables telcos to access federated, structured data into multi-modal models – small and large – that work together, could help unlock the key challenge facing telcos – which is being able to make more money by being able to innovate new services faster and more efficiently.

Telcos, Coward says, have built up technology debts. One is in making data available in ways it can be abstracted. Another is that they have done a very poor job of putting APIs into infrastructure.

“There are examples where you call an API to kick off a services, and it takes somebody doing something physical and pushing that back through the API. That might take an hour, which is a really long time for REST API to be sitting around. That is nuts. One of the acquisitions we did last year is designed to put APIs into standard workflows.”

IBM also offers a low code/no code approach to make it easier to do integrate new services, compared to scripting processes across domains

“We’re essentially making it much easier either to do the integration in the first place and join the dots. If all the different elements within the network have been autonomised and then abstracted into an API, then it becomes much easier and faster to reassemble things, chaining APIs. We end up rearchitecting the business process, but also the simplification of how that business process is represented, so that it can then be sequenced together.”

For Mangal, too, some telcos are already benefitting from a new environment.

“The hyperscalers built their own networks in a different way to telcos. Innovation in the cloud world – selling IaaS, for example – has seeped into how to sell NaaS. That cross-pollination of knowledge and innovation, by involving a more open ecosystem, has created the opportunity for more players to participate in that innovation cycle.”