Nvidia’s telecoms goals

The AI-native RAN, telco AI Factories, and a 6G Research Programme.

Nvidia held its annual conference, GTC, last week. It’s a huge event. Kicking off with a headline session that showcased the capabilities of its new GPU platform, Blackwell, the event then devolved into over a thousand sessions looking at the many use cases of accelerated computing across industries. One of those industries receiving attention was telecoms. 

In a special address, Ronnie Vashishta, Senior Vice President Telecommunications, said that “telcos are really leaning into AI”, with AI in telecoms matching three main areas of use.

The first is in the use of Gen AI within telcos’ own internal processes, for instance in customer care and in network operations. The second was the creation of “AI factories” – essentially running datacentres specialised in running AI workloads. The third area, and according to Vashishta, “the most known way” is the use of AI in “transforming the RAN.”

For the first use case, the use of Gen AI in telco processes is now a question of “what and how”.  Vashishta said Nvidia has built a framework to solve those questions – NeMo. This is a framework for building, customising and deploying Gen AI models – from data creation, training, retrieval-augmented generation (RAG) and deployment through Nvidia Inference Microservices (NIM). An ecosystem is now forming around these frameworks, with partners starting to deploy into their customers. Vashishta mentioned Amdocs’ AmaIz, Kinetica for network intelligence, and ServiceNow, along with TCS’ Twin, Quantiphi – a field technical assistant that synthesises large volumes of technical manuals, and Deloitte which has an LLM for network design.

AI Factories are a way that telcos can “deploy capital” to transform business models,Vashishta said. The demand for infrastructure for training and deployment of Gen AI is “huge and getting larger”. “Why shouldn’t telcos play an important role in that?” he asked.

While admitting that telcos have had mixed or limited success by trying to act as data centre providers, he said that telcos have good attributes for developing AI Factories. They also have access to real estate and large power contracts. One key aspect is that they can create ecosystems that are relevant to their geography, acting as sovereign Gen AI providers. Another factor that plays well for telcos, Vashishta said, is that AI Factories will require connectivity between more centralised and local or regionalised instances. 

Here, the training of the model might happen in larger centralised clouds (or factories), creating an initial foundational model. But then the use of those models doesn’t have to happen on large datasets – it can happen more locally to the user, which requires a different type of infrastructure. 

Nvidia, Vashista said, has invested in a reference architecture that will enable telcos to ramp up deployments quickly. Through its Cloud partner programme it has started to deploy around the world, building multi-tenant clouds with the telco, where Gen AI is the largest tenant, with multiple users of that workload.

AI and 5G

Nvidia’s goal for vRAN, Vashishta said, is for it to be another software-defined workload, with no specific hardware or accelerators required. It could even be a tenant in the AI factories that he sees telcos building. 

“Yes it can. And that’s starting to happen now. That is really transformative. Just think about the billions of dollars in a RAN. Now your investment in your AI factory enables that workload to run in that factory. This we call AI and RAN. We’re working with partners in the AI RAN Alliance to flush out how this infrastructure is deployed and how the two workloads of AI and RAN can be orchestrated to work together.”

Vashishta made the claim that Nvidia often makes, that the RAN is often under-utilisied. In his words, “Operators spend billions and utilise that investment 30% of the time. Think about a business district where the RAN workload is high during the day. When people go home you can keep that infrastructure running and use it for AI workloads, creating tokens generate revenue on the RAN infrastructure. That’s novel and we are talking to some companies and partners about this.”

6G Research

The third area, using AI in the RAN, is where Nvidia is targeting its efforts in a 6G Research Cloud Platform. Vashishta described an AI native RAN as one where there is a convergence of compute and connectivity,” intelligently connecting intelligence”. 

“Running the RAN stack on this compute changes the paradigm – we’ll see AI playing a much more important role – we formed the AI Ran Alliance because of this. You need to support apps over the network and you need AI for the RAN as well. We need to improve cell capacity and spectral efficiency, which is holding back capabilities to deliver at scale. We need to have energy consumption be a factor in how to deliver the RAN, and with a much lowerTCO. The network needs to be not best effort but deliver guaranteed latencies, with decisions made autonomously in the network.

“Now take a deep breath. This is six years away. So to develop 6G the tools need to be there, to do the research, write and agree standards, invest and then deploy.

“AI will be the beating heart of 6G networks, with predictie channel models, dynamic resource adaptation, real time learning. We’re moving towards an environment that is very different to putting out the same signals in any given urban or rural environment.” 

This vision underpins Nvidia’s investment in its 6G Research Cloud, its Sionna Neural Radio Framework and its Omniverse RAN Digital Twin. 

The Research Cloud Platform brings together Nvidia’s Aerial CUDA-accelerated RAN, which is a software-defined, full layer RAN stack. Sionna is a neural radio framework that enables developers to take and test algorithms through Nvidia’s TensorFlow machine learning framework or into the physical CUDA-accelerated RAN stack. 

“We’ve started to accelerate L2 and above using GPUs, with some very good results. We can work with partners up and down the stack, which is very important because some partners have their own algorithms and just want the hardware platform. That’s OK. But we have to develop the full capability because we need to show what our platforms are capable of. L2 we know will be important to us in terms of how we can accelerate the full stack, and as we move into the RIC, where AI is very compatible with the enhancement of overall network performance.”

The Sionna Neural Radio Framework enables AI and ML implementations of some of the most challenging functions of the RAN, described as a PHY layer developed using AI workflows. “Enabling Sionna to be embedded into channel estimation, demodulation and other blocks is unique to us because we have that digitisation of the full stack,” Vashishta said.

The Aerial Omniverse Digital Twin is another of Nvidia’s RAN research tools. It simulates RF in the real world, enabling developers to set parameters within an environment, take in geospatial data into a propagation simulation, bring it to an AI RAN stack, test an algorithm and go back through that loop again. 

These elements are part of Nvidia’s newly announced 6G Developer programme, through which it is enabling access to these capabilities.

“This is a fundamental change in the way research can be conducted for 6G.” Vashishta said.

Building an ARC

In 2023, Nvidia announced the Aerial RAN CoLab (Over the Air), a programmable 5G and 6G wireless full stack that can be deployed in a lab. Anupa Kelkar, Product Manager, Nvidia, said that when it was launched it was made available on discrete components, with an Ampere class GPU, Intel Xeon CPU and Connect-X6 dX Mellanox NIC, all of which could be plugged into a Gigabyte server, able to interoperate over a 7.2 split with commercially available 4T/4R radios.

This year Nvidia is launching a managed developer service with Sterling, leveraging a Dell R750 server and the converged A100x accelerator. The card includes ARM CPU, Ampere class GPU, an internal PCIE switch and Connect-X6x dX. So the entire network is hosted on a single card.

“For this year we should also be able to support Grace Hopper,” Kelkar said. Grace Hopper has an ARM GPU, Hopper CPU and shared memory -“thereby we expect to eliminate significant latencies in communicating between the GPU and CPU.” 

The programme is also working with the Open Air Alliance to introduce features and capabilities beyond R15 this year. These include Short Data Transmisson from R17 and a few other low latency capabilities designed to help immersive experiences and “many other features and capabilities that affect the roadmap to R16,17 and beyond”. From an ecosystem perspective ARC will have Sterling introducing SkyWave, a managed developer service that eases operationalisation of a private 5G network so that developers can focus on research and algorithm development, instead of setting up and managing a network.