Like all good tech companies, Heavy.ai has its idiosyncratic origin story. Its founder Todd Mostak wanted to find a way to analyse large amounts of geospatial data points when, for a University research project, he was researching the use of Twitter during the Arab Spring movement. It also has its list of former names: originally it was formed as MapD, which stood for Massively Parallel Database, then it was known as OmniSci, before setting on its currently vogueish Heavy.ai styling. (Actually it caps it up but that’s more than the TMN style guide can handle for now.)
But the core of what it does hasn’t changed since Mostak’s early work: as he explains it on his LinkedIn profile, he discovered that by utilising the “parallelism of GPUs and CPUs”, he could “query, render, and visualise multi-billion row datasets without needing to downsample, index, or pre-aggregate”.
Mostak says that as he went along, “I soon discovered that my pain point was shared by vast swathes of analysts, data scientists, and data practitioners, who yearned for an agile and effortless way to analyse and visually explore the massive datasets that their organisations were accumulating but could not extract value from. Realising that the problem was bigger than just the difficulties I was encountering in my own research drove me to build a company around the core initial technology.”
That method of sorting through huge datasets has most recently been productised into a solution that Heavy.ai offers to governments, energy companies, utilities and telco operators. The company has also worked in partnership with GPU behemoth Nvidia, which also holds a stake in Heavy.ai, along with a series of other investors that include Google Ventures and Verizon Ventures.
So what is heavy.ai’s play in the telco, and mobile network operator space, and does what it is doing differ from other big data analytics methods?
Talking to TMN, Jennifer Woodford, VP of Customer Success, said, “What we do is that we allow organisations to analyse and visualise large geospatial datasets from multiple data sources. When we talk about large datasets, we’re talking 10 million and up to billions of rows, working with data at a scale that many of the other existing tools – or general business intelligence and recording tools – aren’t able to handle. That’s size of the data set but also the level of complexity; geospatial data has a lot of complex aspects to it – location, time, etc. What we allow organisations to do is to instantaneously visualise that, and to make decisions for their business.”
“What we allow for is those instantaneous results because it is an interactive tool. So whether you’re a network engineer or you’re an executive, you can play with the data – visualise and create different scenarios – and be able to act on that to make data driven decisions very quickly.”
What does this mean for telcos?
“If you look at what we can do with the types of data that we bring together, we really want to help telcos improve their quality of service so that it positively impacts their customer experience. When you think about the last couple of years, in particular, and how dramatically usage patterns changed during the pandemic, what Verizon, in particular, was able to do was monitor its network and see these large changes in the mobility of their users and of their customers, and [understand] how that impacts the way that they deliver service, what that means in terms of capacity planning.”
Another application is Heavy.rf, a modelled network digital twin developed in NVIDIA’s Omniverse Enterprise that enables telcos to develop interactive digital twins to plan, build, and operate networks. The modelling is able to combine signal metrics with demographic and customer behaviour data, giving telcos insight into the effects of planning and operational decisions.
So how does Heavy.ai design and structure its solution to out-perform, in its words, other data analytics solutions?
in Woodford’s words. “We’re using the power of the GPU to be able to handle these large datasets. We also have a CPU accelerated version of our product or we leverage CPU acceleration. And what we’re seeing is 100x. performance improvements.”
Heavy.ai’s website says that HeavyDB is designed to” keep hot data in GPU memory for the fastest access possible.”
“HeavyDB can query up to billions of rows in milliseconds, and is capable of unprecedented ingestion speeds, making it the ideal SQL engine for the era of big, high-velocity data.”
“Other GPU database systems” it says, “have taken the approach of storing the data in CPU memory, only moving it to GPU at query time, trading the gains they receive from GPU parallelism with transfer overheads over the PCIe bus.”
“HeavyDB avoids this transfer inefficiency by caching the most recently touched data in High Bandwidth Memory on the GPU, which offers up to 10x the bandwidth of CPU DRAM and far lower latency. HeavyDB is also designed to exploit efficient inter-GPU communication infrastructure such as NVIDIA NVLink when available.”
Woodford says another feature is to give customers the ability to process fast-changing datasets without having to continually re-ingest and update them. It does that by connecting the HeavyDB database with the datasets in their own environment.
“We have HeavyDB, which is the database, but we also have a tool called HeavyConnect, which allows us to leave data in place so customers don’t have to bring the data into our database in order to analyse and visualise it. We connect to a number of different data sources; we connect to AmazonS3, Snowflake, we’re working on some connections to Google BigQuery. And so we allow customers to leave data where it is and then leverage it in our database and render it so that they can make those decisions.”
Heavy.ai is headquartered in San Francisco, and has 68 employees. It is entering a telco world which is grappling with how to structure massive datasets for use in network operations and customer-facing applications. Operators moving to distributed cloud platforms hosting cloud native functions that support services delivered across multiple access networks need to interrogate billions of network events across physical, virtual and cloud domains. They do this to feed orchestrators, service and performance assurance tools, and for customer experience monitoring.
Some, and Vodafone is just one example, have partnered with one or more cloud giants to make use of their native analytics capabilities. Other players entering this market have made a virtue of how data is ingested, pre-sorted and analysed, to avoid the “boil the ocean” problem when it comes to finding actionable data points. An example here would be Cardinality, which was acquired by ElisaPolystar. Others are looking for partnerships with the hardware and server providers to combine streaming metrics from physical and cloud infrastructure with applicational level and service metrics. In this way operators can gain a cross-domain view of performance. The EXFO-Intel partnership would be one such example.
All of the approaches have to deal with the geographical nature of the data, its time-sensitivity and its changing nature. Heavy.ai is betting that its use of hardware, GPUs combined in parallel with CPUS, can give a speed and performance boost to the tools and libraries that telco data scientists are using to take and make highly interactive decisions.