Artificial intelligence is reshaping the car and becoming a key differentiator. Premium buyers are increasingly willing to switch brands for better digital features. To deliver those features, automotive OEMs must balance latency, privacy, and cost. One key decision is where AI models should be run: in the cloud, on the vehicle, or both.
Onboard model execution, known as edge AI, is gaining ground thanks to hardware advances and lightweight models that enable powerful, low-latency inference directly in the vehicle. That shift is already reshaping the market for automotive semiconductors. For example, demand for neural processing units (NPUs) and modular system-on-chip (SoC) architectures is growing fast.
The roles played by OEMs, Tier 1s, and chipmakers are changing, too. The future of the automotive semiconductor value chain is collaborative: There is no one-size-fits-all, and success depends on codeveloping flexible platforms across the value chain.
Thanks to spectacular technological advances and billions of dollars of investment in R&D and computing capacity, generative artificial intelligence (gen AI) is transforming nearly every industry. For consumers, gen AI tools have become an everyday part of life at home and at work. Now these technologies are on the move again.
A McKinsey consumer survey1 found that in 2024, 38 percent of premium car owners in Germany said they would consider switching brands if the alternative offered a better digital experience. That’s more than double the share of owners who expressed the same preference in 2015 (15 percent). Even though this share decreases to 13 percent when looking at the decision retrospectively (derived intent), it still shows that better digital capabilities are a significant factor in today’s automotive purchasing decisions.
For automakers and their suppliers, the rush to seize the opportunities presented by advanced AI technologies raises important questions about the selection of appropriate hardware, software, and use-case deployment types. Companies will need to answer these questions in different ways across the vehicle life cycle, from product design and engineering to sales and aftermarket support.
In this article, we will examine the critical decisions that affect one specific section of the automotive AI transformation: the execution of AI models (the process known as “inference” in the AI world) within the vehicle. Building, training, and optimizing AI models is a hugely data- and compute-intensive activity, conducted in data centers equipped with large numbers of specialized chips. For the actual execution of the AI models, automakers can choose from cloud-based approaches that offload the work to remote data centers; edge solutions where the work is done on board the vehicle; or hybrid solutions that combine these two approaches.
Each of these approaches has advantages and disadvantages for automotive OEMs and their technology suppliers. They also have important implications for the specification of in-vehicle compute architectures, along with the enabling semiconductors and software stacks.
Automotive AI use cases
Applications for advanced AI and gen AI are emerging in multiple domains (Exhibit 1). The most complex and safety-critical AI use cases in today’s vehicles are found in advanced driver assistance systems (ADAS) and their enablers. AI models for these applications must be capable of perceiving the driving environment, deciding where intervention is necessary, planning the desired changes to vehicle speed or direction, and sending suitable control signals to vehicle systems. In addition to applications requiring intervention in vehicle control, carmakers have also developed AI technologies to optimize in-cabin safety—for example, by monitoring occupant behavior to spot signs of drowsiness or loss of concentration. Sophisticated AI models are also at the heart of the latest developments for end-to-end (E2E) ADAS. End-to-end ADAS refers to an approach where the entire perception-to-control pipeline is handled by a single deep-learning model, deriving driving actions or decisions directly from raw sensor inputs (such as camera images). This contrasts with traditional rule-based ADAS systems, which divide tasks like perception, motion planning, and control into discrete, separately optimized components.
A second major area of interest is the application of AI to infotainment and vehicle comfort systems. Key use cases here include voice-controlled assistants that allow natural language interaction with vehicle systems, gesture recognition, and systems that can learn driver preferences and adjust vehicle settings or make personalized recommendations (such as restaurant suggestions based on previous trips or stops) to enhance the user experience.
AI use cases in the powertrain, body, and chassis domain include sophisticated range calculation and energy management technologies for battery electric vehicles (BEVs); optimization of braking, traction control, and adaptive suspension systems; and predictive maintenance technologies. In the connectivity and gateway domain, we particularly see network health monitoring and intrusion detection use cases (Exhibit 1).
Cloud-based approaches are first to market
Many current-generation in-vehicle gen AI applications use cloud-based or hybrid approaches for model execution. For voice assistants, for example, the cloud-based approach involves transmitting audio recorded in the cabin or the transcribed text (via a simple speech-to-text model) across a 4G or 5G mobile data network for processing in a data center. The data center interprets the transmitted request using a large language model (LLM) and sends back the response.
In hybrid AI systems, the processing work is split between vehicle and data center. Simpler tasks, such as a voice request to alter the cabin temperature, may be processed entirely within the vehicle, while more complex voice queries make use of a sophisticated AI system running in the cloud. The hybrid approach reduces the volume of data carried by the mobile network, simplifies model maintenance and optimization, and enables basic functionality without a network connection while still allowing vehicles to access the functionality of large, complex LLMs.
Cloud limitations
Cloud-based approaches to AI model execution have helped OEMs integrate advanced technologies quickly, but these methods come with significant challenges. A 2024 McKinsey survey2 of industry stakeholders identified several areas of concern (Exhibit 2).
First, these systems require reliable 4G or 5G network connection to operate. Such connections are still difficult to guarantee in some regions, and they are always vulnerable to network outages or capacity constraints. The ability to operate offline is a requirement for AI systems in safety-relevant use cases such as ADAS/autonomous driving (AD), and a key driver of customer satisfaction for infotainment and digital communications functions. Offline availability was cited as a key requirement for all AI use cases by 39 percent of stakeholders in our survey. Premium OEMs, in particular, say that offline availability is critical to satisfy the demanding expectations of their customers.
Second, stakeholders are concerned about latency—the time that elapses between a request to the AI system and the availability of its response in the vehicle. For systems that operate in the cloud, latency comprises the time required for in-vehicle data pre- and post-processing, for the data center AI model to generate a response, and the round trip transit time for data sent across the network. Reduced latency was cited as a key requirement by 35 percent of stakeholders.
Twenty percent of stakeholders also expressed concern about data privacy and security, noting that many users of vehicle infotainment systems do not want their personal communications to be transferred to the cloud. Finally, 6 percent of stakeholders told us that the cost of high volumes of network data traffic was a concern, since this usage typically must be covered by the OEM’s contract with telecommunications service providers.
Driving AI execution to the edge
These concerns are driving automotive stakeholders toward the third option for AI: at-the-edge systems that host models locally and execute them within the vehicle. Edge AI eliminates challenges related to data traffic costs, network availability, and privacy concerns.
Execution at the edge can also help address latency issues by eliminating data transmission over the mobile network, although this approach does not always offer a decisive advantage over cloud and hybrid methods. Our analysis of voice-assistance tasks found that pure cloud solutions achieved a latency of 1000 to 2200ms, while edge deployments may offer latencies of 300 to 700ms.
AI adoption at the edge is not limited to automotive. It is accelerating across a wide range of industries. In the industrial Internet of Things (IoT), edge AI powers predictive maintenance and quality control through local sensor data analysis. Healthcare and medical devices use it in wearables like smartwatches and insulin pumps for real-time diagnostics and monitoring. Meanwhile, smart home electronics and security systems apply AI locally for tasks such as voice and gesture recognition or anomaly detection in video feeds.
Edge AI comes with its own challenges, however. In our research, stakeholders highlighted four areas of particular concern. First, 46 percent of interviewees mentioned resource constraints due to the limited capabilities of the SoC hardware available for in-vehicle applications. Compared to data center services, vehicle SoCs have less computational power, limited flash memory for model storage, limited RAM for model execution, and limited bandwidth for interaction with other vehicle systems. Thirty-five percent of interviewees also noted that computationally intensive AI compute loads can lead to high energy consumption, a particular concern for BEVs.
Beyond the hardware limitations, some interviewees also expressed concern about software-related issues. Fifteen percent of them noted that updating large AI models over the air created challenges, especially since such models must be continuously updated to incorporate new information or fix bugs. Four percent of respondents also highlighted the fragmented landscape of software frameworks for AI model execution, with major providers offering multiple, incompatible systems.
Model size evolution
Edge execution requires AI models small enough to run on the vehicle’s computers (Exhibit 3). Today’s most advanced language and reasoning models consist of hundreds of billions of parameters, making cloud-based inference the only viable option.
Machine-learning research has invested significant effort in the development of smaller “lightweight” models, however, using a variety of advanced pruning and simplification techniques to reduce model size by several orders of magnitude with only a small impact on performance. This area of AI research is progressing extremely rapidly, giving automotive OEMs the opportunity to integrate more advanced capabilities into their edge-AI systems.
The future of the in-car AI technology stack
Increased edge-AI adoption in automotive applications will require evolution across the vehicle technology stack. Exhibit 4 offers a simplified representation of the main components of such a stack. The software layer of the stack includes a real-time operating system (RTOS) supporting time-critical tasks such as ADAS and AD; a non-real-time OS supporting infotainment and other ancillary functions; AI frameworks3, models, and life cycle management tools; and AI applications. Hardware layers include flash memory for storage; dynamic RAM and high-bandwidth memory (HBM) to support computing; and different types of processing elements.
To support this evolution and overcome stated hardware limitations, scalable chip architectures will be essential. By enabling modular designs that can accommodate additional processing units or memory as needed, scalable architectures allow OEMs to future-proof their platforms and adapt to the growing demands of AI workloads. This flexibility is particularly valuable in a rapidly changing technological landscape, where new AI models and use cases are constantly emerging, making it more important than ever to have fast development cycles to bring AI-defined vehicles to market sooner.
While scalable chip architectures provide the design principles for flexibility and adaptability, heterogeneous integration (enabled by advanced packaging technologies) can offer the physical means to implement these designs effectively, allowing chipmakers to scale performance by adding chiplets and/or upgrading specific components without redesigning the entire chip.
Vehicle SoC architectures suitable for AI execution typically contain four different processor building blocks. The central processing unit (CPU) is used for basic vehicle control tasks, managing deterministic workloads and system-level orchestration. Digital signal processor (DSP) elements are mostly used to perform dedicated data-processing tasks from vehicle sensors. Graphics processing unit (GPU) elements are used for workloads that benefit from parallel computing, including display rendering and some AI inference tasks. Most recently, vehicle SoC designs have also incorporated specialized neural processing units (NPUs), which offer greater compute power and improved energy efficiency for intensive AI tasks through optimized designs.
As AI workloads become more common, and more demanding, we also expect an increasing demand for NPUs, while demand for GPU capacity for AI loads in automotive will show smaller growth rates. Especially for newer neural network architectures (such as transformer-based models), NPUs are expected to gain additional importance. Ongoing evolution of vehicle electrical and electronic architectures is also likely to reduce the need for DSP capabilities in the vehicle’s primary SoC, since sensor fusion in the ADAS/AD context can also be increasingly incorporated in respective end-to-end ADAS/AD models (in contrast to current rule-based approaches).
The semiconductor industry is currently pursuing two distinct approaches to the provision of NPUs—they are either directly integrated into the SoC itself or offered as a separate building block. Many new disruptors and start-ups are pursuing the latter approach.
For OEMs, separate NPUs offer several advantages, including the option to scale processing performance for different products by adding NPUs as required, a shorter time to market, and lower nonrecurring engineering (NRE) effort. The major downside of the approach is the additional integration complexity involved.
In addition to these SoC components, microcontrollers will continue to play a critical role in the broader automotive hardware stack and are becoming increasingly capable enough of executing lightweight AI models at the edge. Their energy efficiency and cost-effectiveness are additional contributing factors for any real-time use case in this context (such as monitoring tasks).
Market outlook
The acceleration of in-vehicle AI adoption presents a significant opportunity for semiconductor players. McKinsey analysis suggests that the automotive market for more advanced microcomponents (including microcontrollers [MCUs], microprocessors [MPUs], and SoCs based on node sizes of 20nm or smaller) will grow by 24 percent annually over the remainder of the current decade, reaching $18 billion by 2030 (Exhibit 5).
Decision criteria
For OEMs and Tier 1 suppliers, the choice between edge and cloud AI execution remains complex. The selection of approach will require companies to balance multiple, application-specific criteria. Those criteria include performance requirements such as systems output speed, latency, and the ability to scale capacity to take on new AI workloads. Operational requirements include service reliability and availability considerations, long-term maintainability, and data security. Finally, cost and ROI requirements include time to market, hardware costs, and operating costs through the life of the platform.
As Exhibit 6 shows, cloud and edge approaches have differing strengths and weaknesses in each of these areas. As a result, hybrid approaches to AI are likely to dominate vehicle platforms for the foreseeable future.
For the selection of SoCs and software platforms, OEMs and their suppliers are likely to prioritize a combination of hardware capabilities—including computer performance, energy efficiency and scalability, and software features such as flexibility in the selection of AI model types and tooling.
Outlook
Bringing advanced AI use cases into vehicles will require a concerted effort across the entire automotive value chain.
OEMs must continue pushing toward centralized electrical and electronic (E/E) architectures that enable cross-domain functionality and over-the-air (OTA) updates to support continuous feature enhancement. Investments in middleware and hardware abstraction will be essential to promote software component reuse and accelerate development cycles. A strong, empowered E/E architecture function will be critical for the development of future-proof platforms.
Tier 1 suppliers will continue to face pressure from both directions. On one side, OEMs are intensifying their vertical-integration strategies. On the other, semiconductor players are expanding their software capabilities to move beyond chip supply and position themselves as full system providers. To stay relevant, Tier 1s must strengthen their own AI competencies and integration capabilities to maintain their role in the ecosystem.
For semiconductor players, the rise of AI in vehicles brings opportunities and challenges. Chipmakers are shifting from hardware suppliers to solution providers, developing purpose-specific chips for edge workloads while ensuring compatibility with cloud-based tasks for more compute-intensive workloads. To stay competitive, they are expanding software capabilities, enabling flexibility in AI models and OTA updates. We expect to see a continued rise in M&A activity as hardware-centric and semiconductor-focused players seek to expand their software and AI capabilities. Recent deals, including NXP’s acquisitions of TTTech Auto and Kinara, Renesas’ acquisition of Reality AI for edge-AI applications, and Qualcomm’s acquisition of Arriver for building competence in ADAS/AD, underscore this strategic pivot toward building more comprehensive, vertically integrated platforms. Advanced packaging technologies and close collaboration with OEMs, Tier 1s, and hyperscalers are critical to meeting safety, security, and energy-efficiency demands in connected-car ecosystems. At the same time, the competitive landscape for semiconductor players is being reshaped by a growing cohort of new entrants focused on developing chipsets tailored to specific AI workloads—particularly in the domain of NPUs. On the intellectual property (IP) front, these developments are gaining momentum, though the breadth and maturity of NPU-related IP still lag behind the well-established libraries for traditional x86, ARM, and RISC-V CPU and GPU architectures.
As in recent years, technology companies, including hyperscalers, start-ups, and disruptors, are expected to gain further traction by contributing their AI expertise for both in-vehicle and off-vehicle applications.
Ultimately, all stakeholders—OEMs, Tier 1s, and semiconductor companies alike—must engage in close codevelopment to ensure that the specific demands of automotive AI, such as safety, security, and energy efficiency, are fully addressed. Collaborative development efforts in the semiconductor space are likely to increasingly align with industry-wide standardization initiatives (such as Universal Chiplet Interconnect Express [UCIe]), as well as with research programs and working groups driven by neutral institutions like the Interuniversity Microelectronics Centre (Imec). These partnerships can help establish common frameworks and accelerate innovation across the ecosystem.