The Edge Moves In | Veyago research

Abstract

For most of the modern era of artificial intelligence, capable intelligence has meant the cloud, and the cloud has meant that a person's data leaves the device they hold in order to be understood. That arrangement forced a trade-off that has quietly governed the design of every privacy-conscious product: a tool could be genuinely intelligent or it could be genuinely private, but not, in the same feature, both. This report argues that the trade-off is now collapsing for an important and growing class of work, and that Apple's announcements at its 2026 Worldwide Developers Conference are best understood not as an isolated product launch but as the consumer-facing inflection point of a broader, multi-vendor migration of intelligence from the data centre to the device. The report situates the WWDC 2026 announcements within the documented industry shift toward on-device inference, examines both the headline consumer assistant and the less visible but more consequential developer substrate beneath it, and then sets out, with deliberate care, the limits that the excitement tends to obscure, namely a persistent capability ceiling for frontier reasoning, a small working context, hardware gating that fragments the user base, regional unavailability, and a cloud tier whose reliance on a rival's models complicates the tidy on-device narrative. The central analytical claim is narrow and, the report believes, defensible: the capability-privacy trade-off has not vanished, but for latency-sensitive, privacy-critical, and structured personal-data tasks it has narrowed to the point where a product no longer has to choose, and this changes what software built to run locally can honestly promise. The implications for a studio whose products are built on an on-device, privacy-first thesis, with the life-administration tool Kept as the clearest applied case, are drawn out in the closing sections, alongside the open questions the shift leaves genuinely unresolved.

1. Introduction: the trade-off that defined the cloud era

The defining constraint of consumer AI in the years following the arrival of large language models was not capability but location. The models that could do anything genuinely useful were too large to run anywhere but a data centre, which meant that using them required sending one's words, and often one's documents and images, across a network to a server owned by someone else. For most consumer applications this was accepted without much thought, because the utility was immediate and the cost was abstract. But for any product whose entire proposition rested on privacy, the arrangement was a genuine bind, because the very act of making the product intelligent meant surrendering the data it was supposed to protect.

This produced a trade-off that has shaped a great deal of careful software design. A product could be intelligent, by reaching out to a capable model in the cloud, or it could be private, by keeping the data on the device and forgoing that intelligence, but it could not easily be both in the same feature. Privacy-first products tended to resolve the tension by staying simple, offering storage and structure rather than understanding, because understanding was the thing that lived on the far side of the network. The thesis of this report is that this constraint, which felt permanent, was in fact a temporary artifact of where the models happened to run, and that its loosening is now well enough underway to change what privacy-first software can do.

2. The shift to the edge

It would be a mistake to read the events of WWDC 2026 as a single company's announcement, because the migration of intelligence toward the device is a broad and well-documented movement across the industry, and Apple's contribution is better understood as the most visible consumer expression of it than as its origin. The underlying shift is from training to inference, from the one-time work of building models to the continuous work of running them in production, and inference is precisely the workload for which the cloud's disadvantages bite. As coverage of the edge-AI trend has put it, public clouds offer elasticity but impose latency, raise data-privacy concerns, and carry real per-query cost, and these are exactly the pressures pushing inference outward toward local hardware.

What has made the move possible is a convergence of advances that, taken together, let usefully capable models run on devices people already own. A body of recent work describes a focus on what one industry analysis calls the Goldilocks zone of roughly three to thirty billion parameters, models large enough to be genuinely useful yet small enough to run locally, and characterises the move as an architectural inflection point rather than a passing trend, driven by latency, privacy, cost, and user-experience demands that cloud inference cannot satisfy. The enabling techniques are by now well established in the literature: quantization, which reduces the numerical precision of a model and can cut its memory footprint several-fold with limited quality loss, alongside pruning and knowledge distillation, and architectural choices such as mixture-of-experts designs that decouple capability from raw parameter count. On the hardware side, the neural processing units now embedded in consumer silicon, including Apple's Neural Engine, Qualcomm's Hexagon, and Google's Tensor designs, perform the low-precision arithmetic these models need at real-time speeds and low power, with commentary noting that a current flagship phone's neural engine is already fast enough to run a several-billion-parameter model at useful speed.

The economic logic reinforces the technical one. Analysis of the comparative economics observes that cloud inference loses money at scale and is, for now, substantially subsidised by its providers, whereas on-device inference carries essentially zero marginal cost because it runs on hardware the user has already bought, a gap that pushes developers and enterprises to treat local processing as a structural alternative rather than a curiosity. And the movement is genuinely multi-vendor. Google has pushed small models to Android through its on-device tooling and its Gemma model family, framing the elimination of cloud connectivity as a path to faster and more private applications; Microsoft has invested in its own in-house models; and a Meta researcher's 2026 survey of on-device language models lays out the balanced case plainly, noting that data which never leaves the device cannot be breached in transit or logged on a server, that on-device inference can be both fast and always available regardless of connectivity, and, crucially, that the long-standing catch has always been capability. That last point is the honest hinge of this whole report, and it is taken up directly in section five.

3. WWDC 2026 as the consumer inflection point

Against that backdrop, the significance of Apple's 2026 Worldwide Developers Conference is that it placed the on-device-versus-cloud question, and an explicit privacy framing of it, at the centre of the most widely used consumer assistant in the world. The headline was Siri AI, described across the coverage as a ground-up rebuild of the assistant, able to hold genuine back-and-forth conversation, draw on a person's own messages, mail, and photos for personal context, understand what is on the screen, answer current questions from the web, and take action across applications, with a dedicated application that syncs its conversation history across a person's devices through their iCloud account. After two years in which Apple had promised a more capable and personal Siri and repeatedly failed to ship it, the 2026 conference showed the rebuild rather than another roadmap.

The architecture beneath it is the part most relevant to this report. Apple's next-generation foundation models run, by the company's account, both on the device and through its Private Cloud Compute servers, with Apple maintaining that when a request is handled in the cloud the user's data is neither stored nor made accessible to Apple or any third party. The on-device side includes a new model that can understand speech and read both text and images, while the most demanding cloud tier is, in Apple's own description, comparable in quality to frontier models and runs on Nvidia hardware in Google's cloud.

This is also where a research-grade account has to handle a genuine nuance rather than smoothing it over. Apple confirmed that its next-generation foundation models were custom-built in collaboration with Google and its Gemini models, a striking move for a company that has long sold in-house silicon and on-device processing as a core advantage, and much of the press framing reduced this to the shorthand that Siri is now powered by Gemini, with some reports describing a custom large model and citing, on the authority of Bloomberg's pre-conference reporting, an arrangement expected to cost Apple on the order of a billion dollars a year, a figure Apple has not confirmed. Apple's own framing is more particular: that the models are custom and co-developed, that they run inside Apple's own orchestration and privacy architecture, and, in subsequent clarification from its software leadership, that Apple uses none of the models Google deploys to its own customers. The accurate reading sits between the two simplifications. The cloud brain owes a real and acknowledged debt to a rival's technology, and the architecture is nonetheless classically Apple in that the company owns the orchestration, the integration, and the privacy story even when the underlying foundation model does not originate entirely in-house. For the purposes of this report, the precise commercial and provenance details matter less than the structural fact that Apple has split intelligence between a local tier and a cloud tier and has made the privacy properties of that split a headline feature.

4. The developer substrate, which matters more

The new Siri is the announcement that drew the attention, but for anyone building software the more consequential half of WWDC 2026 was the developer substrate beneath it, because that is what determines whether third-party applications, and not just Apple's own, can become intelligent on the device. Three pieces of that substrate matter most.

The first is that Apple made its App Intents framework the way applications participate in the new assistant and signalled the deprecation of the older SiriKit approach, which means that an application's content and actions are exposed to Siri and to the system's personal-context search through a route that the developer maintains. The second, and the most important for privacy-first software, is that this personal-context search is backed by a semantic index that operates on the device, so an application can let a user ask natural-language questions about their own data and have those questions answered locally rather than by surrendering the data to a remote service. The third is that Apple substantially expanded its Foundation Models framework, the developer interface to the same on-device model that powers its own intelligence, giving developers a local model with structured-output generation, the ability to read images, a built-in path to local retrieval over a person's own indexed content, and on-device tools for tasks such as text recognition, all running on the device with no data leaving it and no per-query cost. The framework's on-device working context, however, is modest, a point this report returns to among the limits. Alongside it Apple introduced a way for developers to run their own models locally, and built an explicit human-in-the-loop confirmation system into the action framework, so that an assistant-triggered action with meaningful consequences is confirmed by the person rather than performed silently.

The reason to dwell on this less glamorous layer is that it is the part that generalises. A capable assistant that only Apple controls is a feature of one company's products. A documented, on-device substrate that any developer can build on is the thing that lets the broader category of privacy-first software stop choosing between intelligence and privacy, because it supplies precisely the capability that previously lived only in the cloud, and supplies it locally.

5. The honest limits

A report that presented only the case for the shift would be advocacy, and the discipline this work tries to hold to requires setting out, without softening, the limits that the announcements and the surrounding enthusiasm tend to obscure. There are five worth naming.

The first and most fundamental is capability, and it is the limit the rest of the industry's own researchers are most candid about. As the Meta survey of on-device models states plainly, the long-standing catch with edge inference is that if a use case demands frontier reasoning, broad world knowledge, or long multi-turn conversation, the cloud remains the better choice, and on-device is most compelling for tasks that are latency-sensitive, privacy-critical, or high-volume rather than for the most demanding cognition. The local models that have made this shift possible sit in the few-billion-parameter range, and they are useful precisely because they are bounded, not because they rival the largest cloud systems.

The second is a concrete expression of the first: the on-device model's working context is small. Apple's on-device framework offers developers a modest token window, far smaller than the long contexts available from cloud models, which means that tasks involving long documents or extended histories must be handled by retrieval and careful budgeting rather than by feeding everything to the model at once. This is a real engineering constraint on what local intelligence can attempt.

The third is hardware. Apple's own most capable on-device model, and the most-promoted of Siri's new abilities, require its newest and most powerful silicon, while the broader set of Apple Intelligence and Siri AI features is restricted to a defined and relatively recent range of devices. The practical consequence is a fragmented user base in which the quality of local intelligence depends on which device a person happens to own, and the most impressive capabilities reach only the newest hardware.

The fourth is regional. The new consumer assistant launches as an English-first beta and will not be available in some major markets at launch, with China named explicitly among the exclusions, which means that the consumer-facing half of the shift arrives unevenly across languages and jurisdictions even as the underlying developer capabilities are more broadly available.

The fifth is the nuance already raised in section three, which is also a limit on the purity of the on-device story. Apple's architecture is a hybrid, not a wholesale relocation to the device. Its most capable tier runs in the cloud, leans on a rival's models, and executes on third-party hardware, and the genuinely private, fully local processing applies to a subset of work rather than to everything the assistant can do. The honest characterisation is that intelligence has moved partly, not entirely, to the edge, and that the privacy properties are strongest precisely for the local subset.

6. What it means for software built to run locally

With both the shift and its limits in view, the analytical contribution of this report can be stated narrowly, which is the only way it is defensible. The trade-off between capability and privacy has not disappeared, and for the most demanding cognition the cloud still wins. But for a specific and growing class of work, namely tasks that are latency-sensitive, privacy-critical, and structured around a person's own data, the trade-off has narrowed to the point where a product no longer has to choose. Reading a document and extracting its key facts, answering natural-language questions over a person's own records, turning a messy input into a typed and validated draft, and watching a set of dates and surfacing the right item at the right moment are all tasks that now sit comfortably within what a local model and a local index can do, and all of them previously required the cloud. The thing that changes, then, is the honest promise a privacy-first product can make. It can now offer understanding, and not merely storage, without asking the user to send their most sensitive information anywhere.

This is directly material to a studio whose products are built on an on-device, privacy-first thesis, and it bears most obviously on the life-administration tool Kept, whose proposition is a private system of record that keeps a sensitive map of a person's possessions, obligations, and documents on their own device. For most of the cloud era, such a product faced the bind described at the outset, in which making the records intelligent meant surrendering their privacy, and the only safe option was to stay a passive store. The developer substrate announced at WWDC 2026 dissolves that bind for exactly the tasks such a product needs, the local reading of a receipt, the local answering of a question about what one owns or owes, the local drafting of an administrative action for a person to approve, and it does so without compromising the architectural privacy that is the product's reason for existing. The detailed mapping of these capabilities onto that product is the subject of separate applied work and is not repeated here; the point for this report is the more general one, that the shift it describes is what makes a privately intelligent system of record buildable at all, and that the same logic extends across a catalogue built to run locally rather than to harvest data. The broader strategic reading is that privacy-first software has spent years accepting a handicap that was really a property of where models ran, and that the handicap is now lifting for the workloads that matter most to it.

7. Open questions

Several questions the shift raises are genuinely unresolved, and intellectual honesty requires naming them rather than assuming their answers.

The first is whether the capability gap between local and frontier models continues to narrow or settles into a durable tier behind the cloud. The trajectory of small-model quality has been steep, but whether on-device models close the distance for harder tasks or remain a useful tier below the largest systems is unknown, and it determines how much a locally intelligent product can eventually attempt.

The second is whether users actually notice or value the distinction between on-device and cloud processing. The same body of consumer research that documents real privacy concern also documents a persistent willingness to trade privacy for convenience, and it is an open question whether the privacy advantages of local processing translate into adoption or whether users remain indifferent to where their data is handled so long as the result is convenient.

The third is whether the developer ecosystem adopts the local substrate at scale or defaults to cloud APIs for the extra capability they offer. The substrate makes private intelligence possible, but possibility is not adoption, and the pull of more capable cloud models is real, so the degree to which builders actually choose the local path is yet to be seen.

The fourth is whether the privacy guarantees hold up under sustained scrutiny, given that the real architecture is a hybrid and that the strongest guarantees apply only to the local subset of work, which leaves open how the cloud tier's assurances fare under independent examination over time.

The fifth is whether hardware gating produces a lasting two-tier population in which local intelligence is meaningfully better for owners of the newest devices, and what that means for products that aspire to serve everyone rather than only those on recent hardware.

8. Methodology and limitations

This report is an analytical synthesis rather than a work of primary measurement. It draws on Apple's own WWDC 2026 materials and developer sessions, on contemporaneous reporting of the conference from a range of technology outlets, and on a body of industry analysis and academic work concerning the broader move toward on-device inference, and it combines these with original argument about the significance of the shift.

Its claims carry the limitations of that material. A portion of the Apple capabilities described were announced and documented as of mid-2026 with some still reaching general availability, so specifics should be verified against current platform documentation before being relied upon. The commercial and provenance details of Apple's arrangement with Google rest substantially on secondary reporting and on figures Apple has not confirmed, and are presented here as reported estimates rather than as established fact. The cross-vendor comparisons that situate Apple within the wider movement are directional rather than precise, drawn from analyst commentary and vendor materials that each carry their own framing and interest. The economic claims about cloud and on-device inference reflect a particular moment in a fast-moving market and should be read as characterising a direction rather than fixing a number. And the report's central analytical claim, that the capability-privacy trade-off has collapsed for a class of workloads, is an argument about significance rather than a measured result, offered as the most defensible reading of the evidence rather than as a demonstrated fact.

Stated plainly, what is well established is that intelligence is migrating, partly and unevenly, from the data centre to the device, that this is a multi-vendor movement with clear technical and economic drivers rather than a single company's initiative, that Apple's WWDC 2026 announcements are its most prominent consumer expression and include a developer substrate that lets third-party software become intelligent locally, and that real limits in capability, context, hardware, and geography persist. What remains argument, and what this report offers as such, is the claim that this shift meaningfully changes what privacy-first software can promise, and that products built to run locally are the natural beneficiaries.

References

Apple, Apple introduces Siri AI, a profoundly more capable and personal assistant, Apple Newsroom, June 2026, https://www.apple.com/newsroom/2026/06/apple-introduces-siri-ai-a-profoundly-more-capable-and-personal-assistant/

Apple, WWDC26 iOS guide, Apple Developer, https://developer.apple.com/wwdc26/guides/ios/

Apple, What's new in the Foundation Models framework, WWDC26 session 241, https://developer.apple.com/videos/play/wwdc2026/241/

Apple, Discover new capabilities in the App Intents framework, WWDC26 session 345, https://developer.apple.com/videos/play/wwdc2026/345/

Apple, Explore advanced App Intents features for Siri and Apple Intelligence, WWDC26 session 343, https://developer.apple.com/videos/play/wwdc2026/343/

Apple, Secure your app: mitigate risks to agentic features, WWDC26 session 347, https://developer.apple.com/videos/play/wwdc2026/347/

TechRepublic, 10 Biggest Apple WWDC 2026 Announcements, https://www.techrepublic.com/article/news-11-biggest-announcements-apple-wwdc-2026/

Popular Science, Everything you need to know about Apple's 2026 WWDC keynote announcements, https://www.popsci.com/gear/apple-wwdc-announcements-2026/

CNBC, WWDC 2026: Apple makes its big Siri AI reveal, https://www.cnbc.com/2026/06/08/apple-wwdc-2026-live-updates.html

TechCrunch, WWDC 2026: Everything announced on Siri AI, iOS 27, Apple Intelligence, and more, https://techcrunch.com/2026/06/09/wwdc-2026-everything-announced-on-siri-ai-os-27-apple-intelligence-and-more/

InfoWorld, Edge AI: the future of AI inference is smarter local compute, January 2026, https://www.infoworld.com/article/4117620/edge-ai-the-future-of-ai-inference-is-smarter-local-compute.html

Semiconductor Engineering, The On-Device LLM Revolution, February 2026, https://semiengineering.com/the-on-device-llm-revolution/

MindStudio, On-Device AI vs Cloud AI: Why the Economics Are Shifting, https://www.mindstudio.ai/blog/on-device-ai-vs-cloud-ai-economics

Vikas Chandra (AI Research, Meta), On-Device LLMs: State of the Union, 2026, https://v-chandra.github.io/on-device-llms/

Edge AI and Vision Alliance (Qualcomm), AI Disruption is Driving Innovation in On-device Inference, https://www.edge-ai-vision.com/2025/02/ai-disruption-is-driving-innovation-in-on-device-inference/

Smaller, Smarter, Closer: The Edge of Collaborative Generative AI, arXiv, https://arxiv.org/pdf/2505.16499