#143: Artificial Intelligence est omnis divisa in partes tres
August 5, 2024: 18 min read -- maybe you can use AI to summarize?
[Generated by Gemini Advanced]
Introduction
The impact of artificial intelligence this year is hard to understate: it has been the primary factor for global equity markets for some time, led by the nearly pure-play Nvidia, briefly a member of the exclusive $3 trillion market cap club. The sheer amount of effort going into AI has heightened the importance of all the ancillary businesses and fields — the demand for building large data centers to run AI models has also catalyzed a new surge of interest in nuclear power because of the enormous and increasing power density necessary for cutting edge GPUs and CPUs. The importance of semiconductor technology and wafer fabrication in particular has elevated Taiwan from a mere cog in the tech supply chain to a nearly irreplaceable source of critical technology, and thus raised from a pawn to a queen on the global political chessboard.
Over the last month or so I’ve been going down the rapidly expanding rabbit hole of the various implications of AI, and thought I’d try to condense and summarize some of my thoughts here — as the field is rapidly maturing and broadening, this note will partly be a summary of the field as I currently (mis)understand it, and partly an invitation to a variety of AI-centric conversations, since brainstorming is perhaps the fastest way of advancing one’s understanding of new technologies.
My previous AI post is here — and a (rather generic!) Google Gemini Advanced summary is below:
The article discusses Artificial Intelligence (AI), particularly focusing on Generative Pre-trained Transformers (GPTs) like ChatGPT. It explores the capabilities and limitations of GPTs, emphasizing their dependence on pre-existing data and inability for logical reasoning. The article delves into the potential impact of AI on various aspects of society, including employment, politics, and even existential risks. It highlights the rapid advancement and adoption of AI, while also raising concerns about its potential misuse and the need for regulation. Additionally, the article touches upon the development of AI in China and compares it with Western AI systems. Overall, the article provides a comprehensive overview of AI, its potential benefits and drawbacks, and the challenges it poses for society.
Proposing three tiers of AI
Since my last AI post, I’ve come to appreciate more of the subtlety of the AI world, and particularly the difference in the tiers – many people think there are just two (general and edge AI), but I hope to make a strong argument for (at least) three.
The differences between the tiers is significant enough that I think it greatly increases our comprehension of the AI universe once we can differentiate the dynamics and use cases of those tiers. This is particularly true because at the moment, the world is primarily focused on Tier 1, but the real promise of AI is arguably at the lower levels, where the players and dynamics are fundamentally dissimilar.
In my mind, the tiers are divided by specific characteristics which determine the appropriate tradeoffs to make in implementation:
Tier 1 is characterized by generalized knowledge and problem solving – the unbounded promise of AI for the widest group of consumers, which includes multi-modality (speech recognition, images, etc) – therefore the tradeoffs are longer latency and less confidentiality.
Tier 2 is characterized by the need for confidentiality and specific tasks, in service of corporate entities for defined corporate meta-tasks – the tradeoffs are specific rather than general applicability, and server-level latency.
Tier 3 is characterized by the need for minimal latency and constant availability, for devices on the edge which need constant and immediate micro-decisions. The tradeoff is lack of general applicability.
Tier 1: Generalist: Hyperscaler Battle Royale
This tier, which dominates most of the current attention in the Gen-AI world, is characterized by:
Instrument: pre-trained multi-modal Large Language Models (LLMs) like ChatGPT / Claude / Gemini / Llama / Perplexity / etc. These generalist models are capable of answering an exceptionally wide variety of questions — as their complexity increases, their scope and answer quality has continued to improve by leaps and bounds. They are starting to become multi-modal, ie. incorporating speech recognition, multiple languages, as well as audio, image, and video capabilities.
Applicability: generalist — the ultra-large user bases of the hyperscalers want the broadest applicability, from answering high school algebra homework to transcribing and translating meeting notes
Training time and cost: in order to cover as wide a set of topics and capabilities as possible, these state of the art models usually have billions to trillions of parameters, requiring training times measured in months or years – hence the need for lots of the fastest processors in large AI server farms, which account for most of the billions being spend to advance AI.
Drivers: hyperscalers — Google, Microsoft, Meta, Amazon, Salesforce, et al. The drivers of this tier are companies with large client bases which aim to be the primary access point for the Internet and knowledge in general, because it’s clear that Google’s PageRank will no longer be the preferred user solution — the new Internet portals will be AI-powered because they will help users find solutions which are more relevant, more comprehensive, and in far less time.
Primary users: broad consumer bases, since the new offerings are an upgrade of the portal experience
Cost to users: although the industry started with, and still hopes for $20/user/month, this expectation is fading fast. Earlier generations of ChatGPT are available for consumers for free (embedded in the Microsoft Bing search engine, among others), and Meta has already enabled the (current state of the art) Llama 3.1 LLM on its Whatsapp messaging system.
Inference latency: faster is better, but immediate responses are not necessary, critical, or economic, so it makes sense to run most of these inquiries on the cloud rather than locally.
Privacy issues: currently marginal – the benefit to consumers appears to be far more beneficial than their need for privacy. Outside of smarter search inquiries, AI-enabled assistants like Siri do not have critical privacy issues. Conversely, personal assistants become more capable when they memorize more personal data, so they can reflect historical preferences and become ‘smarter.’
Current level of understanding: reasonably high, because early versions of these tools are already being used – most of the focus in AI is on the promise of generalized models for consumers (smart assistants, customer service chatbots, etc) rather than the other tiers.
The dynamics of this tier of investment and interest are a combination of the promise of the capabilities of the LLMs, and the strategic implications of this for hyperscalers. In comparison to Internet 1.0, where there was a high degree of fragmentation in terms of investment into the space, and quite a lot of the investment was coming from startups, this iteration is being led by the tech incumbents (they also happen to be the largest non-oil companies in the world) because this technology represents a credible existential threat or disruptive opportunity to their existing business models.
Put another way, ignoring the area of AI entirely, or even developing a clearly inferior product (AltaVista vs Google Search, for instance), is almost certainly a threat to their futures, both because the technology already appears to represent a dramatic leap forward, and because their hyperscaler competitors are investing heavily in the technology. In addition, most of the hyperscalers have an important advantage in terms of data access, which represents a significant barrier to entry by subscale entrants.
For instance, Microsoft currently generates roughly $49 bn annually from its Office 365 product, but if Google were to offer a superior AI-enabled integrated Workspace, this would represent an important threat to Microsoft’s near monopoly in this area.
Tier 2: Mid-level: Tuned for corporate applications
This tier is the all important domain of corporate use, but I separate it from the general and edge tiers because I feel there are strong arguments why corporates would want to use something in the middle (should this be mid-edge?). The argument against widespread corporate use of Tier 1 (hyperscaler) AI is predicated on the need for confidentiality; the argument against them using primarily Edge AI is that deployment at the employee device level is unrealistic because of the sheer expense and lack of commensurate benefit, although a few corporate edge applications are sure to emerge (because of latency issues, below)
My current argument for the bulk of corporate AI use is to fine-tune one of the LLMs (or non-LLM) AIs – perhaps even a cheaper older generation – and deploy the model at the corporate cloud level, where it can both be streamlined and optimized for specific applications, while protecting corporate privacy and reducing AI deployment costs. The idea that every corporate will want to deploy a state of the art Nvidia server farm is uneconomic overkill for all but the largest firms.
Instrument: fine-tuned LLMs and other (non-LLM) domain specific AIs, which might not be the latest versions; multi-modality will be driven by actual use cases
Applicability: specific to a wide spectrum of corporate applications
Training time and cost: fine-tuning takes far less time than pre-training an LLM because it is focused on very specific corporate tasks (evaluating an insurance claim, or a loan application, for instance), and is starting with a pre-trained generalist model. Corporates, with an eye on ROI, are likely to use far more economic older generation chips and simpler models if they can fulfill the specific missions; they are very unlikely to demand the level of compute power currently demanded by the hyperscalers.
Drivers: corporates who want to improve efficiency / throughput / customer experience; save costs / improve decision making / reduce waste etc.
Primary users: corporate employees, and customers
Costs to users: corporates will bear the costs for their employees, in order to increase productivity and/or functionality; for corporate clients, the (non-zero) cost will be a function of the usefulness of the product offering, and the competitive environment.
Inference latency: similar to Tier 1, faster is better but immediacy is mostly not required
Privacy issues: guarding proprietary corporate information is one of the most critical characteristics of this tier. One can imagine a worst case scenario where a general AI model was trained with one corporate’s data, which could enable competitors to query its detailed strategy and tactics. Instead, user queries are later fed back to further fine the proprietary model (but it doesn’t have to happen in real time).
Current level of understanding: because corporates have yet to deploy AI on a widespread basis, this tier, which is arguably more important than Tier 1 even if it is overlooked, is key to future of AI – like the Internet, AI cannot remain a just cool consumer tool, it has to penetrate deeply into corporate implementations.
One of the major disconnects in the AI world at the moment is that everyone seems exclusively focused on Tier 1, both because that is where all the sensational headlines are, and it is possible for nearly everyone to access those models (and go “WOW!”). But outside of the hyperscalers, there has been only modest deployment of AI at the corporate level as of yet, both because of privacy and cost issues. JP Morgan is making some tentative moves, to give some of their employees AI tools, while also respecting corporate confidentiality; some healthcare companies are using AI-powered diagnosis models. And yet this is where the promise of AI will eventually be borne out – corporates will make decisions to deploy AI after doing a proper cost benefit analysis, this is not yet an existential issue for them (though in time, it may become one if their competitors can develop a significant AI-generated advantage).
Tier 3: Edge: Low latency specific devices
The dominant reason to drive AI to the edge is because of latency issues. I’m not counting pseudo-AI edge devices which query AI servers on the cloud (because latency is longer, fragile, and highly variable). Perhaps the most accessible example here is a Level 5 self-driving car, using a pre-trained local inference engine created by a dedicated supercomputer like Tesla’s Dojo, around the single task of driving and navigation. Latency issues (and network availability concerns) would make constant queries to any cloud unworkable and unsafe, but economics and size issues would make it nearly impossible to deploy at the device level without eliminating anything which did not directly relate to the specific task (but a natural language interface might be useful, for example).
Wearable health devices might come in two flavors – expensive low latency AI devices for monitoring of critical health conditions like blood sugar or heart arrhythmia, and cheaper cloud-enabled devices which upload non-critical personal data like steps walked or calories expended to the cloud for general health assessment. In a personal version of self-driving cars, I can also imagine AI-enabled exoskeletons like those made by Cyberdyne, which would both enhance our physical capabilities, but also use AI and sensor / vision technology to prevent elderly people from falling by constantly calculating center of gravity, moments of inertia, probable coefficients of friction, etc. in real time. Generally this tier will be very device focused – one can imagine most vehicles eventually relying on edge AI-powered autopilots which will be connected to the cloud but which would be fully capable of operation, even without connectivity.
Instrument: fine-tuned non-LLM domain specific AIs interfacing with edge sensors (IoT)
Applicability: very specific low latency tasks like level 5 self-driving cars
Training time and cost: probably less time than tier 1, more time than tier 2 — because many of these applications will be time critical, the AIs will need large data sets for training until they are certified, but since they will be domain specific, they will take less time than tier 1. Tesla’s Dojo facility to develop self-driving AI is purported to cost $500 mn, but that’s minor compared to what hyperscalers are spending.
Drivers: corporates like Tesla will use embed edge AI for consumer applications like self-driving cars; the entire spectrum of land/sea/air/space vehicle guidance could eventually be powered by ultra low latency edge AI.
Primary users: products with embedded edge-AI solutions, the AI will be an integral part of the functionality of the product.
Inference latency: absolutely critical, so these AIs need to make near-optimal decisions without referring to the cloud (but the cloud might still be used to update the model or add context like traffic).
Privacy issues: not as critical as tier 2; and actually, the consumer may have no direct access to the data itself; the corporate might strategically retain all rights to the data in order to train future generations of AIs.
Current level of understanding: modest — the public is gradually becoming aware of Tesla’s self-driving solution, although it is probably less aware that the self-driving function is an actual real-world application of edge AI.
AI’s impact: confusion and debate
There has been much ado about the eventual impact of AI and whether the $400-600 bn raining down on the sector will eventually turn out to be money well spent or not. My thoughts are these:
The current hyperscaler battle is existential: so far I haven’t seen anyone characterize the hyperscaler Battle Royale over AI as an existential competition, rather than one about eventual ROI. In a variety of private conversations, however, the existential argument seems well accepted — and I can’t really explain the disconnect: from the Economist, to the major investment banks, no one seems to want to come out on record to say that the current LLM war is an existential struggle between the world’s largest tech companies, which will not end well for some of them.
I think the one way I think about it is when we go through a curve like this, the risk of under-investing is dramatically greater than the risk of over-investing for us here, even in scenarios where if it turns out that we are over-investing. – Sundar Pichai, Google
Sundar doesn’t explicitly say that AI is an existential risk / opportunity, but it’s heavily implied.
Within the hyperscalers, Google and Microsoft are somewhat hedged because even if they overspend on AI, their data pools are so vast, they can probably deploy much of that AI infrastructure internally. Amazon, by contrast, is building AI infrastructure capacity based on the promise of enormous future corporate customer needs for AI, which seems to have worried investors enough to drop their stock price nearly 9% on Friday (despite most segments beating expectations). Apple is staying out of the frenzy by partnering with OpenAI, and using AI functionality to drive their upgrade cycle (even though they could easily allow older devices to access the same functions), which is cost-efficient and will probably be annoyingly effective. Salesforce is deploying AI into its CRM universe, but it’s hard to say whether that’s a game changer or merely a rational defense against new AI-powered competitors.
The OpenAI declaration that they are developing an AI-powered search engine is a wooden stake aimed at the heart of Google. That said, with the abundant profits they are throwing off from their cash cows, the hyperscalers can well afford to fritter away a few hundred billion in pursuit of what is clearly a promising technology – but their earnings (and valuation multiples) will get downgraded and this could cause a derating of underperforming hyperscalers. [Actually, the downgrade appeared to happen this week, sadly before I could publish this article, which I’ve been writing for most of July.]
The major economic battlefield is for corporate applications (but it is just beginning): whether AI changes the world or not will be a function of how deeply it penetrates the corporate world, and whether it enables organic growth, or just cost reduction. The average corporate has to focus on P&L — at this early stage, they cannot commit a large chunk of their IT or R&D budget to AI without having a strong sense that it will eventually pay off (and the sooner the better) — not only is that argument hard to make, but the field of AI, happily sponsored by the hyperscalers, is moving so quickly that the high efficiency play is close monitoring, rather than a full commitment, until the benefits become more immediately tangible. What seems clear is that corporates will spend nowhere near what the hyperscalers are spending on AI (partly because of the next point).
Hyperscaler competition is driving prices down: the cutting edge LLMs initially introduced $20/month subscription models, but competition is in the process of bringing that closer to zero (Llama 3.1 is available for free on WhatsApp, for instance). The diversity of well-funded competition should mean that consumers will benefit from hyperscaler largesse, and given their functional similarities, monopolistic pricing (like Microsoft’s monopoly on MS-DOS/Windows, or Nvidia's AI ecosystem) is unlikely to be sustainable. For many specialized corporate applications, previous generation AIs/LLMs will be adequate, at a fraction of the cost of state of the art models.
Many of the AI startups will regret their valuations (or rather their investors will): the enthusiasm over AI is bubbling over into the startup space, where OpenAI, Anthropic, and others have quickly reached unprecedented valuations, which requires perfect execution (or a silly acquirer) to justify those valuations, in a world where the revenue generating ability of an LLM is far less than it costs to develop (because of intense competition).
The LLM development / price war is great for corporates, kind of: the more capable the models, and the less expensive they are, the higher the probability the corporate world will find uses for them, because the general driving force of the corporation is profitability. However, since I suspect a good deal of the corporate applications may not be LLMs (because they will be in very specific verticals), the buildout of those applications might be hindered because the cataract of LLM investment means less effort helping corporates build the focused tools where AI is likely to have the greatest impact. Additionally, corporate investment in AI might impact other spending in IT, so not all of the AI spend will be additive.
Final words
There’s a lot more to say, so I expect to revisit this topic periodically, both to update for the relentless progress of the field, as well as to correct for whatever misguided ideas I may have held.
My timing was close but a little late – the theme in financial markets this past week was very much “are the hyperscalers overspending?” – which impacted their stock prices in a dramatically downward fashion. At least investors are starting to understand that it is an existential issue with a negative impact on future results, even if the analysts still don’t seem to get it. That said, I’m more optimistic about corporate applications of AI, even if corporates will spend far less on it than the hyperscalers. I think it will be proven to be immensely valuable in certain specific areas, but more of a ‘nice-to-have’ in the vast majority of the corporate world, and there will be a strong pattern of “return on investment” in the corporate Tier 2 than the hyperscaler Tier 1.
For many applications of Tier 3, AI models will be at the heart of the product – able to react more optimally and far faster than any human, with no loss of concentration or other source of impairment. In the beginning, the AIs will have human supervisors, but as the edge cases are solved and become negligible, hopefully the ratio of supervisors to devices will diminish radically over time.
However, all is not roses and ambrosia – both the actual and perceived impact of AI on jobs, markets, economies, and politics will be volatile and perhaps violent – something I hope to tackle in a later issue. I believe the killer app of AI has already been identified, but it doesn’t reside at the corporate or consumer level, but one which will pit elites vs the masses.
Do favor me with your comments and criticism if at all possible, these are only my current thoughts, and I’m sure I still have much to absorb and learn.