Breakdown of the Generative AI Stack: Where are startups best positioned to win?

Daniel Applewhite
7 min readApr 4, 2023

--

The companies that will win in this space, will have the greatest access to quality data and distribution.

Data is more valuable than ever, and “Big Tech” owns the majority of it. What does this mean for startups building in the space? For VCs that are investing? It means that exit opportunities will look different, and speed to market will be supremely important.

  • Hot Take #1: Startups in this space will see an increasing number of acquisitions instead of IPOs.
  • Hot Take #2: The barrier to creation and cost to create will decrease significantly, especially in software engineering expenses. There will be less of a need for expansive engineering teams, Junior entry-level engineers will decrease in volume, and the productivity of senior engineers will greatly increase.
  • Hot Take #3: Data repositories and companies will increase in value for acquisitions. Companies like Reddit, Wikipedia, Tumblr, Medium, and Quora all contain massive amounts of user-generated data to support the training of foundational models.

Generative AI: Where are companies building?

(source) Linus Lee

User-Agents:

  • Still in a nascent stage with general ai still needing to improve before you have a stand-alone virtual assistant capable of replicating a human and directing user actions across the stack
  • Includes: Responding to emails, logging the notes in salesforce, triggering a slack to customer success, etc)

User-facing apps: Face extreme disruption from the incumbents listed above.

  • Well-known user-facing companies using Gen AI to provide customers with function and/or sector-specific tools.
  • Examples of disruption:
  • Microsoft releases a notion copycat with access to more data + greater distribution.
  • Hubspot or Salesforce could release a Jasper AI copycat with greater access to data and distribution.

Middleware: Stands to be the most open space for new startups to compete.

Middleware is the space where companies are building solutions to help apply foundational models to category-specific applications and to help tailor value creation to specific use cases. Two areas of Middleware benefit from the tailwinds of massive investments from Incumbents:

  • Adapt: Tools to adapt foundational models to specific use cases through support with prompt engineering, data management, and re-training generalist models on targeted data.
  • Deploy, Optimize, and Monitor: Tools to help companies test and improve performance

Platforms: Will be very difficult for new entrants to compete.

  • OpenAI and GoogleAI are at the forefront with Anthropic and HuggingFace releasing their own models. I expect this space to be fairly consolidated and owned by these early entrants.
  • Anthropic: An AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems.
  • Hugging Face: Develops tools for building applications using machine learning. Its Transformers library is built for natural language processing applications and its platform allows users to share machine learning models and datasets.

Where Startups can win: Middleware

(source) Foundational capital does a great job of specifically breaking down this market.

Defining Middleware:

In the context of the Generative AI tech stack, middleware refers to software components that sit between the machine learning models and the end applications. It acts as a bridge between the two and enables them to communicate effectively. Middleware helps with tasks such as data preprocessing, model training, inference, and post-processing, among others. It also ensures that the machine learning models are integrated seamlessly into the end applications.

Incumbents are also building in this space, but there are specific areas ripe for Startups. Incumbents building middleware for generative AI, include: (Specific middleware products are linked)

  1. Hugging Face — Transformers: This company is known for its Natural Language Processing (NLP) tools and has developed a middleware called Transformers. It provides a simple and unified API for using pre-trained models for various NLP tasks.
  2. OpenAI — GPT-3 API: OpenAI has developed a middleware called GPT-3 API, which enables developers to integrate their GPT-3 models into their applications easily.
  3. NVIDIA — TensorRT: NVIDIA has developed a middleware called TensorRT, which optimizes machine learning models for deployment on NVIDIA GPUs. It helps to improve the performance of the models and reduce inference time.
  4. Google & TensorFlow: Google has developed a middleware called TensorFlow, which is a popular open-source machine learning framework. It provides a high-level API for building and deploying machine learning models.

*Adapt: Ripe for startups

Adapt refers to the portion of the generative AI stack that enables/supports adapting AI models to specific use cases.

Prompt engineering: Tools for optimizing UX and efficacy

  • Prompts become the tools/language required to extract value from the foundational models.
  • Similar to understanding the code base, having access to prompts or having the talent to create them will help provide immediate value to companies leveraging gen ai to build and increase productivity.

Prompt Template and Marketplace Companies:

Prompt Management:

  • HoneyHive and PromptLayer are examples of companies offering businesses tools to track, iterate, and collaborate on prompts to increase effectiveness at a team level.

Prompt Chaining:

  • LangChain is an example of a company in this space. Tangential to prompt management, prompts can be linked/connected to produce different results, and this company offers an interface with pre-built chains and integration tools to get the most out of your prompts.

Data and embedding management: Increase the accuracy of semantic search

  • Connecting foundational models to external and proprietary knowledge bases can be done through vector databases or search API’s. This function will be supremely important as companies seek to leverage their access to proprietary data to differentiate the quality of semantic search.
  • LlamaIndex is an example of a company that allows users to connect external data to a centralized interface.

Fine-tuning: Retrain generalist models on targeted data

  • These tools help alter the foundation model’s underlying parameters by retraining it on more targeted data sets.
  • This can be done with highly technical teams or through enterprise saas products.
  • With OpenAI, users have to pay a fixed cost to fine-tune their models. Startups like Humanloop and Vellum offer fine-tuning as part of a broader LLM development platform, with tools for data sample selection, data distribution mapping, and post-fine-tuning evaluation.

Deploy, Optimize, and Monitor:

  • Incumbents are owning this space. OpenAI and GoogleAI are both taking the steps to deploy, optimize, and monitor the performance of their models.
  • The startup taking the lead in this space (at the time of this post) is Arize. Arize allows customers to optimize the performance of foundational models through a vertical ML Observability Platform.
  • My hypothesis is that this space will see a number of acquisitions from well-financed incumbents.

As founders race to leverage Generative AI, Big Tech is Dominating these Spaces:

Breaking down where Big Tech is best positioned to win, and where startups will have the most difficult time in gaining meaningful market share at a user-facing application level.

Enterprise & B2B Productivity Suite: Ownership across Enterprise and SMB

  • Microsoft: Co-pilot across platforms, integration into LinkedIn for HR, co-pilot in GitHub. Microsoft’s deal with OpenAI and integration across Microsoft 365 guarantees a massive moat of information and distribution.
  • Google: Google recently announced the integration of GoogleAI across the Google Suite, and has long been a pioneer in the space. With troves of data from its search business, Google’s prowess in the advertising space will only continue to grow.
  • Salesforce: When it comes to proprietary datasets, few companies play a more integral role in the internal operations of a growing company. With the ability to aggregate data from various sources (including Slack and other sales tools), I expect Salesforce to increase the value and verticality of its existing product suite.

Creative:

  • Adobe: Adobe’s distribution to enterprise customers in this space is unmatched. The first product (Firefly) maintains the technical excellence that Photoshop provides without the learning curve.

Amazon: Targeting everything, but will own commerce

  • The greatest repository of data on commerce, and ultimate distribution.
  • AWS and Amazon’s expansion outside of commerce enable them to take a broad approach to market capture.“Our approach to generative AI is to invest and innovate across three layers of the generative AI stack to take this technology out of the realm of research and make it available to customers of any size and developers of all skill levels,” — Rob Ferguson, AWS’ Global Head of AI/ML Startups

Future Questions and Challenges with Generative AI

As new technologies emerge, companies are racing to ensure ownership, but at what cost?

Copyright challenges:

  • With Google and Open AI training their models on content from across the internet, there is a clear need to define ownership in this space and create tools to protect and monetize data that can be used to train ai models with permission for the owners of that data.
  • Nvidia and Adobe: Both taking approaches to pay royalties and setting precedent for creating generative ai products that abide by current-day copyright laws.

Ethical challenges:

  • Is the recent petition for “Pausing the training of AI models” rooted in ethical concerns, or just a way for companies to catch up? Regardless, companies are racing forward to compete with Microsoft and OpenAI.
  • Recently Google and DeepMind began working together to develop Gemini, a software to compete with OpenAI. (source) In the past, the two entities have worked in parallel, but with speed and urgency being more important than ever, the two have decided on a more collaborative approach.

The world is moving a breakneck pace to leverage this technological breakthrough. As big tech companies continue to dominate the generative AI landscape, the future of AI innovation and its impact on society rests in the hands of those who are willing to challenge the status quo and push for a more equitable AI ecosystem.

This post is one of my many posts on the current state of Generative AI. For a more general take on the market, see “Generative AI Roadmap: Hype vs. Reality”.

--

--

Daniel Applewhite
Daniel Applewhite

Written by Daniel Applewhite

Investing @ Dorm Room Fund, Student @ Harvard Business School

No responses yet