One of our favorite parts of the jobs is when founders surprise us with ideas we've never thought about.
But, given the time we spend working on and thinking about AI product, we do have some thoughts for where we think important companies might lie and what challenges they'll need to solve. To be clear, we don't think you have to work on any of these ideas to get into Embed or to pitch us. We share them here as a way to show our thinking about where opportunity lies and how we evaluate ideas. We'll continue to update this list as we formalize more of our thinking.
A wholesale shift is happening in datacenter workloads. The spotlight shines brightly on Nvidia’s stock price, supported by decades of investment in CUDA, a capacity-restricted supply chain, and incredible pace of progress. Most would say that buying anything but Nvidia chips today is economically irrational.
New chip efforts are expensive and harrowing, Nvidia is formidable, but money geysers invite competition. TPUs are clearly working for Google. Is the time right to bet the farm on transformers, and focus more area on matrix math? Can chips be reconfigured to reduce interconnect/memory bandwidth requirements? Can there be a real alternative to NVLink? What pockets of workload support a full-stack optimized systems company? What about a focus on latency-bound inference? What would it take to partner with and make AMD more competitive?
The market, especially large, sophisticated players, want a second source.
Foundation models have been increasingly multi-modal; text and images were followed by a whole host of applications from signal processing to video generation. One class of models that has appeared to be consistently challenging, however, is generated 3D models, specifically with high enough fidelity to be used in non-toy applications. Meta’s July ’24 release of 3DGen, their asset creation pipeline that represents objects in view space, in volumetric space, and in texture space, is a step forward in quality. In adjacent research news, SORA surprised the world with videos that suggest some “learned” physics and object attributes from enough tokens.
However, the road ahead is long. Generated 3D assets are still not used in anger, and new interactions and applications based on these capabilities are still to come. Like in most domains awaiting breakthroughs, lack of data is the problem. Research into multimodal learning (cleverly inferring 3D data from 2D) matters, as does efficient data generation. New entertainment, education and commerce applications will emerge out of 3D’s ChatGPT moment. Today we’re still in the “tiny 28x28 GANS” era of 3D.
Higher-precision end applications (i.e. construction, manufacturing, etc.) remain ever further out of reach of zero-shot generation. These generated 3D assets have demanding precision requirements, must be manufacturable, and often have a complex optimization space. These are processes that are still difficult for experts, much less AI models that still don't have a real understanding of "physics.” Point clouds, easy. Meshes that make sense in the real world and can be used in an existing CAD pipeline? Harder.
The “high precision” problem remains worth tackling, for a couple reasons: first, the majority of the revenue of AutoDesk, one of the largest CAD players, comes from their engineering & construction division ($1.1 billion in 2021); second, we think it's possible to build assistive tooling that either helps engineers narrow down their design space more efficiently or performs some set of menial tasks for them as a starting point; finally, we're optimistic that combining generative models (i.e. NeRFs, Dreambooth) with work to "clean up", like simulation for validation and other post-processing, can reduce the burden on zero-shot model output.
Data cataloging and metadata collection have long been a Sisyphean task. Often necessary for compliance and efficiency, this is a manual process, painful for engineering and data teams to maintain, and therefore the catalog is always incomplete. If proprietary AI applications and models are to be used in anger, enterprises will use much more unstructured data, as well as share much more data than ever before with outside vendors. This unstructured data is poorly supported by existing catalog solutions.
It is clear that the current generation of models can already classify data (both structured and unstructured), apply sophisticated policies against data, identify quality issues, and generate metadata automatically. To understand use and lineage of data, models can now understand application logs and interpret user trajectories. They can express that understanding in natural language. Extracting new context from end users can also now be implemented more intelligently, via embedded interfaces and chat. This all requires a new backend and new workflows.
We strongly believe legacy data catalogs will be disrupted wholesale over the coming years. Their replacement will become a keystone in the AI data stack, and a massive enabled of enterprise AI adoption. We’d love to speak to teams who also see the opportunity.
Fragmented industries are characterized by subscale companies that are cheap to start, but slow to build distribution in, require more people to serve more customers, are sometimes capped by geography or niches within a larger market. They often end up underinvested in technology.
Many technologists feel that some of the industries most ripe for disruption by AI (HOAs, BPOs, accounting, dev shops, legacy software) are unlikely to absorb AI tools quickly enough to satisfy their ambitions. Can you own and improve these businesses top-down, rather than sell into them?
Our ambition at Conviction is to see the powerful abundance of technology permeate every part of the economy — not just the easy-to-penetrate verticals. But this model raises many questions: how can one underwrite owning and operating these businesses? What DNA is required, and what are the limits to scale? How much value can AI-based automation drive? We’re optimistic. Many amazing companies (e.g. Palantir, Mandiant) offer a combination of software and people, and these AI-enabled centaurs might be a dominant model in the coming years.
If you've ever had the honor (read: misfortune) of being placed on on-call rotation, you've experienced the dreaded 2 am PagerDuty alert. Still groggy, you pulled up a dashboard with a bunch of failing services, poorly written logs, and angry messages from your manager. If you were lucky, you had a pretty good idea for what the issue was, with a previously written set of steps to resolve, you could be back to bed within the hour. Unlucky and you'd spend hours debugging in the middle of the night.
We think there's an opportunity for automated root cause analysis to substantially improve incident resolution time, and the experience for engineers. A lightweight "agent" with access to logs and metrics can, to start, retrieve relevant information (e.g. service statuses, past error logs, similar prior incidents) and suggest fixes based on what previous resolutions were. After an incident, the same agent could be used to provide a "best practice" resolution for future incidents. and generate a post mortem. Long term, agents might even be able to automatically fix common reoccurring issues.
While agents, broadly, still struggle with compounding failure, we think this area might be easier to tackle. For one, existing runbooks, which explain steps to resolve common issues, make bootstrapping much easier. More broadly, a lot of debugging is hypothesizing what might have gone wrong and looking for evidence. A basic information-only agent can already help substantially by testing hypotheses; by the time you roll out of bed, your debugging assistant can already tell you "it's not DNS," "all AWS AZs look good," and one day even "it's not a cascading cache invalidation."
Small businesses today list their products across a variety of different platforms (e.g. Amazon, eBay, Shopify, etc). There exist a host of consultancies and digital asset managers today that help them manage their digital presence across all of these channels, some of which attempt to help SMBs understand their customers, optimize their advertising spend, market and price their products more efficiently. There's a lot of potential improvement left on the table in the form of product photography, descriptions, richer metadata generation, or more innovative ways to present products.
Foundation models enable the at-scale generation of alternative product listings and also therefore A/B testing of variants to improve conversion and potentially optimize for different audiences. With some improvement in technology, we believe it will also be possible to generate videos and other assets that convert better than anything manually generated today. By crawling marketplaces and experimenting with listings, a selling assistant might even be able to provide recommendations for new products to build or untapped audiences.
Why isn't there a Copilot-like experience for the rest of your computing experience yet? A browser extension that learns your writing style and makes you 10x faster at the thankless slog of email (and anything else you have to author). The ideal experience would be deeply personalized based on everything you've ever written and could author an entire email with just a couple of words of context. A step beyond that might be using the multimodality capabilities of today’s models and all the context you have (screenshots? voice note fragments?) to become a second brain, immediately useful beyond just “recall.” Tinkerers have only just begun to experiment with what multimodality means for input mechanisms. We haven’t seen enough UX that is fundamentally new since multi-touch, but we’re confident more is coming.
While it might seem that incumbents have an unassailable distribution advantage here, Grammarly has shown you can build a large, independent business. Incumbents are also likely to be slow and cautious in launching this, creating space for new entrants. Clever consumer distribution must be part of the play, and speed and privacy will be key, likely necessitating a hybrid local/cloud approach.
Video is a major social, informational, educational, marketing and monitoring medium, and the fastest growing datatype. As video capture and generation inexorably advances, our archaic workflows for understanding and manipulating that video are exposed.
Most of the video in the world is not yet richly and flexibly indexed. As one example, the physical security monitoring industry remains stuck in the past. Organizations and consumers deploy millions of cameras, but the last generation of companies is still moving storage to the cloud and creating seamless networking gateways (a huge improvement), and penetration of sophisticated computer vision remains minimal. Only recently has public safety even adopted license-plate reading, with the rapid adoption of cool companies like Flock Safety.
Hardware and storage should be rethought from the ground up in the age of semantic video understanding, and powerful on-device models. A full-stack security services firm could see more, cost less, and offer a step-function better experience. Other startups will figure out where the valuable video in the world is, and now that feature extraction (prediction!) is incredibly cheap, sell that.
LLMs can now plan against objectives (poorly) and carry on engaging conversations. They can be Sensible, Specific, Interesting and Factual (SSIF). What would a game world populated by AI's who are *actually interesting* be like? We think having their own “lives,” interactions with other agents/humans, and hierarchical memories, are obvious elements of “interestingness.” If “The Sims” and the engagement with AI girl/boyfriends, celebrities, and coaches are any indication, the right experience here will be wildly fun.
There’s been a strong focus on agents-for-productivity, but the next generation of entertainment is likely to be driven by social and personalized generation, where social interactions and rich environments are inputs for generations. If one likes to look at pictures of “cats where they shouldn't be,” let's generate them. If one wants an-always available, funny friend who can troll your friends AND can do things in the real world, let’s generate her. In an era where one can increasingly produce any media (images, audio, video, memes), mass personalization feels both clearly dominant and within reach.
The modern entertainment company hasn’t been built yet.
Where is the “Vanta” for Sarbanes Oxley compliance? This is fundamentally a sophisticated reporting task on internal controls, financial processes and risk (costing millions of dollars/year for every public company). Controls and IT systems testing could be automated by agents that verify user access, segregation of duties, and adherence of both transactions and system configurations to natural language policies.
While the principles of independence, monitoring and executive responsibility should remain — the toil should disappear.
Honey and Paribus were companies before their time — saving consumers money by automatically tracking prices and discounts, respectively via a chrome extension and coupon search, and inbox scanning, monitoring of pricing changes and automated emails to retailers.
Group buying startups in Asia such as Pinduoduo (拼多多) burst onto the scene around 2020, aggregating billions of demand but also spending huge amounts on supply chain infrastructure and operators with little retention to show for it.
Agents on the web could continually look for savings, aggregate demand, negotiate deals, make decisions on behalf of buyers, and communicate within existing supply chain channels. A refresh of the commerce landscape is coming.
One thing that gets us to sit up and pay attention is a thoughtful challenge to the conventional wisdom that a market is just “stuck forever.” For example, “we can’t modernize mainframe applications and port them to the cloud” or “we will never get off EPIC EHR,” or “SAP is too ingrained into business process to ever be replaced.” Never is a long time to be stuck.
Let’s cope down to mainframe apps. What makes them hard to modernize? Low popularity languages like COBOL/Assembler (or even proprietary ones), code complexity, tight coupling and layered patches, dependency on outdated libraries, business logic mixed with mainframe specific implementation, data formats that don’t map to modern database structures well, lack of documentation, heavy compliance requirements, risk aversion in key systems.
Code generation, if we imagine infinite capacity for grunt engineering work, can help with each on of these tasks individually — extracting business logic, translation to modern languages, documentation generation, mapping of schemas. Can it be help modernize mainframe apps end-to-end? That could be the key to unlocking the next $100B of cloud spend.
A big challenge in government services is accessibility. How easy is it to “navigate” the government systems required to get food/housing assistance, welfare, health insurance, social security, local services, consumer protection, permits, visas and immigration, and pay your taxes on time?
There is no simpler interface than natural language chat that takes context in many forms. Navigation tasks like the above will be within the realm of agent workflows, cleverly implemented. Many billions of dollars are wasted in administration, and it is a worthy challenge to make arcane government sites and unintelligible forms a thing of the past.
A paper from DeepMind last year trained a graph network that predicted more stable structures than the history of materials science research had previously discovered*. Another paper from folks at Berkeley and the Lawrence Berkeley Labs built an “autonomous laboratory” that propose a novel compound, synthesize and characterize it, before beginning the cycle again with an updated proposed structure. Together, the papers demonstrate a really interesting capability in academic settings.
Novel material proposal and synthesis has obvious applications from reducing the use of toxic materials in battery synthesis to designing more energy efficient methods to produce already commonly used components. Historically, this capability has been bespoke and owned by manufacturing and mechanical engineering groups within large companies; this capability of material and process optimization is so novel yet powerful that it may create a new market serving the diverse set of use cases across industries.
We’re excited to meet founders with deep domain expertise and with insights on what sets of markets might be good early targets.
*Notably, followup papers have argued many of these structures are functionally equivalent to others already present in existing databases. Nonetheless, the result demonstrates interesting capabilities.
Models, however intelligent, still need access to live, reliable information. As much as the world’s knowledge can theoretically be encoded and made available in model weights, a huge amount of the inputs models need change in real time: current pricing for concert tickets, whether an item is available in store, recent news events about a topic. Access to citable content is both trust-building for users and reliably improves the correctness of model outputs as well.
However, current web content APIs lack the flexibility & feature set required to power large scale web applications. Consider, for example, what sets of technology would be required to build a clone of ChatGPT with web browsing. While many startups use SerpAPI (or one of it's many competitors), there doesn't exist a web search API that has access to page content, parsed outlinks from the page, or even edit history. This set of features is clearly useful for more expansive language model applications, but would also be extremely helpful for many of the personal assistant style applications we can think of.
We know building crawlers at scale is really damn hard — which is also why we don’t think every agent company around is likely to do it themselves. Historically, the only companies with anywhere near complete and live updated crawls have either been search engines (Google, Bing, Yandex) or search engine adjacent (Amazon, Facebook). We think there’s an interesting opportunity to bootstrap an index, starting with focused crawls within specific verticals and leveraging signal from customers.
Businesses (in particular small businesses) do not answer about half the calls they receive, but inbound calls are often their most important source of leads. Everyone has experienced this.
Use cases range from home services qualification to informational updates, from restaurant reservations to appointment-booking, from order tracking and stock checks to bill collection. These critical customer experiences are widespread, scoped and transactional. Voice generation quality and LLM capability are approaching the ability to handle many transactional calls. What’s missing is the last mile — distribution, customer journey design, guardrails and workflow automation.
Should this be developer infrastructure, horizontal SMB application, or a rethought full stack vertical solution? You tell us.
A high volume of HR events lead to end-user communication: new hires, exits, role changes, promotions, location changes, manager changes, and payroll/benefits changes. Large companies have hundreds of folks whose jobs are primarily to notify employees of these events, verify documents, answer questions, and update records in HRIS systems, often under the titles of HR Operations, Talent Support Operations, Talent Systems Coordinators, Employee Support Coordinators, Compliance Coordinators, and HR Service Desk.
Whatever the titles, we think these teams can be 10X more efficient — and deliver a dramatically better, faster employee experience. Over the past decade, companies have built “service catalogs” and “service request forms” to digitize their processes, but these still create too much manual operational burden.
The next “intranet” isn't a portal at all, but is instead a conversational search box that can intelligently retrieve in-context, localized, access-control aware answers from enterprise documentation and systems of record (and then, accurately updates those records). IT and HR processes are tightly intertwined, but HR is particularly poorly served, and ever harder for increasingly global/hybrid organizations.
A domain populated with process documentation, ever-changing compliance needs, complex policy application, forms, and natural language communication is ripe for attack by LLMs.