Conviction

One of our favorite parts of the jobs is when founders surprise us with ideas we've never thought about.

But, given the time we spend working on and thinking about AI product, we do have some thoughts for where we think important companies might lie and what challenges they'll need to solve. To be clear, we don't think you have to work on any of these ideas to get into Embed or to pitch us. We share them here as a way to show our thinking about where opportunity lies and how we evaluate ideas. We'll continue to update this list as we formalize more of our thinking.

Apply to Embed here!

Use the Recipe

The playbook is here, and access to state-of-the-art reasoning models is wide open. Pick a problem, give the model the same tools humans use to get real work done, and optimize for the outcomes you care about. This is scaling to work of surprising complexity. “AGI” is increasingly an operational challenge of bringing tasks under distribution - and many smart teams can do that, not just the labs. If you're a startup, the opportunity is to run with this recipe and own a domain before anyone else does.

Machine Verified Trades

The shortage of skilled tradespeople isn't from lack of demand, but the years it takes to make a novice productive. Electricians, HVAC techs, solar installers — these jobs require reasoning (mastering diagnosis, compliance, and documentation) as much as physical skill. Employers and countries have never had the patience for training. They want productivity now.

We think there's an opportunity for a full-stack services company that hires minimally trained workers and equips them with AI guidance systems. AR hardware is increasingly commodity. AI can recognize the work context, give step-by-step instructions, flag hazards, check code compliance in real time, and generates certification-grade documentation. Remote experts drop in only when AI confidence is low. Every job is logged and “machine-verified,” creating a new certification standard accepted by insurers and regulators.

Rather than selling tools into an industry slow to change, we think a startup would likely be full stack: hire and equip its own workforce, deliver turnkey projects to customers, and eventually license the platform to larger contractors. There are lots of good starting points (e.g. solar and battery installation). This company would have a mission of making American workers competitive, would train the next-generation workforce, and would fill a massive supply hole today. The real bottleneck is not muscle, but mastery.

The OfficeVerse

RL and multi-agent research has been dominated by games, simulations, constrained benchmarks, and outsourced teams trying to making copies of popular enterprise apps and websites. These environments fail to reflect the messy, ambiguous, and multi-channel communication that defines real-world “knowledge work.”

If we want agents that can reason, collaborate, negotiate, and execute complex objectives in enterprise settings, they need to be trained in worlds that feel like an office. This means multi-agent environments where communication happens across Slack-like channels, Zoom-style video calls, ad-hoc file sharing, project management tools, and evolving datasets — all wrapped in a simulation where every message, task, and decision checkpoint can be logged, replayed, and evaluated.

In the same way flight simulators transformed pilot training, “world sims” for office work will be the proving ground for generalist AI coworkers. These worlds can blend synthetic and real data, simulating entire organizations with realistic incentives, deadlines, and resource constraints. Agents can roleplay everything from software engineers and data analysts to sales reps and compliance officers, complete with the office politics, partial information, and shifting priorities that define human workplaces.

This startup could becomes the standardized environment where AI labs and startups train domain-generalist work agents, with configurable “company cultures,” communication styles, and task distributions. It could “test” rollout of agents. And, of course, it could serve as a training ground for first-party agents.

We've been pitched a lot of people creating RL environments/rewards to sell to the labs recently. That might be a path to revenue. But there's a more interesting general opportunity. Can one build a “sandbox” but a living, evolving simulated economy of work, with realistic temporal dynamics, unexpected events, and a rich API for injecting and extracting information? The next generation of AI coworkers will not graduate from toy problems. They will graduate from The OfficeVerse.

Reasoning Infrastructure for Companies

Large companies are under pressure for “AI transformation,” but their operations still reflect workflows designed for a pre-AI world. Opportunities are missed, AI-aware talent is too thin, pilots stay stuck in narrow lanes, and navigating the tangle of systems, teams, and compliance slows everything down.

We see room for a company that builds the two missing layers most deployments need. First, a live reasoning map of the enterprise — an ontology that continuously learns from systems, documents, and operational logs, mapping entities, workflows, dependencies, and policies. Second, an integration hub —that acts as the single registry and broker for every action the AI can take, whether via MCP, custom APIs, or the standards of tomorrow. The ontology decides what should happen; the hub makes it executable in production.

Instead of starting from zero in each engagement, we think this company starts in a vertical or ships templates: reusable data models, reasoning flows, and action connectors tuned for specific kinds of problems. These might include disruption recovery in supply chains, predictive maintenance for critical assets, or triage for high-risk compliance cases. Each template is adapted to the customer’s systems and policies, but the core stack stays the same — and improves with every deployment.

The bottleneck in enterprise AI isn’t model access, it’s distribution into the messy, high-stakes reality of big companies. The labs have a very good shot at owning the productivity suite and enterprise search. But what about all the other workflows? Being excellent at engaging with large companies, taking risk and owning companies, and at the core, owning the living ontology and integrations— is another path to becoming the operating layer for the AI-powered enterprise.

Vertical Robotics: Data In, Work Out

In robotics, almost every startup has failed. And usually not because of direct US competition. They fail when they run out of money after failing to compete with human labor. The main challenge has been delivering real value to customers.

There is ongoing debate about the “data cake” — whether world models from video, teleoperation/embodied collection, or reinforcement learning will dominate. We believe it is soon possible to deploy robots that can handle “useful-now” tasks with enough environment generalization from day one. If so, each deployment is both a source of revenue and a source of high-quality data.

Healthcare’s last-50-feet logistics is a clear example. Moving supplies, samples, and medications between rooms and departments is repetitive, time-sensitive, and labor-intensive. This is work where a robot’s consistency and attention to detail can shine. Indoor delivery robotics are already spreading quickly in China. Every run could produce valuable data on navigation, handling, and exception management in real hospital environments.

Leaders of full humanoid programs such as Tesla’s Optimus speak of a decade+ timelines. That may be accurate. But we hope to see American robots deployed well before then, capturing datasets, learning error recovery, and steadily improving. The way forward is to get deployed, own the workflows, and control the entire iteration cycle.

A Biological Needle in a Haystack

Roughly 2000 new drugs began a Phase 1 trial in 2024. We believe by the end of the decade, that number will be in the tens of thousands. Molecular design, which traditionally required years of design and relied on human scientific intuition, is rapidly being solved with new models like Chai-2 and Alphafold 3. Yet even as that step of the drug development process becomes less of a bottleneck, each new molecule still has to face the crushing reality of single digit success rates through clinical trials.

We believe there's a large commercial opportunity to develop new methods to “pull forward” clinical risk to pre-clinical settings. As the number of plausible pre-clinical candidates grows, cheaply approximating clinical risk allows biotechs and pharmaceutical companies to better prioritize and allows third parties to acquire mispriced assets. One example of this is Tahoe Therapeutics (a prior Embed participant!) who use the largest single-cell perturbation datasets to build a virtual model of the human cell that better predicts cancer drug performance. We think there are other approaches along these lines that move forms of clinical risk.

Enterprise Vibe Coding

Last year, we saw the rise of “vibe coding.” Tools like Bolt, Lovable, and Replit enabled non technical user to build basic applications that would have required hours from a full time software developer to build and maintain. Usage of these tools has extended far beyond traditional commerce use cases, including everything from wedding registries to little league baseball stat trackers, but has not yet made its way into enterprise use cases.

We think there will be an enterprise application builder that diverges from consumer-focused vibe coding applications. This new platform would have business-oriented features from day one, like access control management and enterprise integrations, and templates for commonly used workflows. We think adoption here can be rapid; the universe of useful enterprise apps is far larger than the traditional set of analytics dashboards.

IT can't fix what IT can't see

The system of record for technology is supposed to be the CMDB (configuration management database), which tracks IT assets, configs and dependencies. It's the basis for change management workflows, incident management, and compliance. Unfortunately, CMDBs suffer from incomplete, outdated, and inconsistent data. Manual updates lead to stale records, duplication, and poor dependency mapping, while naming inconsistencies further degrade accuracy. Surveys show 50% of CMDB data is unreliable, contributing to the more than 60% of IT incidents caused by misconfigurations. The core system for running our other corporate systems, is failing us.

IT should take a page out of the cloud security playbook. There exists an opportunity to use agentless discovery, graph construction, log comprehension, and AI reasoning as the basis for a more robust and proactive picture of the IT environment. Automated ticket resolution, the most exciting innovation in ITSM over the past half-decade, is only a band aid. It's time to go after the root cause. Fixing the CMDB is more ambitious than it sounds — it will reduce the risk and cost (and increase the speed) of updating technology in every enterprise.

Just Go Live

Most of the ideas on this page are descriptions of specific ideas for companies — this one is a little different in that it describes a pattern that seems increasingly true across companies, and partially inspired by the success of thinking models (most recently, Deepseek's R1).

The direction many forms of model improvement are moving is to enable iteration with feedback from environment. For a software engineer, this might be test suite results or compilation errors, for an accountant this might be feedback on a specific tax deduction, etc. A common opinion is that the incumbents in these spaces (e.g. existing IDEs or an accounting firm) should have an enormous advantage as a result of existing access to this data, and will therefore be able to outcompete startups, either on their own or as a partner to the handful of large AI labs.

In practice, we've been surprised by how often the dataset simply doesn't exist in a form that's useful. Even incumbent software products that have logging, didn't have it set up in quite the right way to be valuable for model training. Startups can compete. The most useful action is to just build the first thing that's useful and get into the customer environment — and then iterate. Each progressive release can happen as a “dry run” first, and companies have been able to build the dataset that's useful as they build the company.

As before, but now more than ever, the most important step is to just get live.

The New Datacenter

A wholesale shift is happening in datacenter workloads. The spotlight shines brightly on Nvidia's stock price, supported by decades of investment in CUDA, a capacity-restricted supply chain, and incredible pace of progress. Most would say that buying anything but Nvidia chips today is economically irrational.

New chip efforts are expensive and harrowing, Nvidia is formidable, but money geysers invite competition. TPUs are clearly working for Google. Is the time right to bet the farm on transformers, and focus more area on matrix math? Can chips be reconfigured to reduce interconnect/memory bandwidth requirements? Can there be a real alternative to NVLink? What pockets of workload support a full-stack optimized systems company? What about a focus on latency-bound inference? What would it take to partner with and make AMD more competitive?

The market, especially large, sophisticated players, want a second source.

Understanding and Generating the (3D) World

Foundation models have been increasingly multi-modal; text and images were followed by a whole host of applications from signal processing to video generation. One class of models that has appeared to be consistently challenging, however, is generated 3D models, specifically with high enough fidelity to be used in non-toy applications. Meta's July '24 release of 3DGen, their asset creation pipeline that represents objects in view space, in volumetric space, and in texture space, is a step forward in quality. In adjacent research news, SORA surprised the world with videos that suggest some “learned” physics and object attributes from enough tokens.

However, the road ahead is long. Generated 3D assets are still not used in anger, and new interactions and applications based on these capabilities are still to come. Like in most domains awaiting breakthroughs, lack of data is the problem. Research into multimodal learning (cleverly inferring 3D data from 2D) matters, as does efficient data generation. New entertainment, education and commerce applications will emerge out of 3D's ChatGPT moment. Today we're still in the “tiny 28x28 GANS” era of 3D.

Higher-precision end applications (i.e. construction, manufacturing, etc.) remain ever further out of reach of zero-shot generation. These generated 3D assets have demanding precision requirements, must be manufacturable, and often have a complex optimization space. These are processes that are still difficult for experts, much less AI models that still don't have a real understanding of "physics.” Point clouds, easy. Meshes that make sense in the real world and can be used in an existing CAD pipeline? Harder.

The “high precision” problem remains worth tackling, for a couple reasons: first, the majority of the revenue of AutoDesk, one of the largest CAD players, comes from their engineering & construction division ($1.1 billion in 2021); second, we think it's possible to build assistive tooling that either helps engineers narrow down their design space more efficiently or performs some set of menial tasks for them as a starting point; finally, we're optimistic that combining generative models (i.e. NeRFs, Dreambooth) with work to "clean up", like simulation for validation and other post-processing, can reduce the burden on zero-shot model output.

Finally, Manageable Metadata

Data cataloging and metadata collection have long been a Sisyphean task. Often necessary for compliance and efficiency, this is a manual process, painful for engineering and data teams to maintain, and therefore the catalog is always incomplete. If proprietary AI applications and models are to be used in anger, enterprises will use much more unstructured data, as well as share much more data than ever before with outside vendors. This unstructured data is poorly supported by existing catalog solutions.

It is clear that the current generation of models can already classify data (both structured and unstructured), apply sophisticated policies against data, identify quality issues, and generate metadata automatically. To understand use and lineage of data, models can now understand application logs and interpret user trajectories. They can express that understanding in natural language. Extracting new context from end users can also now be implemented more intelligently, via embedded interfaces and chat. This all requires a new backend and new workflows.

We strongly believe legacy data catalogs will be disrupted wholesale over the coming years. Their replacement will become a keystone in the AI data stack, and a massive enabled of enterprise AI adoption. We'd love to speak to teams who also see the opportunity.

Ownership is the New Sales

Fragmented industries are characterized by subscale companies that are cheap to start, but slow to build distribution in, require more people to serve more customers, are sometimes capped by geography or niches within a larger market. They often end up underinvested in technology.

Many technologists feel that some of the industries most ripe for disruption by AI (HOAs, BPOs, accounting, dev shops, legacy software) are unlikely to absorb AI tools quickly enough to satisfy their ambitions. Can you own and improve these businesses top-down, rather than sell into them?

Our ambition at Conviction is to see the powerful abundance of technology permeate every part of the economy — not just the easy-to-penetrate verticals. But this model raises many questions: how can one underwrite owning and operating these businesses? What DNA is required, and what are the limits to scale? How much value can AI-based automation drive? We're optimistic. Many amazing companies (e.g. Palantir, Mandiant) offer a combination of software and people, and these AI-enabled centaurs might be a dominant model in the coming years.

Automated Root Cause Analysis

If you've ever had the honor (read: misfortune) of being placed on on-call rotation, you've experienced the dreaded 2 am PagerDuty alert. Still groggy, you pulled up a dashboard with a bunch of failing services, poorly written logs, and angry messages from your manager. If you were lucky, you had a pretty good idea for what the issue was, with a previously written set of steps to resolve, you could be back to bed within the hour. Unlucky and you'd spend hours debugging in the middle of the night.

We think there's an opportunity for automated root cause analysis to substantially improve incident resolution time, and the experience for engineers. A lightweight "agent" with access to logs and metrics can, to start, retrieve relevant information (e.g. service statuses, past error logs, similar prior incidents) and suggest fixes based on what previous resolutions were. After an incident, the same agent could be used to provide a "best practice" resolution for future incidents. and generate a post mortem. Long term, agents might even be able to automatically fix common reoccurring issues.

While agents, broadly, still struggle with compounding failure, we think this area might be easier to tackle. For one, existing runbooks, which explain steps to resolve common issues, make bootstrapping much easier. More broadly, a lot of debugging is hypothesizing what might have gone wrong and looking for evidence. A basic information-only agent can already help substantially by testing hypotheses; by the time you roll out of bed, your debugging assistant can already tell you "it's not DNS," "all AWS AZs look good," and one day even "it's not a cascading cache invalidation."

Your Personal Seller

Small businesses today list their products across a variety of different platforms (e.g. Amazon, eBay, Shopify, etc). There exist a host of consultancies and digital asset managers today that help them manage their digital presence across all of these channels, some of which attempt to help SMBs understand their customers, optimize their advertising spend, market and price their products more efficiently. There's a lot of potential improvement left on the table in the form of product photography, descriptions, richer metadata generation, or more innovative ways to present products.

Foundation models enable the at-scale generation of alternative product listings and also therefore A/B testing of variants to improve conversion and potentially optimize for different audiences. With some improvement in technology, we believe it will also be possible to generate videos and other assets that convert better than anything manually generated today. By crawling marketplaces and experimenting with listings, a selling assistant might even be able to provide recommendations for new products to build or untapped audiences.

Next-Gen Autocomplete

Why isn't there a Copilot-like experience for the rest of your computing experience yet? A browser extension that learns your writing style and makes you 10x faster at the thankless slog of email (and anything else you have to author). The ideal experience would be deeply personalized based on everything you've ever written and could author an entire email with just a couple of words of context. A step beyond that might be using the multimodality capabilities of today's models and all the context you have (screenshots? voice note fragments?) to become a second brain, immediately useful beyond just “recall.” Tinkerers have only just begun to experiment with what multimodality means for input mechanisms. We haven't seen enough UX that is fundamentally new since multi-touch, but we're confident more is coming.

While it might seem that incumbents have an unassailable distribution advantage here, Grammarly has shown you can build a large, independent business. Incumbents are also likely to be slow and cautious in launching this, creating space for new entrants. Clever consumer distribution must be part of the play, and speed and privacy will be key, likely necessitating a hybrid local/cloud approach.

The All-Seeing Eye

Video is a major social, informational, educational, marketing and monitoring medium, and the fastest growing datatype. As video capture and generation inexorably advances, our archaic workflows for understanding and manipulating that video are exposed.

Most of the video in the world is not yet richly and flexibly indexed. As one example, the physical security monitoring industry remains stuck in the past. Organizations and consumers deploy millions of cameras, but the last generation of companies is still moving storage to the cloud and creating seamless networking gateways (a huge improvement), and penetration of sophisticated computer vision remains minimal. Only recently has public safety even adopted license-plate reading, with the rapid adoption of cool companies like Flock Safety.

Hardware and storage should be rethought from the ground up in the age of semantic video understanding, and powerful on-device models. A full-stack security services firm could see more, cost less, and offer a step-function better experience. Other startups will figure out where the valuable video in the world is, and now that feature extraction (prediction!) is incredibly cheap, sell that.

The (Smart, Social) Sims

LLMs can now plan against objectives (poorly) and carry on engaging conversations. They can be Sensible, Specific, Interesting and Factual (SSIF). What would a game world populated by AI's who are *actually interesting* be like? We think having their own “lives,” interactions with other agents/humans, and hierarchical memories, are obvious elements of “interestingness.” If “The Sims” and the engagement with AI girl/boyfriends, celebrities, and coaches are any indication, the right experience here will be wildly fun.

There's been a strong focus on agents-for-productivity, but the next generation of entertainment is likely to be driven by social and personalized generation, where social interactions and rich environments are inputs for generations. If one likes to look at pictures of “cats where they shouldn't be,” let's generate them. If one wants an-always available, funny friend who can troll your friends AND can do things in the real world, let's generate her. In an era where one can increasingly produce any media (images, audio, video, memes), mass personalization feels both clearly dominant and within reach.

The modern entertainment company hasn't been built yet.

Knock My SOX (Compliance) Off

Where is the “Vanta” for Sarbanes Oxley compliance? This is fundamentally a sophisticated reporting task on internal controls, financial processes and risk (costing millions of dollars/year for every public company). Controls and IT systems testing could be automated by agents that verify user access, segregation of duties, and adherence of both transactions and system configurations to natural language policies.

While the principles of independence, monitoring and executive responsibility should remain — the toil should disappear.

Save Us Money

Honey and Paribus were companies before their time — saving consumers money by automatically tracking prices and discounts, respectively via a chrome extension and coupon search, and inbox scanning, monitoring of pricing changes and automated emails to retailers.

Group buying startups in Asia such as Pinduoduo (拼多多) burst onto the scene around 2020, aggregating billions of demand but also spending huge amounts on supply chain infrastructure and operators with little retention to show for it.

Agents on the web could continually look for savings, aggregate demand, negotiate deals, make decisions on behalf of buyers, and communicate within existing supply chain channels. A refresh of the commerce landscape is coming.

Loosening the Legacy Grip

One thing that gets us to sit up and pay attention is a thoughtful challenge to the conventional wisdom that a market is just “stuck forever.” For example, “we can't modernize mainframe applications and port them to the cloud” or “we will never get off EPIC EHR,” or “SAP is too ingrained into business process to ever be replaced.” Never is a long time to be stuck.

Let's cope down to mainframe apps. What makes them hard to modernize? Low popularity languages like COBOL/Assembler (or even proprietary ones), code complexity, tight coupling and layered patches, dependency on outdated libraries, business logic mixed with mainframe specific implementation, data formats that don't map to modern database structures well, lack of documentation, heavy compliance requirements, risk aversion in key systems.

Code generation, if we imagine infinite capacity for grunt engineering work, can help with each on of these tasks individually — extracting business logic, translation to modern languages, documentation generation, mapping of schemas. Can it be help modernize mainframe apps end-to-end? That could be the key to unlocking the next $100B of cloud spend.

Accessible Government Services

A big challenge in government services is accessibility. How easy is it to “navigate” the government systems required to get food/housing assistance, welfare, health insurance, social security, local services, consumer protection, permits, visas and immigration, and pay your taxes on time?

There is no simpler interface than natural language chat that takes context in many forms. Navigation tasks like the above will be within the realm of agent workflows, cleverly implemented. Many billions of dollars are wasted in administration, and it is a worthy challenge to make arcane government sites and unintelligible forms a thing of the past.

Many, Many, Materials

A paper from DeepMind last year trained a graph network that predicted more stable structures than the history of materials science research had previously discovered*. Another paper from folks at Berkeley and the Lawrence Berkeley Labs built an “autonomous laboratory” that propose a novel compound, synthesize and characterize it, before beginning the cycle again with an updated proposed structure. Together, the papers demonstrate a really interesting capability in academic settings.

Novel material proposal and synthesis has obvious applications from reducing the use of toxic materials in battery synthesis to designing more energy efficient methods to produce already commonly used components. Historically, this capability has been bespoke and owned by manufacturing and mechanical engineering groups within large companies; this capability of material and process optimization is so novel yet powerful that it may create a new market serving the diverse set of use cases across industries.

We're excited to meet founders with deep domain expertise and with insights on what sets of markets might be good early targets.

*Notably, followup papers have argued many of these structures are functionally equivalent to others already present in existing databases. Nonetheless, the result demonstrates interesting capabilities.

Web Data APIs

Models, however intelligent, still need access to live, reliable information. As much as the world's knowledge can theoretically be encoded and made available in model weights, a huge amount of the inputs models need change in real time: current pricing for concert tickets, whether an item is available in store, recent news events about a topic. Access to citable content is both trust-building for users and reliably improves the correctness of model outputs as well.

However, current web content APIs lack the flexibility & feature set required to power large scale web applications. Consider, for example, what sets of technology would be required to build a clone of ChatGPT with web browsing. While many startups use SerpAPI (or one of it's many competitors), there doesn't exist a web search API that has access to page content, parsed outlinks from the page, or even edit history. This set of features is clearly useful for more expansive language model applications, but would also be extremely helpful for many of the personal assistant style applications we can think of.

We know building crawlers at scale is really damn hard — which is also why we don't think every agent company around is likely to do it themselves. Historically, the only companies with anywhere near complete and live updated crawls have either been search engines (Google, Bing, Yandex) or search engine adjacent (Amazon, Facebook). We think there's an interesting opportunity to bootstrap an index, starting with focused crawls within specific verticals and leveraging signal from customers.

Always Pick Up the Phone

Businesses (in particular small businesses) do not answer about half the calls they receive, but inbound calls are often their most important source of leads. Everyone has experienced this.

Use cases range from home services qualification to informational updates, from restaurant reservations to appointment-booking, from order tracking and stock checks to bill collection. These critical customer experiences are widespread, scoped and transactional. Voice generation quality and LLM capability are approaching the ability to handle many transactional calls. What's missing is the last mile — distribution, customer journey design, guardrails and workflow automation.

Should this be developer infrastructure, horizontal SMB application, or a rethought full stack vertical solution? You tell us.

Autonomous HR (and IT) Helpdesk

A high volume of HR events lead to end-user communication: new hires, exits, role changes, promotions, location changes, manager changes, and payroll/benefits changes. Large companies have hundreds of folks whose jobs are primarily to notify employees of these events, verify documents, answer questions, and update records in HRIS systems, often under the titles of HR Operations, Talent Support Operations, Talent Systems Coordinators, Employee Support Coordinators, Compliance Coordinators, and HR Service Desk.

Whatever the titles, we think these teams can be 10X more efficient — and deliver a dramatically better, faster employee experience. Over the past decade, companies have built “service catalogs” and “service request forms” to digitize their processes, but these still create too much manual operational burden.

The next “intranet” isn't a portal at all, but is instead a conversational search box that can intelligently retrieve in-context, localized, access-control aware answers from enterprise documentation and systems of record (and then, accurately updates those records). IT and HR processes are tightly intertwined, but HR is particularly poorly served, and ever harder for increasingly global/hybrid organizations.

A domain populated with process documentation, ever-changing compliance needs, complex policy application, forms, and natural language communication is ripe for attack by LLMs.

PLAUSIBLE SCHEMES

Quick Links