“Big corporations can’t rely on their internal speed to match the transformation that is happening in the world. As soon as I know a competitor has decided to build something itself, I know it has lost.”
These candid sentences from Sanofi CEO, showcase one of the most common questions that’s at the forefront of every pharmaceutical company’s mind; whether to build or buy your way into the agentic and generative AI revolutions.
The build illusion: Why building an AI feels easier than it is
In life sciences, many teams start with the same instinct. They see a capable large language model, stand up a proof of concept, and feel close to a breakthrough. For most of us, AI prototypes can look magical. A chatbot summarizes visit reports, drafts emails, or answers protocol questions in minutes. The experience is so strong that teams assume production is a short step away.
Unfortunately, the gap is much bigger than it looks.
According to a recent MIT study, 95% of AI pilots will fail, as they note that “Only 5% of custom GenAI tools survive the pilot-to-production cliff, while generic chatbots hit 83% adoption for trivial tasks but stall the moment workflows demand context and customization.”
Like MIT’s example shows, moving from prototype to production in clinical research means building something validated, compliant, scalable, and integrated into real workflows. That takes far more than clever prompts. It requires domain grounding, continuous monitoring, retraining loops, robust tool orchestration, and evidence that the system is safe and auditable under regulations like GxP, HIPAA, and 21 CFR Part 11.
Many organizations only discover the hidden costs after they have committed. Internal teams often invest for two years, spend millions in sunk cost, and still never reach a dependable clinical grade system. The illusion comes from how easy it is to get an early demo working, and how hard it is to make that demo survive contact with trial reality.
The reality check: Most pilots never scale
In March 2025 the IDC reported that 88 percent of observed AI proofs of concept did not make it to widescale deployment. For every 33 pilots launched, only four graduated to production. That attrition rate is not because teams are careless. It is because scaling AI in enterprise settings, especially regulated ones, is fundamentally difficult.
Proofs of concept are still valuable. Experimentation gives teams hands-on insight into what works, where friction arises, and what scaling gaps emerge. It also helps organizations clarify which workflows are truly worth automating. But in clinical research, where stakes are high and competition is intense, learning by doing must not become a multi-year detour.
Seven reasons why a clinical grade agentic AI is hard to build
Teams attempting a do-it-yourself agentic AI stack face recurring challenges that compound over time. First, model limitations remain real. Large language models can reason and summarize, but they also hallucinate, obscure their logic, and force tradeoffs among accuracy, speed, and cost. In regulated environments, any unexplainable output can undermine trust quickly.
Second, integration fragility is a daily risk. Agents depend on external trial systems and data tools. Latency, unstable APIs, permissioning mismatches, or cascading errors can cripple performance at scale. Third, coordination complexity grows with multi-agent workflows. Agents need shared context, synchronized states, and stable handoffs. Without this, different agents produce divergent outputs and reliability breaks down.
Fourth, high-stakes autonomy raises the bar. Even small automation mistakes can affect patient safety or regulatory outcomes. Auditability must be equal to human experts, including clear rationale for each action. Fifth, data precision and retrieval are non-negotiable. Trial data is fragmented across systems and often inconsistent. Every retrieval step must be validated against a trusted source, not a best guess.
Sixth, the security and compliance burden is heavy. A clinical agent must meet GxP, HIPAA, and 21 CFR Part 11 standards so that every action is traceable, secure, and audit-ready. Building those layers internally is a major program, not a feature. Seventh, integration complexity deserves its own spotlight. Trial environments rarely run on one system. They involve CTMS, eTMF, IRT, eCOA, EDC, safety systems, document repositories, and CRM tools. Building and validating connectors to each system, then maintaining them through version changes, is a long-tail cost that most internal projects underestimate.
What it really takes to build, and what buying delivers
A practical way to view the decision is to compare timelines, effort, and risk across the lifecycle.
A DIY build typically requires over 24 months and multiple teams, including product, engineering, DevOps and QA, data science, security, and clinical operations. Each new agent type can add months. Each connector can take months more, especially when domain logic and validation are required. Additionally, DIY teams must build or integrate a policy engine, dynamic filtering, sanitization, audit log infrastructure, and verification workflows. This often becomes months of work before a single agent can run safely.
Internal builds must figure out how to connect securely across a variety of systems, writing intermediary code to help translate data between systems. This can prove challenging for even the most technical organizations.
Lastly, DIY teams must go beyond technical metrics. They need human feedback loops, drift detection, stress testing, and scenario validation. Many underinvest here until late. The result is a different time to ROI. DIY can be about 24 months with heavy internal lift and high experimentation spend.
Why partnering early pays off
Most organizations go through a build vs. buy phase. Some organizations know immediately they want a vendor partner. Others prefer in-house experimentation first. Both paths can offer success, but they have different risks, failure points, and can lead to substantially different timelines.
Experimentation builds understanding. Partnering early accelerates time to value, avoids costly missteps, and ensures compliance and integration from day one. In practice, many life sciences organizations look for an enterprise platform partner because clinical grade AI is not the same as general-purpose AI. A tool that boosts office productivity is not automatically safe to run a trial.
For instance, clinical grade requirements show up everywhere. AI must carry clinical context in every workflow. It must be validated against domain knowledge rather than improvised from public data. It must include built-in compliance, continuous auditability, and operational governance. McKinsey’s January 2025 report on scaling gen AI in life sciences recommends an ecosystem approach and emphasizes partnerships to move faster in a rapidly evolving landscape. The report also points out a practical bottleneck, prompt engineering becomes a specialized skill that blends deep regulatory expertise with technical discipline. Many organizations find this skill mix rare, expensive, and hard to sustain internally.
Who you partner with matters
Buying only works if the partner is the right one. In January 2025 McKinsey highlighted the capabilities that distinguish a clinical-grade partner from a generic AI vendor.
A credible partner brings hybrid skills across clinical domain knowledge and technical AI engineering. It offers your organization all of the following:
- A modular architecture and orchestration so organizations can reuse components and scale reliably.
- Integrated governance, risk, and compliance that’s established early, not as an afterthought.
- Help in solving data readiness and integration across fragmented systems.
- Solutions for real clinical and operational problems rather than “tech for tech’s sake.”
- Support for operating-model and change-management needs,
- Demonstrable proven experience in GxP environments, and finally
- A modular, open, and flexible framework so the organization can evolve without periodic rebuilds.
What to ask before you adopt
Once an organization leans toward buying or partnering, the next risk is choosing the wrong kind of partner. A short list of practical questions can quickly separate a clinical-grade provider from a generic AI vendor. Check out our previous blog on the six steps you should take when evaluating a digital, AI-enabled, trial platform.
One useful starting question is about domain grounding. How does the system ensure that agents are anchored in the organization’s own SOPs, protocols, monitoring plans, and templates, rather than relying on public data or “best guess” completion? Clinical grade AI must show its sources and constraints, not just its fluency.
A second set of questions should focus on validation and auditability. What evidence exists that the agents behave safely in regulated workflows? How are decisions logged, explained, and retrievable for inspections? Can the platform produce a complete audit trail for every action an agent takes, including what data it accessed and why?
Data access and retrieval deserve direct scrutiny. How does the platform handle fragmented trial data across CTMS, eTMF, IRT, eCOA, EDC, safety, and document systems? What retrieval validation or reconciliation steps prevent incorrect context from driving agent actions? If a data source changes, how is drift detected and addressed?
Integration questions often determine real-world success. Which systems are already supported out of the box, and which require custom work? Who maintains connectors over time, and how quickly are new system versions supported? How does the platform handle identity, delegated authority, and role-based access across multiple vendors without creating security gaps?
Governance and risk alignment should be explicit. What policy controls exist to enforce approved behavior in different workflows, geographies, or study types? How are human-in-the-loop reviews implemented, and how easy is it to override or rollback agent actions? Is there a built-in mechanism to halt automation when confidence drops?
Security and privacy should be tested at the same level as any other GxP-critical system. What certifications or external audits back the security model? How are PHI and other sensitive data protected during retrieval, prompting, and storage? What isolation and encryption guarantees exist, and what is the incident response model?
Finally, ask about flexibility and avoidance of lock-in. Is the architecture modular enough to swap models, tools, or workflows without a rewrite? Can the organization bring its own agents or components? Does the partner support open standards for data export, lineage tracking, and long-term portability?
These questions are not meant to slow a decision. They are meant to protect trial integrity while keeping the path to value short.
Addressing vendor lock-in
A common concern about buying is vendor lock-in, especially in workflows dependent on proprietary formats, connectors, and audit chains. Thankfully, modular design and open standards (like those found in Medable’s Agent Studio) enable interoperability and allow components to be plugged in or swapped out as needs change. By unifying fragmented systems through standardized orchestration, organizations gain flexibility rather than lose it. The aim is not to trap customers, but to let them scale on their terms.
Phased and hybrid strategies
The decision does not need to be binary. Many life sciences organizations adopt hybrid strategies. They might begin with a ready-to-deploy CRA agent to capture quick value, then build custom agents for high differentiation areas over time, while keeping orchestration modular. Others may bring their own agents into the platform from the start. The key is to avoid rebuilding the same regulated infrastructure repeatedly.
The smart bottom line
The smartest organizations do not stop at “build or buy?” They ask “how fast can we scale responsibly?” In clinical research, DIY AI is slow, risky, and usually starts two years behind the curve.
A clinical grade platform like Medable’s Agent Studio is built for regulated trials, faster to deploy, and designed to keep R&D and clinical operations teams focused on breakthroughs while agents handle heavy execution.
Clinical grade AI is not just smart. It is validated, regulated, and trusted. When time to patient impact and time to ROI matter, that difference is everything.
Book a demo to see how Medable’s Agent Studio meets your needs, and how easy it is to bring to your organization.