20/02/2026

Evaluating startup promises in Public Innovation Procurement

By Aleksandra Gruszkiewicz, ECLIPSE Project

Public procurement is one of Europe’s strongest market-shaping instruments, yet when it comes to innovation, contracting authorities often default to large incumbents. This preference is understandable because incumbents offer track records, established delivery capacity, strong balance sheets, and predictable performance. At the same time, new entrants, especially startups, are frequently the source of more disruptive, high-impact innovation, precisely because they are less constrained by legacy systems, internal risk aversion, and the operational inertia that can accompany large organisations.

There is, however, a second side to the coin. Startups are more variable in execution capability, financial resilience, governance maturity, and their ability to scale under operational constraints. For public buyers, this variability is amplified by core procurement requirements such as transparency, equal treatment, proportionality, and the need to justify decisions ex post. In practice, even when a startup offers a superior solution, uncertainty about delivery, continuity, and compliance can lead procurers to avoid the perceived risk.

The challenge, however, is not whether to buy from incumbents or startups, but how to unlock the innovation potential of startups while keeping procurement decisions accountable, auditable, and defensible.

A decision-support score for procurers and investors

This is the logic that led us to develop a risk-flagging algorithm. It is not meant to be a selection instrument, nor a substitute for expert evaluation. Its purpose is to support public buyers and investors in comparing solutions in a consistent, auditable way by translating startup uncertainty into a structured maturity signal that can be interpreted reliably.

This approach is formalised through a venture-capital-inspired evaluation model that produces a single compound score: the Company Success Readiness Level (CSRL). CSRL ranges from 0 to 10 and is designed to estimate the likelihood of successful execution under real public-sector conditions, rather than innovation promise in abstract terms. The model assesses five pillars: technology readiness, team readiness, market attractiveness, business readiness, and financial stability.

This matters because delivery risk is rarely only technical.

Our scoring logic is deliberately not a simple average. Indicators are combined through multi-layered weighted logic with non-linear effects, meaning that structural weaknesses such as very low team capacity or weak market readiness cannot be “smoothed out” by one strong dimension. Put plainly, a brilliant technology does not compensate for an organisation that is not equipped to deliver it.

It also matters, because innovation tenders often attract very different types of solutions that can solve the identified needs in a different way. The categories where the solutions fall into carry inherently different execution risks. In practice, procurers are asked to compare these heterogeneous offers under one procedure, one set of deadlines, and one accountability regime. A structured readiness score helps make those differences legible. It does not force unlike-for-like comparison of the solution itself, but it does provide a consistent view of the supplier’s capacity to deliver under constraints, which is often the real source of procurement risk.

Because the assessment is self-reported, we designed the model to reduce strategic over-reporting. It includes a consistency verification layer that cross-checks correlated responses, such as traction claims relative to financial stability, to detect incoherent patterns and reduce the impact of biased inputs. It also applies boundary effects so that critical gaps materially constrain the overall result, which makes the score harder to inflate through isolated strong claims. Finally, the model is recalibrated using anonymised datasets to maintain robustness across sectors and geographies.

The ECLIPSE pilot dataset

To test the model at scale, we used 905 records, corresponding to registered users who initiated the assessment. 103 records were incomplete and excluded, leaving 802 completed assessments admitted to analysis. Every one of these 802 companies received a CSRL score between 0 and 10.

The cohort is predominantly Western European, with the largest country representations from Denmark, Austria, Germany, Switzerland, and the Netherlands. The sector mix is broad but concentrated in Software domains (System and Application), where innovation procurement is increasingly active.

Translating CSRL into procurement-relevant risk language

CSRL runs from 0 to 10, with higher values indicating higher delivery readiness and lower execution risk. To make the signal easier to read across stakeholder groups, we translate CSRL into an investment-grade style scale from AAA to D. This is not a credit rating, and it is not a funding recommendation. It is a shared “risk language” that helps procurement teams, auditors, and market actors talk about maturity in a comparable way.

In this mapping, a CSRL between 0 and 1 corresponds to grade D, reflecting an extremely fragile profile and the highest risk. Scores from 1.1 to 4.0 span C to CCC, indicating very high-risk profiles where critical weaknesses are likely to prevent reliable delivery. Scores from 4.1 to 7.0 map to B through BBB, signaling progressively stronger maturity and lower execution risk. Scores from 7.1 to 10 translate into A to AAA, representing the strongest readiness profiles and the lowest risk within this framework. Investors often describe their risk appetite in similar “readiness bands”.

How should a public procurer interpret the score in practice?

A high CSRL, and a corresponding investment-grade band of A to AAA, does not mean “award the contract.” It means that, based on the evidence provided, the supplier appears comparatively mature and may require less intensive supervision to deliver. In procurement terms, this aligns with later-stage adoption pathways and larger-scale deployment decisions, where the cost of non-delivery is high and tolerance for execution volatility is low.

Mid-range scores, particularly in the B to BBB range, often correspond to promising innovators whose risk profile can be made manageable through procurement design, especially under pre-commercial

procurement logic. In these cases, the score is not a verdict on the innovation. It is a signal that the contracting authority must do more of the risk-management work through the procedure and the contract, because the supplier is still building delivery maturity. This is precisely where phased delivery, evidence-based milestones, and controlled test environments are what make the procurement defensible.

Low scores do not mean “bad innovation.” They frequently signal early-stage fragility and high execution risk. For operational deployment procurement, these profiles are typically too exposed. For exploratory instruments, they can still be relevant, but only if the buyer is explicit that the objective is learning rather than delivery, and that progression depends on the supplier producing verifiable evidence that de-risks execution.

The essential point is that CSRL is not a selection method. It is a risk and potential flag that helps align procurement pathways and contractual safeguards with supplier maturity.

A test bed for PCP: Smarter Italy

To show how this can work in practice, we analysed AgID’s Smarter Italy pre-commercial procurement (PCP) on the “enhancement of cultural tourism destinations.” The procurement is explicitly framed as a PCP, signaling a clear intention: the buyer is not purchasing a finished operational product from day one; it is funding competitive research and development to reduce uncertainty in a structured way. Phase I anticipates multiple winners, preserving competition and learning early, and uses evidence to determine who progresses later.

That architecture is exactly the environment where a readiness score becomes procurement-relevant. In a PCP, the central question is not whether the market can generate ideas. The central question is how to fund promising approaches while keeping delivery risk and public exposure proportionate as the procurement moves closer to real-world conditions.

If the 802 assessed companies had been the applicant pool for a Smarter Italy-type PCP, the most defensible use of CSRL would have been as a phase gate, not as a final award decision. The Phase I gate can then be tuned to the contracting authority’s risk appetite, because feasibility and early prototyping can mean very different things in practice. For some buyers, Phase I is an exploration stage with lower consequences. For others, it is a short runway to a near-operational pilot with real political and service-delivery expectations.

A pragmatic default for Phase I is to set the eligibility band at BBB to AAA, meaning CSRL 6.1 and above. In our pilot dataset, 526 of 802 companies meet this threshold. This preserves strong competition while filtering out the highest-risk tail where execution fragility is most likely to dominate. Under this rule, 276 companies would not progress to Phase I evaluation, not because their ideas lack promise, but because the probability of reliable delivery appears structurally weak even in a staged R&D setting.

Alternatively, if the contracting authority’s risk appetite is lower, because timelines are tight, the use case is safety-critical, integration is complex, or the buyer expects rapid progression toward real-world conditions, Phase I can be restricted to the A to AAA band, meaning CSRL 7.1 and above. In the same dataset, 363 of 802 companies fall into this higher-readiness cohort.

This is more selective, but it makes the pipeline more execution-forward from the start and reduces the supervision burden on the contracting authority, while remaining defensible as a proportional response to higher consequences of non-delivery.

Phase II is where the logic tightens exactly as PCP intends. At this point, the procedure moves from exploration toward stronger proof and higher exposure. The right procurement move is to reassess the Phase I winners because new evidence now exists: performance against milestones, responsiveness, delivery discipline, coherence of claims, team stability, and the ability to run controlled tests. The recommendation then becomes to invite only suppliers in the A to AAA band, meaning CSRL 7.1 and above. In our dataset, 248 companies scored between 7.1 and 8.0 (A), 111 scored between 8.1 and 9.0 (AA) and only 4 scored 9.1 or above (AAA).

This approach does two things public buyers consistently need:

First, it operationalises proportionality across phases: the procurement begins inclusive but not reckless, then becomes more selective as the consequences of failure increase.
Second, it strengthens defensibility: if down-selection is challenged, the contracting authority can show a coherent rationale for why the pathway tightened, grounded in a consistent readiness signal aligned with the objectives of each PCP stage.

Where this becomes transformative: the bridge to investors

PCP has an underappreciated side effect. When it is well designed, it does not only buy prototypes; it produces credible evidence of execution under constraints. That evidence changes the quality of the conversation around startups, both inside procurement teams and outside them.

Innovation programmes often hit the same bottleneck after prototyping. A supplier can demonstrate a solution, yet scaling requires capital, partnerships, and operational reinforcement. This is where a Phase II shortlist in the A–AAA band becomes strategically important. It is not “safe,” because nothing in innovation is safe, but it is demonstrably closer to delivery readiness in a language that both procurement and investment communities recognize.

In reality this means that public buyers can use the PCP process, supported by a structured readiness signal, to turn innovation procurement into an evidence-building mechanism. When a contracting authority can show that a supplier reached a higher-readiness band, supported by self-reporting and by observed delivery evidence generated during the PCP itself, conversations with investors become easier, faster, and more concrete. The procurement file becomes not just defensible; it becomes catalytic for an investor-procurement participatory model.

Conclusion: a practical test bed for public procurers

Public procurers can apply the same discipline investors use, with an important nuance. When the procurement pathway is intended for operational deployment, prioritising applicants with higher maturity signals is a defensible strategy because performance risk is not primarily about the idea, but about the supplier’s capacity to deliver reliably under public-sector constraints. When lower maturity applicants are selected, CSRL does not argue against the decision. Instead, it signals that procurement design must do

more work by structuring phased delivery, measurable milestones, controlled testing environments, and proportional risk sharing.

Used this way, CSRL does not pick winners. It makes risk legible across very different solution types, strengthens defensibility, and helps innovation procurement become what it promises to be: a bridge between the public sector’s demand for innovation and the market’s capacity to deliver it.

The ECLIPSE Project

ECLIPSE is a Horizon Europe project that builds bridges between public buyers, solution providers, and investors. Under this workstream, we translate evidence-based, venture-inspired assessment logic into practical tools that can be applied in real pre-commercial procurement and public procurement of innovation contexts. When used strategically, it can accelerate the green transition, strengthen digital sovereignty, and improve public services at scale.