Designing for Secure Enclave Collaboration

Supporting Trust in AI Evaluations

Problem Context

As AI systems increasingly shape public life, national security, and critical infrastructure, independent evaluation has become a core governance requirement. Meaningful evaluation often requires access to highly sensitive assets: proprietary models, unreleased datasets, or internal system behavior. The organizations best positioned to conduct oversight are frequently those least able to access these materials.

My early work leading OpenMined’s privacy infrastructure platform PyGrid explored how privacy-enhancing technologies (PETs) could support cross-institutional research without sharing raw data. That work primarily focused on output privacy, particularly Differential Privacy (DP), which protects individuals by obfuscating results. While effective for individual-level protection, DP relies on consistent identification of individuals across datasets—an assumption that breaks down in AI evaluation contexts, where multiple organizations contribute distinct datasets and models under conditions of partial trust.

In these settings, the challenge shifts from protecting outputs to ensuring input privacy and input verification: confirming that agreed-upon assets were used in a computation without revealing them.

Diagram of the access problem

The Access Problem: oversight requires access, but the method by which access is granted immediately erodes trust. [Image Credit: Carmen Popa]

Research Question

This case study investigates:

How can two independent organizations jointly govern an AI evaluation—enforcing mutual secrecy, mutual veto power, and shared policy constraints—while still being able to understand, negotiate, and trust the process?

Rather than asking whether enclaves technically work, which was addressed by OpenMined’s specific output privacy team, my work complimented those efforts by exploring how PyGrid could augment an enclave experience to support the social practices of governance required for semi-trust collaboration: coordination, review, interpretation, and shared accountability.

Approach

Network Diagram for AI Eval Use Case: AI evaluation ecosystems span multiple cultures of trust each with their own rules by which to enact oversight.

Between 2021–2023, I conducted interviews and workshops with organizations across the AI auditing ecosystem—including NGOs, AI labs, policy advisors, and academic researchers. While participants expressed strong interest in secure enclaves and related PETs, a consistent gap emerged: these technologies address cryptographic assurances (attestations, isolation, certificates) but do little to support how institutions interpret and negotiate trust.

Building on earlier PyGrid work around project objects and request policies, I led co-design sessions with OpenMined’s founder and head of engineering to explore how governance could be operationalized through interface and system design. I prototyped UX flows, object hierarchies, and interaction models that treated AI evaluation as a jointly governed process, not just a secure computation.

Key explorations included:

  • Project objects as shared containers for intent, policy, and negotiation
  • Multi-party voting and polling interfaces to support mutual veto power
  • Abstracted pipeline views that allowed stakeholders to understand evaluation progress without exposing sensitive assets
  • Flexible rulesets that made policy constraints legible to non-technical participants

These speculative interfaces enabled stakeholders to reason about what was being evaluated, why, and under whose authority—even when the computation itself remained opaque.


Multi-Party Voting / Polling Space

A polling prototype where stakeholders could view project context and conditions and then vote "yay" or "nay" on submitted computations that directly affected their data. Chat messaging and a spectrum that visualized current voting sentiments would assist stakeholders in negotiation, reasoning, and reaching consensus on their decision-making.


Project Workspace

A project workspace for the evaluator. Providing spaces to define project intent and needed dependencies that would have to be uploaded to the enclave. Additionally cell blocks formed as "tasks" and a dropdown that allowed for running the code on mock data first, helped streamline the submit and triage workflow.


Project Pipeline View

A computation request workspace for non-technical stakeholders to make informed decisions on how the computation would affect their data. Likewise it served as an abstracted view for the evaluator to see where in the process their computation request was at in terms of approval.

Findings

Across both internal testing and external pilot discussions, the dominant sources of friction were not computational, but interpretive and organizational. Even when enclave protections functioned as intended, collaborations struggled due to governance overhead, ambiguity around policy enforcement, and difficulty interpreting low-level security guarantees.

This work reinforced the need for objects of negotiation—such as project objects—that contextualize technical execution within human governance processes. It positioned OpenMined to further work on defining how enclave setups could adapt to different governance norms. The main takeaway was that although secure computation can guarantee secrecy, without shared frames for interpretation, trust erodes.

A New Model for Enclave Collaboration: addressing the access problem, this work informed a reframed approach to how conditions were negotiated between evaluators and AI owners [Image Credit: Carmen Popa]

alt text

Impact

This research helped bridge theoretical AI governance frameworks with implementable mechanisms for structured transparency. By demonstrating how mutually governed evaluations can occur without exposing proprietary models or sensitive data, this work contributes toward:

  • reducing information assymetries between developers and evaluators
  • enabling rigorous, privacy-preserving AI audits
  • supporting cross-institutional accountability at scale

The ideas explored here later informed OpenMined’s Secure Enclaves pilot with the AI Safety Institute and Anthropic, demonstrating that jointly governed AI evaluation is on its way to not only being technically feasible, but socially viable.

What I Learned

I learned that the central challenge of secure, cross-organizational AI evaluation is not enforcing confidentiality in computation, but enabling institutions to reason together under shared constraints. Governance, metadata, and interface design are not peripheral to privacy-preserving systems—they determine whether such systems are trusted and adopted at all.

This work solidified my belief that AI accountability infrastructure must be designed as a sociotechnical system, where cryptographic guarantees and human governance evolve together.