Mapping Access Friction in Academic Research

Data Access as Infrastructure

Problem Context

Across academia, access to data is often assumed to be a function of sensitivity: public data is non-sensitive and easy to use, while sensitive data is restricted and hard to access. Yet in practice, researchers across disciplines encounter similar frictions regardless of whether data is public, gated, or restricted. These frictions shape who can participate in research, how collaboration forms, and whether new infrastructure can be adopted organically.

As OpenMined prepared the release of PySyft 0.9, positioned as “the public internet for non-public data,” we sought to understand whether this framing resonated with the academic research community and what structural barriers might shape its adoption. Rather than starting from the technology, my work began with investigating the lived realities of researchers; interviewing them and uncovering how they navigated contemporary data access ecosystems.

Research Question

This case study investigates

What data-access challenges exist across different segments of the academic research community, and how do these challenges shape researchers’ perceptions of PySyft’s value proposition and readiness for adoption?

The goal was not simply to assess excitement for PySyft's vision, but to understand how institutional processes, collaboration norms, and access pathways influence whether privacy-preserving infrastructure could make immediate impact on researchers' current workflows.

My Role

As user research lead, I designed and conducted a qualitative research study to assess academic readiness for adopting PySyft 0.9. My responsibilities included:

  • Designing the interview guide and research protocol
  • Conducting 25 semi-structured interviews across disciplines and career stages
  • Guiding and instructing fellow team members on interview techniques
  • Performing thematic analysis with bias checks across all conducted interviews
  • Translating findings into recommendations for product direction, pilot direction, and community strategy.

This work was conducted under the supervision of Ronnie Falcon (Chief Product Officer), in collaboration with Osam Kyemenu-Sarsah (Partner Coordinator) and Valerio Maggio (Community Outreach Lead).

Approach

Participants represented a wide range of academic contexts, including early-career researchers, senior faculty, and applied researchers working across public, semi-public, and restricted datasets. Interviews focused on:

  • how researchers locate and access data
  • how collaboration is initiated and sustained
  • where access breaks down
  • where analysis after access breaks down
  • what methods are used to work around challenges

In this way the backbone of my interviews were structured around process friction—the procedural, legal, and social steps required to move from interest to access to analysis to publish.

Findings

Several patterns emerged consistently across disciplines and seniority levels.

First, data-access challenges were driven less by how sensitive data was and more by how it was governed. Public and restricted datasets often shared the same bottlenecks: unclear ownership, slow approval cycles, opaque requirements, and reliance on personal networks. This revealed a crucial “semi-public” category—data that is nominally accessible, but practically difficult to use.

Second, data interoperability, dataset quality, and dataset attribute context were primary blockers to analysis even when data access had been granted. These challenges would persist even if PETs unblocked research across private data.

Third, collaboration (as opposed to specialized tooling) was a primary motivator for researchers. Participants expressed willingness to tolerate technical complexity if it enabled meaningful collaboration, but little interest in tools that addressed privacy in isolation from collaborative workflows.

Finally, PySyft’s value proposition resonated most strongly when framed not as a privacy solution, but as infrastructure for coordination—a way to formalize collaboration, clarify expectations, and reduce the informal labor required to access data.

Impact

This research highlighted how inequitable access to data limits participation in scientific discovery and slows progress across fields—not because researchers lack the expertise or motivation, but because access pathways encourage those within institutions to leverage the data on hand rather than forging new cross-collaborations.

By surfacing shared structural barriers across data types, this work informed a renewed focus within PySyft on using privacy-enhancing technologies(PETs) to:

  • Add context and encourage exploration during dataset discovery
  • Further our explorations into collaboration spaces and tooling

This work also informed our partnership with Reddit for Research and our focus on consortia for our NSF NAIRR proposal.

What I Learned

A lot of the frictions expressed dealt with lack of clear processes, lack of clear context around data collection, and difficulty finding datasets that can work with one another. At this point, I had spent a couple years investigating how PETs could unblock collaboration, so the need for process alignment and clarity were not new barriers to me. If anything this further validated findings I had made before. What did stand out was a realization, informed by John Gallacher (director of DPUK), that semantic AI combined with universal metadata frameworks might help unlock barriers placed by lack of clear context around data collection, and difficulty finding datasets that can work with one another. Through these interviews, I came to see metadata encoding as a primary problem set that should be explored in parallel with the problem set of collaboration workflows.