This is a post about complexity, complex systems, how they break down, how that impacts decision making and stakeholders, and how all of this has motivated Ozette's work. It's an interesting problem in science, which is self-correcting, yet when it intersects with real-world problems of human health and business, the stakes are much higher.

A Cautionary Tale & the Role of Forensic Bioinformatics

You may have heard of the Duke research scandal, where some clinical trials proceeded based on fabricated data that ultimately harmed patients. The initial scientific claims were fascinating: that gene expression profiles of cancer cell lines (derived from then–cutting-edge high-dimensional microarray technologies) could be used to guide cancer therapies for human subjects. These claims turned out to be incorrect, as did later findings that built upon them.

What's interesting is that the origins of this scandal were a series of likely unintentional computational errors that snowballed and only turned into an apparent cover-up after they were brought to the authors' attention. The investigative work to reproduce the original work was published by Keith Baggerly and Kevin Coombes in 2009. Their efforts uncovered simple computational errors (off-by-one indexing, mislabeled samples). I was fortunate enough to hear Baggerly lecture on the topic, where I first heard the term "Forensic Bioinformatics." It left quite an impression and instilled in me a firm skepticism of lofty claims made without the requisite transparency.

A Need For Validated Software in Research

The story is an interesting case study of how compounded complexity can mask simple mistakes, because in a complex system no one individual can have a complete understanding of the whole. This compartmentalization is normal in modern science, which is by necessity an interdisciplinary affair. The real human risk is that if we don't understand something, it becomes easy to dismiss it when there is a problem, or more insidiously we may not even recognize it as a problem.

We rely on scientific instruments to correctly generate reliable data, even if we don't understand the engineering of exactly how the instruments work. We rely on software to produce trustworthy analytical results, even if we did not write the software. In the human health space this is critical, and when it breaks down it has very real consequences. Processes of quality assurance, engineering practices, system design, formal validation, and regulations are in place to ensure that the systems we rely on work as intended — the formal mechanisms that ensure data integrity, accountability, reliability, accuracy and reproducibility.

Yet when research doesn't touch human subjects — as with much academic research software — software systems are usually not validated. Up until relatively recently, academic bioinformatics was quite the Wild West. Version control, code review, unit testing, software development practices — these have only become widespread in the field over the past 10–15 years, driven in part by the reproducibility crisis and an influx of professionally trained software engineers.

Most academic research software today is still not validated off the shelf in the strict regulatory sense. Instead, users who need to apply software for data analysis in human clinical trials must take on the burden of validation themselves. It raises a conundrum: how does an organization verify that a software system they did not write is performing as intended? For "simple" systems this is relatively easy. But what if the system is complex, and the organization doesn't have the expertise to verify it? In such cases validation may not even be viable. This is one reason, I believe, that manual analysis remains the gold standard for cytometry in a clinical trial setting. This is something we sought to change at Ozette, from the outset.

The Deceptive Simplicity of Cytometry

Modern cytometry is a fascinating technology — remarkably high-dimensional and high-throughput, sharing some of the complexity risks of other high-dimensional technologies (microarrays, scRNASeq). It is an interesting example of a complex system because it is conceptually a very simple assay. Cells are stained with antibodies conjugated to fluorophores, excited by lasers, and the emitted light is filtered and captured at detectors. The intensity of the signal is proportional to the amount of fluorophore, which is proportional to the amount of the specific protein on each cell.

Each of these phenomena is well understood, and a trained technician can set up the instrument, titrate staining, run controls and samples, and analyze the data. It is conceptually simple. But when one digs into the details, there is a lot going on that few individuals can claim to fully understand. Would we expect the lab technician to understand the mathematical underpinnings of unmixing, data transformations, file formats, and downstream statistical modeling? Would we expect the statistician to understand antibody affinity and avidity, titration, or instrument calibration? It's not reasonable to be expert in all things — we rely on each person in the chain to understand their part and do it well.

Yet in flow cytometry there are non-obvious knock-on effects between layers of complexity. If something goes wrong with the instrument, the data processing, or the titration, these issues manifest in a readout that is not only inaccurate but rarely obviously so. The problem only becomes visible somewhere downstream, sometimes far downstream from the individual responsible for that piece of the system. The layers are tightly coupled, errors have far-reaching causal impact, and the people who rely on the final output usually don't have access to the data, intermediate decisions, and quality diagnostics necessary to evaluate it. As a consequence, decisions are not fully informed. Cytometry can easily deceive the user.

Ozette's Approach and Validated Solutions

At Ozette, we work with partners to help them make sense of their complex cytometry data, and we have repeatedly seen systemic complexity affect data quality. The problems typically arise upstream in data generation, and our team often surfaces these issues to our partners for the first time. The demand for higher quality data is part of the motivation for establishing Ozette, the Ozette Lab, and bringing together experts in assay development, panel design, computation, statistics, and engineering — taking a vertically integrated approach to cytometry data generation and analysis.

Ozette's assay and platform utilizes data-driven computational analysis, quality monitoring, and gating to rapidly and robustly identify cell populations, with the quality assurance controls and regulatory compliance necessary for clinical trial use through formal validation of the assay and underlying algorithm. To our knowledge, Ozette provides the only validated computational gating solution (Endpoints™), which, combined with our validated 48-color PIP-01 spectral assay, delivers deep phenotyping and immune profiling for clinical trials.

By taking on the data quality component rather than pushing it onto the user, we've built tooling that lets us rapidly identify and diagnose data quality issues before they become pervasive. The impact is that our partners' drug development programs are accelerated, moving faster from phase I to II to III, because data is reliable, rich, and informative. The unintended consequences of overlooking complexity are multi-fold — impacting human health and patient finances. Ozette's commitment to a vertically integrated approach, coupled with a focus on transparency and data-driven solutions, offers a pathway to navigate these challenges, ultimately accelerating drug development and ensuring decisions are based on reliable insights.

← All posts