Should I install PyCytoData?
In case you haven’t heard, PyCytoData provides a unified
framework for analyzing CyTOF data in python. The supported workflow includes File IO, preprocessing, and
dimension reduction and visualization via CytofDR. While we believe in the future of PyCytoData, we recognize
that many people prefer a leaner package without the baggage of an ecosystem. Thus, we kindly made PyCytoData
to only those who need it. In this tutorial, we walk you through whether you need to install PyCytoData
and help you make the best decisions.
Decision at a Glance
If you need a TLDR, we create a nifty flowchart for you to follow. At each step, ask yourself the question
and you will arrive at a conclusion. We want to emphasize that currently only the Cytomulate CLI strictly
requires PyCytoData because we don’t want to maintain two branches of the same IO code. For all other
workflows, you can judge the necessity of Cytomulate yourself.
How is PyCytoData optional?
Currently, we need PyCytoData for two functionalities:
The Cytomulate CLI
Outputting to a PyCytoData object
For the former, we require PyCytoData because we rely on its File IO operations so that we don’t have
reinvent a great module. For the latter, users have the option to use the sample_to_pycytodata method
or just the regular sample method. In the case that PyCytoData is not installed, user can have
NumPy arrays returned instead. As you may see, we do not rely on PyCytoData for the core model and
estimation procedures. PyCytoData is rather just a convenience feature for those who need it.
In terms of implementation, we check whether PyCytoData is importable at the beginning. If not, no
ImportError is raised until absolutely necessary. This means that unless users explicitly call
smaple_to_pycytodata without a proper installation, we do not complain! Of course, this implementation
has its downsides: namely, it can surprise those who are not aware of the optional dependency. We will
revisit this in the future.
What are added benefits of PyCytoData?
The main reason to use PyCytoData is its convenience. Specifically, it can handle batches, metadata,
and working with expression matrices easily. For those who want a wrapper around arrays, we believe that
PyCytoData is a good fit. Of course, let’s not forget the File IO capabilities and downstream integration
with CytofDR.
On the note of downstream analyses, PyCytoData is being actively developed. So, there will be some neat
features coming in the futrue when we add more tools and integrate more libraries.
Should I skip PyCytoData for CytofDR directly?
If you are asking this question, then YES! The CytofDR package provides a more complete set of
tools for benchmarking and performing DR in CyTOF. While PyCytoData can have workarounds to allow
these functionalities, we recommend using CytofDR directly. However, you are always welcomed to
install PyCytoData to manage other aspects of the CyTOF workflow.