Emulation: cytof_data

class cytomulate.emulation.cytof_data.EmulationCytofData(n_batches=1, background_noise_model=None, bead_label=None)[source]

Bases: GeneralCytofData

generate_cell_abundances(use_observed=True, is_random=True)[source]

Generate cell abundances

Generate the cell abundances for all cell types: namely, the amount of cells in each cell type. This method supports either data-based cell abundance or randomly-generated cell abundance. In the latter case, each cell type’s probability can be further randomized.

Parameters:
  • use_observed (bool) – Whether the cell abundances should use the observed ones

  • is_random (bool) – In the case that user_obsersed is False, whether the cell abundances’ probability should be randomly generated. If True, the abundance of each cell type is sampled from a dirichlet distribution. If False, then all cell types an have equal probability.

Return type:

None

Note

If you wish to use the default observed cell abundance from the data, it is not necessary to call this method. Otherwise, you should always set used_observed to False.

generate_cell_graph(graph_topology='forest', **kwargs)[source]

Generate a cell graph as well as differentiation paths

This method is part of complex simulation’s cellular trajectory simulation. It generates differentiation paths, which will be used at the sampling stage.

Parameters:
  • graph_topology (str) – Type of graph to be generated

  • kwargs – Other parameters used for trajectory generation

Return type:

None

initialize_cell_types(expression_matrix, labels, max_components=9, min_components=1, covariance_types=('full', 'tied', 'diag', 'spherical'))[source]

Initialize cell type models by fitting Gaussian mixtures

This method fits the GMM models for each cell type. Namely, a Gaussian Mixture Model is generated for each cell type at this stage according to the parameters specified. An extensive model selection procedure based on the Bayesian Information Criterion (BIC) is performed when multiple possibilities of components and covariance types are specified. See details in max_components and covariance_types.

Parameters:
  • expression_matrix (np.ndarray) – A matrix containing the expression levels of cell events

  • labels (np.ndarray) – A vector of cell type labels

  • max_components (int) – The maximal number of components for a Gaussian mixture. Used for Gaussian mixture model selection. This must be smaller or equal to the max_components. If max_components equals min_components, the exact number will be used for fitting. Otherwise, a model selection procedure will ensue using Bayesian Information Criterion.

  • min_components (int) – The minimal number of components for a Gaussian mxitrue. Used for Gaussian mixture model selection. This must be smaller or equal to the max_components. See max_components for details on model selection.

  • covariance_types (list or tuple) – The candidate types of covariances used for Gaussian mixture model selection. If only one is specified, no model selection will be performed based on the covariance structure.

Return type:

None