Module: cytof_data_general

class cytomulate.cytof_data_general.GeneralCytofData(n_batches=1, background_noise_model=None)[source]

Bases: object

generate_cell_abundances(is_random=True)[source]

Generate cell abundances.

This method generates the cell abundane for each batch. The probability of each cell type can be either random or fixed with equal probabilities. See is_random parameter for details.

Parameters:

is_random (bool) – Whether the cell abundances should be randomly generated. If True, the abundance of each cell type is sampled from a dirichlet distribution. If False, then all cell types an have equal probability.

Return type:

None

generate_local_batch_effects(variance=0.001)[source]

Generate local batch effects (interaction effects)

We separate main effects from local effects since some methods are designed only to eliminate overall effects. This will have interactions between cell types and specific channels.

Parameters:

variance (float) – The variance of the effects

Return type:

None

generate_overall_batch_effects(variance=0.001)[source]

Generate overall batch effects.

This is the main batch effect of the dataset, which does not have an interaction with the channels.

Parameters:

variance (float) – The variance of the effects

Return type:

None

generate_temporal_effects(variance=None, coefficients=None, x=None, y=None, **kwargs)[source]

Generate temporal effect

Parameters:
  • variance (float) – The variance of the end point if using Brownian bridge or polynomial

  • coefficients (dict, list or np.ndarray) – The coefficients of the polynomial to be generated or a dictionary of coefficients of the polynomials to be generated. The coefficients are consistent with the Numpy interface: each corresponds to the rising degrees (i.e. the first is the constant, the second is the first term, etc.)

  • x (dict or np.ndarray) – The x values used to fit a spline or a dictionary of x values used to fit a spline

  • y (dict or np.ndarray) – The y values used to fit a spline or dictionary of y values used to fit a spline

  • kwargs (Extra parameters for the brownian bridge method or the spline function) –

Return type:

None

Note

The polynomial only specifies the shape, not the actual polymial. We will shift and transform the polynomial to fit in the [0,1] range.

sample(n_samples, cell_abundances=None, beta_alpha=0.4, beta_beta=0.4)[source]

Draw random samples for all batches

Parameters:
  • n_samples (int or list or np.ndarray) – Number of samples for each batch. If an integer is provided, then it will be used for all batches

  • cell_abundances (dict or None) – A nested dictionary whose keys are the batches. The corresponding values should be a dictionary mapping cell types to cell numbers or probabilities OR It can be a plain dictionary whose keys are the cell labels. The corresponding values should be either the actual number of events for each cell type or the probability of each cell type

  • beta_alpha (float, int, or dict) – The alpha parameters of the beta distribution, which should be contrained to the positive reals. Defaults to 0.4.

  • beta_beta (float, int, or dict) – The beta parameters of the beta distribution, which should be contrained to the positive reals. Defaults to 0.4.

Return type:

Tuple[dict, dict, dict, dict]

Returns:

  • expression_matrices (dict) – The dictionary of expression matrices

  • labels (dict) – The dictionary of arrays of the corresponding cell type labels

  • pseudo_time (dict) – The dictionary of arrays of the positions on the differentiation paths

  • children_cell_labels (dict) – The dictionary of descendants to which the cells are differentiating towards

sample_one_batch(n_samples, cell_abundances=None, batch=0, beta_alpha=0.4, beta_beta=1.0)[source]

Draw random samples for one batch

Parameters:
  • n_samples (int) – Number of samples

  • cell_abundances (dict or None) – A dictionary whose keys are the cell labels. The corresponding values should be either the actual number of events for each cell type or the probability of each cell type. If this is not provided, the one stored in the object will be used. Defaults to None.

  • batch (int) – The index of the batch for which we want to draw samples. Defaults to 0.

  • beta_alpha (float or int) – The alpha parameter of the beta distribution, which should be contrained to the positive reals. Defaults to 0.4.

  • beta_beta (float or int) – The beta parameter of the beta distribution, which should be contrained to the positive reals. Defaults to 1.0.

Return type:

Tuple[ndarray, ndarray, ndarray, ndarray]

Returns:

  • expression_matrix (np.ndarray) – The expression matrix

  • labels (np.ndarray) – The array of the corresponding cell type labels

  • pseudo_time (np.ndarray) – The array of the positions on the differentiation paths

  • children_cell_labels (np.ndarray) – The descendants to which the cells are differentiating towards

sample_to_pycytodata(n_samples, cell_abundances=None, beta_alpha=0.4, beta_beta=0.4)[source]

Draw random samples for all batches and returns a PyCytoData object.

This method is a wrapper for the sample method but provides an interface to return a PyCytoData object.

Parameters:
  • n_samples (int or list or np.ndarray) – Number of samples for each batch. If an integer is provided, then it will be used for all batches

  • cell_abundances (dict or None) – A nested dictionary whose keys are the batches. The corresponding values should be a dictionary mapping cell types to cell numbers or probabilities OR It can be a plain dictionary whose keys are the cell labels. The corresponding values should be either the actual number of events for each cell type or the probability of each cell type

  • beta_alpha (float, int, or dict) – The alpha parameters of the beta distribution

  • beta_beta (float, int, or dict) – The beta parameters of the beta distribution

Returns:

pcd – A PyCytoData object with the simulated data.

Return type:

PyCytoData

Raises:

ImportError – No PyCytoData installation present.:

Note

The PyCytoData is not compatible with storing pseudo_time and children_cell_labels. If you would like these information, use the traditional sample method instead.

Note

PyCytoData is an optional dependency. If an ImportError is raised, you need to install the the package first. Tutorials can be found here: https://pycytodata.readthedocs.io/en/latest/installation.html.