Module: cytof_data_general
- class cytomulate.cytof_data_general.GeneralCytofData(n_batches=1, background_noise_model=None)[source]
Bases:
object
- generate_cell_abundances(is_random=True)[source]
Generate cell abundances.
This method generates the cell abundane for each batch. The probability of each cell type can be either random or fixed with equal probabilities. See is_random parameter for details.
- Parameters:
is_random (bool) – Whether the cell abundances should be randomly generated. If True, the abundance of each cell type is sampled from a dirichlet distribution. If False, then all cell types an have equal probability.
- Return type:
None
- generate_local_batch_effects(variance=0.001)[source]
Generate local batch effects (interaction effects)
We separate main effects from local effects since some methods are designed only to eliminate overall effects. This will have interactions between cell types and specific channels.
- Parameters:
variance (float) – The variance of the effects
- Return type:
None
- generate_overall_batch_effects(variance=0.001)[source]
Generate overall batch effects.
This is the main batch effect of the dataset, which does not have an interaction with the channels.
- Parameters:
variance (float) – The variance of the effects
- Return type:
None
- generate_temporal_effects(variance=None, coefficients=None, x=None, y=None, **kwargs)[source]
Generate temporal effect
- Parameters:
variance (float) – The variance of the end point if using Brownian bridge or polynomial
coefficients (dict, list or np.ndarray) – The coefficients of the polynomial to be generated or a dictionary of coefficients of the polynomials to be generated. The coefficients are consistent with the Numpy interface: each corresponds to the rising degrees (i.e. the first is the constant, the second is the first term, etc.)
x (dict or np.ndarray) – The x values used to fit a spline or a dictionary of x values used to fit a spline
y (dict or np.ndarray) – The y values used to fit a spline or dictionary of y values used to fit a spline
kwargs (Extra parameters for the brownian bridge method or the spline function) –
- Return type:
None
Note
The polynomial only specifies the shape, not the actual polymial. We will shift and transform the polynomial to fit in the [0,1] range.
- sample(n_samples, cell_abundances=None, beta_alpha=0.4, beta_beta=0.4)[source]
Draw random samples for all batches
- Parameters:
n_samples (int or list or np.ndarray) – Number of samples for each batch. If an integer is provided, then it will be used for all batches
cell_abundances (dict or None) – A nested dictionary whose keys are the batches. The corresponding values should be a dictionary mapping cell types to cell numbers or probabilities OR It can be a plain dictionary whose keys are the cell labels. The corresponding values should be either the actual number of events for each cell type or the probability of each cell type
beta_alpha (float, int, or dict) – The alpha parameters of the beta distribution, which should be contrained to the positive reals. Defaults to 0.4.
beta_beta (float, int, or dict) – The beta parameters of the beta distribution, which should be contrained to the positive reals. Defaults to 0.4.
- Return type:
Tuple
[dict
,dict
,dict
,dict
]- Returns:
expression_matrices (dict) – The dictionary of expression matrices
labels (dict) – The dictionary of arrays of the corresponding cell type labels
pseudo_time (dict) – The dictionary of arrays of the positions on the differentiation paths
children_cell_labels (dict) – The dictionary of descendants to which the cells are differentiating towards
- sample_one_batch(n_samples, cell_abundances=None, batch=0, beta_alpha=0.4, beta_beta=1.0)[source]
Draw random samples for one batch
- Parameters:
n_samples (int) – Number of samples
cell_abundances (dict or None) – A dictionary whose keys are the cell labels. The corresponding values should be either the actual number of events for each cell type or the probability of each cell type. If this is not provided, the one stored in the object will be used. Defaults to None.
batch (int) – The index of the batch for which we want to draw samples. Defaults to 0.
beta_alpha (float or int) – The alpha parameter of the beta distribution, which should be contrained to the positive reals. Defaults to 0.4.
beta_beta (float or int) – The beta parameter of the beta distribution, which should be contrained to the positive reals. Defaults to 1.0.
- Return type:
Tuple
[ndarray
,ndarray
,ndarray
,ndarray
]- Returns:
expression_matrix (np.ndarray) – The expression matrix
labels (np.ndarray) – The array of the corresponding cell type labels
pseudo_time (np.ndarray) – The array of the positions on the differentiation paths
children_cell_labels (np.ndarray) – The descendants to which the cells are differentiating towards
- sample_to_pycytodata(n_samples, cell_abundances=None, beta_alpha=0.4, beta_beta=0.4)[source]
Draw random samples for all batches and returns a PyCytoData object.
This method is a wrapper for the
sample
method but provides an interface to return aPyCytoData
object.- Parameters:
n_samples (int or list or np.ndarray) – Number of samples for each batch. If an integer is provided, then it will be used for all batches
cell_abundances (dict or None) – A nested dictionary whose keys are the batches. The corresponding values should be a dictionary mapping cell types to cell numbers or probabilities OR It can be a plain dictionary whose keys are the cell labels. The corresponding values should be either the actual number of events for each cell type or the probability of each cell type
beta_alpha (float, int, or dict) – The alpha parameters of the beta distribution
beta_beta (float, int, or dict) – The beta parameters of the beta distribution
- Returns:
pcd – A PyCytoData object with the simulated data.
- Return type:
PyCytoData
- Raises:
ImportError – No
PyCytoData
installation present.:
Note
The
PyCytoData
is not compatible with storingpseudo_time
andchildren_cell_labels
. If you would like these information, use the traditionalsample
method instead.Note
PyCytoData
is an optional dependency. If anImportError
is raised, you need to install the the package first. Tutorials can be found here: https://pycytodata.readthedocs.io/en/latest/installation.html.