ripser.ripser

ripser.ripser(X, maxdim=1, thresh=inf, coeff=2, distance_matrix=False, do_cocycles=False, metric='euclidean', n_perm=None)[source]

Compute persistence diagrams for X data array. If X is not a distance matrix, it will be converted to a distance matrix using the chosen metric.

Parameters:
  • X (ndarray (n_samples, n_features)) – A numpy array of either data or distance matrix. Can also be a sparse distance matrix of type scipy.sparse
  • maxdim (int, optional, default 1) – Maximum homology dimension computed. Will compute all dimensions lower than and equal to this value. For 1, H_0 and H_1 will be computed.
  • thresh (float, default infinity) – Maximum distances considered when constructing filtration. If infinity, compute the entire filtration.
  • coeff (int prime, default 2) – Compute homology with coefficients in the prime field Z/pZ for p=coeff.
  • distance_matrix (bool) – Indicator that X is a distance matrix, if not we compute a distance matrix from X using the chosen metric.
  • do_cocycles (bool) – Indicator of whether to compute cocycles, if so, we compute and store cocycles in the cocycles_ dictionary Rips member variable
  • metric (string or callable) – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options specified in pairwise_distances, including “euclidean”, “manhattan”, or “cosine”. Alternatively, if metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays from X as input and return a value indicating the distance between them.
  • n_perm (int) – The number of points to subsample in a “greedy permutation,” or a furthest point sampling of the points. These points will be used in lieu of the full point cloud for a faster computation, at the expense of some accuracy, which can be bounded as a maximum bottleneck distance to all diagrams on the original point set
Returns:

  • A dictionary holding all of the results of the computation
  • {‘dgms’ (list (size maxdim) of ndarray (n_pairs, 2)) – A list of persistence diagrams, one for each dimension less
    than maxdim. Each diagram is an ndarray of size (n_pairs, 2) with the first column representing the birth time and the second column representing the death time of each pair.
    ’cocycles’: list (size maxdim) of list of ndarray
    A list of representative cocycles in each dimension. The list in each dimension is parallel to the diagram in that dimension; that is, each entry of the list is a representative cocycle of the corresponding point expressed as an ndarray(K, d+1), where K is the number of nonzero values of the cocycle and d is the dimension of the cocycle. The first d columns of each array index into the simplices of the (subsampled) point cloud, and the last column is the value of the cocycle at that simplex
    ’num_edges’: int
    The number of edges added during the computation
    ’dperm2all’: ndarray(n_samples, n_samples) or ndarray (n_perm, n_samples) if n_perm
    The distance matrix used in the computation if n_perm is none. Otherwise, the distance from all points in the permutation to all points in the dataset
    ’idx_perm’: ndarray(n_perm) if n_perm > 0
    Index into the original point cloud of the points used as a subsample in the greedy permutation
    ’r_cover’: float
    Covering radius of the subsampled points. If n_perm <= 0, then the full point cloud was used and this is 0
  • }

Examples

from ripser import ripser, plot_dgms
from sklearn import datasets

data = datasets.make_circles(n_samples=110)[0]
dgms = ripser(data)['dgms']
plot_dgms(dgms)