ripser.ripser¶

ripser.ripser(X, maxdim=1, thresh=inf, coeff=2, distance_matrix=False, do_cocycles=False, metric='euclidean', n_perm=None)[source]¶

Compute persistence diagrams for X.

X can be a data set of points or a distance matrix. When using a data set as X it will be converted to a distance matrix using the metric specified.

Parameters:

X (ndarray (n_samples, n_features)) – A numpy array of either data or distance matrix (also pass distance_matrix=True). Can also be a sparse distance matrix of type scipy.sparse
maxdim (int, optional, default 1) – Maximum homology dimension computed. Will compute all dimensions lower than and equal to this value. For 1, H_0 and H_1 will be computed.
thresh (float, default infinity) – Maximum distances considered when constructing filtration. If infinity, compute the entire filtration.
coeff (int prime, default 2) – Compute homology with coefficients in the prime field Z/pZ for p=coeff.
distance_matrix (bool, optional, default False) – When True the input matrix X will be considered a distance matrix.
do_cocycles (bool, optional, default False) – Computed cocycles will be available in the cocycles value of the return dictionary.
metric (string or callable, optional, default "euclidean") –
Use this metric to compute distances between rows of X.

”euclidean”, “manhattan” and “cosine” are already provided metrics to choose from by using their name.

You can provide a callable function and it will be used with two rows as arguments, it will be called once for each pair of rows in X.

The computed distance will be available in the result dictionary under the key dperm2all.
n_perm (int, optional, default None) – The number of points to subsample in a “greedy permutation,” or a furthest point sampling of the points. These points will be used in lieu of the full point cloud for a faster computation, at the expense of some accuracy, which can be bounded as a maximum bottleneck distance to all diagrams on the original point set

Returns:

dict – The result of the computation.

Note

Each list in dgms has a relative list in cocycles.

>>> r = ripser(...)

For each dimension d and index k then r['dgms'][d][k] is the barcode associated to the representative cocycle r['cocycles'][d][k].

The keys available in the dictionary are the:

dgms: list (size maxdim) of ndarray (n_pairs, 2)
For each dimension less than maxdim a list of persistence diagrams. Each persistent diagram is a pair (birth time, death time).

cocycles: list (size maxdim) of list of ndarray
For each dimension less than maxdim a list of representative cocycles. Each representative cocycle in dimension d is represented as a ndarray of (k,d+1) elements. Each non zero value of the cocycle is laid out in a row, first the d indices of the simplex and then the value of the cocycle on the simplex. The indices of the simplex reference the original point cloud, even if a greedy permutation was used.

num_edges: int
The number of edges added during the computation

dperm2all: ndarray(n_samples, n_samples) or ndarray (n_perm, n_samples) if n_perm
The distance matrix used during the computation. When n_perm is not None the distance matrix will only refers to the subsampled dataset.
idx_perm: ndarray(n_perm) if n_perm > 0
Index into the original point cloud of the points used as a subsample in the greedy permutation
>>> r = ripser(X, n_perm=k)
>>> subsampling = X[r['idx_perm']]
’r_cover’: float
Covering radius of the subsampled points. If n_perm <= 0, then the full point cloud was used and this is 0

Examples

from ripser import ripser, plot_dgms
from sklearn import datasets
from persim import plot_diagrams

data = datasets.make_circles(n_samples=110)[0]
dgms = ripser(data)['dgms']
plot_diagrams(dgms, show = True)

Raises:

ValueError – If the distance matrix is not square.
ValueError – When using both a greedy permutation and a sparse distance matrix.
ValueError – When n_perm value is bigger than the number of rows in the matrix.
ValueError – When n_perm is non positive.

Warns:

When using a square matrix without toggling `distance_matrix` to True.
When there are more columns than rows (as each row is a different data point).