ripser.ripser¶

ripser.
ripser
(X, maxdim=1, thresh=inf, coeff=2, distance_matrix=False, do_cocycles=False, metric='euclidean', n_perm=None)[source]¶ Compute persistence diagrams for X.
X can be a data set of points or a distance matrix. When using a data set as X it will be converted to a distance matrix using the metric specified.
 Parameters
X (ndarray (n_samples, n_features)) – A numpy array of either data or distance matrix (also pass distance_matrix=True). Can also be a sparse distance matrix of type scipy.sparse
maxdim (int, optional, default 1) – Maximum homology dimension computed. Will compute all dimensions lower than and equal to this value. For 1, H_0 and H_1 will be computed.
thresh (float, default infinity) – Maximum distances considered when constructing filtration. If infinity, compute the entire filtration.
coeff (int prime, default 2) – Compute homology with coefficients in the prime field Z/pZ for p=coeff.
distance_matrix (bool, optional, default False) – When True the input matrix X will be considered a distance matrix.
do_cocycles (bool, optional, default False) – Computed cocycles will be available in the cocycles value of the return dictionary.
metric (string or callable, optional, default "euclidean") –
Use this metric to compute distances between rows of X.
”euclidean”, “manhattan” and “cosine” are already provided metrics to choose from by using their name.
You can provide a callable function and it will be used with two rows as arguments, it will be called once for each pair of rows in X.
The computed distance will be available in the result dictionary under the key dperm2all.
n_perm (int, optional, default None) – The number of points to subsample in a “greedy permutation,” or a furthest point sampling of the points. These points will be used in lieu of the full point cloud for a faster computation, at the expense of some accuracy, which can be bounded as a maximum bottleneck distance to all diagrams on the original point set
 Returns
dict – The result of the computation.
Note
Each list in dgms has a relative list in cocycles.
>>> r = ripser(...)
For each dimension
d
and indexk
thenr['dgms'][d][k]
is the barcode associated to the representative cocycler['cocycles'][d][k]
.The keys available in the dictionary are the:
dgms
: list (size maxdim) of ndarray (n_pairs, 2)For each dimension less than
maxdim
a list of persistence diagrams. Each persistent diagram is a pair (birth time, death time).
cocycles
: list (size maxdim) of list of ndarrayFor each dimension less than
maxdim
a list of representative cocycles. Each representative cocycle in dimensiond
is represented as a ndarray of(k,d+1)
elements. Each non zero value of the cocycle is laid out in a row, first thed
indices of the simplex and then the value of the cocycle on the simplex. The indices of the simplex reference the original point cloud, even if a greedy permutation was used.
num_edges
: intThe number of edges added during the computation
dperm2all
: ndarray(n_samples, n_samples) or ndarray (n_perm, n_samples) if n_permThe distance matrix used during the computation. When
n_perm
is not None the distance matrix will only refers to the subsampled dataset.
idx_perm
: ndarray(n_perm) ifn_perm
> 0Index into the original point cloud of the points used as a subsample in the greedy permutation
>>> r = ripser(X, n_perm=k) >>> subsampling = X[r['idx_perm']]
 ’r_cover’: float
Covering radius of the subsampled points. If
n_perm <= 0
, then the full point cloud was used and this is 0
Examples
from ripser import ripser, plot_dgms from sklearn import datasets from persim import plot_diagrams data = datasets.make_circles(n_samples=110)[0] dgms = ripser(data)['dgms'] plot_diagrams(dgms, show = True)
 Raises
ValueError – If the distance matrix is not square.
ValueError – When using both a greedy permutation and a sparse distance matrix.
ValueError – When n_perm value is bigger than the number of rows in the matrix.
ValueError – When n_perm is non positive.
 Warns
When using a square matrix without toggling `distance_matrix` to True.
When there are more columns than rows (as each row is a different data point).