DataSelector

class PAsampling.utils.DataSelector(X, save_path, strategies=None, trainig_set_sizes=None, initial_conditions=None, knn=100, mu=3, ratios=None, gamma_FacLocG=1)[source]

Selects data subsets based on various strategies and saves the indices to an HDF5 file.

Parameters:

Xnumpy.ndarray

The input data array (n_points, n_features).

save_pathstr

The path where the HDF5 file will be saved.

strategieslist of str, optional

List of strategies to use for data selection. Possible values include ‘DAFPS’, ‘FPS’, ‘RDM’, ‘k-medoids++’, ‘FacilityLocation’, ‘Twinning’, ‘FPS-k-medoids++’, ‘FPS-FacLoc’, ‘FPS-RDM’, ‘FacLoc-G’.

trainig_set_sizeslist of int, optional

List of training set sizes to be used for each strategy.

initial_conditionslist, optional

List of initial conditions for the data selection strategies.

knnint, optional

Number of nearest neighbors to consider for the DAFPS strategy. Default is 100.

muint, optional

Hyperparameter for the DAFPS and FPS-(mehtod) strategies. Default is 3.

ratioslist of float, optional

List of ratios for the Twinning strategy.

gamma_FacLocGfloat, optional

Gamma parameter for the FacilityLocation strategy with Gaussian metric. Default is 1.

Returns:

None