DataSelector
- class PAsampling.utils.DataSelector(X, save_path, strategies=None, trainig_set_sizes=None, initial_conditions=None, knn=100, mu=3, ratios=None, gamma_FacLocG=1)[source]
Selects data subsets based on various strategies and saves the indices to an HDF5 file.
Parameters:
- Xnumpy.ndarray
The input data array (n_points, n_features).
- save_pathstr
The path where the HDF5 file will be saved.
- strategieslist of str, optional
List of strategies to use for data selection. Possible values include ‘DAFPS’, ‘FPS’, ‘RDM’, ‘k-medoids++’, ‘FacilityLocation’, ‘Twinning’, ‘FPS-k-medoids++’, ‘FPS-FacLoc’, ‘FPS-RDM’, ‘FacLoc-G’.
- trainig_set_sizeslist of int, optional
List of training set sizes to be used for each strategy.
- initial_conditionslist, optional
List of initial conditions for the data selection strategies.
- knnint, optional
Number of nearest neighbors to consider for the DAFPS strategy. Default is 100.
- muint, optional
Hyperparameter for the DAFPS and FPS-(mehtod) strategies. Default is 3.
- ratioslist of float, optional
List of ratios for the Twinning strategy.
- gamma_FacLocGfloat, optional
Gamma parameter for the FacilityLocation strategy with Gaussian metric. Default is 1.
Returns:
None