Introduction
Passive Sampling (PAsampling) is a repository designed to provide easy and fast access to existing and novel tools and resources for data sampling. This project aims to facilitate the implementation of data sampling techniques and provide insights on key aspects of data selection in machine learning, with a particular focus on training data selection for optimizing regression model performance. The term “Passive” refers to the fact that the library mainly focuses on selection approaches that rely solely on data feature representations and do not involve any active learning procedures, which require iterative learning of one or several models. Additionally, the library provides tools for creating machine learning experiment pipelines.
Features
- PAsampling includes several data sampling methods:
- ML pipeline tools:
Installation
To install the PAsampling package, you can either install it via PyPI or clone the repository and install the required dependencies:
Install via PyPI
pip install PAsampling
Install via Git
git clone https://github.com/PaClimaco/PAsampling.git
cd PAsampling
pip install .
Usage
Here is a basic example of how to use PAsampling:
from PAsampling import *
# Example usage (Farthest Point Sampling on QM dataset)
datasets = DataLoader('./data') # data_loader function
x, labels = datasets.QM7_dataset()
fps_sampler = FPS() # FPS sampler class
fps_indices = fps_sampler.fit(x, initial_subset=[0], b_samples=100) # Fit FPS to data matrix
Tutorials
- Explore the tutorials to learn how to use the PAsampling library tools and gain key insights into data sampling in machine learning.
Contributing
We welcome contributions! Please read our contributing guidelines to get started. All contributors will be acknowledged and credited.
Contact
For any questions or inquiries, please contact us at climaco@ins.uni-bonn.de.
Dependencies
Some of the functions implemented in PAsampling are wraps of functions from the following existing libraries: