Introduction

Passive Sampling (PAsampling) is a repository designed to provide easy and fast access to existing and novel tools and resources for data sampling. This project aims to facilitate the implementation of data sampling techniques and provide insights on key aspects of data selection in machine learning, with a particular focus on training data selection for optimizing regression model performance. The term “Passive” refers to the fact that the library mainly focuses on selection approaches that rely solely on data feature representations and do not involve any active learning procedures, which require iterative learning of one or several models. Additionally, the library provides tools for creating machine learning experiment pipelines.

Features

PAsampling includes several data sampling methods:
ML pipeline tools:
- DataLoader
- DataSelector

Installation

To install the PAsampling package, you can either install it via PyPI or clone the repository and install the required dependencies:

Install via PyPI

pip install PAsampling

Install via Git

git clone https://github.com/PaClimaco/PAsampling.git
cd PAsampling
pip install .

Usage

Here is a basic example of how to use PAsampling:

from PAsampling import *
# Example usage (Farthest Point Sampling on QM dataset)

datasets =  DataLoader('./data') # data_loader function
x, labels = datasets.QM7_dataset()
fps_sampler = FPS()  # FPS sampler class
fps_indices = fps_sampler.fit(x, initial_subset=[0], b_samples=100)  # Fit FPS to data matrix

Tutorials

Explore the tutorials to learn how to use the PAsampling library tools and gain key insights into data sampling in machine learning.

Contributing

We welcome contributions! Please read our contributing guidelines to get started. All contributors will be acknowledged and credited.

Contact

For any questions or inquiries, please contact us at climaco@ins.uni-bonn.de.

Dependencies

Some of the functions implemented in PAsampling are wraps of functions from the following existing libraries:

Apricot (MIT License)
twinning (Apache 2.0 License)