Introduction

Passive Sampling (PAsampling) is a repository designed to provide easy and fast access to existing and novel tools and resources for data sampling. This project aims to facilitate the implementation of data sampling techniques and provide insights on key aspects of data selection in machine learning, with a particular focus on training data selection for optimizing regression model performance. The term “Passive” refers to the fact that the library mainly focuses on selection approaches that rely solely on data feature representations and do not involve any active learning procedures, which require iterative learning of one or several models. Additionally, the library provides tools for creating machine learning experiment pipelines.

Features

Installation

To install the PAsampling package, you can either install it via PyPI or clone the repository and install the required dependencies:

Install via PyPI

pip install PAsampling

Install via Git

git clone https://github.com/PaClimaco/PAsampling.git
cd PAsampling
pip install .

Usage

Here is a basic example of how to use PAsampling:

from PAsampling import *
# Example usage (Farthest Point Sampling on QM dataset)

datasets =  DataLoader('./data') # data_loader function
x, labels = datasets.QM7_dataset()
fps_sampler = FPS()  # FPS sampler class
fps_indices = fps_sampler.fit(x, initial_subset=[0], b_samples=100)  # Fit FPS to data matrix

Tutorials

Explore the tutorials to learn how to use the PAsampling library tools and gain key insights into data sampling in machine learning.

Contributing

We welcome contributions! Please read our contributing guidelines to get started. All contributors will be acknowledged and credited.

Contact

For any questions or inquiries, please contact us at climaco@ins.uni-bonn.de.

Dependencies

Some of the functions implemented in PAsampling are wraps of functions from the following existing libraries: