DataLoader

class PAsampling.utils.DataLoader(save_path=None)[source]

A class used to load and preprocess various datasets.

Parameters:

save_path (str): The directory where the datasets will be saved. Default is the current working directory.

Attributes:

unzip_file(file_path, extract_to=’.’):

Unzips a compressed file (zip or tar) to a specified directory.

download_data(url, save_path):

Downloads data from a specified URL and saves it to a specified path.

QM7_dataset(preprocessing=True):

Downloads and processes the QM7 dataset, with optional preprocessing.

Power_Grid_dataset(normalize=True):

Loads and preprocesses the Power Grid dataset, with optional normalization.

Power_Grid_dataset(normalize=True)[source]

Loads and preprocesses the Power Grid dataset. This function downloads the Power Grid dataset from the UCI repository if it is not already present, extracts the data, and loads it into a pandas DataFrame. It then selects specific features and the target label, optionally normalizes the features, and returns the feature vectors and labels.

Parameters:

normalize (bool): If True, the feature vectors will be normalized using MinMaxScaler. Default is True.

Returns:

tuple: A tuple containing:
  • features (numpy.ndarray): The feature vectors.

  • labels (numpy.ndarray): The target labels.

QM7_dataset(preprocessing=True)[source]

Downloads and processes the QM7 dataset.

Parameters:

preprocessing (bool): If True, extracts the upper triangular entries of each matrix in the dataset.

If False, reshapes the matrices into vectors. Default is True.

Returns:

tuple: A tuple containing:
  • features (np.ndarray): The processed feature matrix.

  • labels (np.ndarray): The labels corresponding to the feature matrix.

unzip_file(file_path, extract_to='./data')[source]

Unzip a compressed file (zip or tar) to a specified directory.

Parameters:
  • file_path – Path to the compressed file.

  • extract_to – Directory to extract the files to.