tree_structured.data_manipulator

Useful methods to split or insert noise in dataset.

Module Contents

Classes

DataManipulator

DataManipulation class provides some useful methods to work with datasets.

class tree_structured.data_manipulator.DataManipulator

Bases: noisecut.tree_structured.base_structured_data.BaseStructuredData

DataManipulation class provides some useful methods to work with datasets.

x

Each row of the array x is a binary input.

Type:

{array-like, ndarray, dataframe} of shape (n_data, n_features)

y

Target output value in one_to_one mapping of binary input x.

Type:

{array-like, ndarray, dataframe} of shape (n_data,)

n_data

Number of samples.

Type:

int

__set_xy(x: Any, y: Any) None

Validate x and y, and set them to x and y attributes of the class.

Parameters:
  • x ({array-like, ndarray, dataframe} of shape (n_data, n_features)) –

  • y ({array-like, ndarray, dataframe} of shape (n_data,)) –

static __generate_unique_random_numbers(start: int, end: int, num: int) numpy.typing.NDArray[numpy.int_]

Return an array of unique random numbers in range start and end.

Parameters:
  • start (int) – Min value of the random numbers.

  • end (int) – Max value of the random numbers.

  • num (int) – Count of random numbers to be generated.

Returns:

Unique random numbers array in range start and end.

Return type:

ndarray of int of shape (num,)

split_data(x: list[bool] | numpy.typing.NDArray[numpy.bool_], y: list[bool] | numpy.typing.NDArray[numpy.bool_], percentage_training_data: float) tuple[numpy.typing.NDArray[numpy.bool_], numpy.typing.NDArray[numpy.bool_], numpy.typing.NDArray[numpy.bool_], numpy.typing.NDArray[numpy.bool_]]

Return training and test by splitting the dataset x and y.

Parameters:
  • x ({array-like, ndarray, dataframe} of shape (n_data, n_features)) – Binary input array.

  • y ({array-like, ndarray, dataframe} of shape (n_data,)) – Target output value in one_to_one mapping of binary input x.

  • percentage_training_data (float [0:100]) – Percentage of the presence of data in training dataset.

Returns:

  • x_train (ndarray of bool of shape (n_training_data,n_features)) – Binary input array for train dataset.

  • y_train (ndarray of bool of shape (n_training_data,)) – Target output value for train dataset.

  • x_test (ndarray of bool of shape (n_test_data,n_features)) – Binary input array for test dataset.

  • y_test (ndarray of bool of shape (n_test_data,)) – Target output value for test dataset.

get_noisy_data(x: list[bool] | numpy.typing.NDArray[numpy.bool_], y: list[bool] | numpy.typing.NDArray[numpy.bool_], percentage_noise: float) tuple[numpy.typing.NDArray[numpy.bool_], numpy.typing.NDArray[numpy.bool_]]

Return noisy dataset by adding noise in the given dataset x and y.

Parameters:
  • x ({array-like, ndarray, dataframe} of shape (n_data, n_features)) – Binary input.

  • y ({array-like, ndarray, dataframe} of shape (n_data,)) – Target output value in one_to_one mapping of binary input x.

  • percentage_noise (float [0:100]) – Percentage of the added noise in dataset.

Returns:

  • x (ndarray of bool of shape (n_data, dimension)) – Noisy binary input.

  • y (ndarray of bool of shape (n_data,)) – Noisy Target output.

write_data_in_file(x: list[bool] | numpy.typing.NDArray[numpy.bool_], y: list[bool] | numpy.typing.NDArray[numpy.bool_], file_name: str) None

Write the input dataset x and y in a file with name file_name.

Parameters:
  • x ({array-like, ndarray, dataframe} of shape (n_data, n_features)) – Binary input.

  • y ({array-like, ndarray, dataframe} of shape (n_data,)) – Target output value in one_to_one mapping of binary input x.

  • file_name (str) – Name of the file (with or without the path) in which you aim to store dataset x and y.