`tree_structured.data_manipulator`

Useful methods to split or insert noise in dataset.

Module Contents

Classes

DataManipulator

DataManipulation class provides some useful methods to work with datasets.

class tree_structured.data_manipulator.DataManipulator

Bases: noisecut.tree_structured.base_structured_data.BaseStructuredData

DataManipulation class provides some useful methods to work with datasets.

x

Each row of the array x is a binary input.

Type:: {array-like, ndarray, dataframe} of shape (n_data, n_features)

y

Target output value in one_to_one mapping of binary input x.

Type:: {array-like, ndarray, dataframe} of shape (n_data,)

n_data

Number of samples.

Type:: int

__set_xy(x: Any, y: Any) → None

Validate x and y, and set them to x and y attributes of the class.

Parameters:

x ({array-like, ndarray, dataframe} of shape (n_data, n_features)) –
y ({array-like, ndarray, dataframe} of shape (n_data,)) –

static __generate_unique_random_numbers(start: int, end: int, num: int) → numpy.typing.NDArray[numpy.int_]

Return an array of unique random numbers in range start and end.

Parameters:

start (int) – Min value of the random numbers.
end (int) – Max value of the random numbers.
num (int) – Count of random numbers to be generated.

Returns:

Unique random numbers array in range start and end.

Return type:

ndarray of int of shape (num,)

split_data(x: list[bool] | numpy.typing.NDArray[numpy.bool_], y: list[bool] | numpy.typing.NDArray[numpy.bool_], percentage_training_data: float) → tuple[numpy.typing.NDArray[numpy.bool_], numpy.typing.NDArray[numpy.bool_], numpy.typing.NDArray[numpy.bool_], numpy.typing.NDArray[numpy.bool_]]

Return training and test by splitting the dataset x and y.

Parameters:

x ({array-like, ndarray, dataframe} of shape (n_data, n_features)) – Binary input array.
y ({array-like, ndarray, dataframe} of shape (n_data,)) – Target output value in one_to_one mapping of binary input x.
percentage_training_data (float [0:100]) – Percentage of the presence of data in training dataset.

Returns:

x_train (ndarray of bool of shape (n_training_data,n_features)) – Binary input array for train dataset.
y_train (ndarray of bool of shape (n_training_data,)) – Target output value for train dataset.
x_test (ndarray of bool of shape (n_test_data,n_features)) – Binary input array for test dataset.
y_test (ndarray of bool of shape (n_test_data,)) – Target output value for test dataset.

get_noisy_data(x: list[bool] | numpy.typing.NDArray[numpy.bool_], y: list[bool] | numpy.typing.NDArray[numpy.bool_], percentage_noise: float) → tuple[numpy.typing.NDArray[numpy.bool_], numpy.typing.NDArray[numpy.bool_]]

Return noisy dataset by adding noise in the given dataset x and y.

Parameters:

x ({array-like, ndarray, dataframe} of shape (n_data, n_features)) – Binary input.
y ({array-like, ndarray, dataframe} of shape (n_data,)) – Target output value in one_to_one mapping of binary input x.
percentage_noise (float [0:100]) – Percentage of the added noise in dataset.

Returns:

x (ndarray of bool of shape (n_data, dimension)) – Noisy binary input.
y (ndarray of bool of shape (n_data,)) – Noisy Target output.

write_data_in_file(x: list[bool] | numpy.typing.NDArray[numpy.bool_], y: list[bool] | numpy.typing.NDArray[numpy.bool_], file_name: str) → None

Write the input dataset x and y in a file with name file_name.

Parameters:

x ({array-like, ndarray, dataframe} of shape (n_data, n_features)) – Binary input.
y ({array-like, ndarray, dataframe} of shape (n_data,)) – Target output value in one_to_one mapping of binary input x.
file_name (str) – Name of the file (with or without the path) in which you aim to store dataset x and y.

tree_structured.data_manipulator

Module Contents

Classes

`tree_structured.data_manipulator`