tree_structured.data_manipulator
Useful methods to split or insert noise in dataset.
Module Contents
Classes
DataManipulation class provides some useful methods to work with datasets. |
- class tree_structured.data_manipulator.DataManipulator
Bases:
noisecut.tree_structured.base_structured_data.BaseStructuredDataDataManipulation class provides some useful methods to work with datasets.
- x
Each row of the array x is a binary input.
- Type:
{array-like, ndarray, dataframe} of shape (n_data, n_features)
- y
Target output value in one_to_one mapping of binary input x.
- Type:
{array-like, ndarray, dataframe} of shape (n_data,)
- n_data
Number of samples.
- Type:
int
- __set_xy(x: Any, y: Any) None
Validate x and y, and set them to x and y attributes of the class.
- Parameters:
x ({array-like, ndarray, dataframe} of shape (n_data, n_features)) –
y ({array-like, ndarray, dataframe} of shape (n_data,)) –
- static __generate_unique_random_numbers(start: int, end: int, num: int) numpy.typing.NDArray[numpy.int_]
Return an array of unique random numbers in range start and end.
- Parameters:
start (int) – Min value of the random numbers.
end (int) – Max value of the random numbers.
num (int) – Count of random numbers to be generated.
- Returns:
Unique random numbers array in range start and end.
- Return type:
ndarray of int of shape (num,)
- split_data(x: list[bool] | numpy.typing.NDArray[numpy.bool_], y: list[bool] | numpy.typing.NDArray[numpy.bool_], percentage_training_data: float) tuple[numpy.typing.NDArray[numpy.bool_], numpy.typing.NDArray[numpy.bool_], numpy.typing.NDArray[numpy.bool_], numpy.typing.NDArray[numpy.bool_]]
Return training and test by splitting the dataset x and y.
- Parameters:
x ({array-like, ndarray, dataframe} of shape (n_data, n_features)) – Binary input array.
y ({array-like, ndarray, dataframe} of shape (n_data,)) – Target output value in one_to_one mapping of binary input x.
percentage_training_data (float [0:100]) – Percentage of the presence of data in training dataset.
- Returns:
x_train (ndarray of bool of shape (n_training_data,n_features)) – Binary input array for train dataset.
y_train (ndarray of bool of shape (n_training_data,)) – Target output value for train dataset.
x_test (ndarray of bool of shape (n_test_data,n_features)) – Binary input array for test dataset.
y_test (ndarray of bool of shape (n_test_data,)) – Target output value for test dataset.
- get_noisy_data(x: list[bool] | numpy.typing.NDArray[numpy.bool_], y: list[bool] | numpy.typing.NDArray[numpy.bool_], percentage_noise: float) tuple[numpy.typing.NDArray[numpy.bool_], numpy.typing.NDArray[numpy.bool_]]
Return noisy dataset by adding noise in the given dataset x and y.
- Parameters:
x ({array-like, ndarray, dataframe} of shape (n_data, n_features)) – Binary input.
y ({array-like, ndarray, dataframe} of shape (n_data,)) – Target output value in one_to_one mapping of binary input x.
percentage_noise (float [0:100]) – Percentage of the added noise in dataset.
- Returns:
x (ndarray of bool of shape (n_data, dimension)) – Noisy binary input.
y (ndarray of bool of shape (n_data,)) – Noisy Target output.
- write_data_in_file(x: list[bool] | numpy.typing.NDArray[numpy.bool_], y: list[bool] | numpy.typing.NDArray[numpy.bool_], file_name: str) None
Write the input dataset x and y in a file with name file_name.
- Parameters:
x ({array-like, ndarray, dataframe} of shape (n_data, n_features)) – Binary input.
y ({array-like, ndarray, dataframe} of shape (n_data,)) – Target output value in one_to_one mapping of binary input x.
file_name (str) – Name of the file (with or without the path) in which you aim to store dataset x and y.