model.noisecut_coder

Coder class for implementing NoiseCut method.

ATTENTION :

It is highly recommended to use NoiseCut class in noisecut_model.py to construct the model as the access of the user has been limited to some inherited methods and attributes of the class to keep intact attributes from unintended manipulations.

Module Contents

Classes

CoderNoiseCut

NoiseCut model to fit the tree-structured binary dataset.

Metric

Metrics to assess the performance of the model.

PseudoBooleanFunc

Build Pseudo-Boolean function.

class model.noisecut_coder.CoderNoiseCut(n_input_each_box: list[int] | numpy.typing.NDArray[numpy.int_], threshold: float = 0.5)

Bases: noisecut.tree_structured.structured_data.StructuredData

NoiseCut model to fit the tree-structured binary dataset.

CoderNoiseCut class is used to predict the binary functions of the structured model of the training dataset by NoiseCut method. However, it is highly recommended to use NoiseCut class in noisecut_model.py file to construct the model as the access of the user has been limited to some inherited methods and attributes of the class to keep intact the attributes from unintended manipulations.

Parameters:
  • n_input_each_box ({list, ndarray} of shape (n_box,)) – An array of size n_box (number of first-layer black boxes) which keeps number of input features to each box. For instance, when n_input_each_box=[2, 4, 1], it means there are three first-layer black boxes and number of input features to box1, box2 and box3 is 2, 4, and 1, respectively.

  • threshold (float in range (0,1), default=0.5≤) – Calculation of the function of the 2nd layer black box is done based on the threshold value. It is more reasonable to be set to 0.5 (default value).

y_number_of_0

Based on the fact that the NoiseCut method is designed for binary data, only 0 and 1 are the valid inputs of the model. To store data efficiently, each binary input can be seen as a decimal value which is used as the index of the y_number_of_0 and y_number_of_1. If the output of the binary input is 0, y_number_of_0[decimal index] is increased by 1 at that specific decimal index.

Type:

ndarray of int of shape (2**dimension,)

y_number_of_1

Based on the fact that the NoiseCut method is designed for binary data, only 0 and 1 are the valid inputs of the model. To store data efficiently, each binary input can be seen as a decimal value which is used as the index of the y_number_of_0 and y_number_of_1. If the output of the binary input is 1, y_number_of_1[decimal index] is increased by 1 at that specific decimal index.

Type:

ndarray of int of shape (2**dimension,)

model_fitted

True if model is fitted.

Type:

bool

number_of_0_labels

It is used in counting number of 0 of target output values based on the binary input to the 2nd-layer black box. Indeed, outputs of the 1st-layer black boxes enters as an input to the 2nd-layer black box. Each binary input to the 2nd-layer black box can be seen as a decimal value for indexing number_of_0_labels.

Type:

ndarray of int of shape (2**n_box)

number_of_1_labels

It is used in counting number of one of target output values based on the binary input to the 2nd-layer black box. Indeed, outputs of the 1st-layer black boxes enters as an input to the 2nd-layer black box. Each binary input to the 2nd-layer black box can be seen as a decimal value for indexing number_of_1_labels.

Type:

ndarray of int of shape (2**n_box)

decimal_index_boxes

Keeps the decimal value of input features to each first-layer black box. For instance, if n_input_each_box = [2, 4, 1] and one sample has the following input feature [0, 1, 0, 0, 1, 1, 1], the decimal_index_boxes is as follows: [decimal(0, 1), decimal(0, 0, 1, 1), decimal(1)] = [2,12,1].

Type:

ndarray of int of shape (n_box,)

probability_of_being_1

Is used for setting probability of the target output to be one based on the number_of_0_labels and number_of_1_labels.

Type:

ndarray of shape (2**n_box)

score_of_being_1

Is used for setting a score in a defined range (for instance from 1 to 5) to the target output of a binary input based on the binary input to the 2nd layer black box. The score value indicates probability of being one. For instance, if the score value is maximum in a defined range, it is more likely that the NoiseCut model has a good prediction of the output.

Notes

There is only one black box in the second-layer.

Type:

ndarray of shape (2**n_box)

initialize_w() None

Initialize w.

set_weight_each_box_by_recursion(i_o: int, number_loop_level: int = 0) None

Set weight of each 1st-layer black box by recursion (NoiseCut method).

Parameters:
  • i_o (int) –

    Indicates the sequence of the nested loops. For instance, when n_box=3, if i_o=0, the sequence of nested loop is like:

    for i[0] in range(…):
    for i[1] in range(…):
    for i[2] in range(…):

    and the method sets weight for the 3rd box because of the last loop in the nested loop.

    If i_o=1, the sequence of nested loop is like:

    for i[1] in range(…):
    for i[2] in range(…):
    for i[0] in range(…):

    and sets weight for the 1st box.

    And if i_o=2, the sequence of nested loop is like:

    for i[2] in range(…):
    for i[0] in range(…):
    for i[1] in range(…):

    and sets weight for the 2nd box.

    Hint: (i[0], i[1], i[2]) can be seen as (i, j, k).

  • number_loop_level (int, default=0) –

    Number of loop in which code runs. For instance, purpose of this function is to build nested loops to the number of n_box. For instance, when n_box=3 and i_o=0, it is somehow similar to building such a for loop:

    for i[0] in range(…): -> number_loop_level = 0
    for i[1] in range(…): -> number_loop_level = 1
    for i[2] in range(…): -> number_loop_level = 2

    hint: (i[0], i[1], i[2]) can be seen as (i, j, k).

count_labels_by_recursion(number_loop_level: int = 0) None

Count labels for each binary input to the 2nd-layer black box.

Count number of 0 and 1 of target output values based on the binary input to the 2nd-layer black box. number_of_0_labels and number_of_1_labels are set in this method.

Parameters:

number_loop_level (int, default=0) –

Number of loop in which code runs. Purpose of this function is to build nested loops to the number of n_box. For instance, when n_box=3, it is somehow similar to building such a for loop:

for i[0] in range(…): -> number_loop_level = 0
for i[1] in range(…): -> number_loop_level = 1
for i[2] in range(…): -> number_loop_level = 2

hint: (i[0], i[1], i[2]) can be seen as (i, j, k).

coder_set_training_data(x: list[bool] | numpy.typing.NDArray[numpy.bool_], y: list[bool] | numpy.typing.NDArray[numpy.bool_]) None

Set x and y dataset.

Parameters:
  • x ({array-like, ndarray, dataframe} of shape (n_data, dimension)) – Each row of the array x is a binary input.

  • y ({array-like, ndarray, dataframe} of shape (n_data,)) – Target output value in one_to_one mapping of binary input x.

Warns:
  • If some data has already been set, by calling

  • `coder_set_training_data()` again, the previous data will be

  • deleted. If you do not intend to delete previous data,

  • use `coder_add_training_data()` instead.

coder_add_training_data(x: list[bool] | numpy.typing.NDArray[numpy.bool_], y: list[bool] | numpy.typing.NDArray[numpy.bool_]) None

Add x and y dataset to the existing dataset.

Parameters:
  • x ({array-like, ndarray, dataframe} of shape (n_data, dimension)) – Each row of the array x is a binary input.

  • y ({array-like, ndarray, dataframe} of shape (n_data,)) – Target output value in one_to_one mapping of binary input x.

Raises:

TypeError – If some data has already been set, it will work. Otherwise, it pops up an error.

coder_fit(x: list[bool] | numpy.typing.NDArray[numpy.bool_], y: list[bool] | numpy.typing.NDArray[numpy.bool_], with_more_data: bool = False, print_result: bool = False, print_weights: bool = False) None

Fit the model based on the input dataset.

Parameters:
  • x ({array-like, ndarray, dataframe} of shape (n_data, dimension)) – Training data to fit the NoiseCut model. Each row of the array x is a binary input.

  • y ({array-like, ndarray, dataframe} of shape (n_data,)) – Training data output in one_to_one mapping of binary training data input x.

  • with_more_data (bool, default=False) – Whether to fit the model with additional data if the model has been fitted once. Indeed, the x, y will be added to the existing data from the previous fitting and then the model will fit again if with_more_data=True.

  • print_result (bool, default=False) – Whether to print the result of fitting.

  • print_weights (bool, default=False) – Whether to print the set weights of the 1st-layer black boxes (for debugging purpose).

coder_predict(x: list[bool] | numpy.typing.NDArray[numpy.bool_]) numpy.typing.NDArray[numpy.bool_]

Return predicted output of the NoiseCut model to the binary inout x.

Parameters:

x ({array-like, ndarray, dataframe} of shape (n_test_data, dimension)) – Data input to predict the binary output for each row of it, which is a binary input.

Returns:

Predicted output in one_to_one mapping of binary input x.

Return type:

ndarray of bool of shape (n_test_data,)

coder_predict_all() numpy.typing.NDArray[numpy.bool_]

Return all predicted output of the structured system by NoiseCut.

Predict output of the structured system for any possible binary input based on the fitted NoiseCut model. Decimal value of binary input can be seen as an index for the returned array to get the predicted value for that binary input.

For instance, consider a structured system with ‘dimension=5’:

returned_array[ 1] is the predicted output of the fitted NoiseCut model to binary input [1, 0, 0, 0, 0].

returned_array[22] is the predicted output of the fitted NoiseCut model to binary input [1, 0, 1, 1, 0].

Returns:

Predicted Output.

Return type:

ndarray of shape (2**dimension,)

coder_set_uncertainty_measure(file_path_result: str | None = None) None

Set uncertainty.

Parameters:

file_path_result (str, default=None) – Path of a file to save the uncertainty result to it.

__set_probability_of_being_1() None

Set probability of the target output to be 1.

Set probability of the target output to be 1 based on the binary input to the 2nd-layer black box, which is the output of the 1st-layer black boxes. number_of_0_labels and number_of_1_labels are used to set this measure for each possible binary input to the 2nd-layer black box.

coder_predict_probability_of_being_1(x: list[bool] | numpy.typing.NDArray[numpy.bool_]) numpy.typing.NDArray[numpy.float_]

Return probability of the predicted target output to be 1.

Parameters:

x ({array-like, ndarray, dataframe} of shape (n_samples, dimension)) – Data input to predict probability of being 1 for each row of it, which is a binary input.

Returns:

Probability of the predicted target output to be 1 in one_to_one mapping of binary input x.

Return type:

ndarray of float of shape (n_samples,)

coder_predict_pseudo_boolean_func_coef() numpy.typing.NDArray[numpy.float_]

Return the Pseudo boolean coefficients of 2nd-layer black box.

Return the Pseudo boolean coefficients of the Pseudo boolean function of the 2nd-layer black box based on the values of the probability_of_being_1.

Returns:

Pseudo boolean coefficients in array type.

Return type:

ndarray of float of shape (2**n_box,)

__set_score(vector_n_score: list[float] | numpy.typing.NDArray[numpy.float_]) None

Set score based on probability value.

Score is set for each binary input of the 2nd-layer black box based on the probability value.

Parameters:

vector_n_score ({array-like, ndarray}) – An indicator how the score is given to a range of probabilities. vector_n_score array Should have 0 as the first element of the array and 1 as the last element. The other elements in between should be unique and in range 0 to 1. The array should be also sorted from lowest, which is zero, to highest, which is one. For instance, consider vector_n_score is [0, 0.2, 0.4, 0.6, 0.8, 1]. Highest score is set to binary inputs of the 2nd-layer black box, which has a probability in range 0.8 to 1. Lowest score is always one and highest score depends on the length of vector_n_score. In this example, highest score is 5.

coder_predict_score(x: list[bool] | numpy.typing.NDArray[numpy.bool_], vector_n_score: list[float] | numpy.typing.NDArray[numpy.float_], validate: bool = True) numpy.typing.NDArray[numpy.int_]

Return predicted score for each binary input of x.

Parameters:
  • x ({array-like, ndarray, dataframe} of shape (n_samples, dimension)) – Input data to predict score of output for it. Each row of the array x is a binary input.

  • vector_n_score ({array-like, ndarray}) – An indicator how the score is given to a range of probabilities. vector_n_score array Should have 0 as the first element of the array and 1 as the last element. The other elements in between should be unique and in range 0 to 1. The array should be also sorted from lowest, which is zero, to highest, which is one. For instance, consider vector_n_score is [0, 0.2, 0.4, 0.6, 0.8, 1]. Highest score is set to binary inputs of the 2nd-layer black box, which has a probability in range 0.8 to 1. Lowest score is always one and highest score depends on the length of vector_n_score. In this example, highest score is 5.

  • validate (bool) – Whether input data needs to be validated. Note: Always validate data if it has not been validated before.

Returns:

Score array is set in one_to_one mapping of binary input x.

Return type:

ndarray of int of shape (n_samples,)

coder_predict_mortality_of_each_score(x_test: list[bool] | numpy.typing.NDArray[numpy.bool_], y_test: list[bool] | numpy.typing.NDArray[numpy.bool_], vector_n_score: list[float] | numpy.typing.NDArray[numpy.float_], print_mortality: bool = False) tuple[numpy.typing.NDArray[numpy.float_], numpy.typing.NDArray[numpy.int_], numpy.typing.NDArray[numpy.int_]]

Return percentage of output to be 1 for test data with specific score.

Parameters:
  • x_test ({array-like, ndarray, dataframe} of shape (n_data,) – dimension) Test data input against the fitted NoiseCut model. Each row of the array x is a binary input.

  • y_test ({array-like, ndarray, dataframe} of shape (n_data,)) – Predicted output of test data in one_to_one mapping of test data x.

  • vector_n_score ({array-like, ndarray}) – An indicator how the score is given to a range of probabilities. vector_n_score array Should have 0 as the first element of the array and 1 as the last element. The other elements in between should be unique and in range 0 to 1. The array should be also sorted from lowest, which is zero, to highest, which is one. For instance, consider vector_n_score is [0, 0.2, 0.4, 0.6, 0.8, 1]. Highest score is set to binary inputs of the 2nd-layer black box, which has a probability in range 0.8 to 1. Lowest score is always one and highest score depends on the length of vector_n_score. In this example, highest score is 5.

  • print_mortality (bool, default=False) – Whether print result of the method.

Returns:

Percentage for each score. For example: returned_array[0] -> for score 1.

Return type:

ndarray of shape (n_score,)

Raises:

RuntimeError – Riase error if model is not fitted.

coder_print_model() None

Print the model data when the model has been fitted.

class model.noisecut_coder.Metric

Bases: noisecut.tree_structured.base.Base

Metrics to assess the performance of the model.

static set_confusion_matrix(y_test: list[bool] | numpy.typing.NDArray[numpy.bool_], y_predicted: list[bool] | numpy.typing.NDArray[numpy.bool_]) tuple[float, float, float, float]

Set confusion matrix.

Parameters:
  • y_test (ndarray of shape (n_samples,)) – Target output value in one_to_one mapping of binary input x_test.

  • y_predicted (ndarray of shape (n_samples,)) – Predicted output value of the model in one_to_one mapping of binary input x_test.

Returns:

  • accuracy (float)

  • recall (float)

  • precision (float)

  • F1 (float)

Notes

confusion matrix:
Actual Actual

0 1

Predicted 0 TN FN Predicted 1 FP TP

class model.noisecut_coder.PseudoBooleanFunc(arity: int)

Bases: noisecut.tree_structured.base_structured_data.BasePseudoBooleanFunc

Build Pseudo-Boolean function.

Parameters:

arity (int) – Arity of the Pseudo-Boolean function.

num

A counter for filling the dict_new.

Type:

int

dict_new

Stores members of each statement in the Pseudo-Boolean function. When n=3 the Pseudo-Boolean function is a0 + a1 X1 + a2 X2 + a3 X3 + a4 X1 X2 + a5 X1 X3 + a6 X2 X3 + a7 X1 X2 X3. So, ‘dict_new’ stores for key 0 an empty array as there is not any Xi for that. It stores for key 4 [1,2] which represents X1 X2.

Type:

dict

coefficient_matrix

To find the coefficients of the Pseudo-Boolean function in matrix form of Ax=y, matrix A should be built. x is a vector of unkown coefficients and A is a matrix in which each row represents a binary input. For instance, when n=3, each row of the matrix is filled as below: [1, X1, X2, X3, X1*X2, X1*X3, X2*X3, X1*X2*X3].

Type:

ndarray of shape(2**n,2**n)

Examples

>>> from noisecut.model.noisecut_coder import PseudoBooleanFunc
>>> pseudo_boolean_func = PseudoBooleanFunc(3)
>>> x = [[0, 0, 0], [1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 1],     [1, 0, 1], [0, 1, 1], [1, 1, 1]]
>>> y = [0, 0, 1, 1, 0, 1, 0, 1]
>>> coef = pseudo_boolean_func.get_coef_boolean_func(x, y)
The Boolean Function is:
0.00 + 0.00 X1 + 1.00 X2 + 0.00 X3 + 0.00 X1 X2 + 1.00 X1 X3 + -1.00 X2
X3 + 0.00 X1 X2 X3 +
>>> print(coef)
[ 0.  0.  1.  0.  0.  1. -1.  0.]
>>> pseudo_boolean_func.dict_new
{0: array([], dtype=float64), 1: array([1]), 2: array([2]), 3: array([
3]), 4: array([1, 2]), 5: array([1, 3]), 6: array([2, 3]), 7: array([1,
2, 3])}
__recursion(n_size: int, level: int) None

Fill a specific element of dict_new.

Parameters:
  • n_size (int) – Number of Xi in each statement of the Pseudo-Boolean function.

  • level (int) – Number of loop in which the code will be run, it starts from 0.

__build() None

Fill dict_new as an attribute of the class.

__fill_coef_matrix(x: numpy.typing.NDArray[numpy.bool_]) None

Fill matrix as an attribute of the class.

Parameters:

x (ndarray of bool of shape (2**n, n)) – Binary data input to set a Pseudo-Boolean function for.

Notes

For instance, when n=3, each row of the matrix is filled as below: [1, X1, X2, X3, X1*X2, X1*X3, X2*X3, X1*X2*X3].

get_coef_boolean_func(x: numpy.typing.NDArray[numpy.bool_], y: numpy.typing.NDArray[numpy.float_], print_allowance: bool = True) numpy.typing.NDArray[numpy.float_]

Return the coefficient of the Pseudo-Boolean function.

Parameters:
  • x (ndarray of bool of shape (2**n, n)) – Binary data input to set a Pseudo-Boolean function for.

  • y (ndarray of float of shape (2**n,)) – Data output in one_to_one mapping of binary data input x.

  • print_allowance (bool) – Whether print the computed Pseudo-Boolean function.

Returns:

Csoefficient of the Pseudo-Boolean function.

Return type:

ndarray of float of shape(2**n,)