nodegam#

Subpackages#

Submodules#

nodegam.arch module#

The architecture of the models.

This file includes the NODE (ODSTBlock), NODE-GAM (GAMBlock), and NODE-GAM with attention (GAMAttBlock).

class nodegam.arch.GAMAdditiveMixin#

Bases: object

All Functions related to extracting GAM and GA2M graphs from the model.

convert_onehot_vector_to_integers(terms)#

Make onehot or multi-hot vectors into a list of integers or tuple.

Parameters

terms (Pytorch tensor) – a one-hot matrix with each column has only one entry as 1. Shape: [in_features, uniq_GAM_terms].

Returns

tuple_terms (list) – A list of integers or tuples of all the GAM terms.

extract_additive_terms(X, norm_fn=<function GAMAdditiveMixin.<lambda>>, y_mu=0.0, y_std=1.0, device='cpu', batch_size=1024, tol=0.001, purify=True)#

Extract the additive terms in the GAM/GA2M model to plot the graphs.

To extract the main and interaction terms, it runs the model on all possible input values and get the predicted value of each additive term. Then it returns a mapping of x and model’s outputs y in a dataframe for each term.

Parameters
  • X – Input 2d array (pandas). Note that it is the unpreprocessed data.

  • norm_fn – The data preprocessing function (E.g. quantile normalization) before feeding into the model. Inputs: pandas X. Outputs: preprocessed outputs.

  • y_mu – The outputs of the model will be multiplied by y_std and then shifted by y_mu. It’s useful in regression problem where target y is normalized to mean 0 and std 1. Default: 0, 1.

  • y_std – The outputs of the model will be multiplied by y_std and then shifted by y_mu. It’s useful in regression problem where target y is normalized to mean 0 and std 1. Default: 0, 1.

  • device – Use which device to run the model. Default: ‘cpu’.

  • batch_size – Batch size.

  • tol – The tolerance error for the interaction purification that moves mass from interactions to mains (see the “purification” of the paper).

  • purify – If True, we move all effects of the interactions to main effects.

Returns
  • A pandas table that records all main and interaction terms. The columns include::

  • feat_name – The feature name. E.g. “Hour”.

  • feat_idx – The feature index. E.g. 2.

  • x – The unique values of the feature. E.g. [0.5, 3, 4.7].

  • y – The values of the output. E.g. [-0.2, 0.3, 0.5].

  • importance – The feature importance. It’s calculated as the weighted average of the absolute value of y weighted by the counts of each unique value.

  • counts – The counts of each unique value in the data. E.g. [20, 10, 3].

get_additive_terms(return_inverse=False)#

Get the additive terms in the GAM/GA2M model.

It returns all the main and interaction effects in the NodeGAM.

Parameters

return_inverse (bool) – If True, it returns the map back from each additive term to the index of trees. It’s useful to check which tree focuses on which feature set.

Returns

tuple_terms (list) – A list of integer or tuple that represents all the additive terms it learns. E.g. [2, 4, (2, 3), (1, 4)].

run_with_additive_terms(x)#

Run the models but return the outputs of each main and interaction term.

Run the models. But instead of summing all the tree outputs, we return the aggregate outputs under each main or interaction term for each example.

Parameters

x – Inputs to the model. A Pytorch Tensor of [batch_size, in_features].

Returns
  • A tensor with shape [batch_size, num_unique_terms, output_dim] where

  • ‘num_unique_terms’ is the total number of main and interaction effects, and

  • ‘output_dim’ is the output_dim (num_classes)

class nodegam.arch.GAMAttBlock(in_features, num_trees, num_layers, num_classes=1, addi_tree_dim=0, output_dropout=0.0, init_bias=True, add_last_linear=True, last_dropout=0.0, l2_lambda=0.0, l2_interactions=0.0, l1_interactions=0.0, **kwargs)#

Bases: GAMBlock

Node-GAM with attention model.

Initialization of Node-GAM.

Parameters
  • in_features – The input dimension of dataset.

  • num_trees – How many ODST trees in a layer.

  • num_layers – How many layers of trees.

  • num_classes – How many classes to predict. It’s the output dim.

  • addi_tree_dim – Additional dimension for the outputs of each tree. If the value x > 0, each tree outputs a (1 + x) dimension of vector.

  • output_dropout – The dropout rate on the output of each tree.

  • init_bias – If set to True, it adds a trainable bias to the output of the model.

  • add_last_linear – If set to True, add a last linear layer to sum outputs of all trees.

  • last_dropout – If add_last_layer is True, it adds a dropout on the weight og last linear year.

  • l2_lambda – Add a l2 penalty on the outputs of trees.

  • l2_interactions – Penalize the l2 magnitude of the output of trees that have pairwise interactions. Default: 0.

  • l1_interactions – Penalize the l1 magnitude of the output of trees that have pairwise interactions. Default: 0.

  • kwargs (dict) – The arguments for underlying GAM ODST trees.

classmethod add_model_specific_args(parser)#

Add argparse arguments.

create_layers(in_features, num_trees, num_layers, tree_dim, **kwargs)#

Create layers of oblivious trees.

Parameters
  • in_features – The dim of input features.

  • num_trees – The number of trees in a layer.

  • num_layers – The number of layers.

  • tree_dim – The output dimension of each tree.

  • kwargs – The kwargs for initializing GAMAtt ODST trees.

classmethod get_model_specific_rs_hparams()#

Specify the range of hyperparameter search.

class nodegam.arch.GAMBlock(in_features, num_trees, num_layers, num_classes=1, addi_tree_dim=0, output_dropout=0.0, init_bias=True, add_last_linear=True, last_dropout=0.0, l2_lambda=0.0, l2_interactions=0.0, l1_interactions=0.0, **kwargs)#

Bases: GAMAdditiveMixin, ODSTBlock

Node-GAM model.

Initialization of Node-GAM.

Parameters
  • in_features – The input dimension of dataset.

  • num_trees – How many ODST trees in a layer.

  • num_layers – How many layers of trees.

  • num_classes – How many classes to predict. It’s the output dim.

  • addi_tree_dim – Additional dimension for the outputs of each tree. If the value x > 0, each tree outputs a (1 + x) dimension of vector.

  • output_dropout – The dropout rate on the output of each tree.

  • init_bias – If set to True, it adds a trainable bias to the output of the model.

  • add_last_linear – If set to True, add a last linear layer to sum outputs of all trees.

  • last_dropout – If add_last_layer is True, it adds a dropout on the weight og last linear year.

  • l2_lambda – Add a l2 penalty on the outputs of trees.

  • l2_interactions – Penalize the l2 magnitude of the output of trees that have pairwise interactions. Default: 0.

  • l1_interactions – Penalize the l1 magnitude of the output of trees that have pairwise interactions. Default: 0.

  • kwargs (dict) – The arguments for underlying GAM ODST trees.

classmethod add_model_specific_args(parser)#

Add argparse arguments.

classmethod add_model_specific_results(results, args)#

Record some model attributes into the csv result.

calculate_l2_penalty(outputs)#

Calculate the penalty of the trees’ outputs.

It helps regularize the model.

Parameters

outputs – The outputs of trees. A tensor of shape [batch_size, num_trees, tree_dim].

create_layers(in_features, num_trees, num_layers, tree_dim, **kwargs)#

Create layers.

Parameters
  • in_features – The input dimension (feature).

  • num_trees – Number of trees in a layer.

  • num_layers – Number of layers.

  • tree_dim – The dimension of the tree’s output. Usually equal to num of classes.

  • kwargs (dict) – The arguments for underlying GAM ODST trees.

classmethod get_model_specific_rs_hparams()#

Specify the range of hyperparameter search.

classmethod load_model_by_hparams(args, ret_step_callback=False)#

Load the initialized model by its hyperparameters.

Parameters

args – The arguments of the model. Can passed in a dictionary or a namespace.

run_with_layers(x, return_fs=False)#

Run the examples through the layers of trees.

Parameters
  • x – The input tensor of shape [batch_size, in_features].

  • return_fs – If True, it returns the feature selectors of each tree.

Returns
  • outputs – The trees’ outputs [batch_size, num_trees, tree_dim].

  • prev_feature_selectors – Only returns when return_fs is True, this returns the feature selector of each ODST tree of shape [in_features, num_trees, tree_depth].

class nodegam.arch.ODSTBlock(in_features, num_trees, num_layers, num_classes=1, addi_tree_dim=0, output_dropout=0.0, init_bias=True, add_last_linear=True, last_dropout=0.0, l2_lambda=0.0, **kwargs)#

Bases: Sequential

Original NODE model adapted from https://github.com/Qwicen/node.

Neural Oblivious Decision Ensembles (NODE).

Parameters
  • in_features – The input dimension of dataset.

  • num_trees – How many ODST trees in a layer.

  • num_layers – How many layers of trees.

  • num_classes – How many classes to predict. It’s the output dim.

  • addi_tree_dim – Additional dimension for the outputs of each tree. If the value x > 0, each tree outputs a (1 + x) dimension of vector.

  • output_dropout – The dropout rate on the output of each tree.

  • init_bias – If set to True, it adds a trainable bias to the output of the model.

  • add_last_linear – If set to True, add a last linear layer to sum outputs of all trees.

  • last_dropout – If add_last_layer is True, then it adds a dropout on the weight og last linear year.

  • l2_lambda – Add a l2 penalty on the outputs of trees.

  • kwargs – The kwargs for initializing odst trees.

classmethod add_model_specific_args(parser)#

Add argparse arguments.

classmethod add_model_specific_results(results, args)#

Add or modify the output of csv recording.

calculate_l2_penalty(outputs)#

Calculate l2 penalty.

create_layers(in_features, num_trees, num_layers, tree_dim, **kwargs)#

Create layers of oblivious trees.

Parameters
  • in_features – The dim of input features.

  • num_trees – The number of trees in a layer.

  • num_layers – The number of layers.

  • tree_dim – The output dimension of each tree.

  • kwargs – The kwargs for initializing odst trees.

forward(x, return_outputs_penalty=False, feature_masks=None)#

Model prediction.

Parameters
  • x – The input features.

  • return_outputs_penalty – If True, it returns the output l2 penalty.

  • feature_masks – Only used in the pretraining. If passed, the outputs of trees belonging to masked features (masks==1) is zeroed. This is like dropping out features directly.

freeze_all_but_lastw()#
classmethod get_model_specific_rs_hparams()#

Specify the range of hyperparameter search.

get_num_trees_assigned_to_each_feature()#

Get the number of trees assigned to each feature per layer.

It’s helpful for logging. Just to see how many trees focus on some features.

Returns

Counts of trees with shape of [num_layers, num_input_features (in_features)

classmethod load_model_by_hparams(args, ret_step_callback=False)#

Helper function to generate a model instance based on hyperparameters.

Parameters

args – The arguments from argparse. It specifies all hyperparameters.

run_with_layers(x)#
set_bias(y_train)#

Set the bias term for GAM output as logodds of y.

It’s unnecessary to run since we can just use a learnable bias.

unfreeze()#

nodegam.data module#

Data preprocessors and functions.

nodegam.data.create_onedrive_directdownload(onedrive_link)#

See https://towardsdatascience.com/how-to-get-onedrive-direct-download-link-ecb52a62fee4.

nodegam.data.download(url, filename, delete_if_interrupted=True, chunk_size=4096)#

It saves file from url to filename with a fancy progressbar.

nodegam.data.download_file_from_google_drive(id, destination)#

See https://stackoverflow.com/questions/38511444/python-download-files-from-google-drive-using-url.

nodegam.data.download_file_from_onedrive(onedrive_link, destination)#

Download file from onedrive.

nodegam.data.fetch_ADULT(path='./data/', fold=0)#

Download Adult Income dataset.

Parameters
  • path – Where the data should be stored.

  • fold – Which data fold to use. Choose from [0, 1, 2, 3, 4].

Returns

data (dict) –

It contains the following keys::
  • ‘X_train’, ‘y_train’, ‘X_valid’, ‘y_valid’, ‘X_test’, ‘y_test’:

    Train/val/test sets. X is apandas dataframe and y is a numpy array.

  • ‘problem’:

    Which problem type. Either ‘classification’ or ‘regression’.

  • ‘cat_features’:

    Which features are categorical.

  • ‘metric’:

    The optimized metric on this dataset. Set to auc.

nodegam.data.fetch_BIKESHARE(path='./data/', fold=0)#

Download Bikeshare dataset from Kaggle.

Parameters
  • path – Where the data should be stored.

  • fold – Which data fold to use. Choose from [0, 1, 2, 3, 4].

Returns

data (dict) –

It contains the following keys::
  • ‘X_train’, ‘y_train’, ‘X_valid’, ‘y_valid’, ‘X_test’, ‘y_test’:

    Train/val/test sets. X is apandas dataframe and y is a numpy array.

  • ‘problem’:

    Which problem type. Either ‘classification’ or ‘regression’.

nodegam.data.fetch_CHURN(path='./data/', fold=0)#

Download CHURN dataset.

Parameters
  • path – Where the data should be stored.

  • fold – Which data fold to use. Choose from [0, 1, 2, 3, 4].

Returns

data (dict) –

It contains the following keys::
  • ‘X_train’, ‘y_train’, ‘X_valid’, ‘y_valid’, ‘X_test’, ‘y_test’:

    Train/val/test sets. X is apandas dataframe and y is a numpy array.

  • ‘problem’:

    Which problem type. Either ‘classification’ or ‘regression’.

  • ‘cat_features’:

    Which features are categorical.

  • ‘metric’:

    The optimized metric on this dataset. Set to auc.

nodegam.data.fetch_CLICK(path='./data/', valid_size=100000, validation_seed=None, fold=0)#

Download CLICK dataset from https://www.kaggle.com/slamnz/primer-airlines-delay.

Parameters
  • path – Where the data should be stored.

  • valid_size – Validation size. Default to 100K.

  • validation_seed – The seed. Default to None.

  • fold – Which data fold to use. This is not used since only 1 fold is available.

Returns

data (dict) –

It contains the following keys::
  • ‘X_train’, ‘y_train’, ‘X_valid’, ‘y_valid’, ‘X_test’, ‘y_test’:

    Train/val/test sets. X is apandas dataframe and y is a numpy array.

  • ‘problem’:

    Which problem type. Either ‘classification’ or ‘regression’.

  • ‘cat_features’:

    Which features are categorical.

nodegam.data.fetch_COMPAS(path='./data/', fold=0)#

Download COMPAS dataset.

Parameters
  • path – Where the data should be stored.

  • fold – Which data fold to use. Choose from [0, 1, 2, 3, 4].

Returns

data (dict) –

It contains the following keys::
  • ‘X_train’, ‘y_train’, ‘X_valid’, ‘y_valid’, ‘X_test’, ‘y_test’:

    Train/val/test sets. X is apandas dataframe and y is a numpy array.

  • ‘problem’:

    Which problem type. Either ‘classification’ or ‘regression’.

  • ‘cat_features’:

    Which features are categorical.

  • ‘metric’:

    The optimized metric on this dataset. Set to auc.

nodegam.data.fetch_CREDIT(path='./data/', fold=0)#

Download Credit dataset.

Parameters
  • path – Where the data should be stored.

  • fold – Which data fold to use. Choose from [0, 1, 2, 3, 4].

Returns

data (dict) –

It contains the following keys::
  • ‘X_train’, ‘y_train’, ‘X_valid’, ‘y_valid’, ‘X_test’, ‘y_test’:

    Train/val/test sets. X is apandas dataframe and y is a numpy array.

  • ‘problem’:

    Which problem type. Either ‘classification’ or ‘regression’.

  • ‘cat_features’:

    Which features are categorical.

  • ‘metric’:

    The optimized metric on this dataset. Set to auc.

nodegam.data.fetch_EPSILON(path='./data/', fold=0)#

Download EPSILON dataset from NODE paper.

Parameters
  • path – Where the data should be stored.

  • fold – Which data fold to use. This is not used since only 1 fold is available.

Returns

data (dict) –

It contains the following keys::
  • ‘X_train’, ‘y_train’, ‘X_valid’, ‘y_valid’, ‘X_test’, ‘y_test’:

    Train/val/test sets. X is apandas dataframe and y is a numpy array.

  • ‘problem’:

    Which problem type. Either ‘classification’ or ‘regression’.

nodegam.data.fetch_HIGGS(path='./data/', test_size=500000, fold=0)#

Download HIGGS dataset from NODE paper.

Parameters
  • path – Where the data should be stored.

  • fold – Which data fold to use. This is not used since only 1 fold is available.

Returns

data (dict) –

It contains the following keys::
  • ‘X_train’, ‘y_train’, ‘X_valid’, ‘y_valid’, ‘X_test’, ‘y_test’:

    Train/val/test sets. X is apandas dataframe and y is a numpy array.

  • ‘problem’:

    Which problem type. Either ‘classification’ or ‘regression’.

nodegam.data.fetch_MICROSOFT(path='./data/', fold=0)#

Download MICROSOFT dataset from NODE paper.

Parameters
  • path – Where the data should be stored.

  • fold – Which data fold to use. This is not used since only 1 fold is available.

Returns

data (dict) –

It contains the following keys::
  • ‘X_train’, ‘y_train’, ‘X_valid’, ‘y_valid’, ‘X_test’, ‘y_test’:

    Train/val/test sets. X is apandas dataframe and y is a numpy array.

  • ‘problem’:

    Which problem type. Either ‘classification’ or ‘regression’.

nodegam.data.fetch_MIMIC2(path='./data/', fold=0)#

Download MIMIC2 dataset.

Parameters
  • path – Where the data should be stored.

  • fold – Which data fold to use. Choose from [0, 1, 2, 3, 4].

Returns

data (dict) –

It contains the following keys::
  • ‘X_train’, ‘y_train’, ‘X_valid’, ‘y_valid’, ‘X_test’, ‘y_test’:

    Train/val/test sets. X is apandas dataframe and y is a numpy array.

  • ‘problem’:

    Which problem type. Either ‘classification’ or ‘regression’.

  • ‘cat_features’:

    Which features are categorical.

  • ‘metric’:

    The optimized metric on this dataset.

nodegam.data.fetch_MIMIC3(path='./data/', fold=0)#

Download MIMIC3 dataset.

It aggregates the values within first 24-hour window and predict the mortality.

Parameters
  • path – Where the data should be stored.

  • fold – Which data fold to use. Choose from [0, 1, 2, 3, 4].

Returns

data (dict) –

It contains the following keys::
  • ‘X_train’, ‘y_train’, ‘X_valid’, ‘y_valid’, ‘X_test’, ‘y_test’:

    Train/val/test sets. X is apandas dataframe and y is a numpy array.

  • ‘problem’:

    Which problem type. Either ‘classification’ or ‘regression’.

  • ‘metric’:

    The optimized metric on this dataset. Set to auc.

nodegam.data.fetch_PROTEIN(path='./data/', fold=0)#

Download Protein dataset from NODE paper.

See https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#protein.

Parameters
  • path – Where the data should be stored.

  • fold – Which data fold to use. This is not used since only 1 fold is available.

Returns

data (dict) – It contains the following keys: ‘X_train’, ‘y_train’, ‘X_valid’, ‘y_valid’, ‘X_test’, ‘y_test’: train/val/test data. ‘problem’: which problem type. Either ‘classification’ or ‘regression’.

nodegam.data.fetch_ROSSMANN(path='./data/', fold=0)#

Download Rossman dataset.

Parameters
  • path – Where the data should be stored.

  • fold – Not used.

Returns

data (dict) –

It contains the following keys::
  • ‘X_train’, ‘y_train’, ‘X_valid’, ‘y_valid’, ‘X_test’, ‘y_test’:

    Train/val/test sets. X is apandas dataframe and y is a numpy array.

  • ‘problem’:

    Which problem type. Either ‘classification’ or ‘regression’.

  • ‘cat_features’:

    Which features are categorical.

nodegam.data.fetch_SARCOS(path='./data/', fold=0, target_id=None)#

Download Sarcos dataset for multi-task learning.

Parameters
  • path – Where the data should be stored.

  • fold – Data fold.

  • target_id – which task to return. If None, y contains all tasks. Set between 0 to 6.

Returns

data (dict) –

It contains the following keys::
  • ‘X_train’, ‘y_train’, ‘X_valid’, ‘y_valid’, ‘X_test’, ‘y_test’:

    Train/val/test sets. X is apandas dataframe and y is a numpy array.

  • ‘problem’:

    Which problem type. Either ‘classification’ or ‘regression’.

  • ‘cat_features’:

    Which features are categorical.

nodegam.data.fetch_SUPPORT2(path='./data/', fold=0)#

Download Support2 dataset.

Parameters
  • path – Where the data should be stored.

  • fold – Which data fold to use. Choose from [0, 1, 2, 3, 4].

Returns

data (dict) –

It contains the following keys::
  • ‘X_train’, ‘y_train’, ‘X_valid’, ‘y_valid’, ‘X_test’, ‘y_test’:

    Train/val/test sets. X is apandas dataframe and y is a numpy array.

  • ‘problem’:

    Which problem type. Either ‘classification’ or ‘regression’.

  • ‘cat_features’:

    Which features are categorical.

  • ‘metric’:

    The optimized metric on this dataset. Set to auc.

nodegam.data.fetch_WINE(path='./data/', fold=0)#

Download Wine dataset from Kaggle.

Parameters
  • path – Where the data should be stored.

  • fold – Which data fold to use. Choose from [0, 1, 2, 3, 4].

Returns

data (dict) –

It contains the following keys::
  • ‘X_train’, ‘y_train’, ‘X_valid’, ‘y_valid’, ‘X_test’, ‘y_test’:

    Train/val/test sets. X is apandas dataframe and y is a numpy array.

  • ‘problem’:

    Which problem type. Either ‘classification’ or ‘regression’.

nodegam.data.fetch_YAHOO(path='./data/', fold=0)#

Download YAHOO dataset from the NODE paper.

Parameters
  • path – Where the data should be stored.

  • fold – Which data fold to use. This is not used since only 1 fold is available.

Returns

data (dict) –

It contains the following keys::
  • ‘X_train’, ‘y_train’, ‘X_valid’, ‘y_valid’, ‘X_test’, ‘y_test’:

    Train/val/test sets. X is apandas dataframe and y is a numpy array.

  • ‘problem’:

    Which problem type. Either ‘classification’ or ‘regression’.

nodegam.data.fetch_YEAR(path='./data/', test_size=51630, fold=0)#

Download YEAR dataset from NODE paper.

Parameters
  • path – Where the data should be stored.

  • fold – Which data fold to use. This is not used since only 1 fold is available.

Returns

data (dict) –

It contains the following keys::
  • ‘X_train’, ‘y_train’, ‘X_valid’, ‘y_valid’, ‘X_test’, ‘y_test’:

    Train/val/test sets. X is apandas dataframe and y is a numpy array.

  • ‘problem’:

    Which problem type. Either ‘classification’ or ‘regression’.

nodegam.mypreprocessor module#

The preprocessor that normalizes and imputes the data.

class nodegam.mypreprocessor.MyPreprocessor(random_state=1377, cat_features=None, y_normalize=False, quantile_transform=False, output_distribution='normal', n_quantiles=2000, quantile_noise=0.001)#

Bases: object

Preprocessor does the data preprocessing like input and target normalization.

Parameters
  • random_state – Global random seed for an experiment.

  • cat_features – If passed in, it does the ordinal encoding for these features before other input normalization like quantile transformation. Default: None.

  • y_normalize – If True, it standardizes the targets y by setting the mean and stdev to 0 and 1. Useful in the regression setting.

  • quantile_transform – If True, transforms the features to follow a normal or uniform distribution.

  • output_distribution – Choose between [‘normal’, ‘uniform’]. Data is projected onto this distribution. See the same param of sklearn QuantileTransformer. ‘normal’ is better.

  • n_quantiles – Number of quantiles to estimate the distribution. Default: 2000.

  • quantile_noise – If specified, fits QuantileTransformer on data with added gaussian noise with std = :quantile_noise: * data.std; this will cause discrete values to be more separable. Please note that this transformation does NOT apply gaussian noise to the resulting data, the noise is only applied for QuantileTransformer.

Example

>>> preprocessor = nodegam.mypreprocessor.MyPreprocessor(
>>>     cat_features=['ethnicity', 'gender'],
>>>     y_normalize=True,
>>>     random_state=1337,
>>> )
>>> preprocessor.fit(X_train, y_train)
>>> X_train, y_train = preprocessor.transform(X_train, y_train)
fit(X, y)#

Fit the transformer.

Parameters
  • X (pandas daraframe) – Input data.

  • y (numpy array) – target y.

transform(*args)#

Transform the data.

Parameters
  • X (pandas daraframe) – Input data.

  • y (numpy array) – Optional. If passed in, it will do target normalization.

Returns
  • X (pandas daraframe) – Normalized Input data.

  • y (numpy array) – Optional. Normalized y.

nodegam.nn_utils module#

Neural Network related utils like Entmax and Modules.

class nodegam.nn_utils.EM15Temp(steps, max_temp=1.0, min_temp=0.01, sample_soft=False)#

Bases: _Temp

EntMax15 with temperature annealing.

Annealing temperature from max to min in log10 space.

Parameters
  • steps – The number of steps to change from max_temp to the min_temp in log10 space.

  • max_temp – The max (initial) temperature.

  • min_temp – The min (final) temperature.

  • sample_soft – If False, the model does a hard operation after the specified steps.

forward_with_tau(logits, dim)#
training: bool#
class nodegam.nn_utils.EMoid15Temp(**kwargs)#

Bases: _Temp

Entmoid with temperature annealing.

Annealing temperature from max to min in log10 space.

Parameters
  • steps – The number of steps to change from max_temp to the min_temp in log10 space.

  • max_temp – The max (initial) temperature.

  • min_temp – The min (final) temperature.

  • sample_soft – If False, the model does a hard operation after the specified steps.

discrete_op(logits, dim=-1)#
forward_with_tau(logits, dim=-1)#
training: bool#
class nodegam.nn_utils.Entmax15Function(*args, **kwargs)#

Bases: Function

Entropy Max (EntMax).

An implementation of exact Entmax with alpha=1.5 (B. Peters, V. Niculae, A. Martins). See :cite:`https://arxiv.org/abs/1905.05702 for detailed description. Source: https://github.com/deep-spin/entmax

static backward(ctx, grad_output)#

Defines a formula for differentiating the operation with backward mode automatic differentiation (alias to the vjp function).

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by as many outputs as the forward() returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve tensors saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computated w.r.t. the output.

static forward(ctx, input, dim=-1)#

Performs the operation.

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).

The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with ctx.save_for_backward() if they are intended to be used in backward (equivalently, vjp) or ctx.save_for_forward() if they are intended to be used for in jvp.

class nodegam.nn_utils.Entmoid15(*args, **kwargs)#

Bases: Function

A highly optimized equivalent of lambda x: Entmax15([x, 0]).

static backward(ctx, grad_output)#

Defines a formula for differentiating the operation with backward mode automatic differentiation (alias to the vjp function).

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by as many outputs as the forward() returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve tensors saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computated w.r.t. the output.

static forward(ctx, input)#

Performs the operation.

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).

The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with ctx.save_for_backward() if they are intended to be used in backward (equivalently, vjp) or ctx.save_for_forward() if they are intended to be used for in jvp.

class nodegam.nn_utils.GSMTemp(steps, max_temp=1.0, min_temp=0.01, sample_soft=False)#

Bases: _Temp

Gumbel Softmax with temperature annealing.

Annealing temperature from max to min in log10 space.

Parameters
  • steps – The number of steps to change from max_temp to the min_temp in log10 space.

  • max_temp – The max (initial) temperature.

  • min_temp – The min (final) temperature.

  • sample_soft – If False, the model does a hard operation after the specified steps.

forward_with_tau(logits, dim)#
training: bool#
class nodegam.nn_utils.Lambda(func)#

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(*args, **kwargs)#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#
class nodegam.nn_utils.ModuleWithInit#

Bases: Module

Base class for pytorch module with data-aware initializer on first batch.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

initialize(*args, **kwargs)#

initialize module tensors using first batch of data.

training: bool#
class nodegam.nn_utils.SMTemp(steps, max_temp=1.0, min_temp=0.01, sample_soft=False)#

Bases: _Temp

Softmax with temperature annealing.

Annealing temperature from max to min in log10 space.

Parameters
  • steps – The number of steps to change from max_temp to the min_temp in log10 space.

  • max_temp – The max (initial) temperature.

  • min_temp – The min (final) temperature.

  • sample_soft – If False, the model does a hard operation after the specified steps.

forward_with_tau(logits, dim)#
training: bool#
class nodegam.nn_utils.SparsemaxFunction(*args, **kwargs)#

Bases: Function

Sparsemax function.

An implementation of sparsemax (Martins & Astudillo, 2016). See :cite:`DBLP:journals/corr/MartinsA16` for detailed description.

By Ben Peters and Vlad Niculae.

static backward(ctx, grad_output)#

Defines a formula for differentiating the operation with backward mode automatic differentiation (alias to the vjp function).

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by as many outputs as the forward() returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve tensors saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computated w.r.t. the output.

static forward(ctx, input, dim=-1)#

sparsemax: normalizing sparse transform (a la softmax)

Parameters
  • input – Any dimension.

  • dim – Dimension along which to apply.

Returns

output (Tensor) – Same shape as input.

nodegam.nn_utils.entmax15(input, dim=-1)#
nodegam.nn_utils.entmoid15()#
nodegam.nn_utils.my_one_hot(val, dim=-1)#

Make one hot encoding along certain dimension and not just the last dimension.

Parameters
  • val – A pytorch tensor.

  • dim – The dimension.

nodegam.nn_utils.sparsemax(input, dim=-1)#
nodegam.nn_utils.sparsemoid(input)#
nodegam.nn_utils.to_one_hot(y, depth=None)#

Make the target become one-hot encoding.

Takes integer with n dims and converts it to 1-hot representation with n + 1 dims. The n+1’st dimension will have zeros everywhere but at y’th index, where it will be equal to 1.

Parameters
  • y – Input integer (IntTensor, LongTensor or Variable) of any shape.

  • depth (int) – The size of the one hot dimension.

Returns

y_onehot – The onehot encoding of y.

nodegam.odst module#

Implementation of NODE-GAM layer.

class nodegam.odst.GAMAttODST(in_features, num_trees, tree_dim=1, depth=6, choice_function=<function <lambda>>, bin_function=<built-in method apply of FunctionMeta object>, initialize_response_=<function normal_>, initialize_selection_logits_=<function uniform_>, colsample_bytree=1.0, selectors_detach=True, ga2m=0, prev_in_features=0, dim_att=8, **kwargs)#

Bases: GAM_ODST

A layer of GAM ODST trees with attention mechanism.

Change a layer of ODST trees to make each tree only depend on at most 1 or 2 features to make it as a GAM or GA2M. And also add an attention between layers.

Parameters
  • in_features – Number of features in the input tensor.

  • num_trees – Number of trees in this layer.

  • tree_dim – Number of response channels in the response of individual tree.

  • depth – Number of splits in every tree.

  • choice_function – f(tensor, dim) -> R_simplex computes feature weights s.t. f(tensor, dim).sum(dim) == 1.

  • bin_function – f(tensor) -> R[0, 1], computes tree leaf weights.

  • initialize_response – In-place initializer for tree output tensor.

  • initialize_selection_logits – in-place initializer for logits that select features for the tree. Both thresholds and scales are initialized with data-aware init (or .load_state_dict).

  • colsample_bytree – The random proportion of features allowed in each tree. The same argument as in xgboost package. If less than 1, for each tree, it will only choose a fraction of features to train. For instance, if colsample_bytree = 0.9, each tree will only selects among 90% of the features.

  • selectors_detach – If True, the selector will be detached before passing into the next layer. This will save GPU memory in the large dataset (e.g. Epsilon).

  • fs_normalize – If True, we normalize the feature selectors be summed to 1. But False or True do not make too much difference in performance.

  • ga2m – If set to 1, use GA2M, else use GAM.

  • prev_in_features – The number of previous layers’ outputs.

  • dim_att – The dimension of attention embedding to reduce memory consumption.

  • kwargs – For other old unused arguments for compatibility reasons.

cal_prev_feat_weights(feature_selectors, pfs)#

Calculate the feature weights of the previous trees outputs.

To make sure it’s a GAM or GA2M, the weights should be 0 if the previous tree focus on different (sets of) features than the current tree, and should be 1 if they are the same.

Parameters
  • feature_selectors – The current feature selector of this layer.

  • pfs – The previous feature selectors.

Returns

fw – The feature weights for the previous trees’ outputs. Values are between 0 and 1 with shape as [prev_trees_outputs, current_tree_outputs, depth], where depth=1 in GAM and depth=2 in GA2M.

training: bool#
class nodegam.odst.GAM_ODST(in_features, num_trees, tree_dim=1, depth=6, choice_function=<function <lambda>>, bin_function=<built-in method apply of FunctionMeta object>, initialize_response_=<function normal_>, initialize_selection_logits_=<function uniform_>, colsample_bytree=1.0, selectors_detach=True, fs_normalize=True, ga2m=0, **kwargs)#

Bases: ODST

A layer of GAM ODST trees.

Change a layer of ODST trees to make each tree only depend on at most 1 or 2 features to make it as a GAM or GA2M.

Parameters
  • in_features – Number of features in the input tensor.

  • num_trees – Number of trees in this layer.

  • tree_dim – Number of response channels in the response of individual tree.

  • depth – Number of splits in every tree.

  • choice_function – f(tensor, dim) -> R_simplex computes feature weights s.t. f(tensor, dim).sum(dim) == 1.

  • bin_function – f(tensor) -> R[0, 1], computes tree leaf weights.

  • initialize_response – In-place initializer for tree output tensor.

  • initialize_selection_logits – In-place initializer for logits that select features for the tree. Both thresholds and scales are initialized with data-aware init (or .load_state_dict).

  • colsample_bytree – The random proportion of features allowed in each tree. The same argument as in xgboost package. If less than 1, for each tree, it will only choose a fraction of features to train. For instance, if colsample_bytree = 0.9, each tree will only selects among 90% of the features.

  • selectors_detach – If True, the selector will be detached before passing into the next layer. This will save GPU memory in the large dataset (e.g. Epsilon).

  • fs_normalize – If True, we normalize the feature selectors be summed to 1. But False or True do not make too much difference in performance.

  • ga2m – If set to 1, use GA2M, else use GAM.

  • kwargs – For other old unused arguments for compatibility reasons.

cal_prev_feat_weights(myfs, pfs)#

Calculate the feature weights of the previous trees outputs.

To make sure it’s a GAM or GA2M, the weights should be 0 if the previous tree focus on different (sets of) features than the current tree, and should be 1 if they are the same.

Parameters
  • myfs – The current feature selector of this layer.

  • pfs – The previous feature selectors.

Returns

fw – The feature weights for the previous trees’ outputs. Values are between 0 and 1 with shape as [prev_trees_outputs, current_tree_outputs, depth], where depth=1 in GAM and depth=2 in GA2M.

forward(input, return_feature_selectors=True, prev_feature_selectors=None)#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_feature_selection_values(input, return_fss=False)#

Get the selected features of each tree.

Parameters
  • input – Input data of shape [batch_size, in_features].

  • return_fss – If True, return the feature selectors.

Returns
  • feature_values – The feature input to trees in a batch with Shape as [batch_size, num_trees, tree_depth].

  • feature_selectors – (Optional) the feature selectors.

get_num_trees_assigned_to_each_feature()#
initialize(input, return_feature_selectors=True, prev_feature_selectors=None, eps=1e-06)#

initialize module tensors using first batch of data.

post_process(feature_selectors)#
training: bool#
class nodegam.odst.ODST(in_features, num_trees, depth=6, tree_dim=1, choice_function=<function <lambda>>, bin_function=<built-in method apply of FunctionMeta object>, initialize_response_=<function normal_>, initialize_selection_logits_=<function uniform_>, threshold_init_beta=1.0, threshold_init_cutoff=1.0, colsample_bytree=1.0, **kwargs)#

Bases: ModuleWithInit

Oblivious Differentiable Sparsemax Trees. http://tinyurl.com/odst-readmore.

One can drop (sic!) this module anywhere instead of nn.Linear

Parameters
  • in_features – Number of features in the input tensor.

  • num_trees – Number of trees in this layer.

  • tree_dim – Number of response channels in the response of individual tree.

  • depth – Number of splits in every tree.

  • flatten_output – If False, returns […, num_trees, tree_dim], by default returns […, num_trees * tree_dim].

  • choice_function – f(tensor, dim) -> R_simplex computes feature weights s.t. f(tensor, dim).sum(dim) == 1.

  • bin_function – f(tensor) -> R[0, 1], computes tree leaf weights.

  • initialize_response – In-place initializer for tree output tensor.

  • initialize_selection_logits – In-place initializer for logits that select features for the tree. Both thresholds and scales are initialized with data-aware init (or .load_state_dict).

  • threshold_init_beta – Initializes threshold to a q-th quantile of data points where q ~ Beta(:threshold_init_beta:, :threshold_init_beta:). If this param is set to 1, initial thresholds will have the same distribution as data points. If greater than 1 (e.g. 10), thresholds will be closer to median data value. If less than 1 (e.g. 0.1), thresholds will approach min/max data values.

  • threshold_init_cutoff

    Threshold log-temperatures initializer, in (0, inf) By default(1.0), log-remperatures are initialized in such a way that all bin selectors end up in the linear region of sparse-sigmoid. The temperatures are then scaled by this parameter:

    - Setting this value > 1.0 will result in some margin between data points and
        sparse-sigmoid cutoff value.
    - Setting this value < 1.0 will cause (1 - value) part of data points to end up
        in flat sparse-sigmoid region. For instance, threshold_init_cutoff = 0.9
        will set 10% points equal to 0.0 or 1.0.
    - Setting this value > 1.0 will result in a margin between data points and
        sparse-sigmoid cutoff value. All points will be between
        (0.5 - 0.5 / threshold_init_cutoff) and (0.5 + 0.5 / threshold_init_cutoff).
    

  • colsample_bytree – The random proportion of features allowed in each tree. The same argument as in xgboost package. If less than 1, for each tree, it will only choose a fraction of features to train. For instance, if colsample_bytree = 0.9, each tree will only selects among 90% of the features.

  • kwargs – For other old unused arguments for compatibility reasons.

forward(input)#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_feature_selection_values(input)#

Get the selected features of each tree.

Parameters

input – Input data of shape [batch_size, in_features].

Returns

feature_values – The feature input to trees in a batch with shape as [batch_size, num_trees, tree_depth].

get_feature_selectors()#

Get the feature selectors of each tree of each depth.

Returns

feature_selectors – Tensor of shape [in_features, num_trees, tree_depth]. The values of first dimension sum to 1.

initialize(input, eps=1e-06)#

initialize module tensors using first batch of data.

training: bool#
nodegam.odst.entmoid15()#

nodegam.recorder module#

A simple recorder to store the model’s training progress.

class nodegam.recorder.Recorder(path)#

Bases: object

A recorder to store the model’s training progress.

Useful to resume training if interuppted by the scheduler. It will reload the record if previous record exists.

Parameters

path – the path to store the record.

clear()#

Remove the record.

load_record()#

Load the record.

save_record()#

Save the record.

nodegam.sklearn module#

This file implements a higher-level NodeGAM model that can just call fit(X, y).

The goal is to provide a simple interface for users who just want to use it like:

>>> model = NodeGAM()
>>> model.fit(X, y)
class nodegam.sklearn.NodeGAMBase(in_features, cat_features=None, validation_size=0.15, quantile_dist='normal', quantile_noise=0.001, name=None, seed=1377, arch='GAM', ga2m=1, num_classes=1, num_trees=200, num_layers=2, depth=3, addi_tree_dim=0, colsample_bytree=0.5, output_dropout=0, last_dropout=0.3, l2_lambda=0, dim_att=8, n_last_checkpoints=5, batch_size=2048, lr=0.01, lr_warmup_steps=100, lr_decay_steps=300, early_stopping_steps=2000, max_steps=20000, max_time=72000, anneal_steps=2000, report_frequency=100, fp16=0, device='cuda', objective='negative_auc', problem='classification', verbose=1)#

Bases: object

Base class for all NodeGAM.

NodeGAM Base.

fit(X: DataFrame, y: ndarray, X_val: Optional[DataFrame] = None, y_val: Optional[ndarray] = None)#

Train the model.

Parameters
  • X (pandas dataframe) – inputs.

  • y (numpy array) – targets.

  • X_val (pandas dataframe) – if set, instead of splitting validation set from the X, it uses this X as validation set.

  • y_val (numpy array) – if set, uses this as validation y.

Returns
  • train_losses – the training losses of each optimization step.

  • val_metrics – the validation losses of optimization under the report_frequency.

get_GAM_df(all_X: DataFrame, max_n_bins: int = 256)#

Extract the GAM dataframe from the model.

Parameters
  • all_X – all the input data in X.

  • max_n_bins – max number of bins per feature.

Returns

df – a GAM dataframe with each row representing a GAM term.

print(*args)#
visualize(X: DataFrame, max_n_bins: int = 256, show_density: bool = False)#

Visualize the GAM graph.

Parameters
  • all_X – all the input data in X.

  • max_n_bins – max number of bins per feature.

  • show_density – if True, show the density of data as red colors in the background in the main effect plot.

Returns
  • fig – the figure.

  • axes – all the subplots.

  • df – the GAM dataframe.

class nodegam.sklearn.NodeGAMClassifier(in_features, cat_features=None, validation_size=0.15, quantile_dist='normal', quantile_noise=0.001, name=None, seed=1377, arch='GAM', ga2m=1, num_classes=1, num_trees=200, num_layers=2, depth=3, addi_tree_dim=0, colsample_bytree=0.5, output_dropout=0, last_dropout=0.3, l2_lambda=0, dim_att=8, n_last_checkpoints=5, batch_size=2048, lr=0.01, lr_warmup_steps=100, lr_decay_steps=300, early_stopping_steps=2000, max_steps=10000, max_time=72000, anneal_steps=2000, report_frequency=100, fp16=0, device='cuda', objective='ce_loss', verbose=1)#

Bases: NodeGAMBase

A NodeGAM Classfier that follows sklearn interface to train.

Parameters
  • in_features (int) – number of input features.

  • cat_features – the name of categorical features that match the columns of X.

  • validation_size – validation size.

  • quantile_dist – choose between [‘normal’, ‘uniform’]. Data is projected onto this distribution. See the flag ‘output_dist’ of sklearn QuantileTransformer.

  • quantile_noise – fits QuantileTransformer on data with added gaussian noise with std = :quantile_noise: * data.std; this will cause discrete values to be more separable. Please note that this transformation does NOT apply gaussian noise to the resulting data, the noise is only applied for QuantileTransformer.fit().

  • name – the model’s name. It’s used to store checkpoints under logs/{name}. If not specified, it randomly generates a temperory name.

  • seed – random seed.

  • arch – choose between [‘GAM’, ‘GAMAtt’]. GAMAtt is the architecture with attention. Often GAMAtt is better in large datasets while GAM is better in smaller ones.

  • ga2m – if 0, only model GAM. If 1, model GA2M.

  • num_classes – number of target classes. If set to 1, it is binary classification. Set to > 2 for multi-class classifications, but the visualization is not available yet for the multi-class setup.

  • num_trees – number of trees per layer.

  • num_layers – number of layers of trees.

  • depth – depth of the tree. Should be at least 2 if ga2m=1.

  • addi_tree_dim – additional dimension of tree’s output. Default: 0.

  • colsample_bytree – the random proportion of features allowed in each tree. The same argument as in xgboost package. If less than 1, for each tree, it will only choose a fraction of features to train.

  • output_dropout – the dropout rate on the output of each tree.

  • last_dropout – the dropout rate on the weight of the last linear layer.

  • l2_lambda – the l2 penalty coefficient on the outputs of trees.

  • dim_att – the dimension of the attention embedding.

  • n_last_checkpoints – number of the most recent checkpoints to take average.

  • batch_size – batch size. Should be bigger than 1024.

  • lr – the learning rate.

  • lr_warmup_steps – warm up the learning rate in the first few steps.

  • lr_decay_steps – decrease the learning rate by half if not improving for these steps.

  • early_stopping_steps – early stopping if not improving for k steps.

  • max_steps – maximum number of steps to optimize.

  • max_time – maximum number of time to optimize in seconds.

  • anneal_steps – temperature annealing steps. After this step, the EntMax becomes Max.

  • report_frequency – how many steps to report.

  • fp16 – if 1, use fp16 to optimize.

  • device='cuda' – choose from [‘cpu’, ‘cuda’].

  • objective – the evaluation objective. Only used in binary classification i.e. `num_classes`=1 . Choose from [‘ce_loss’, ‘negative_auc’, ‘error_rate’]. If num_classes > 2 (multi-class classifier), only [‘ce_loss’, ‘error_rate’] is allowed.

  • verbose – if 1, print the training progress.

predict(X: DataFrame)#

Predict logits.

Parameters

X (pandas dataframe) – Input.

Returns

logits (numpy array) – logits.

predict_proba(X: DataFrame)#

Predict probability.

Parameters

X – pandas dataframe.

Returns

prob (numpy array) – the probability of 2 classes with shape [N, 2].

class nodegam.sklearn.NodeGAMRegressor(in_features, cat_features=None, validation_size=0.15, quantile_dist='normal', quantile_noise=0.001, name=None, seed=1377, arch='GAM', ga2m=1, num_trees=200, num_layers=2, depth=3, addi_tree_dim=0, colsample_bytree=0.5, output_dropout=0, last_dropout=0.3, l2_lambda=0, dim_att=8, n_last_checkpoints=5, batch_size=2048, lr=0.01, lr_warmup_steps=100, lr_decay_steps=600, early_stopping_steps=2000, max_steps=20000, max_time=72000, anneal_steps=2000, report_frequency=100, fp16=0, device='cuda', verbose=1)#

Bases: NodeGAMBase

A NodeGAM Regressor that follows sklearn interface to train.

Parameters
  • in_features (int) – number of input features.

  • cat_features – the name of categorical features that match the columns of X.

  • validation_size – validation size.

  • quantile_dist – choose between [‘normal’, ‘uniform’]. Data is projected onto this distribution. See the flag ‘output_dist’ of sklearn QuantileTransformer.

  • quantile_noise – fits QuantileTransformer on data with added gaussian noise with std = :quantile_noise: * data.std; this will cause discrete values to be more separable. Please note that this transformation does NOT apply gaussian noise to the resulting data, the noise is only applied for QuantileTransformer.fit().

  • name – the model’s name. It’s used to store checkpoints under logs/{name}. If not specified, it randomly generates a temperory name.

  • seed – random seed.

  • arch – choose between [‘GAM’, ‘GAMAtt’]. GAMAtt is the architecture with attention. Often GAMAtt is better in large datasets while GAM is better in smaller ones.

  • ga2m – if 0, only model GAM. If 1, model GA2M.

  • num_trees – number of trees per layer.

  • num_layers – number of layers of trees.

  • depth – depth of the tree. Should be at least 2 if ga2m=1.

  • addi_tree_dim – additional dimension of tree’s output. Default: 0.

  • colsample_bytree – the random proportion of features allowed in each tree. The same argument as in xgboost package. If less than 1, for each tree, it will only choose a fraction of features to train.

  • output_dropout – the dropout rate on the output of each tree.

  • last_dropout – the dropout rate on the weight of the last linear layer.

  • l2_lambda – the l2 penalty coefficient on the outputs of trees.

  • dim_att – the dimension of the attention embedding.

  • n_last_checkpoints – number of the most recent checkpoints to take average.

  • batch_size – batch size. Should be bigger than 1024.

  • lr – the learning rate.

  • lr_warmup_steps – warm up the learning rate in the first few steps.

  • lr_decay_steps – decrease the learning rate by half if not improving for these steps.

  • early_stopping_steps – early stopping if not improving for k steps.

  • max_steps – maximum number of steps to optimize.

  • max_time – maximum number of time to optimize in seconds.

  • anneal_steps – temperature annealing steps. After this step, the EntMax becomes Max.

  • report_frequency – how many steps to report.

  • fp16 – if 1, use fp16 to optimize.

  • device='cuda' – choose from [‘cpu’, ‘cuda’].

  • verbose – if 1, print the training progress.

predict(X: DataFrame)#

Predict regression.

Parameters

X – pandas dataframe.

Returns

prediction – numpy array.

nodegam.sklearn.entmoid15()#

nodegam.trainer module#

The trainer to optimize the model.

class nodegam.trainer.Trainer(model, experiment_name=None, warm_start=False, Optimizer=<class 'torch.optim.adam.Adam'>, optimizer_params={}, lr=0.01, lr_warmup_steps=-1, verbose=False, n_last_checkpoints=5, step_callbacks=[], fp16=0, problem='classification', pretraining_ratio=0.15, masks_noise=0.1, opt_only_last_layer=False, freeze_steps=0, **kwargs)#

Bases: Module

Trainer.

Parameters
  • model (torch.nn.Module) – the model.

  • experiment_name – a path where all logs and checkpoints are saved.

  • warm_start – when set to True, loads the last checkpoint.

  • Optimizer – function(parameters) -> optimizer. Default: torch.optim.Adam.

  • optimizer_params – parameter when intializing optimizer. Usage: Optimizer(**optimizer_params).

  • verbose – when set to True, produces logging information.

  • n_last_checkpoints – the last few checkpoints to do model averaging.

  • step_callbacks – function(step). Will be called after each optimization step.

  • problem – problem type. Chosen from [‘classification’, ‘regression’, ‘pretrain’].

  • pretraining_ratio – the percentage of feature to mask for reconstruction. Between 0 and 1. Only used when problem == ‘pretrain’.

average_checkpoints(tags=None, paths=None, out_tag='avg', out_path=None)#
decrease_lr(ratio=0.1, min_lr=1e-06)#
evaluate_ce_loss(X_test, y_test, device, batch_size=512)#

Evaluate cross entropy loss for binary or multi-class targets.

Parameters
  • X_test – input features.

  • y_test (numpy Int array or torch Long tensor) – the target classes.

Returns

celoss (float) – the average cross entropy loss.

evaluate_classification_error(X_test, y_test, device, batch_size=4096)#

This is for evaluation of one or multi-class classification error rate.

evaluate_mse(X_test, y_test, device, batch_size=4096)#
evaluate_multiple_mse(X_test, y_test, device, batch_size=4096)#
evaluate_negative_auc(X_test, y_test, device, batch_size=4096)#
evaluate_pretrain_loss(X_test, y_test, device, batch_size=4096)#
get_latest_checkpoints(pattern, n_last=None)#
get_latest_file(pattern)#
load_checkpoint(tag=None, path=None, **kwargs)#
mask_input(x_batch)#
pretrain_loss(outputs, masks, targets)#
remove_old_temp_checkpoints(number_ckpts_to_keep=None)#
save_checkpoint(tag=None, path=None, mkdir=True, **kwargs)#
set_lr(lr)#
train_on_batch(*batch, device, update=True)#
training: bool#

nodegam.utils module#

All utilities including minibatches, files, seeds, model storages, and GAM extractions.

class nodegam.utils.Timer(name, remove_start_msg=True)#

Bases: object

A simple timer.

Parameters
  • name – the name of the timer.

  • remove_start_msg – if True, it will remove the start message of running.

Usage:
>>> with Timer('model training'):
>>>     train()
Run model training.........
Finish model training in 1.3s
nodegam.utils.average_GAM_dfs(all_dfs)#

Take average of GAM dataframes to derive mean and stdev for each term.

Parameters

all_dfs – a list of dataframes.

Returns

df – the averaged dataframe with mean, stdev and the importance.

nodegam.utils.average_GAMs(gam_dirs, **kwargs)#

Take average of GAM models to derive mean and stdev from their model names.

Parameters

gam_dirs – a list of model name. E.g. [‘0603_bikeshare’]. The model has to be stored under “logs/{name}”.

Returns

df – the averaged dataframe with mean, stdev and the importance.

nodegam.utils.check_numpy(x)#

Makes sure x is a numpy array. If not, make it as one.

nodegam.utils.extract_GAM_from_NODE(saved_dir, max_n_bins=256, way='blackbox', cache=False, **kwargs)#

Extract the GAM dataframe from the NodeGAM model.

Parameters
  • saved_dir – the saved directory of the NodeGAM.

  • max_n_bins – max number of bins of each feature.

  • way – choice from [‘blackbox’, ‘mine’]. ‘blackbox’ treats the model as a blackbox to extract a GAM dataframe. ‘mine’ can only be applied to NodeGAM that uses the internal knowledge of NodeGAM to extract the GAM/GA2M dataframe.

  • cache – if True, it stores ‘df_cache_bins{max_n_bins}.pkl’ under the saved_dir.

  • kwargs – the additional arguments when calling model.extract_additive_terms().

Returns

df – the GAM dataframe.

nodegam.utils.extract_GAM_from_baselines(saved_dir, max_n_bins=256, **kwargs)#

Extract the dataframe from other GAM baselines like EBM and Spline.

Parameters
  • saved_dir – the saved model’s directory.

  • max_n_bins – the max number of bins for each feature.

Returns

df – the GAM dataframe.

nodegam.utils.extract_GAM_from_saved_dir(saved_dir, max_n_bins=256, **kwargs)#

Extract the GAM dataframe from a saved model directory (either NodeGAM or EBM or Spline).

Parameters
  • saved_dir – the saved directory.

  • max_n_bins – max number of bins for each feature when extracting.

  • kwargs – additional arguments passed into NodeGAM.extract_additive_terms().

Returns

df – a GAM dataframe.

nodegam.utils.free_memory(sleep_time=0.1)#

Black magic function to free torch memory and some jupyter whims.

nodegam.utils.get_gpu_stat(pitem: str, device_id=0)#

Get the GPU stats.

Borrow from pytorch lightning: https://github.com/PyTorchLightning/PyTorch-Lightning/blob/0.9.0/pytorch_lightning/callbacks/gpu_usage_logger.py#L30-L166

Parameters
  • pitem – the gpu partition.

  • device_id – the device id of gpu.

Returns

gpu_usage – the GPU memory consumption.

nodegam.utils.get_latest_file(pattern)#

Get the lattest files under the regex pattern.

Parameters

pattern – the regex pattern. E.g. ‘*.csv’.

nodegam.utils.iterate_minibatches(*tensors, batch_size, shuffle=True, epochs=1, allow_incomplete=True, callback=<function <lambda>>)#

Run the minibatches.

Parameters
  • *tensors – the tensors to run minibatch.

  • batch_size – the batch size.

  • shuffle – if True, shuffle the tensors before each epoch starts.

  • epochs – the number of epochs to iterate minibatches.

  • allow_incomplete – if True, the last batch of each epoch can be smaller than the batch_size.

  • callback – f(list of batch start idxes). Could be useful to change the batch start idxes.

Example

>>> for x, y in iterate_minibatches(X, Y, batch_size=256, shuflle=True, epochs=10):
>>>     train(x, y)
nodegam.utils.load_best_model_from_trained_dir(the_dir)#

Load the best NodeGAM model from a trained directory.

Follow the filenames of checkpoints in ‘main.py’.

Parameters

the_dir – the saved direcotry.

Returns

model – a pytorch NodeGAM model.

nodegam.utils.load_hparams(the_dir)#

Load the hyperparameters (hparams) from a directory.

nodegam.utils.make_predictions(model_name, X)#

Make predictions of some model.

Parameters
  • model_name – the model name. It’s saved under logs/{model_name}/.

  • X (pandas dataframe) – the input data.

Returns

ret (numpy array) – the prediction on X.

nodegam.utils.md5sum(fname)#

Computes mdp checksum of a file.

nodegam.utils.nop_ctx()#
nodegam.utils.output_csv(the_path, data_dict, order=None, delimiter=',')#

Output a csv file from a python dictionary.

If the csv file exists, it outputs another row under this csv file.

Parameters
  • the_path – the filename of the csv file.

  • data_dict – the data dictionary.

  • order – if specified, the columns of the csv follow the specified order. Default: None.

  • delimiter – the seperated delimiter. Defulat: ‘,’.

nodegam.utils.process_in_chunks(function, *args, batch_size, out=None, **kwargs)#

Computes output by applying batch-parallel function to large data tensor in chunks.

Parameters
  • function – a function(*[x[indices, …] for x in args]) -> out[indices, …].

  • args – one or many tensors, each [num_instances, …].

  • batch_size – maximum chunk size processed in one go.

  • out – memory buffer for out, defaults to torch.zeros of appropriate size and type.

Returns

out – the outputs of function(data), computed in a memory-efficient (mini-batch) way.

nodegam.utils.seed_everything(seed=None) int#

Seed everything.

It includes pytorch, numpy, python.random and sets PYTHONHASHSEED environment variable. Borrow it from the pytorch_lightning.

Parameters

seed – the seed. If None, it generates one.

nodegam.utils.sigmoid_np(x)#

A sigmoid function for numpy array.

Parameters

x – numpy array.

Returns

the sigmoid value.

nodegam.utils.to_float_str(element)#

nodegam.vis_utils module#

Adapted from https://github.com/zzzace2000/GAMs_models/.

Visualization utilities include plotting GAMs and comparing pandas tables.

nodegam.vis_utils.add_new_row(table, series, row_name)#
nodegam.vis_utils.cal_statistics(table, is_metric_higher_better=True, add_ns_baseline=False)#

Calculate the statistics like average, average ranks across scores.

Parameters
  • table – a pandas table with each row as method and column as different datasets.

  • is_metric_higher_better – if True, treat the higher metric as better.

  • add_ns_baseline – if True, add an normalized score to the statistics.

Returns

A pandas table with two summary row as (1) average value, and (2) – also highlights the best number as red and the worst method as green.

nodegam.vis_utils.extract_mean(s)#

Extract the mean and remove stdev from the content of 0.123 +- 0.234.

Parameters

s – the string with format “mean +- stdev”. E.g. “0.123 +- 0.234”.

Returns

mean – a float number, e.g. 0.123.

nodegam.vis_utils.highlight_min_max(x, is_extract_mean=True)#
nodegam.vis_utils.normalized_score(x, is_metric_higher_better=True, min_value=None)#
nodegam.vis_utils.rank(x, is_metric_higher_better=True, is_extract_mean=True)#
nodegam.vis_utils.vis_GAM_effects(all_dfs, num_cols=4, figsize=None, vertical_margin=2, horizontal_margin=2, sort_by_imp=False, show_density=False, model_names=None, feature_names=None, feature_idxes=None, top_main=-1, top_interactions=-1, only_interactions=False, call_backs=None)#

Visualize main and interaction effects of the GAM model.

Parameters
  • all_dfs – the dictionary of dataframes. The key is the model name and the value is the GAM dataframe of each model.

  • num_cols – number of columns when showing GAM graphs.

  • figsize

    the figure size. If not specified, it uses the (width, height) = (4 * num_cols + (num_cols-1) * horizontal_margin,

    3 * num_rows + vertical_margin * (num_rows - 1)).

  • vertical_margin – the vertical margin. Default: 2.

  • horizontal_margin – the horizontal margin. Default: 2.

  • sort_by_imp – if True, sort the figures by the feature importances. Otherwise use the feature default order.

  • show_density – if True, it represents the data density as color red in the background when showing the main effect GAM graph.

  • model_names – if specified, only show the GAM models corresponding to these.

  • feature_names – if specified, only show the GAM graphs corresponding to these names.

  • feature_idxes – if specified, only show the GAM graphs corresponding to these feature index.

  • top_main – if > 0, only show the top k main effects. If -1, show all main effects.

  • top_interactions – if > 0, only show the top k interactions. If -1, show all interactions.

  • only_interactions – if True, hide all the main effect plots and only show interaction terms.

  • call_backs – if specified, it calls this function at the end of plotting the graph. It should be a dict with key as the feature name and the value as a function (lambda ax: f(ax)) that can modify the axis corresponding to that feature. Useful to do feature-specific adjustment.

Returns
  • fig – the figure.

  • axes (numpy array) – all the axes.

Module contents#