trelawney package

Submodules

trelawney.base_explainer module

module that provides the base explainer class from which all future explainers will inherit

class trelawney.base_explainer.BaseExplainer[source]

Bases: abc.ABC

the base explainer class. this is an abstract class so you will need to define some behaviors when implementing your new explainer. In order to do so, override:

  • the fit method that defines how (if needed) the explainer should be fited
  • the feature_importance method that extracts the relative importance of each feature on a dataset globally
  • the explain_local method that extracts the relative impact of each feature on the final decisionfor every sample in a dataset
explain_filtered_local(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], cols: List[str], n_cols: Optional[int] = None) → List[Dict[str, float]][source]

same as explain_local but applying a filter on each explanation on the features

explain_local(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → List[Dict[str, float]][source]

explains each individual predictions made on x_explain. BEWARE this is usually quite slow on large datasets

Parameters:
  • x_explain – the samples to explain
  • n_cols – the number of columns to limit the explanation to
feature_importance(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → Dict[str, float][source]

returns a relative importance of each feature on the predictions of the model (the explainer was fitted on) for x_explain globally. The output will be a dict with the importance for each column/feature in x_explain (limited to n_cols)

if some importance are negative, this means they are negatively correlated with the output and absolute value represents the relative importance

Parameters:
  • x_explain – the dataset to explain on
  • n_cols – the maximum number of features to return (ordered by importance)
filtered_feature_importance(x_explain: pandas.core.frame.DataFrame, cols: Optional[List[str]], n_cols: Optional[int] = None) → Dict[str, float][source]

same as feature_importance but applying a filter first (on the name of the column)

fit(model: sklearn.base.BaseEstimator, x_train: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], y_train: pandas.core.frame.DataFrame)[source]

prepares the explainer by saving all the information it needs and fitting necessary models

Parameters:
  • model – the TRAINED model the explainer will need to shed light on
  • x_train – the dataset the model was trained on originally
  • y_train – the target the model was trained on originally
graph_feature_importance(x_explain: pandas.core.frame.DataFrame, cols: Optional[List[str]] = None, n_cols: Optional[int] = None, irrelevant_cols: Optional[List[str]] = None)[source]
graph_local_explanation(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], cols: Optional[List[str]] = None, n_cols: Optional[int] = None, info_values: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, None] = None) → plotly.graph_objs._figure.Figure[source]

creates a waterfall plotly figure to represent the influance of each feature on the final decision for a single prediction of the model.

You can filter the columns you want to see in your graph and limit the final number of columns you want to see. If you choose to do so the filter will be applied first and of those filtered columns at most n_cols will be kept

Parameters:
  • x_explain – the example of the model this must be a dataframe with a single ow
  • cols – the columns to keep if you want to filter (if None - default) all the columns will be kept
  • n_cols – the number of columns to limit the graph to. (if None - default) all the columns will be kept
Raises:

ValueError – if x_explain doesn’t have the right shape

trelawney.colors module

trelawney.lime_explainer module

class trelawney.lime_explainer.LimeExplainer(class_names: Optional[List[str]] = None, categorical_features: Optional[List[str]] = None)[source]

Bases: trelawney.base_explainer.BaseExplainer

Lime stands for local interpretable model-agnostic explanations and is a package based on this article. Lime will explain a single prediction of you model by crechariotsating a local approximation of your model around said prediction.’sphinx.ext.autodoc’, ‘sphinx.ext.viewcode’]

>>> X = pd.DataFrame([np.array(range(100)), np.random.normal(size=100).tolist()], index=['real', 'fake']).T
>>> y = np.array(range(100)) > 50
>>> # training the base model
>>> model = LogisticRegression().fit(X, y)
>>> # creating and fiting the explainer
>>> explainer = LimeExplainer()
>>> explainer.fit(model, X, y)
<trelawney.lime_explainer.LimeExplainer object at ...>
>>> # explaining observation
>>> explanation =  explainer.explain_local(pd.DataFrame([[5, 0.1]]))[0]
>>> abs(explanation['real']) > abs(explanation['fake'])
True
explain_local(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → List[Dict[str, float]][source]

explains each individual predictions made on x_explain. BEWARE this is usually quite slow on large datasets

Parameters:
  • x_explain – the samples to explain
  • n_cols – the number of columns to limit the explanation to
feature_importance(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → Dict[str, float][source]

returns a relative importance of each feature on the predictions of the model (the explainer was fitted on) for x_explain globally. The output will be a dict with the importance for each column/feature in x_explain (limited to n_cols)

if some importance are negative, this means they are negatively correlated with the output and absolute value represents the relative importance

Parameters:
  • x_explain – the dataset to explain on
  • n_cols – the maximum number of features to return (ordered by importance)
fit(model: sklearn.base.BaseEstimator, x_train: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], y_train: pandas.core.frame.DataFrame)[source]

prepares the explainer by saving all the information it needs and fitting necessary models

Parameters:
  • model – the TRAINED model the explainer will need to shed light on
  • x_train – the dataset the model was trained on originally
  • y_train – the target the model was trained on originally

trelawney.logreg_explainer module

Module that provides the LogRegExplainer class base on the BaseExplainer class

class trelawney.logreg_explainer.LogRegExplainer(class_names: Optional[List[str]] = None, categorical_features: Optional[List[str]] = None)[source]

Bases: trelawney.base_explainer.BaseExplainer

The LogRegExplainer class is composed of 3 methods: - fit: get the right model - feature_importance (global interpretation) - graph_odds_ratio (visualisation of the ranking of the features, based on their odds ratio)

explain_local(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → List[Dict[str, float]][source]

returns local relative importance of features for a specific observation. :param x_explain: the dataset to explain on :param n_cols: the maximum number of features to return

feature_importance(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → Dict[str, float][source]

returns the absolute value (i.e. magnitude) of the coefficient of each feature as a dict. :param x_explain: the dataset to explain on :param n_cols: the maximum number of features to return

fit(model: sklearn.base.BaseEstimator, x_train: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], y_train: pandas.core.frame.DataFrame)[source]

prepares the explainer by saving all the information it needs and fitting necessary models

Parameters:
  • model – the TRAINED model the explainer will need to shed light on
  • x_train – the dataset the model was trained on originally
  • y_train – the target the model was trained on originally
graph_odds_ratio(n_cols: Optional[int] = 10, ascending: bool = False, irrelevant_cols: Optional[List[str]] = None) → pandas.core.frame.DataFrame[source]

returns a plot of the top k features, based on the magnitude of their odds ratio. :n_cols: number of features to plot :ascending: order of the ranking of the magnitude of the coefficients

trelawney.shap_explainer module

trelawney.surrogate_explainer module

class trelawney.surrogate_explainer.SurrogateExplainer(surrogate_model: sklearn.base.BaseEstimator, class_names: Optional[List[str]] = None)[source]

Bases: trelawney.base_explainer.BaseExplainer

A surrogate model is a substitution model used to explain the initial model. Therefore, substitution models are generally simpler than the initial ones. Here, we use single trees and logistic regressions as surrogates.

adequation_score(metric: Union[Callable[[numpy.ndarray, numpy.ndarray], float], str] = 'auto')[source]

returns an adequation score between the output of the surrogate and the output of the initial model based on the x_train set given.

explain_local(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → List[Dict[str, float]][source]

returns local relative importance of features for a specific observation. :param x_explain: the dataset to explain on :param n_cols: the maximum number of features to return

feature_importance(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → Dict[str, float][source]

returns a relative importance of each feature globally as a dict. :param x_explain: the dataset to explain on :param n_cols: the maximum number of features to return

fit(model: sklearn.base.BaseEstimator, x_train: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], y_train: pandas.core.frame.DataFrame)[source]

prepares the explainer by saving all the information it needs and fitting necessary models

Parameters:
  • model – the TRAINED model the explainer will need to shed light on
  • x_train – the dataset the model was trained on originally
  • y_train – the target the model was trained on originally
plot_tree(out_path: str = './tree_viz')[source]

returns the colored plot of the decision tree and saves an Image in the wd.

trelawney.tree_explainer module

Module that provides the TreeExplainer class base on the Baseexplainer class

class trelawney.tree_explainer.TreeExplainer(class_names: Optional[List[str]] = None)[source]

Bases: trelawney.base_explainer.BaseExplainer

The TreeExplainer class is composed of 4 methods: - fit: get the right model - feature_importance (global interpretation) - explain_local (local interpretation, WIP) - plot_tree (full tree visualisation)

explain_local(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → List[Dict[str, float]][source]

returns local relative importance of features for a specific observation. :param x_explain: the dataset to explain on :param n_cols: the maximum number of features to return

feature_importance(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → Dict[str, float][source]

returns a relative importance of each feature globally as a dict. :param x_explain: the dataset to explain on :param n_cols: the maximum number of features to return

fit(model: sklearn.base.BaseEstimator, x_train: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], y_train: pandas.core.frame.DataFrame)[source]

prepares the explainer by saving all the information it needs and fitting necessary models

Parameters:
  • model – the TRAINED model the explainer will need to shed light on
  • x_train – the dataset the model was trained on originally
  • y_train – the target the model was trained on originally
plot_tree(out_path: str = './tree_viz')[source]

creates a png file of the tree saved in out_path

Parameters:out_path – the path to save the png representation of the tree to

trelawney.trelawney module

Module contents

Top-level package for trelawney.