API¶
Data & Plotting¶
-
class
fairensics.data.decision_boundary.
DecisionBoundary
(colors=('k', 'c', 'm', 'b', 'g', 'r', 'y'), downsampler=PCA(copy=True, iterated_power='auto', n_components=2, random_state=None, svd_solver='auto', tol=0.0, whiten=False))[source]¶ Class for plotting decision boundaries against two axes.
The data may be down sampled to two dimensions before plotting. The decision boundary plots are generated using a mesh grid and the following procedure:
If necessary, the data is down-sampled to two dimensions
Min and maximum values for each axis are extracted
A mesh grid is created
If necessary, the mesh grid is up-sampled again
Predictions are made on the mesh grid
Predictions are plotted against the maybe down sampled axis
TODO: add option to scale data to [0,1]
-
__init__
(colors=('k', 'c', 'm', 'b', 'g', 'r', 'y'), downsampler=PCA(copy=True, iterated_power='auto', n_components=2, random_state=None, svd_solver='auto', tol=0.0, whiten=False))[source]¶ - Parameters
colors – iterator over possible colors for the decision boundaries
downsampler – function to down sample data points must implement ‘fit_transform’ and ‘inverse_transform’ methods
-
add_boundary
(dataset, clf, label='', only_unprotected=True, num_points=100, cmap=None)[source]¶ Adds decision boundary to the current plot.
If the data set is two dimensional, the boundary is directly plotted using a mesh grid. Otherwise, a mesh gird is generated on the down-sampled points and up-sampled again for prediction.
- Parameters
dataset (BinaryLabelDataset) – the labeled data set.
clf (object) – the classifier object (must implement a predict function).
label (str) – the label for the decision boundary.
only_unprotected (bool) – if true, the classifier only uses the unprotected attributes.
num_points (int) – number of points in mesh grid.
cmap (str) – colormap from matplotlib. If provided background of the plot is colored.
-
class
fairensics.data.synthetic_dataset.
SyntheticDataset
(n_samples=1000, label_name='label', feature_one_name='feature_1', feature_two_name='feature_2', favorable_label=1, unfavorable_label=0, protected_attribute_name='protected_attribute', privileged_class=1, unprivileged_class=0, sd=1122334455, mu_1=(2, 2), sigma_1=((5, 1), (1, 5)), mu_2=(-2, -2), sigma_2=((10, 1), (1, 3)), initial_discrimination=4.0)[source]¶ Synthetic data set with two features and one protected attribute.
The data set is randomly generated from two gaussians each time. Both protected attribute and label are binary and features are numerical.
-
__init__
(n_samples=1000, label_name='label', feature_one_name='feature_1', feature_two_name='feature_2', favorable_label=1, unfavorable_label=0, protected_attribute_name='protected_attribute', privileged_class=1, unprivileged_class=0, sd=1122334455, mu_1=(2, 2), sigma_1=((5, 1), (1, 5)), mu_2=(-2, -2), sigma_2=((10, 1), (1, 3)), initial_discrimination=4.0)[source]¶ - Parameters
n_samples (int) – the number of samples to generate
label_name (str) – name of the column storing the target variable
feature_one_name (str) – name of the first unprotected feature
feature_two_name (str) – name of the second unprotected feature
favorable_label (int) – label considered positive
unfavorable_label (int) – label considered negative
protected_attribute_name (str) – the name of the protected attribute
privileged_class (int) – class of protected attribute considered positive
unprivileged_class (int) – class of protected attribute considered negative
sd (int) – seed for random generator
sigma_1 ((float, float), (float, float)) – covariance of positive group cluster
sigma_2 ((float, float), (float, float)) – covariance of negative group cluster
initial_discrimination (float) – initial discrimination factor
-
Modeling¶
-
class
fairensics.methods.disparate_impact.
AccurateDisparateImpact
(loss_function='logreg', warn=True)[source]¶ Minimize loss subject to fairness constraints.
Loss “L” defines whether a logistic regression or a liner SVM is trained.
- Minimize
L(w)
- Subject to
cov(sensitive_attributes, true_labels, predictions) < sensitive_attrs_to_cov_thresh
- Where:
predictions: the distance to the decision boundary
-
__init__
(loss_function='logreg', warn=True)[source]¶ Args: loss_function (str): loss function string from utils.LossFunctions. warn (bool): if true, warnings are raised on certain bounds.
-
fit
(dataset, sensitive_attrs_to_cov_thresh=0, sensitive_attributes=None)[source]¶ Fit the model.
- Parameters
dataset – AIF360 data set
sensitive_attrs_to_cov_thresh (float or dict) – dictionary as returned by _get_cov_thresh_dict(). If a single float is passed the dict is generated using the _get_cov_thresh_dict() method.
sensitive_attributes (list(str)) – names of protected attributes to apply constraints to.
-
class
fairensics.methods.disparate_impact.
FairDisparateImpact
(loss_function='logreg', warn=True)[source]¶ Minimize disparate impact subject to accuracy constraints.
Loss “L” defines whether a logistic regression or a liner svm is trained.
- Minimize
cov(sensitive_attributes, predictions)
- Subject to
L(w) <= (1-gamma)L(w*)
- Where
L(w*): is the loss of the unconstrained classifier predictions: the distance to the decision boundary
-
__init__
(loss_function='logreg', warn=True)[source]¶ Args: loss_function (str): loss function string from utils.LossFunctions. warn (bool): if true, warnings are raised on certain bounds.
-
class
fairensics.methods.disparate_mistreatment.
DisparateMistreatment
(loss_function='logreg', constraint_type=None, take_initial_sol=True, warn=True, tau=0.005, mu=1.2, EPS=1e-06, max_iter=100, max_iter_dccp=50)[source]¶ Disparate mistreatment free classifier. Loss “L” defines whether a logistic regression or a liner svm is trained.
- Minimize
L(w)
- Subject to
cov(sensitive_attributes, predictions) < sensitive_attrs_to_cov_thresh
- Where
predictions: the distance to the decision boundary
Example
-
__init__
(loss_function='logreg', constraint_type=None, take_initial_sol=True, warn=True, tau=0.005, mu=1.2, EPS=1e-06, max_iter=100, max_iter_dccp=50)[source]¶
-
class
fairensics.methods.preferential_fairness.
PreferentialFairness
(loss_function='logreg', constraint_type=None, train_multiple=False, lam=None, warn=True, tau=0.5, mu=1.2, EPS=0.0001, max_iter=100, max_iter_dccp=50)[source]¶ Train separate classifier clf_z for each group of protected attribute z. Loss “L” defines whether a logistic regression or a liner svm is trained.
- Minimize
L(w)
- Subject to
sum(predictions_z) > sum(predictions_z’)
- Where
predictions_z are the predictions using group zs classifier clf_z predictions_z’ are the predictions using group z’s classifier clf_z’
Example
-
__init__
(loss_function='logreg', constraint_type=None, train_multiple=False, lam=None, warn=True, tau=0.5, mu=1.2, EPS=0.0001, max_iter=100, max_iter_dccp=50)[source]¶ - Parameters
loss_function (str) – name of loss function defined in utils.
constraint_type (str) – one of the values in _CONS_TYPE.
train_multiple (bool) – if true, a classifier for each group of protected attribute is trained
lam (dict, optional) –
…
warn (bool) – if true, warnings are raised on certain bounds.
mu, EPS, max_iter, max_iter_dccp (tau,) – solver related parameters.
-
class
fairensics.methods.fairness_warnings.
FairnessBoundsWarning
(raw_dataset, predicted_dataset, privileged_groups=None, unprivileged_groups=None)[source]¶ Raise warnings if classifier misses specified fairness bounds.
Bounds are checked using AIF360s classification metric if the specified bound is not None.
-
DISPARATE_IMPACT_RATIO_BOUND
= 0.8¶
-
EO_DIFFERENCE_BOUND
= 0.1¶
-
ERROR_DIFFERENCE_BOUND
= None¶
-
ERROR_RATIO_BOUND
= 0.8¶
-
FNR_DIFFERENCE_BOUND
= None¶
-
FNR_RATIO_BOUND
= 0.8¶
-
FPR_DIFFERENCE_BOUND
= None¶
-
FPR_RATIO_BOUND
= 0.8¶
-
__init__
(raw_dataset, predicted_dataset, privileged_groups=None, unprivileged_groups=None)[source]¶ - Parameters
raw_dataset (BinaryLabelDataset) – Dataset with ground-truth labels.
predicted_dataset (BinaryLabelDataset) – Dataset after predictions.
privileged_groups (list(dict)) – Privileged groups. Format is a list of dicts where the keys are protected_attribute_names and the values are values in protected_attributes. Each dict element describes a single group.
unprivileged_groups (list(dict)) – Unprivileged groups. Same format as privileged_groups.
-
-
class
fairensics.methods.fairness_warnings.
DataSetSkewedWarning
(dataset)[source]¶ Raise warning if dataset is skewed with respect to protected attributes.
Checks are only executed, if the specified bounds are not None.
-
CLASS_LABEL_FRACTION
= 0.4¶
-
POSITIVE_NEGATIVE_CLASS_FRACTION
= 0.4¶
-
POSITIVE_NEGATIVE_LABEL_FRACTION
= 0.4¶
-
-
class
fairensics.methods.utils.
LossFunctions
[source]¶ Loss functions for fair-classification.
This class stores implementations of loss functions used in fair-classification. The functions can be accessed using the get_loss_function() methods passing loss function names either as numpy or cvxpy implementation.
-
LOSS_NAMES
= ['logreg', 'logreg_l1', 'logreg_l2', 'svm_linear']¶
-
NAME_LOG_REG
= 'logreg'¶
-
NAME_LOG_REG_L1
= 'logreg_l1'¶
-
NAME_LOG_REG_L2
= 'logreg_l2'¶
-
NAME_SVM_LOSS
= 'svm_linear'¶
-
static
cvxpy_hinge_loss
(w, X, y, num_points=None)[source]¶ CVXPY implementation of hinge loss.
- Parameters
w (np.ndarray) – 1D, the weight matrix with shape (n_features,).
X (np.ndarray) – 2D, the features with shape (n_samples, n_features)
y (np.ndarray) – 1D, the true labels with shape (n_samples,).
num_points (int) – number of points in X (corresponds to the first dimension of X “n”, but some methods pass a different value for scaling).
- Returns
the loss.
- Return type
(float)
-
static
cvxpy_logistic_loss
(w, X, y, num_points=None)[source]¶ CVXPY implementation of logistic loss.
- Parameters
w (np.ndarray) – 1D, the weight matrix with shape (n_features,).
X (np.ndarray) – 2D, the features with shape (n_samples, n_features)
y (np.ndarray) – 1D, the true labels with shape (n_samples,).
num_points (int) – number of points in X (first dimension of X “n_samples”, but some methods pass a different value for scaling).
- Returns
the loss.
- Return type
(float)
-
static
cvxpy_logistic_loss_l1
(w, X, y, lam=None, num_points=None)[source]¶ CVXPY implementation of L1 regularized logistic loss.
- Parameters
w (np.ndarray) – 1D, the weight matrix with shape (n_features,).
X (np.ndarray) – 2D, the features with shape (n_samples, n_features)
y (np.ndarray) – 1D, the true labels with shape (n_samples,).
lam (float) – regularization parameter.
num_points (int) – number of points in X (corresponds to the first dimension of X “n”, but some methods pass a different value for scaling).
- Returns
the loss.
- Return type
(float)
-
static
cvxpy_logistic_loss_l2
(w, X, y, lam=None, num_points=None)[source]¶ CVXPY implementation of L2 regularized logistic loss.
- Parameters
w (np.ndarray) – 1D, the weight matrix with shape (n_features,).
X (np.ndarray) – 2D, the features with shape (n_samples, n_features)
y (np.ndarray) – 1D, the true labels with shape (n_samples,).
lam (float) – regularization parameter.
num_points (int) – number of points in X (corresponds to the first dimension of X “n”, but some methods pass a different value for scaling).
- Returns
the loss.
- Return type
(float)
-
static
hinge_loss
(w, X, y)[source]¶ Numpy implementation of hinge loss.
- Parameters
w (np.ndarray) – 1D, the weight matrix with shape (n_features,).
X (np.ndarray) – 2D, the features with shape (n_samples, n_features)
y (np.ndarray) – 1D, the true labels with shape (n_samples,).
- Returns
the loss.
- Return type
(float)
-
static
log_logistic
(X)[source]¶ Log_logistic from scikit-learn source code. Source link below.
Compute the log of the logistic function,
log(1 / (1 + e ** -x))
. Source code at: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/extmath.py- Parameters
X (array-like) – shape (M, N) Argument to the logistic function
- Returns
- shape (M, N) Log of the logistic function at
every point in x
- Return type
out (np.ndarray)
-
static
logistic_loss
(w, X, y, return_arr=False)[source]¶ Numpy implementation of logistic loss.
This function is used from scikit-learn source code
- Parameters
w (np.ndarray) – 1D, the weight matrix with shape (n_features,).
X (np.ndarray) – 2D, the features with shape (n_samples, n_features)
y (np.ndarray) – 1D, the true labels with shape (n_samples,).
return_arr (bool) – if true, an array is returned otherwise the sum of the array
- Returns
the loss.
- Return type
-
static
logistic_loss_l1_reg
(w, X, y, lam=None)[source]¶ Numpy implementation of L1 regularized logistic loss.
-
-
fairensics.methods.utils.
get_one_hot_encoding
(arr)[source]¶ Returns one hot encoding of array arr.
- Parameters
arr (np.ndarray) – 1D array with int values.
- Returns
Tuple consisting of out_arr (np.ndarray) one-hot encoded matrix and index_dict (dict) dictionary original_val -> column in encoded matrix.