Discrimination
The module provides methods to perform discrimination analytics. It provides a table containing the Accuracy Ratio (AR), the left (lower) bound of the AR confidence interval, the right (upper) bound of the AR confidence interval, the significance level, the number of observations, and the number of observed defaults. It also provides a Receiver Operating Characteristic (ROC) curve plot.
Accessor
Initialise the DataFrame with the discrimination method. Minimal working example:
df.crm.discrimination(default="DEFAULT", score="SCORE")
Parameters:
Name | Type | Description | Default |
---|---|---|---|
default
|
str
|
Defines the default column, typically containing binary values. |
required |
score
|
str
|
Defines the score column, typically containing Probabilities of Default (PDs) in absolute values. Grades can be transformed to scores, for example, PDs via the module "general.py" if necessary. |
required |
alpha
|
float
|
Defines the statistical significance in absolute terms, i.e. a significant level of 10% should be entered as 0.1. |
None
|
by
|
str
|
Defines the grouper to be applied to the DataFrame, for example, "DATE". |
None
|
Returns:
Type | Description |
---|---|
Discrimination
|
Returns a class called "Discrimination" providing discrimination analytics methods. |
Methods
plot(x_axis_label='1 - Specificity / Percentage of Ratings', y_axis_label='Sensitivity / Percentage of Defaults', legend_label=None, score_2=None, legend_label_2=None, legend_loc='lower right', fig_size=(7.5, 7.5), path=None, show=False)
Minimal working example:
df.crm.discrimination(default="DEFAULT", score="SCORE").plot()
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x_axis_label
|
str
|
Defines the x-axis label. |
'1 - Specificity / Percentage of Ratings'
|
y_axis_label
|
str
|
Defines the y-axis label. |
'Sensitivity / Percentage of Defaults'
|
legend_label
|
Union[str, list]
|
Defines the legend label. |
None
|
score_2
|
str
|
Defines a second score column. |
None
|
legend_label_2
|
Union[str, list]
|
Defines the legend label of the second score column. |
None
|
legend_loc
|
str
|
Defines the legend location. |
'lower right'
|
fig_size
|
tuple
|
Defines the figure size. |
(7.5, 7.5)
|
path
|
str
|
Defines the saving path including the filename of the figure and should be entered as r"C:\<path>\<filename>.<image_format>". |
None
|
show
|
bool
|
If True, plot is shown. |
False
|
Returns:
Type | Description |
---|---|
BytesIO
|
Returns the ROC curve. |
table()
Minimal working example:
df.crm.discrimination(default="DEFAULT", score="SCORE").table()
Returns:
Type | Description |
---|---|
DataFrame
|
Returns a table containing the AR, the left (lower) bound of the AR confidence interval, the right (upper) bound of the AR confidence interval, the significance level, the number of observations, and the number of observed defaults. |
Examples
>>> import credit_risk_modelling as crm
>>> data = crm.load_data.load_data()
>>> data
DATE ID GRADE GRADE_PD OVERRIDE OVERRIDE_PD DEFAULT
0 2019-12-31 10 B 0.1000 B 0.1000 0
1 2019-12-31 100 BBB 0.0090 BB 0.0400 0
2 2019-12-31 1000 BBB 0.0090 BBB 0.0090 0
3 2019-12-31 1001 BBB 0.0090 BBB 0.0090 0
4 2019-12-31 1003 BBB 0.0090 BBB 0.0090 0
... ... ... ... ... ... ... ...
4145 2023-12-31 994 AA 0.0010 AA 0.0010 0
4146 2023-12-31 995 AA 0.0010 AA 0.0010 0
4147 2023-12-31 996 A 0.0020 A 0.0020 0
4148 2023-12-31 998 B 0.1000 B 0.1000 0
4149 2023-12-31 999 AAA 0.0002 AAA 0.0002 0
[4150 rows x 7 columns]
>>> (
>>> data
>>> .crm.discrimination(default="DEFAULT", score="GRADE_PD", alpha=0.1, by="DATE")
>>> .table()
>>> )
BY OBS DEFAULT_OBS AR AR_BOUND_LOWER AR_BOUND_UPPER ALPHA
0 2019-12-31 750 131 0.941644 0.907460 0.975829 0.1
1 2020-12-31 800 170 0.905444 0.867548 0.943339 0.1
2 2021-12-31 700 150 0.929358 0.894257 0.964458 0.1
3 2022-12-31 900 205 0.932367 0.902949 0.961785 0.1
4 2023-12-31 1000 236 0.921183 0.891655 0.950711 0.1
>>> (
>>> data
>>> .crm.discrimination(default="DEFAULT", score="GRADE_PD", alpha=0.1, by="DATE")
>>> .plot(show=True)
>>> )