Other
cfg.py
This module contains some default configurations that can be adjusted according to own preferences. It contains five sets of constants:
- Grades,
- Probabilities of Default (PDs),
- Colours,
- Date formats, and
- Column names.
load_data.py
The module provides dummy data to be used for examples or testing.
load_data()
Load dummy data to be used for examples or testing. Minimal working example:
df = crm.load_data.load_data()
Returns:
Type | Description |
---|---|
DataFrame
|
Returns dummy data to be used for examples or testing. |
cleaning.py
The module contains several functions to clean a pandas DataFrame. In particular:
- "change_column_names": Change column names. Column names can be changed to all upper case (default), lower case, or left unchanged.
- "drop_duplicates": Drop duplicates. Duplicates can be dropped (default) or kept.
- "reset_index": Reset index.
- "strip_strings": Strip strings of column names and object column rows.
- "transform_missing_strings_to_nan": Transform missing strings in object column rows to Numpy's np.nan. Missing strings are defined as "", "nan", and "None".
The functions can be simultaneously applied with the succinct "clean" wrapper function. Minimal working example:
df = crm.clean(df=df)
change_column_names(df, style='upper')
Change column names. Column names can be changed to all upper case (default), lower case, or left unchanged. Minimal working example:
df = crm.change_column_names(df=df)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Defines the pandas DataFrame. |
required |
style
|
str
|
Define the style of change of the column names. It can be either "upper", "lower", or "unchanged". |
'upper'
|
Returns:
Type | Description |
---|---|
DataFrame
|
Returns the cleaned DataFrame. |
clean(df, style='upper', drop=True)
Clean pandas DataFrame. The function does the following:
- Strips strings of column names and object column rows.
- Transforms missing strings in object column rows to Numpy's np.nan. Missing strings are defined as "", "nan", and "None".
- Changes column names. Column names can be changed to all upper case (default), lower case, or left unchanged.
- Drops duplicates. Duplicates can be dropped (default) or kept.
- Resets the index.
Minimal working example:
df = crm.clean(df=df)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Defines the pandas DataFrame. |
required |
style
|
str
|
Define the style of change of the column names. It can be either "upper", "lower", or "unchanged". |
'upper'
|
drop
|
bool
|
Define whether to drop ("True") or keep ("False") duplicates. |
True
|
Returns:
Type | Description |
---|---|
DataFrame
|
Returns the cleaned DataFrame. |
drop_duplicates(df, drop=True)
Drop duplicates. Duplicates can be dropped (default) or kept. Minimal working example:
df = crm.drop_duplicates(df=df)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Defines the pandas DataFrame. |
required |
drop
|
bool
|
Define whether to drop ("True") or keep ("False") duplicates. |
True
|
Returns:
Type | Description |
---|---|
DataFrame
|
Returns the cleaned DataFrame. |
reset_index(df)
Reset index. Minimal working example:
df = crm.reset_index(df=df)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Defines the pandas DataFrame. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Returns the cleaned DataFrame. |
strip_strings(df)
Strip strings of column names and object column rows. Minimal working example:
df = crm.strip_strings(df=df)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Defines the pandas DataFrame. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Returns the cleaned DataFrame. |
transform_missing_strings_to_nan(df)
Transform missing strings in object column rows to Numpy's np.nan. Missing strings are defined as "", "nan", and "None". Minimal working example:
df = crm.transform_missing_strings_to_nan(df=df)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Defines the pandas DataFrame. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Returns the cleaned DataFrame. |
docx.py
The module defines a class "WordDocument" that can be used to initialise a Word template, add content, and save it. See docxtpl.readthedocs for more information.
Examples
>>> import os
>>> from docxtpl import InlineImage
>>> from docx.shared import Mm
>>> import credit_risk_modelling as crm
>>> data
GRADE 2022-12-31_ABS 2022-12-31_PCT 2023-12-31_ABS 2023-12-31_PCT
0 AAA 70.0 0.077778 85.0 0.085
1 AA 85.0 0.094444 114.0 0.114
2 A 118.0 0.131111 128.0 0.128
3 BBB 133.0 0.147778 130.0 0.130
4 BB 148.0 0.164444 159.0 0.159
5 B 119.0 0.132222 138.0 0.138
6 CCC 108.0 0.120000 117.0 0.117
7 D 119.0 0.132222 129.0 0.129
8 Sum 900.0 1.000000 1000.0 1.000
>>> tbl = (
>>> data
>>> .set_index("GRADE")
>>> .crm.formatting().format_cols(
>>> int_cols=["2022-12-31_ABS", "2023-12-31_ABS"],
>>> pct_cols=["2022-12-31_PCT", "2023-12-31_PCT"]
>>> )
>>> .set_axis(["# 2022-12-31", "% 2022-12-31", "# 2023-12-31", "% 2023-12-31"], axis=1)
>>> .crm.formatting().df_to_tbl(label="Grade")
>>> )
>>> fig = (
>>> data[data.iloc[:, 0] != "Sum"]
>>> .crm.plotting().plot_bar(
>>> x="GRADE",
>>> y=["2022-12-31_PCT", "2023-12-31_PCT"],
>>> x_axis_label="Grade",
>>> legend_label=["2022-12-31", "2023-12-31"],
>>> )
>>> )
>>> document = crm.docx.WordDocument(path_word_in=os.path.join(PATH_WORD, r"word_in.docx"))
>>> document.add(dict_item={"date_start": "31 December 2022"})
>>> document.add(dict_item={"date_end": "31 December 2023"})
>>> document.add(dict_item={"tbl": tbl})
>>> document.add(dict_item={"fig": InlineImage(document.doc, fig, width=Mm(150))})
>>> document.save(path_word_out=os.path.join(PATH_WORD, r"word_out.docx"))
Below is an example of a Word template.
Given the data in word data
and the code word document
above, the Word template would be rendered to the following
output:
general.py
The module contains several general functions that can be used for subsequent analyses.
grade_difference(grade_1, grade_2)
Calculate the difference between two rating grades as grade 1 - grade 2 as integer. Rating grades can be configured in module "cfg.py". Minimal working example:
df = df.assign(GRADE_DIFFERENCE=df.apply(lambda row: crm.general.grade_difference(grade_1=row["GRADE_1"], grade_2=row["GRADE_2"]), axis=1))
Parameters:
Name | Type | Description | Default |
---|---|---|---|
grade_1
|
str
|
Defines the first rating grade. |
required |
grade_2
|
str
|
Defines the second rating grade. |
required |
Returns:
Type | Description |
---|---|
Union[int, None]
|
Returns the difference between two rating grades as grade 1 - grade 2 as integer. |
grade_to_index(grade)
Transform rating grade to index (starting from 0). Rating grades can be configured in module "cfg.py". Minimal working example:
df = df.assign(INDEX=df["GRADE"].apply(lambda x: crm.general.grade_to_index(grade=x)))
Parameters:
Name | Type | Description | Default |
---|---|---|---|
grade
|
str
|
Defines the rating grade. |
required |
Returns:
Type | Description |
---|---|
Union[int, None]
|
Returns the index of the rating grade (starting from 0). |
grade_to_pd(grade)
Transform rating grade to Probability of Default (PD). Rating grades and PDs can be configured in module "cfg.py". Minimal working example:
df = df.assign(PD=df["GRADE"].apply(lambda x: crm.general.grade_to_pd(grade=x)))
Parameters:
Name | Type | Description | Default |
---|---|---|---|
grade
|
str
|
Defines the rating grade. |
required |
Returns:
Type | Description |
---|---|
Union[float, None]
|
Returns the PD of the rating grade. |
index_to_grade(index)
Transform index (starting from 0) to rating grade. Rating grades can be configured in module "cfg.py". Minimal working example:
df = df.assign(GRADE=df["INDEX"].apply(lambda x: crm.general.index_to_grade(index=x)))
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index
|
int
|
Defines the index (starting from 0). |
required |
Returns:
Type | Description |
---|---|
Union[str, None]
|
Returns the rating grade. |
logit_pd_to_pd(logit_pd)
Transform logit Probability of Default (PD) to PD. PDs can be configured in module "cfg.py". Minimal working example:
df = df.assign(PD=df["LOGIT_PD"].apply(lambda x: crm.general.logit_pd_to_pd(logit_pd=x)))
Parameters:
Name | Type | Description | Default |
---|---|---|---|
logit_pd
|
float
|
Defines the logit PD. |
required |
Returns:
Type | Description |
---|---|
Union[float, None]
|
Returns the PD. |
pd_to_grade(pd)
Transform Probability of Default (PD) to rating grade based on minimum PD threshold. Rating grades and PDs can be configured in module "cfg.py". Minimal working example:
df = df.assign(GRADE=df["PD"].apply(lambda x: crm.general.pd_to_grade(pd=x)))
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pd
|
float
|
Defines the PD. |
required |
Returns:
Type | Description |
---|---|
Union[str, None]
|
Returns the rating grade. |
pd_to_logit_pd(pd)
Transform Probability of Default (PD) to logit PD. PDs can be configured in module "cfg.py". Minimal working example:
df = df.assign(LOGIT_PD=df["PD"].apply(lambda x: crm.general.pd_to_logit_pd(pd=x)))
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pd
|
float
|
Defines the PD. |
required |
Returns:
Type | Description |
---|---|
Union[float, None]
|
Returns the logit PD. |
set_pandas_options()
Set the pandas display width to "320" and the display maximum columns to "None". Minimal working example:
crm.set_pandas_options()
Returns:
Type | Description |
---|---|
None
|
Set the pandas display width to "320" and the display maximum columns to "None". |