Other

cfg.py

This module contains some default configurations that can be adjusted according to own preferences. It contains five sets of constants:

Grades,
Probabilities of Default (PDs),
Colours,
Date formats, and
Column names.

load_data.py

The module provides dummy data to be used for examples or testing.

`load_data()`

Load dummy data to be used for examples or testing. Minimal working example:

df = crm.load_data.load_data()

Returns:

Type	Description
`DataFrame`	Returns dummy data to be used for examples or testing.

cleaning.py

The module contains several functions to clean a pandas DataFrame. In particular:

"change_column_names": Change column names. Column names can be changed to all upper case (default), lower case, or left unchanged.
"drop_duplicates": Drop duplicates. Duplicates can be dropped (default) or kept.
"reset_index": Reset index.
"strip_strings": Strip strings of column names and object column rows.
"transform_missing_strings_to_nan": Transform missing strings in object column rows to Numpy's np.nan. Missing strings are defined as "", "nan", and "None".

The functions can be simultaneously applied with the succinct "clean" wrapper function. Minimal working example:

df = crm.clean(df=df)

`change_column_names(df, style='upper')`

Change column names. Column names can be changed to all upper case (default), lower case, or left unchanged. Minimal working example:

df = crm.change_column_names(df=df)

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Defines the pandas DataFrame.	required
`style`	`str`	Define the style of change of the column names. It can be either "upper", "lower", or "unchanged".	`'upper'`

Returns:

Type	Description
`DataFrame`	Returns the cleaned DataFrame.

`clean(df, style='upper', drop=True)`

Clean pandas DataFrame. The function does the following:

Strips strings of column names and object column rows.
Transforms missing strings in object column rows to Numpy's np.nan. Missing strings are defined as "", "nan", and "None".
Changes column names. Column names can be changed to all upper case (default), lower case, or left unchanged.
Drops duplicates. Duplicates can be dropped (default) or kept.
Resets the index.

Minimal working example:

df = crm.clean(df=df)

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Defines the pandas DataFrame.	required
`style`	`str`	Define the style of change of the column names. It can be either "upper", "lower", or "unchanged".	`'upper'`
`drop`	`bool`	Define whether to drop ("True") or keep ("False") duplicates.	`True`

Returns:

Type	Description
`DataFrame`	Returns the cleaned DataFrame.

`drop_duplicates(df, drop=True)`

Drop duplicates. Duplicates can be dropped (default) or kept. Minimal working example:

df = crm.drop_duplicates(df=df)

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Defines the pandas DataFrame.	required
`drop`	`bool`	Define whether to drop ("True") or keep ("False") duplicates.	`True`

Returns:

Type	Description
`DataFrame`	Returns the cleaned DataFrame.

`reset_index(df)`

Reset index. Minimal working example:

df = crm.reset_index(df=df)

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Defines the pandas DataFrame.	required

Returns:

Type	Description
`DataFrame`	Returns the cleaned DataFrame.

`strip_strings(df)`

Strip strings of column names and object column rows. Minimal working example:

df = crm.strip_strings(df=df)

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Defines the pandas DataFrame.	required

Returns:

Type	Description
`DataFrame`	Returns the cleaned DataFrame.

`transform_missing_strings_to_nan(df)`

Transform missing strings in object column rows to Numpy's np.nan. Missing strings are defined as "", "nan", and "None". Minimal working example:

df = crm.transform_missing_strings_to_nan(df=df)

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Defines the pandas DataFrame.	required

Returns:

Type	Description
`DataFrame`	Returns the cleaned DataFrame.

docx.py

The module defines a class "WordDocument" that can be used to initialise a Word template, add content, and save it. See docxtpl.readthedocs for more information.

Examples

word data

>>> import os
>>> from docxtpl import InlineImage
>>> from docx.shared import Mm
>>> import credit_risk_modelling as crm
>>> data
  GRADE  2022-12-31_ABS  2022-12-31_PCT  2023-12-31_ABS  2023-12-31_PCT
0   AAA            70.0        0.077778            85.0           0.085
1    AA            85.0        0.094444           114.0           0.114
2     A           118.0        0.131111           128.0           0.128
3   BBB           133.0        0.147778           130.0           0.130
4    BB           148.0        0.164444           159.0           0.159
5     B           119.0        0.132222           138.0           0.138
6   CCC           108.0        0.120000           117.0           0.117
7     D           119.0        0.132222           129.0           0.129
8   Sum           900.0        1.000000          1000.0           1.000

word document

>>> tbl = (
>>>     data
>>>     .set_index("GRADE")
>>>     .crm.formatting().format_cols(
>>>         int_cols=["2022-12-31_ABS", "2023-12-31_ABS"],
>>>         pct_cols=["2022-12-31_PCT", "2023-12-31_PCT"]
>>>     )
>>>     .set_axis(["# 2022-12-31", "% 2022-12-31", "# 2023-12-31", "% 2023-12-31"], axis=1)
>>>     .crm.formatting().df_to_tbl(label="Grade")
>>> )
>>> fig = (
>>>     data[data.iloc[:, 0] != "Sum"]
>>>     .crm.plotting().plot_bar(
>>>         x="GRADE",
>>>         y=["2022-12-31_PCT", "2023-12-31_PCT"],
>>>         x_axis_label="Grade",
>>>         legend_label=["2022-12-31", "2023-12-31"],
>>>     )
>>> )
>>> document = crm.docx.WordDocument(path_word_in=os.path.join(PATH_WORD, r"word_in.docx"))
>>> document.add(dict_item={"date_start": "31 December 2022"})
>>> document.add(dict_item={"date_end": "31 December 2023"})
>>> document.add(dict_item={"tbl": tbl})
>>> document.add(dict_item={"fig": InlineImage(document.doc, fig, width=Mm(150))})
>>> document.save(path_word_out=os.path.join(PATH_WORD, r"word_out.docx"))

Below is an example of a Word template.

Word in

Given the data in word data and the code word document above, the Word template would be rendered to the following output:

Word out

general.py

The module contains several general functions that can be used for subsequent analyses.

`grade_difference(grade_1, grade_2)`

Calculate the difference between two rating grades as grade 1 - grade 2 as integer. Rating grades can be configured in module "cfg.py". Minimal working example:

df = df.assign(GRADE_DIFFERENCE=df.apply(lambda row: crm.general.grade_difference(grade_1=row["GRADE_1"], grade_2=row["GRADE_2"]), axis=1))

Parameters:

Name	Type	Description	Default
`grade_1`	`str`	Defines the first rating grade.	required
`grade_2`	`str`	Defines the second rating grade.	required

Returns:

Type	Description
`Union[int, None]`	Returns the difference between two rating grades as grade 1 - grade 2 as integer.

`grade_to_index(grade)`

Transform rating grade to index (starting from 0). Rating grades can be configured in module "cfg.py". Minimal working example:

df = df.assign(INDEX=df["GRADE"].apply(lambda x: crm.general.grade_to_index(grade=x)))

Parameters:

Name	Type	Description	Default
`grade`	`str`	Defines the rating grade.	required

Returns:

Type	Description
`Union[int, None]`	Returns the index of the rating grade (starting from 0).

`grade_to_pd(grade)`

Transform rating grade to Probability of Default (PD). Rating grades and PDs can be configured in module "cfg.py". Minimal working example:

df = df.assign(PD=df["GRADE"].apply(lambda x: crm.general.grade_to_pd(grade=x)))

Parameters:

Name	Type	Description	Default
`grade`	`str`	Defines the rating grade.	required

Returns:

Type	Description
`Union[float, None]`	Returns the PD of the rating grade.

`index_to_grade(index)`

Transform index (starting from 0) to rating grade. Rating grades can be configured in module "cfg.py". Minimal working example:

df = df.assign(GRADE=df["INDEX"].apply(lambda x: crm.general.index_to_grade(index=x)))

Parameters:

Name	Type	Description	Default
`index`	`int`	Defines the index (starting from 0).	required

Returns:

Type	Description
`Union[str, None]`	Returns the rating grade.

`logit_pd_to_pd(logit_pd)`

Transform logit Probability of Default (PD) to PD. PDs can be configured in module "cfg.py". Minimal working example:

df = df.assign(PD=df["LOGIT_PD"].apply(lambda x: crm.general.logit_pd_to_pd(logit_pd=x)))

Parameters:

Name	Type	Description	Default
`logit_pd`	`float`	Defines the logit PD.	required

Returns:

Type	Description
`Union[float, None]`	Returns the PD.

`pd_to_grade(pd)`

Transform Probability of Default (PD) to rating grade based on minimum PD threshold. Rating grades and PDs can be configured in module "cfg.py". Minimal working example:

df = df.assign(GRADE=df["PD"].apply(lambda x: crm.general.pd_to_grade(pd=x)))

Parameters:

Name	Type	Description	Default
`pd`	`float`	Defines the PD.	required

Returns:

Type	Description
`Union[str, None]`	Returns the rating grade.

`pd_to_logit_pd(pd)`

Transform Probability of Default (PD) to logit PD. PDs can be configured in module "cfg.py". Minimal working example:

df = df.assign(LOGIT_PD=df["PD"].apply(lambda x: crm.general.pd_to_logit_pd(pd=x)))

Parameters:

Name	Type	Description	Default
`pd`	`float`	Defines the PD.	required

Returns:

Type	Description
`Union[float, None]`	Returns the logit PD.

`set_pandas_options()`

Set the pandas display width to "320" and the display maximum columns to "None". Minimal working example:

crm.set_pandas_options()

Returns:

Type	Description
`None`	Set the pandas display width to "320" and the display maximum columns to "None".