Frequency

The module provides methods to perform frequency analytics. In general, the module is based on the crosstab module by pandas and hence borrows the parameter names index, column, and value accordingly. The module can create four different frequency tables depending on the exact parameter input.

Accessor

Initialise the DataFrame with the frequency method. Minimal working example:

df.crm.frequency(index="GRADE")

The module can create four different frequency tables depending on the exact parameter input:

df.crm.frequency(index="GRADE"): frequency table per index row.
df.crm.frequency(index="GRADE", column="DATE"): frequency table per index row and column column.
df.crm.frequency(index="GRADE", column="DATE", cohort="COHORT"): frequency table per index row and column column and cohort adjustment.
df.crm.frequency(index="GRADE", column="DATE", value="EXPOSURE"): actual crosstab module by pandas per index row, column column, and value values with 'aggfunc="sum"'.

In case the index is categorical and should include "NaN"s, the "NaN"s have to be explicitly added, for example,

bins = [-np.inf, 0, 10, np.inf]
labels = ["0", "0-10", ">10"]
df["EXPOSURE_NEW"] = pd.cut(df["EXPOSURE_OLD"], bins=bins, labels=labels, right=False).values.add_categories("NaN")
df["EXPOSURE_NEW"] = df["EXPOSURE_NEW"].apply(lambda x: x if x in labels else "NaN")

Parameters:

Name	Type	Description	Default
`index`	`str`	Defines the index column in accordance to the crosstab module by pandas, for example, "GRADE".	required
`column`	`str`	Defines the column column in accordance to the crosstab module by pandas, for example, "DATE".	`None`
`value`	`str`	Defines the value column in accordance to the crosstab module by pandas, for example, "EXPOSURE".	`None`
`cohort`	`str`	Defines the cohort identifier, for example, "COHORT".	`None`

Returns:

Type	Description
`Frequency`	Returns a class called "Frequency" providing frequency analytics methods.

Methods

`table(index_range=None, df_ext=None, sort_by_col=None, sort_by_list=None, sort_asc=True, add_sum=False)`

Minimal working example:

df.crm.frequency(index="GRADE").table()

Parameters:

Name	Type	Description	Default
`index_range`	`list`	Extends the index range. In case of grades, for example, the index can be extended for missing grades via, for example, "constants.GRADES" to get the complete range.	`None`
`df_ext`	`DataFrame`	Adds data (columns) to the resulting DataFrame. Hence, dimensions of df_ext need to be defined accordingly. Optimally, data in df_ext is already given in percentages.	`None`
`sort_by_col`	`str`	Defines the column to sort.	`None`
`sort_by_list`	`list`	Defines the list to sort in case of categorical items which have no intrinsic sorting order.	`None`
`sort_asc`	`bool`	Sorts the previously defined column or list ascending.	`True`
`add_sum`	`bool`	Adds a sum row to the DataFrame.	`False`

Returns:

Type	Description
`DataFrame`	Returns a frequency table. Generally, returns index per columns in absolute and percentage values.

Examples

data

>>> import credit_risk_modelling as crm
>>> data = crm.load_data.load_data()
>>> data

           DATE    ID GRADE  GRADE_PD OVERRIDE  OVERRIDE_PD  DEFAULT
0    2019-12-31    10     B    0.1000        B       0.1000        0
1    2019-12-31   100   BBB    0.0090       BB       0.0400        0
2    2019-12-31  1000   BBB    0.0090      BBB       0.0090        0
3    2019-12-31  1001   BBB    0.0090      BBB       0.0090        0
4    2019-12-31  1003   BBB    0.0090      BBB       0.0090        0
...         ...   ...   ...       ...      ...          ...      ...
4145 2023-12-31   994    AA    0.0010       AA       0.0010        0
4146 2023-12-31   995    AA    0.0010       AA       0.0010        0
4147 2023-12-31   996     A    0.0020        A       0.0020        0
4148 2023-12-31   998     B    0.1000        B       0.1000        0
4149 2023-12-31   999   AAA    0.0002      AAA       0.0002        0

[4150 rows x 7 columns]

.table() 1

>>> (
>>>     data
>>>     .crm.frequency(index="GRADE")
>>>     .table(add_sum=True)
>>> )

  GRADE  GRADE_ABS  GRADE_PCT
0     A      503.0   0.121205
1    AA      368.0   0.088675
2   AAA      273.0   0.065783
3     B      591.0   0.142410
4    BB      715.0   0.172289
5   BBB      735.0   0.177108
6   CCC      473.0   0.113976
7     D      492.0   0.118554
8   Sum     4150.0   1.000000

.table() 2

>>> (
>>>     data
>>>     .loc[lambda df: df["DATE"].dt.year.isin([2022, 2023])]
>>>     .crm.frequency(index="GRADE", column="DATE", cohort="ID")
>>>     .table(sort_by_list=crm.cfg.GRADES)
>>> )

  GRADE  2022-12-31_ABS  2022-12-31_PCT  2023-12-31_ABS  2023-12-31_PCT
0   AAA              63        0.076736              69        0.084044
1    AA              79        0.096224              95        0.115713
2     A             109        0.132765             106        0.129111
3   BBB             126        0.153471             106        0.129111
4    BB             135        0.164434             132        0.160780
5     B             107        0.130329             108        0.131547
6   CCC              94        0.114495              99        0.120585
7     D             108        0.131547             106        0.129111