Skip to content

Frequency

The module provides methods to perform frequency analytics. In general, the module is based on the crosstab module by pandas and hence borrows the parameter names index, column, and value accordingly. The module can create four different frequency tables depending on the exact parameter input.

Accessor

Initialise the DataFrame with the frequency method. Minimal working example:

df.crm.frequency(index="GRADE")

The module can create four different frequency tables depending on the exact parameter input:

  1. df.crm.frequency(index="GRADE"): frequency table per index row.
  2. df.crm.frequency(index="GRADE", column="DATE"): frequency table per index row and column column.
  3. df.crm.frequency(index="GRADE", column="DATE", cohort="COHORT"): frequency table per index row and column column and cohort adjustment.
  4. df.crm.frequency(index="GRADE", column="DATE", value="EXPOSURE"): actual crosstab module by pandas per index row, column column, and value values with 'aggfunc="sum"'.

In case the index is categorical and should include "NaN"s, the "NaN"s have to be explicitly added, for example,

bins = [-np.inf, 0, 10, np.inf]
labels = ["0", "0-10", ">10"]
df["EXPOSURE_NEW"] = pd.cut(df["EXPOSURE_OLD"], bins=bins, labels=labels, right=False).values.add_categories("NaN")
df["EXPOSURE_NEW"] = df["EXPOSURE_NEW"].apply(lambda x: x if x in labels else "NaN")

Parameters:

Name Type Description Default
index str

Defines the index column in accordance to the crosstab module by pandas, for example, "GRADE".

required
column str

Defines the column column in accordance to the crosstab module by pandas, for example, "DATE".

None
value str

Defines the value column in accordance to the crosstab module by pandas, for example, "EXPOSURE".

None
cohort str

Defines the cohort identifier, for example, "COHORT".

None

Returns:

Type Description
Frequency

Returns a class called "Frequency" providing frequency analytics methods.

Methods

table(index_range=None, df_ext=None, sort_by_col=None, sort_by_list=None, sort_asc=True, add_sum=False)

Minimal working example:

df.crm.frequency(index="GRADE").table()

Parameters:

Name Type Description Default
index_range list

Extends the index range. In case of grades, for example, the index can be extended for missing grades via, for example, "constants.GRADES" to get the complete range.

None
df_ext DataFrame

Adds data (columns) to the resulting DataFrame. Hence, dimensions of df_ext need to be defined accordingly. Optimally, data in df_ext is already given in percentages.

None
sort_by_col str

Defines the column to sort.

None
sort_by_list list

Defines the list to sort in case of categorical items which have no intrinsic sorting order.

None
sort_asc bool

Sorts the previously defined column or list ascending.

True
add_sum bool

Adds a sum row to the DataFrame.

False

Returns:

Type Description
DataFrame

Returns a frequency table. Generally, returns index per columns in absolute and percentage values.

Examples

data
>>> import credit_risk_modelling as crm
>>> data = crm.load_data.load_data()
>>> data

           DATE    ID GRADE  GRADE_PD OVERRIDE  OVERRIDE_PD  DEFAULT
0    2019-12-31    10     B    0.1000        B       0.1000        0
1    2019-12-31   100   BBB    0.0090       BB       0.0400        0
2    2019-12-31  1000   BBB    0.0090      BBB       0.0090        0
3    2019-12-31  1001   BBB    0.0090      BBB       0.0090        0
4    2019-12-31  1003   BBB    0.0090      BBB       0.0090        0
...         ...   ...   ...       ...      ...          ...      ...
4145 2023-12-31   994    AA    0.0010       AA       0.0010        0
4146 2023-12-31   995    AA    0.0010       AA       0.0010        0
4147 2023-12-31   996     A    0.0020        A       0.0020        0
4148 2023-12-31   998     B    0.1000        B       0.1000        0
4149 2023-12-31   999   AAA    0.0002      AAA       0.0002        0

[4150 rows x 7 columns]
.table() 1
>>> (
>>>     data
>>>     .crm.frequency(index="GRADE")
>>>     .table(add_sum=True)
>>> )

  GRADE  GRADE_ABS  GRADE_PCT
0     A      503.0   0.121205
1    AA      368.0   0.088675
2   AAA      273.0   0.065783
3     B      591.0   0.142410
4    BB      715.0   0.172289
5   BBB      735.0   0.177108
6   CCC      473.0   0.113976
7     D      492.0   0.118554
8   Sum     4150.0   1.000000
.table() 2
>>> (
>>>     data
>>>     .loc[lambda df: df["DATE"].dt.year.isin([2022, 2023])]
>>>     .crm.frequency(index="GRADE", column="DATE", cohort="ID")
>>>     .table(sort_by_list=crm.cfg.GRADES)
>>> )

  GRADE  2022-12-31_ABS  2022-12-31_PCT  2023-12-31_ABS  2023-12-31_PCT
0   AAA              63        0.076736              69        0.084044
1    AA              79        0.096224              95        0.115713
2     A             109        0.132765             106        0.129111
3   BBB             126        0.153471             106        0.129111
4    BB             135        0.164434             132        0.160780
5     B             107        0.130329             108        0.131547
6   CCC              94        0.114495              99        0.120585
7     D             108        0.131547             106        0.129111