Credit risk application probability of default (PD)

Home - Article - Credit risk application probability of default (PD)

Hello everybody – we do hope this article finds you well.

In the mean-time we have been keeping ourselves busy with the next business case we are more than happy to share with you – credit risk application PD.

You are financial/ fintech institution and you are in the business of granting loans. Distributing money means two important things for you:

  • Allocation of capital with certain expected ROI;
  • Risk exposure – the probability that our customers do not return the money granted. Sounds bad, right?

Well, in a perfect world we would have eliminated totally the risk and invest only in profitable applications. In the real world it’s all about the balance – being able to discriminate well between paying customers (good) and the ones that will default (bad)… and this is our case for this series.

Let’s lay down the set-up so that we have good foundation for our exercise – wording out the business case and creating the streamline to solve it (see the picture below):

Remember the experience-driven steps for tackling this problem from our first article?

If not, no worries – we will outline them again for you (feel free to check our Data Science Series Part 1 article here)

1) Understand (and bring to the surface) your pain-point and hence your business case:

Evaluate through-the-door population as part of our credit-risk acquisition decision engine; find the pivot point where having too many defaults break our profitability balance and integrate this into our strategy.

 2) Find the right data for this problem:

Data sources (hence availability) are to be the reflection of our business case at hand: Socio-demographics, product features (maturity, loan amount/ principal), external data sources (credit bureau information), behavioral data (cashflows, day-delay, etc.).

3) Exploratory analysis of your portfolio

Develop understanding of the portfolio through exploring different slices of visual information. Discuss potential segmentation and create expectations for the later stages of the modelling exercise.

4) Use advanced predictive analytics to model your pain-point: 

Using Machine Learning techniques to develop a credit applicant’s profile based on historical data (features) that is explicitly related to the event of default/ non default (target). This profile is to be: 1) rich in terms of drivers so that the true multi-dimensional self of the applicant is captured; 2) stable – we have a certainty that it is developed and validated in such a manner that it will persist without significant alterations at least in the near future)

5) Develop strategy and integrate it into the decision engine:

Based on the ML results and adding a financial dimension – realized cash flows converted into profit/ loss – develop cut-off strategy. This is our profit-driven approach for deciding to grant (accept) or not (reject) a loan to an applicant.

6) Bottom-line improvement over time (see picture below for what-if scenario).

We always create a model in such a way that it reflects the live environment in which it will be applied. See below an example of this model applied on previously unseen data (out-of-time data – sample time span not used for the model training).

You can see the distribution of the decision – Accept/ Reject based on the cut-off select, the point of your scoring scale where good/bad ratio turns into unprofitable (higher PD) and what do these customers cost you in terms of principal disbursed.

The rest of the benefits we have outlined in the picture itself as a summarization of our effort.

Note: selecting your cut-off PD is an exercise that gives you choise flexibility: it reflects the Good/Bad odds, profitability dimension and last, but not least – your own business view and future strategy perspectives.

Again, we cannot stress more the importance that results as well as the models are monitored, fine-tuned and every-so-often – redeveloped over time. The business conjuncture is constantly changing as well as we are so we need to evolve.

Well, developing PD models is nothing that hasn’t been done before but this advanced analytics technique is not as widely adopted in the business as it should be. Furthermore – you know us – we are always looking to go deeper, improve the business case and add a nice additional ML twist that boosts the overall solution. What we have done this time:

1) We have applied our in-house approach for integrating multi-level features in the model in order to improve accuracy and enhance customer profile (if you haven’t had the chance now is a good time that you check the article we wrote on that topic: )

2) We have in-house developed automatic approach for grouping (binning) numeric features based on eXtreme Gradient Boosting. This, along with accounting for the target imbalance has improved the performance of the model (measured by Gini) by the incredible 40 percent. 🙂

Now we do not want to bother you any further with technical details – maybe we can write a separate article on them. 

Yours honestly,


With this series our aim is to increase data science coverage and to make data-driven decisions an integral part of more companies around the Globe.


More to read