Lenders have a number of reasons to validate the ability of a forecasting or selection model to differentiate between different classes of applicants or leads. A model might attempt to differentiate between creditworthy (“Good”) and non-creditworthy (“Bad”) customers, or between mail prospects that have a high or low propensity to respond to an offer. Important reasons for validating models include:
- Compliance with regulatory requirements surrounding credit scoring
- Assurance that the model is performing satisfactorily at the time the model is developed as well as over time
- Assurance that individual model components remain functional and stable over time
The most common statistical measures used for validation of scoring models are the Kolmogorov-Smirnov (K-S) statistic and Divergence.
The K-S statistic is a value ranging from 0 to 100. A value of 0 indicates that there is no difference between two distributions, while a value of 100 indicates that there is no overlap (perfect separation) between the two distributions. A value of 0 for Divergence indicates no difference between the means of two distributions with positive values indicating that there is a difference between the means of two distributions.
Applicability
In comparing K-S with Divergence, it first should be noted that the two statistics for validating credit scoring models are not designed to measure the same differences between two populations. Divergence measures the differences between the location of two distributions (that is, the difference between the means of the distributions). K-S, on the other hand, measures differences in the location, scale and dispersion of two population distributions.
In addition to what it measures, the usefulness of a statistical measure depends upon the conditions that must be satisfied for the measure to be theoretically valid
From a business or decision- making point of view, it is important to know to what extent a given scoring model really separates “Good” customers from “Bad” customers. This knowledge is particularly important when comparing the effectiveness of one model against another, when determining how many models are required to predict the credit risk for a given portfolio, or when monitoring the change in predictive strength over time.
From a legal point of view, there may be a Regulation B concern. Regulation B states in part, that: “…To qualify as an empirically derived, demonstrably and statistically sound credit scoring system, the system must be – (iii) developed and validated using accepted statistical principles and methodology, and (iv) periodically revalidated by the use of appropriate statistical principles and methodology and adjusted as necessary to maintain predictive ability.”
The K-S Statistic is an accepted statistical method for validating credit scoring models as long as proper sampling techniques are employed. Divergence may not be an accepted method unless other necessary conditions are met.
Testing for Significance
Within the effects test concept, the courts have defined statistical significance to mean that there is no more than a 1-in-20 chance that a result could have occurred due to random chance. Thus, when we say that a scoring model separates Good and Bad accounts at a statistically significant level, we must be able to demonstrate that the observed separation had no more than a 1-in-20 chance of occurring randomly.
The properties of the K-S statistic for validating credit scoring models are well known and have been thoroughly explored by statisticians in a large body of readily available literature. It is possible to establish a minimum K-S value that must be obtained in order to claim statistical significance for a given sample. A survey of available literature by the authors has not revealed a corresponding method for testing Divergence for statistical significance.
Use as a Decision Tool
Putting legal issues aside, and returning to the more practical, it is important for a decision maker to be able to measure the useful predictive strength of a scoring model. Creditors usually employ a scoring model by selecting a specific cut-off score and approving applicants who score above (or below) that cutoff. But the cut- off may be quite different from the score where the maximum separation between the two distributions occurs.