Statistics

Overview

Table 1 - Measured Data
MetricTotalLast 30 daysLast 7 days
Loans Lent# ($)# ($)# ($)
Outstanding Loans# ($)# ($)# ($)
Unpaid Loans# ($)# ($)# ($)

Loans Lent is the total number of loans that have been lent that have not been flagged as deleted by a moderator, as well as the dollar value of the principal of the corresponding loans.

Outstanding loans is the number of loans that have a principal repayment that is less than their principal and have not been flagged as deleted or unpaid, as well as the dollar value of the principal of the corresponding loans.

Unpaid loans is the number of loans flagged as unpaid, as well as the dollar value of the principal of those loans.

Time since the first loan is the interval of time since the first loan was created in the subreddit and was picked up by the bot.


Table 2 - Derived Data
MetricConfidenceValue
Loan Default Rate%% - %
User Default Rate%% - %
Recurring User Rate%% - %
Recurring User Default Rate%% - %
Projected Loan Value%$ - $

Loan Default Rate

Measures the likelihood that the next loan will default.

Calculations

Known

Let \(L\) be the number of loans total

Let \(D\) be the number of defaulted loans

Let \(P\) be the number of outstanding (pending) loans

Let \(c\) be the confidence that the true default rate lies within \(i\); \(c = 0.95\)

Assume

The number of loans so far is a representative sample of loans in the future, and the sample size is large enough for Fisher information to be a good estimate of \(-l''\).

Want

Let \(i\) be the Wald interval of the default rate with confidence \(c\). \(i = (i_0, i_1)\)

Let \(r = \frac{i_1-i_0}{2}\)

MathDescription
\(L^* = L - P\) Outstanding loans do not provide information on default rates
\(r = 1.96 \frac{1}{\sqrt{I(\hat{\pi}})}\) Definition of standard error, where \(I(\hat{\pi})\) is the Fisher information
\(r = 1.96 \frac{1}{\sqrt{\frac{L^*}{\hat{\pi}(1-\hat{\pi})}}}\) Substitute Fisher Information \(I(p; n) = \frac{n}{p(1 - p)}\)
\(r = 1.96 \sqrt{\frac{\frac{D}{L^*}(1 - \frac{D}{L^*})}{L^*}}\) Algebra, substitute \(\hat{\pi} = \frac{D}{L^*}\)
\(i = (\hat{\pi} - r, \hat{\pi} + r)\) Definition
\(i = \left(\frac{D}{L^*} - 1.96 \sqrt{\frac{\frac{D}{L^*}(1 - \frac{D}{L^*})}{L^*}}, \frac{D}{L^*} + 1.96 \sqrt{\frac{\frac{D}{L^*}(1 - \frac{D}{L^*})}{L^*}}\right)\) Substitution

User Default Rate

Measures the likelihood that the next new user will default.

Calculations

Known

Let \(U_B\) be the number of users with atleast one loan as borrower

Let \(U_D\) be the number of users with atleast one loan defaulted as borrower

Let \(c\) be the confidence that the true user default rate lies within \(i\); \(c = 0.95\)

Assume

The users so far is a representative sample of users in the future, and the sample is large enough for Fisher information to be a good estimate of \(-l''\).

Want

Let \(i\) be the Wald interval that the true user default rate lies within with confidence \(c\). \(i = (i_0, i_1)\)

Let \(r = \frac{i_1 - i_0}{2}\)

MathDescription
\(r = 1.96 \frac{1}{\sqrt{I(\hat{\pi})}}\) Definition of standard error, where \(I(\hat{\pi})\) is the Fisher information
\(r = 1.96 \frac{1}{\sqrt{\frac{U_B}{\hat{\pi}(1 - \hat{\pi})}}}\) Substitute \(I(p; n) = \frac{n}{p(1 - p)}\) for binomial distributions, where \(U_B = n\)
\(r = 1.96 \sqrt{\frac{\hat{\pi}(1 - \hat{\pi})}{U_B}}\) Algebra
\(r = 1.96 \sqrt{\frac{\frac{U_D}{U_B}(1 - \frac{U_D}{U_B})}{U_B}}\) Substitute \(\hat{\pi} = \frac{U_D}{U_B}\) for binomial distributions
\(i = \left(\frac{U_D}{U_B} - 1.96 \sqrt{\frac{\frac{U_D}{U_B}(1 - \frac{U_D}{U_B})}{U_B}}, \frac{U_D}{U_B} + 1.96 \sqrt{\frac{\frac{U_D}{U_B}(1 - \frac{U_D}{U_B})}{U_B}}\right)\) Definition of \(i\) substituting \(\hat{\pi} = \frac{U_D}{U_B}\) and \(r\)

Recurring User Rate

Measures the likelihood that the next loan will be done by a user who has already completed a loan. A loan is considered completed if it is not deleted, has the same principal and principal repayment, and is not marked as unpaid.

Calculations

Know

Let \(L_{RB}\) be the number of loans with a repeated borrower

Let \(L\) be the total number of loans

Let \(P\) be the number of outstanding loans

Let \(c\) be the confidence that the true recurring user rate lies within \(i\); \(c = 0.95\)

Assume

Loans and users so far are a representative sample of future users/loans, and the sample size is large enough for Fisher information to be a good estimate of \(-l''\)

Want

Let \(i\) be the interval the true recurring user rate lies within with \(c\) confidence; \(i = (i_0, i_1)\)

Let \(r = \frac{i_1 - i_0}{2}\)

MathDescription
\(r = 1.96 \frac{1}{\sqrt{I(\hat{\pi})}}\) Definition of standard error, where \(I(\hat{\pi})\) is the Fisher information
\(r = 1.96 \frac{1}{\sqrt{\frac{L - P}{\frac{L_{RB}}{L - P}\left(1 - \frac{L_{RB}}{L - P}\right)}}}\) Substitute \(I(p; n) = \frac{n}{p(1-p)}\) for binomial distributions, where \(n=L_{RB}, p = \frac{L_{RB}}{L - P}\)
\(r = 1.96 \sqrt{\frac{\frac{L_{RB}}{L - P}\left(1 - \frac{L_{RB}}{L - P}\right)}{L - P}}\) Algebra
\(i = \left(\frac{L_{RB}}{L - P} - 1.96 \sqrt{\frac{\frac{L_{RB}}{L - P}\left(1 - \frac{L_{RB}}{L - P}\right)}{L - P}}, \frac{L_{RB}}{L - P} + 1.96 \sqrt{\frac{\frac{L_{RB}}{L - P}\left(1 - \frac{L_{RB}}{L - P}\right)}{L - P}}\right)\) Definition of \(i\) and \(r\)

Recurring User Default Rate

Measures the likelihood that the next loan by a recurring user will default.

Calculations

Know

Let \(L_{RB}\) be the number of completed (same principal as principal repayment or unpaid) loans with a repeated borrower

Let \(L_{D_{RB}}\) be the number of loans defaulted on by a repeated borrower

Let \(c\) be the confidence that the true recurring user default rate is within \(i\); \(c = 0.95\)

Assume

Loans by recurring users is representative of future loans by recurring users, and the sample size is large enough for Fisher information to be a good estimate of \(-l''\)

Want

Let \(i\) be the interval the true recurring user default rate lies within with \(c\) confidence; \(i = (i_0, i_1)\)

Let \(r = \frac{i_1 - i_0}{2}\)

MathDescription
\(r = 1.96 \frac{1}{\sqrt{I(\hat{\pi})}}\) Definition of standard error, where \(I(\hat{\pi})\) is the Fisher information
\(r = 1.96 \frac{1}{\sqrt{\frac{L_{RB}}{\frac{L_{D_{RB}}}{L_{RB}}(1 - \frac{L_{D_{RB}}}{L_{RB}})}}}\) Substitute \(I(p; n) = \frac{n}{p(1-p)}\), \(p = \frac{L_{D_{RB}}}{L_{RB}}, n = L_{RB}\)
\(r = 1.96 \sqrt{\frac{\frac{L_{D_{RB}}}{L_{RB}}(1 - \frac{L_{D_{RB}}}{L_{RB}})}{L_{RB}}}\) Algebra
\(i = \left(\frac{L_{D_{RB}}}{L_{RB}} - 1.96 \sqrt{\frac{\frac{L_{D_{RB}}}{L_{RB}}(1 - \frac{L_{D_{RB}}}{L_{RB}})}{L_{RB}}}, \frac{L_{D_{RB}}}{L_{RB}} + 1.96 \sqrt{\frac{\frac{L_{D_{RB}}}{L_{RB}}(1 - \frac{L_{D_{RB}}}{L_{RB}})}{L_{RB}}}\right)\) Definition of \(i\) and \(r\)

Projected Loan Value

Measures the projected value of the next loan in the subreddit. Alternatively, measures the projected average value of all future loans. Since this range is large even with very low confidence (68%), the principal of loan requests is probably not normally distributed, and warrants further investigation.

Calculations

Known

Let \(L\) be the number of loans

Let \(m_i\) be the principal of loan \(i\)

Let \(c\) be the confidence that the true average principal lies within \(i\); \(c = 0.68\)

Assume

Current loans are representative of future loans

Want

Let \(i\) be the interval the true average principal lies within with confidence \(c\); \(i = (i_0, i_1)\)

MathDescription
\(\bar{m} = \sum\limits_{i=1}^{L} \frac{m_i}{L}\) Definition of mean
\(S^2 = \sum\limits_{i=1}^{L} \frac{(m_i - \bar{m})^2}{L - 1}\) Variance adjusted for sample rather than population
\(i = (\bar{m} - 1 \sqrt{S^2}, \bar{m} + 1 \sqrt{S^2})\) Definition of standard error

Loan Quantity vs Principal

As Table 2 shows, the projected loan value did not settle into a reasonable interval. At this time, it may be desirable to do a histogram of loan quantity vs principal.

Determining Bin Size

There are several methods for determining bin size. In this case Doane's formula is a reasonable choice, since a non-normal distribution is suspected.

Known

Let \(L\) be the number of loans

Let \(k\) be the number of bins

Let \(m_i\) be the principal of loan \(i\)

Let \(m_{\text{min}}\) be the minimum loan principal, and be equal to 0

Let \(m_{\text{max}}\) be the maximum loan principal, (from data)

Assume

Doane's Formula: \(k = 1 + \log_2(n) + \log_2(1 + \frac{\left|g_1\right|}{\sigma_{g_1}})\) where

  • k - number of bins
  • n - number of data points (\(n \equiv L\))
  • \(g_1\) - estimated 3rd-moment-skewness \(\gamma_1\)
  • \(\sigma_{g_1} = \sqrt{\frac{6(n-2)}{(n+1)(n+3)}}\)

Sample Skewness Estimation: \(\gamma_1 \approx g_1 = \frac{\frac{1}{n} \sum\limits_{i=1}^{n}(x_i - \bar{x})^3}{\left(\frac{1}{n-1}\sum\limits_{i=1}^{n}(x_i-\bar{x})^2\right)^{\frac{3}{2}}}\)

  • \(\bar{x}\) - sample mean (\(\bar{x} \equiv \bar{m}\))
  • \(x_i\) - sample value \(i\) (\(x_i \equiv m_i\))
Want

Let \(h\) be the bin width

MathDescription
\(\bar{m} = \sum\limits_{i=1}^{L} \frac{m_i}{L}\) Definition of mean
\(g_1 = \frac{\frac{1}{L}\sum\limits_{i=1}^{L}(m_i - \bar{m})^3}{\left(\frac{1}{L-1}\sum\limits_{i=1}^L(m_i-\bar{m})^{2}\right)^{\frac{3}{2}}}\) Sample Skewness Estimation
\(\sigma_{g_1} = \sqrt{\frac{6(L-2)}{(L+1)(L+3)}}\) Definition
\(k = 1 + \log_2(L) + \log_2(1 + \frac{\left|g_1\right|}{\sigma_{g_1}})\) Doane's Formula
\(k = \text{ceil}\left(\frac{m_{\text{max}} - m_{\text{min}}}{h}\right) \approx \frac{m_{\text{max}} - m_{\text{min}}}{h}\) Definition of bin size vs bin width over an interval \((m_{\text{min}},m_{\text{max}})\)
\(h = \frac{m_{\text{max}} - m_{\text{min}}}{k}\) Algebra

This graph shows how many loans are created at each dollar amount. Loans over $1050 are ignored, as they stretch the graph out beyond what is useful. The graph helps determine what average loan looks like, as well as the distribution of loans - clearly right-skewed (the majority of the loans are on the left). This confirms that the reason the projected loan value did not work is because the distribution of loans is not normal.

Loans Fulfilled Over Time

This graph shows the number of loans that were fulfilled over time, and is another metric to measure the healthiness of the subreddit. A loan is fulfilled as soon as it is marked in the LoansBot database. If the number of loans fulfilled over time is increasing, more users are coming into the subreddit and/or existing users are making loans more often.