Table 1 - Measured Data

Metric | Total | Last 30 days | Last 7 days |
---|---|---|---|

Loans Lent | # ($) | # ($) | # ($) |

Outstanding Loans | # ($) | # ($) | # ($) |

Unpaid Loans | # ($) | # ($) | # ($) |

Loans Lent is the total number of loans that have been lent that have not been flagged as deleted by a moderator, as well as the dollar value of the principal of the corresponding loans.

Outstanding loans is the number of loans that have a principal repayment that is less than their principal and have not been flagged as deleted or unpaid, as well as the dollar value of the principal of the corresponding loans.

Unpaid loans is the number of loans flagged as unpaid, as well as the dollar value of the principal of those loans.

Time since the first loan is the interval of time since the first loan was created in the subreddit and was picked up by the bot.

Table 2 - Derived Data

Metric | Confidence | Value |
---|---|---|

Loan Default Rate | % | % - % |

User Default Rate | % | % - % |

Recurring User Rate | % | % - % |

Recurring User Default Rate | % | % - % |

Projected Loan Value | % | $ - $ |

Measures the likelihood that the next loan will default.

Let \(L\) be the number of loans total

Let \(D\) be the number of defaulted loans

Let \(P\) be the number of outstanding (pending) loans

Let \(c\) be the confidence that the true default rate lies within \(i\); \(c = 0.95\)

The number of loans so far is a representative sample of loans in the future, and the sample size is large enough for Fisher information to be a good estimate of \(-l''\).

Let \(i\) be the Wald interval of the default rate with confidence \(c\). \(i = (i_0, i_1)\)

Let \(r = \frac{i_1-i_0}{2}\)

Math | Description |
---|---|

\(L^* = L - P\) | Outstanding loans do not provide information on default rates |

\(r = 1.96 \frac{1}{\sqrt{I(\hat{\pi}})}\) | Definition of standard error, where \(I(\hat{\pi})\) is the Fisher information |

\(r = 1.96 \frac{1}{\sqrt{\frac{L^*}{\hat{\pi}(1-\hat{\pi})}}}\) | Substitute Fisher Information \(I(p; n) = \frac{n}{p(1 - p)}\) |

\(r = 1.96 \sqrt{\frac{\frac{D}{L^*}(1 - \frac{D}{L^*})}{L^*}}\) | Algebra, substitute \(\hat{\pi} = \frac{D}{L^*}\) |

\(i = (\hat{\pi} - r, \hat{\pi} + r)\) | Definition |

\(i = \left(\frac{D}{L^*} - 1.96 \sqrt{\frac{\frac{D}{L^*}(1 - \frac{D}{L^*})}{L^*}}, \frac{D}{L^*} + 1.96 \sqrt{\frac{\frac{D}{L^*}(1 - \frac{D}{L^*})}{L^*}}\right)\) | Substitution |

Measures the likelihood that the next new user will default.

Let \(U_B\) be the number of users with atleast one loan as borrower

Let \(U_D\) be the number of users with atleast one loan defaulted as borrower

Let \(c\) be the confidence that the true user default rate lies within \(i\); \(c = 0.95\)

The users so far is a representative sample of users in the future, and the sample is large enough for Fisher information to be a good estimate of \(-l''\).

Let \(i\) be the Wald interval that the true user default rate lies within with confidence \(c\). \(i = (i_0, i_1)\)

Let \(r = \frac{i_1 - i_0}{2}\)

Math | Description |
---|---|

\(r = 1.96 \frac{1}{\sqrt{I(\hat{\pi})}}\) | Definition of standard error, where \(I(\hat{\pi})\) is the Fisher information |

\(r = 1.96 \frac{1}{\sqrt{\frac{U_B}{\hat{\pi}(1 - \hat{\pi})}}}\) | Substitute \(I(p; n) = \frac{n}{p(1 - p)}\) for binomial distributions, where \(U_B = n\) |

\(r = 1.96 \sqrt{\frac{\hat{\pi}(1 - \hat{\pi})}{U_B}}\) | Algebra |

\(r = 1.96 \sqrt{\frac{\frac{U_D}{U_B}(1 - \frac{U_D}{U_B})}{U_B}}\) | Substitute \(\hat{\pi} = \frac{U_D}{U_B}\) for binomial distributions |

\(i = \left(\frac{U_D}{U_B} - 1.96 \sqrt{\frac{\frac{U_D}{U_B}(1 - \frac{U_D}{U_B})}{U_B}}, \frac{U_D}{U_B} + 1.96 \sqrt{\frac{\frac{U_D}{U_B}(1 - \frac{U_D}{U_B})}{U_B}}\right)\) | Definition of \(i\) substituting \(\hat{\pi} = \frac{U_D}{U_B}\) and \(r\) |

Measures the likelihood that the next loan will be done by a user who has already *completed* a loan. A loan is considered completed if it is not deleted, has the same principal and principal repayment, and is not
marked as unpaid.

Let \(L_{RB}\) be the number of loans with a repeated borrower

Let \(L\) be the total number of loans

Let \(P\) be the number of outstanding loans

Let \(c\) be the confidence that the true recurring user rate lies within \(i\); \(c = 0.95\)

Loans and users so far are a representative sample of future users/loans, and the sample size is large enough for Fisher information to be a good estimate of \(-l''\)

Let \(i\) be the interval the true recurring user rate lies within with \(c\) confidence; \(i = (i_0, i_1)\)

Let \(r = \frac{i_1 - i_0}{2}\)

Math | Description |
---|---|

\(r = 1.96 \frac{1}{\sqrt{I(\hat{\pi})}}\) | Definition of standard error, where \(I(\hat{\pi})\) is the Fisher information |

\(r = 1.96 \frac{1}{\sqrt{\frac{L - P}{\frac{L_{RB}}{L - P}\left(1 - \frac{L_{RB}}{L - P}\right)}}}\) | Substitute \(I(p; n) = \frac{n}{p(1-p)}\) for binomial distributions, where \(n=L_{RB}, p = \frac{L_{RB}}{L - P}\) |

\(r = 1.96 \sqrt{\frac{\frac{L_{RB}}{L - P}\left(1 - \frac{L_{RB}}{L - P}\right)}{L - P}}\) | Algebra |

\(i = \left(\frac{L_{RB}}{L - P} - 1.96 \sqrt{\frac{\frac{L_{RB}}{L - P}\left(1 - \frac{L_{RB}}{L - P}\right)}{L - P}}, \frac{L_{RB}}{L - P} + 1.96 \sqrt{\frac{\frac{L_{RB}}{L - P}\left(1 - \frac{L_{RB}}{L - P}\right)}{L - P}}\right)\) | Definition of \(i\) and \(r\) |

Measures the likelihood that the next loan by a recurring user will default.

Let \(L_{RB}\) be the number of completed (same principal as principal repayment or unpaid) loans with a repeated borrower

Let \(L_{D_{RB}}\) be the number of loans defaulted on by a repeated borrower

Let \(c\) be the confidence that the true recurring user default rate is within \(i\); \(c = 0.95\)

Loans by recurring users is representative of future loans by recurring users, and the sample size is large enough for Fisher information to be a good estimate of \(-l''\)

Let \(i\) be the interval the true recurring user default rate lies within with \(c\) confidence; \(i = (i_0, i_1)\)

Let \(r = \frac{i_1 - i_0}{2}\)

Math | Description |
---|---|

\(r = 1.96 \frac{1}{\sqrt{I(\hat{\pi})}}\) | Definition of standard error, where \(I(\hat{\pi})\) is the Fisher information |

\(r = 1.96 \frac{1}{\sqrt{\frac{L_{RB}}{\frac{L_{D_{RB}}}{L_{RB}}(1 - \frac{L_{D_{RB}}}{L_{RB}})}}}\) | Substitute \(I(p; n) = \frac{n}{p(1-p)}\), \(p = \frac{L_{D_{RB}}}{L_{RB}}, n = L_{RB}\) |

\(r = 1.96 \sqrt{\frac{\frac{L_{D_{RB}}}{L_{RB}}(1 - \frac{L_{D_{RB}}}{L_{RB}})}{L_{RB}}}\) | Algebra |

\(i = \left(\frac{L_{D_{RB}}}{L_{RB}} - 1.96 \sqrt{\frac{\frac{L_{D_{RB}}}{L_{RB}}(1 - \frac{L_{D_{RB}}}{L_{RB}})}{L_{RB}}}, \frac{L_{D_{RB}}}{L_{RB}} + 1.96 \sqrt{\frac{\frac{L_{D_{RB}}}{L_{RB}}(1 - \frac{L_{D_{RB}}}{L_{RB}})}{L_{RB}}}\right)\) | Definition of \(i\) and \(r\) |

Measures the projected value of the next loan in the subreddit. Alternatively, measures the projected average value of all future loans. Since this range is large even
with very low confidence (68%), the principal of loan requests is probably **not** normally distributed, and warrants further investigation.

Let \(L\) be the number of loans

Let \(m_i\) be the principal of loan \(i\)

Let \(c\) be the confidence that the true average principal lies within \(i\); \(c = 0.68\)

Current loans are representative of future loans

Let \(i\) be the interval the true average principal lies within with confidence \(c\); \(i = (i_0, i_1)\)

Math | Description |
---|---|

\(\bar{m} = \sum\limits_{i=1}^{L} \frac{m_i}{L}\) | Definition of mean |

\(S^2 = \sum\limits_{i=1}^{L} \frac{(m_i - \bar{m})^2}{L - 1}\) | Variance adjusted for sample rather than population |

\(i = (\bar{m} - 1 \sqrt{S^2}, \bar{m} + 1 \sqrt{S^2})\) | Definition of standard error |

As Table 2 shows, the projected loan value did not settle into a reasonable interval. At this time, it may be desirable to do a histogram of loan quantity vs principal.

There are several methods for determining bin size. In this case Doane's formula is a reasonable choice, since a non-normal distribution is suspected.

Let \(L\) be the number of loans

Let \(k\) be the number of bins

Let \(m_i\) be the principal of loan \(i\)

Let \(m_{\text{min}}\) be the minimum loan principal, and be equal to 0

Let \(m_{\text{max}}\) be the maximum loan principal, (from data)

Doane's Formula: \(k = 1 + \log_2(n) + \log_2(1 + \frac{\left|g_1\right|}{\sigma_{g_1}})\) where

- k - number of bins
- n - number of data points (\(n \equiv L\))
- \(g_1\) - estimated 3rd-moment-skewness \(\gamma_1\)
- \(\sigma_{g_1} = \sqrt{\frac{6(n-2)}{(n+1)(n+3)}}\)

Sample Skewness Estimation: \(\gamma_1 \approx g_1 = \frac{\frac{1}{n} \sum\limits_{i=1}^{n}(x_i - \bar{x})^3}{\left(\frac{1}{n-1}\sum\limits_{i=1}^{n}(x_i-\bar{x})^2\right)^{\frac{3}{2}}}\)

- \(\bar{x}\) - sample mean (\(\bar{x} \equiv \bar{m}\))
- \(x_i\) - sample value \(i\) (\(x_i \equiv m_i\))

Let \(h\) be the bin width

Math | Description |
---|---|

\(\bar{m} = \sum\limits_{i=1}^{L} \frac{m_i}{L}\) | Definition of mean |

\(g_1 = \frac{\frac{1}{L}\sum\limits_{i=1}^{L}(m_i - \bar{m})^3}{\left(\frac{1}{L-1}\sum\limits_{i=1}^L(m_i-\bar{m})^{2}\right)^{\frac{3}{2}}}\) | Sample Skewness Estimation |

\(\sigma_{g_1} = \sqrt{\frac{6(L-2)}{(L+1)(L+3)}}\) | Definition |

\(k = 1 + \log_2(L) + \log_2(1 + \frac{\left|g_1\right|}{\sigma_{g_1}})\) | Doane's Formula |

\(k = \text{ceil}\left(\frac{m_{\text{max}} - m_{\text{min}}}{h}\right) \approx \frac{m_{\text{max}} - m_{\text{min}}}{h}\) | Definition of bin size vs bin width over an interval \((m_{\text{min}},m_{\text{max}})\) |

\(h = \frac{m_{\text{max}} - m_{\text{min}}}{k}\) | Algebra |

This graph shows how many loans are created at each dollar amount. Loans over $1050 are ignored, as they stretch the graph out beyond what is useful. The graph helps determine what average loan looks like, as well as the distribution of loans - clearly right-skewed (the majority of the loans are on the left). This confirms that the reason the projected loan value did not work is because the distribution of loans is not normal.

This graph shows the number of loans that were fulfilled over time, and is another metric to measure the healthiness of the subreddit. A loan is fulfilled as soon as it is marked in the LoansBot database. If the number of loans fulfilled over time is increasing, more users are coming into the subreddit and/or existing users are making loans more often.

Besides everything labeled WIP, these haven't even been started:

- Large loans vs small loans default rate
- Long loans vs short loans default rate
- More average loan information, Median loan information