Home  |  Cart (0)  |  About Us  |  Advertise  |  Login  | Contact
CPA Training, Knowledge, News & Community
 

Have an Account?   Login
Join 1000s of Accounting Professionals
FREE MEMBERSHIP
Accounting Education & Training
Access to Tax Forms & Tools
Accounting Articles & White Papers
Tax News, Jobs, CPA Blogs & More

Benford's Rule


Of the forensic tools available to the fraud investigator, Benford’s Rule, sometimes called Benford’s Law is the most fascinating. It predicts the frequency of numbers (digits), and hence an anomaly, in naturally-occurring data sets. When used properly, it allows the fact-finder to identify suspect values in a given data set. For example, it is particularly suitable for identifying vendors that submit fraudulent invoices. Using Benford’s Rule nonconforming invoice amounts are flagged as systemic errors that corrupt the data set. These fraudulent invoices reveal themselves because the amount (dollar value) does not correspond to a predicable Benford-type distribution when compared to the other invoice amounts in the sample.

The law was first published in 1881 by Newcomb in The American Journal of Mathematics. It appears that a simple but powerful mathematical phenomena exists that quickly predicts the distribution of numbers for “pointing suspicion at frauds, embezzlers, tax-evaders, sloppy accountants, and even computer bugs.” Frank Benford, re-discoverer of this phenomenon, was a physicist for General Electric Company who in 1938 recognized that certain nonrandomly behaved numbers in non-normally distributed data sets (i.e., those that are highly skewed, not uniform in distribution, and not bell-shaped) followed the algorithm defined by:

Benford’s Rule:
P(n)=log(n+1)-log(n)

(Where P(n) is the probability that n is the first digit; the logarithms are to the base 10, and probability is defined as all numbers between 0 and 1, where 0 is impossibility and 1 is absolute certainty.)

The sample of numbers in the Benford-type data sets should be large enough to give the predicted proportions a chance to assert themselves, and must be free of artificial limits, that is, they should be allowed to take any value they please. A critical feature of the Benford distribution is that it is invariant of scale, that is, it is base-invariant and independent of the choice of units. That is, it can be used to evaluate data sets of any value; square miles, dollars, tons, gallons, or invoice numbers in any proportion or amount. Benford’s Rule does not offer a mathematical solution to a problem but rather, it models the data set against the probability distribution of each digit as it appears in any number. This is a predictive tool called digital frequency analysis, or DFA. The numbers could be data in a set that measures mass, volume, or all numbers on the front page of the Wall Street Journal. Benford manually analyzed over 20 thousand data sets of various categories and found that all of the seemingly disparate numbers followed the same first-digit probability pattern. Subsequent analyses of thousands of more data sets researchers have confirmed Benford’s distribution.

Most people intuitively believe that in any string of numbers of a naturally occurring data set each digit, one through nine, has an equal probability of being the first digit. Benford’s distribution showed that this intuition is incorrect. It is this belief and practice of the fraudster that all digits should be regular that creates data sets which evoke suspicions of fraud. A database with as few as ten numbers can begin to show a trend of honesty or fraud. It should be obvious that the greater the sampling of numbers (the bigger the database), the more closely the distribution of digits follows Benford’s Rule. It does not prove fraud, but does indicate the potential for a data set to be partially or entirely unnatural (fraudulent).

Benford’s Rule is so powerful in its ability to detect fraud, that even the IRS uses it to detect tax-cheats. The idea underlying the method is the analysis of deductions in tax returns to determine if they more or less match the frequencies and ratios predicted by Benford’s Rule. The IRS knows that when a database (the taxpayer’s return) contains systematic errors, the creator of the database (the taxpayer) has knowingly or unknowingly entered false data into the set. According to Browne, “The income tax agencies of several nations and several states are using detection software based on Benford’s Rule, as are a score of large companies and accounting businesses.”

Benford’s probability distribution (in percent) of the first digit of any number within a database that meets the criteria described above should be as follows:

1 = 30.10 4 = 9.69 7 = 5.80
2 = 17.61 5 = 7.92 8 = 5.12
3 = 12.49 6 = 6.69 9 = 4.58

What fraudster would intuitively know that the first digit of fabricated numbers should appear in this frequency? Whether the numbers in any database are normally distributed or non-randomly behaved, the probability distribution of the third digit of any database are equally divided and hence should be very close to ten percent for each of the ten digits. Thus, analysis of the third digit can independently provide a clue as to whether the numbers have been manipulated. Together, the first and third digit analyses, provides a very strong indication of whether the numbers are naturally occurring or have been manipulated. For even further analysis, one could examine the second digit frequency distribution as well. In the second digit analysis, the number zero should appear about 12 percent of the time and the number nine should appear about 8.5 percent of the time. The sequence, one-zero (10) should be the most frequent first two digits of any number (occurring 3.612 percent of the time), and the sequence, nine-nine (99) should be the least frequent first two digits of any number (occurring only 0.389 percent of the time).

Related Products


Related Information


Articles Blogs Forms Whitepapers