[科普] - Benford's law Python implement
李永樂老師的視頻 淘寶“雙11”2684億銷售額造假了嗎?用本福特定律檢驗一下
班佛定律,班佛定律說明在b進制中,以數n起頭的數出現的機率為
Pn = logb (n+1/n)
<例>10進制中,自然雜亂無章的數據首位數出線機率表如下
首位數
|
出現機率
|
1
|
30.1%
|
2
|
17.6%
|
3
|
12.5%
|
4
|
9.7%
|
5
|
7.9%
|
6
|
6.7%
|
7
|
5.8%
|
8
|
5.1%
|
9
|
4.6%
|
可用來查數據是否有造假,但須滿足下面兩個原則
- 非人為之自然數
- 數量級差異大
Python implement
------------------------------------------------------------------------------------------------------------
def Benfort_Law_Test(number):
import math
import pandas as pd
import matplotlib.pyplot as plt
results = []
dic = {}
for x in number:
if x != 0:
y = math.floor(x / (10**(len(str(x))-1)))
results.append(y)
for y in results:
if not y in dic:
dic[y] = 1
else:
dic[y] = dic[y]+1
df = pd.DataFrame.from_dict(dic,orient='index', columns=['count'])
df.sort_index(inplace=True)
df['number'] = range(1,10)
df['1st_no_p'] = df['count']/df['count'].sum()
df['benfort_l'] = df['number'].apply(lambda x: math.log10((x+1)/x))
plt.bar(range(1,10), df['1st_no_p'], label= 'Data_P', width = 0.5 )
plt.plot(range(1,10), df['benfort_l'], label= 'BenFort_Law', color = 'r')
plt.legend()
plt.title('Benfort Law Test')
plt.xlabel('Number')
plt.ylabel('Precent')
plt.show
import math
import pandas as pd
import matplotlib.pyplot as plt
results = []
dic = {}
for x in number:
if x != 0:
y = math.floor(x / (10**(len(str(x))-1)))
results.append(y)
for y in results:
if not y in dic:
dic[y] = 1
else:
dic[y] = dic[y]+1
df = pd.DataFrame.from_dict(dic,orient='index', columns=['count'])
df.sort_index(inplace=True)
df['number'] = range(1,10)
df['1st_no_p'] = df['count']/df['count'].sum()
df['benfort_l'] = df['number'].apply(lambda x: math.log10((x+1)/x))
plt.bar(range(1,10), df['1st_no_p'], label= 'Data_P', width = 0.5 )
plt.plot(range(1,10), df['benfort_l'], label= 'BenFort_Law', color = 'r')
plt.legend()
plt.title('Benfort Law Test')
plt.xlabel('Number')
plt.ylabel('Precent')
plt.show
------------------------------------------------------------------------------------------------------------
<執行>
<執行>
- 以商品銷售金額統計,很好的符合Benfort定理
- 以商品銷售數量統計,則不符合
- 如用隨機數生成是否會符合定理,執行3次首位數1大約35%
import random
import math
import pandas as pd
df2=[]
for i in range(100000):
a = random.random() * 10 ** random.randint(1,9)
b = random.random() * 10 ** random.randint(1,9)
c = math.floor(a/b)
if c >= 1:
df2.append(c)
df2 = pd.DataFrame(df2, columns=["randm"])
Benfort_Law_Test(df2['randm'])
留言
張貼留言