i am trying to convert a simple linear regression code from python to FB as part of my effort to create a ML FB library of linear regression.
well the intension is to make a library of ML linear regression in FB so when (hopefully) we'll have a working linear regression program code in FB that's just one step the next is to make out of it a library (libLINREG) with a bi file and a libLINREG.a file so any one in FB won't have to code this headache over and over when trying to add some linear regression calculation to his data / program
source of python code is in this site: https://machinelearningmastery.com/impl ... ch-python/
for more details please visit this post: viewtopic.php?f=14&t=28901
project repository on github: https://github.com/ronblue/libLINREG
any help or contribution will be welcome this is an open source community project...
here is the thread in forum where we are trying to do the conversion/translation of the code from python to FB: viewtopic.php?f=2&t=28908
UPDATE 09/12/2020: we have another ML library in freeBASIC - KNN algorithm analysis on csv datasets thanks to the community help...
the repository (open source MIT license) is here https://github.com/ronblue/FB_KNN_lib
UPDATE 16/11/2020: PROJECT HAS BEEN SUCCESFULLY COMPLITED IN REPOSITORY OF PROJECT YOU'LL FIND A WORKING LIBRARY WITH DOCUMENTATION. THANKS FOR ALL OF YOUR HELP <3
update (15/11/2020):
-well me and my teacher had succeeded in taking dodicat linear regression code example and combine it with paul doe "load csv" code to read dataset csv files to a working code
- now the next step is learning how to make an FB library out of the code with a "bi" file header and a "lib.a" lib file
- the result working code with example dataset.csv is posted here
update (31/10/2020):
here is an update i'm taking private lessons in programming in freeBASIC with my old teacher who thought me and helped me in the past when i started to learn how to program and code... we are taking lessons in FB OOP and UDT in the aim to accomplish this project and make a) a program that does a simple linear regression and b) make a library of linear regression in freebasic. the lessons aim to teach me advance topics of FB programming in order that i will be able to code projects that require advance knowledge in FB so i'll be able to know what i'm doing and be able in completing such projects without asking for help or relaying on someone else for helping me... to be more independent and self-reliant in my progress in coding in FB...
as for tourist trap attempt to convert the python code i guess it didn't work out at the end - i really don't now where that stands... but anyway i will be as soon as possible attempting to code such a program and if succeeded i'll post it in the forum...
update (23/10/2020) version 0.0.2.1:
- Paul doe posted working code that mimics the result of python code (including presenting the results as a graph) by so proving that FB linear regression program is possible - i offered Paul doe to become a collaborator in project - i hope he will accept invitation. the most important thing in Paul doe code is the equations function - the math equations function which i hope will be at the core of the library at the end...
- old code of attempts to convert the python code that was scrapped is now in "source/ARCHIVE" folder
- looks like there is no option but to use UDT and OOP in order to create a linear regression / python code equivalent result
- the ownership of the project's repository has been transferred to Tourist Trap.
update(22/10/2020) version 0.0.2:
- Tourist Trap joined the project as a collaborator.
- we decided to start from scratch and ditch the main_program.bas code with the OOP to load the csv file...
- now we start to convert\translate the python code 1 to 1 - version of code is now 0.0.2
version 0.01(21/10/2020):
- checked if python code example works (and it does) ( in "py_example/py_linear_example" )
- dataset csv in "datasets\dataset.scv" of Swedish insurance data made this will be test and train data
- source code in "source\Load_csv.bas" - an attempt to convert the py code program to a working FB code program ( so far csv file data are loaded into the program and been able to convert py code for finding means varients and find convariance of data )
update 21/10/2020:]
- today tried to convert more of the python code to FB and add it to bas source code with partial success. help is needed by others
- it seems that the python code is more accurate then the FB code using the simple test.cvs file
- found a vb code example of simple linear regression here: https://www.centerspace.net/examples/nm ... xample.php however i believe i should continue to convert the python code...
test.csv file:
Code: Select all
1, 1
2, 3
4, 3
3, 2
5, 5
dataset.csv
Code: Select all
108,392.5
19,46.2
13,15.7
124,422.2
40,119.4
57,170.9
23,56.9
14,77.5
45,214
10,65.3
5,20.9
48,248.1
11,23.5
23,39.6
7,48.8
2,6.6
24,134.9
6,50.9
3,4.4
23,113
6,14.8
9,48.7
9,52.1
3,13.2
29,103.9
7,77.5
4,11.8
20,98.1
7,27.9
4,38.1
0,0
25,69.2
6,14.6
5,40.3
22,161.5
11,57.2
61,217.6
12,58.1
4,12.6
16,59.6
13,89.9
60,202.4
41,181.3
37,152.8
55,162.8
41,73.4
11,21.3
27,92.6
8,76.1
3,39.9
17,142.1
13,93
13,31.9
15,32.1
8,55.6
29,133.3
30,194.5
24,137.9
9,87.4
31,209.8
14,95.5
53,244.6
26,187.5
Code: Select all
#include once "file.bi"
TYPE ListPair
As Double x,y
End TYPE
'APPEND TO the listpair array the ListPair item
SUB pAPPEND(arr() AS ListPair , Item AS ListPair)
REDIM PRESERVE arr(LBOUND(arr) TO UBOUND(arr) + 1) AS ListPair
arr(UBOUND(arr)) = Item
END SUB
SUB loadDataset( byref path as const string , p() AS ListPair)
'dim as ListPairTable t
'Dim As ListPair p()
if( fileExists( path ) ) then
dim as long f = freeFile()
open path for input as f
do while( not eof( f ) )
dim as ListPair d
input #f, d.x
input #f, d.y
PAPPEND p(), d
LOOP
CLOSE #f
end if
end SUB
Function mean(p() As ListPair) As ListPair
Dim As ListPair pt
For n As Long=Lbound(p) To Ubound(p)
pt.x+=p(n).x
pt.y+=p(n).y
Next n
Var sz=(Ubound(p)-Lbound(p)+1)
Return Type(pt.x/sz,pt.y/sz)
End Function
Function Gradient(p() As ListPair) As Double
Dim As Double CoVariance,Variance
Dim As ListPair m=mean(p())
For n As Long=Lbound(p) To Ubound(p)
CoVariance+=(p(n).x-m.x)*(p(n).y-m.y)
Variance+=(p(n).x-m.x)^2
Next n
Return CoVariance/Variance
End Function
Function intercept(p() As ListPair,grad As Double) As Double
Var m=mean(p())
Return m.y-grad*m.x
End Function
Function RMSerror(p() As ListPair,m As Double,c As Double,res() As Double) As Double
Dim As Double acc
Redim res(Lbound(p) To Ubound(p))
For n As Long=Lbound(p) To Ubound(p)
res(n)=m*p(n).x+c
acc+=(p(n).y-res(n))^2
Next n
acc/=(Ubound(p)-Lbound(p)+1)
Return Sqr(acc)
End Function
Function minmax(p() As ListPair,flag As String="x") As ListPair 'for plotting
Dim As ListPair result
Dim As Double d(Lbound(p) To Ubound(p))
For n As Long=Lbound(d) To Ubound(d)
If flag="x" Then d(n)=p(n).x Else d(n)=p(n).y
Next
For n1 As Long=Lbound(d) To Ubound(d)-1
For n2 As Long=n1+1 To Ubound(d)
If d(n1)>d(n2) Then Swap d(n1),d(n2)
Next
Next
Return Type(d(Lbound(d)),d(Ubound(d)))
End Function
Sub plot(p() As ListPair,pred() As Double,xres As Integer,yres As Integer)
#define map(a,b,x,c,d) ((d)-(c))*((x)-(a))/((b)-(a))+(c)
#define xmap(z) map(minx,maxx,z,k,(xres-k))
#define ymap(z) map(miny,maxy,z,k,(yres-k))
Var minx=minmax(p(),"x").x,maxx=minmax(p(),"x").y
Var miny=minmax(p(),"y").x,maxy=minmax(p(),"y").y
Var k=100
Line(k,k)-(xres-k,yres-k),8,b
Dim As Double lxpos,lypos
For n As Long=Lbound(p) To Ubound(p)
Circle(xmap(p(n).x),ymap(p(n).y)),5,15,,,,f
Circle(xmap(p(n).x),ymap(pred(n))),5,5,,,,f
If n>Lbound(p) Then Line(xmap(p(n).x),ymap(pred(n)))-(lxpos,lypos),5
Line(xmap(p(n).x),ymap(p(n).y))-(xmap(p(n).x),ymap(pred(n))) ,8
lxpos=xmap(p(n).x)
lypos=ymap(pred(n))
Next n
End Sub
Sub GetRegressionLineAndShow(p() As ListPair,xres As Integer,yres As Integer)
Var M= Gradient(p()) 'get the gradient and intercept
Var C=intercept(p(),M)
Redim As Double predictions()
'get the regression line points (predictions) and root mean square error
Dim As Double e=RMSerror(p(),M,C,predictions())
COLOR 5
'y=Mx+C
Print "Regression line: y = ";M;"*x";Iif(Sgn(C)=1," +","");C
PRINT
PRINT "Predictions"
For n As Long=Lbound(predictions) To Ubound(predictions)
Print predictions(n);" ";
Next
Print
Color 8
Print "RMSE: ";e
SLEEP
CLS
PLOT(p(),predictions(),xres,yres)
End Sub
SCREEN 20
'SCREENRES 1000,950
Dim As Integer xres,yres
Screeninfo xres,yres
Window(0,0)-(xres,yres)
REDIM p(any) AS ListPair
loadDataset( "D:\repo\FB_libLINREG\datasets\dataset.csv", p() )
GetRegressionLineAndShow(p(),xres,yres)
Sleep
Paul doe code that mimic python code in results:
Code: Select all
#include once "fbgfx.bi"
const as double _
MIN_DBL = 4.940656458412465E-324, _
MAX_DBL = 1.797693134862316E+308
enum Colors
White = rgba( 255, 255, 255, 255 )
Black = rgba( 0, 0, 0, 255 )
Red = rgba( 205, 80, 80, 255 )
LightBlue = rgba( 130, 182, 208, 255 )
LightGray = rgba( 214, 214, 214, 0 )
end enum
type Rect
declare constructor()
declare constructor( _
byval as double, byval as double, byval as double, byval as double )
as double x, y, w, h
end type
constructor Rect() : end constructor
constructor Rect( _
byval nX as double, byval nY as double, _
byval nW as double, byval nH as double )
x = nX : y = nY
w = nW : h = nH
end constructor
''' Linear regression stuff
type Values
declare operator cast() as string
declare operator []( byval as integer ) byref as double
as double _values( any )
as integer count
end type
operator Values.cast() as string
dim as string s
for i as integer = 0 to count - 1
s += str( _values( i ) ) + iif( i < count - 1, ",", chr( 13, 10 ) )
next
return( s )
end operator
operator Values.[]( byval index as integer ) byref as double
return( _values( index ) )
end operator
type Dataset
declare operator cast() as string
declare operator []( byval as integer ) byref as Values
as Values _values( any )
as integer count
end type
operator Dataset.cast() as string
dim as string s = ""
for i as integer = 0 to count - 1
s += _values( i )
next
return( s )
end operator
operator Dataset.[]( byval index as integer ) byref as Values
return( _values( index ) )
end operator
function add overload( byref ds as Values, byval v as double ) byref as Values
ds.count += 1
redim preserve ds._values( 0 to ds.count - 1 )
ds._values( ds.count - 1 ) = v
return( ds )
end function
function add( byref ds as Dataset, byref v as Values ) byref as Dataset
ds.count += 1
redim preserve ds._values( 0 to ds.count - 1 )
ds._values( ds.count - 1 ) = v
return( ds )
end function
type Coefs
declare constructor()
declare constructor( byval as double, byval as double )
declare operator cast() as string
as double b0, b1
end type
constructor Coefs() : end constructor
constructor Coefs( byval cB0 as double, byval cB1 as double )
b0 = cB0 : b1 = cB1
end constructor
operator Coefs.cast() as string
return( "B0=" & b0 & ",B1=" & b1 )
end operator
private function max overload( byval a as double, byval b as double ) as double
return( iif( a > b, a, b ) )
end function
function max( byref ds as Values ) as double
dim as double value = MIN_DBL
for i as integer = 0 to ds.count - 1
value = iif( ds[ i ] > value, ds[ i ], value )
next
return( value )
end function
private function min overload( byval a as double, byval b as double ) as double
return( iif( a < b, a, b ) )
end function
function min( byref ds as Values ) as double
dim as double value = MAX_DBL
for i as integer = 0 to ds.count - 1
value = iif( ds[ i ] < value, ds[ i ], value )
next
return( value )
end function
function mean( byref x as Values ) as double
dim as double sum = 0.0d
for i as integer = 0 to x.count - 1
sum += x[ i ]
next
return( sum / x.count )
end function
function variance( byref x as Values, byval mean_x as double ) as double
dim as double sum = 0.0d
for i as integer = 0 to x.count - 1
sum += ( x[ i ] - mean_x ) ^ 2
next
return( sum )
end function
function covariance( _
byref x as Values, byval mean_x as double, _
byref y as Values, byval mean_y as double ) as double
dim as double covar = 0.0d
for i as integer = 0 to min( x.count, y.count ) - 1
covar += ( x[ i ] - mean_x ) * ( y[ i ] - mean_y )
next
return( covar )
end function
function coefficients( byref x as Values, byref y as Values ) as Coefs
dim as double _
mean_x = mean( x ), mean_y = mean( y ), _
b1 = covariance( x, mean_x, y, mean_y ) / variance( x, mean_x ), _
b0 = mean_y - b1 * mean_x
return( Coefs( b0, b1 ) )
end function
function rmse_metric( byref actual as Values, byref predicted as Values ) as double
dim as double sum_error = 0.0d
for i as integer = 0 to actual.count - 1
sum_error += ( predicted[ i ] - actual[ i ] ) ^ 2
next
return( sqr( sum_error / actual.count ) )
end function
type as function( byref as Dataset, byref as Values ) as Values _
Algorithm
function evaluate_algorithm( _
byref ds as Dataset, byval algorithm_func as Algorithm ) as Values
dim as Values test_set = ds[ 0 ]
return( algorithm_func( ds, test_set ) )
end function
function simple_linear_regression( _
byref train as Dataset, byref test as Values ) as Values
dim as Values predictions
var c = coefficients( train[ 0 ], train[ 1 ] )
for i as integer = 0 to test.count - 1
add( predictions, c.b0 + c.b1 * test[ i ] )
next
return( predictions )
end function
'''
''' Visualization stuff
private function remap( _
byval x as double, _
byval start1 as double, _
byval end1 as double, _
byval start2 as double, _
byval end2 as double ) _
as double
return( ( x - start1 ) * _
( end2 - start2 ) / ( end1 - start1 ) + start2 )
end function
sub drawRect( byref r as Rect, byval c as ulong )
line( r.x, r.y ) - ( r.x + r.w - 1, r.y + r.h - 1 ), c, b
end sub
sub plot overload( _
byref r as Rect, _
byref xA as Values, byref yA as Values, _
byval minX as double, byval maxX as double, _
byval minY as double, byval maxY as double, _
byval c as ulong )
for i as integer = 0 to min( xA.count, yA.count ) - 1
dim as double _
x = remap( xA[ i ], minX, maxX, r.x, r.x + r.w - 1 ), _
y = remap( yA[ i ], minY, maxY, r.y + r.h - 1, r.y )
line( x - 5, y - 5 ) - ( x + 5, y + 5 ), c, bf
next
end sub
sub plotLine( _
byref r as Rect, _
byref xA as Values, byref yA as Values, _
byval minX as double, byval maxX as double, _
byval minY as double, byval maxY as double, _
byval c as ulong )
for i as integer = 0 to min( xA.count, yA.count ) - 1
if( i > 0 ) then
dim as double _
x1 = remap( xA[ i - 1 ], minX, maxX, r.x, r.x + r.w - 1 ), _
y1 = remap( yA[ i - 1 ], minY, maxY, r.y + r.h - 1, r.y ), _
x2 = remap( xA[ i ], minX, maxX, r.x, r.x + r.w - 1 ), _
y2 = remap( yA[ i ], minY, maxY, r.y + r.h - 1, r.y )
line( x1, y1 ) - ( x2, y2 ), c
end if
next
end sub
/'
Test code
'/
dim as Values x, y
x = add( add( add( add( add( x, 1 ), 2 ), 3 ), 4 ), 5 )
y = add( add( add( add( add( y, 1 ), 3 ), 2 ), 3 ), 5 )
/'
The dataset used for this example assumes x values in the 0 index, and
y values in the 1 index.
'/
dim as Dataset ds
ds = add( add( ds, x ), y )
dim as integer _
wW = 800, wH = 600, margin = 30
screenRes( 800, 600, 32, Fb.GFX_ALPHA_PRIMITIVES )
windowTitle( "Linear regression tutorial" )
color( Black, White )
cls()
var r = Rect( margin, margin, wW - margin * 2, wH - margin * 2 )
var predicted = evaluate_algorithm( ds, @simple_linear_regression )
dim as double _
minX = 0, maxX = max( x ) + 1, _
minY = 0, maxY = max( y ) + 1
minY = min( minY, min( predicted ) )
maxY = max( maxY, max( predicted ) )
drawRect( r, LightGray )
plot( r, x, y, minX, maxX, minY, maxY, LightBlue )
plot( r, x, predicted, minX, maxX, minY, maxY, Red )
plotLine( r, x, predicted, minX, maxX, minY, maxY, Black )
sleep()
update: this code has been moved to source\ARCHIVE folder this was my first attempt to convert the python code to FB development is halted we start from scratch:
Code: Select all
#include once "file.bi"
#include "string.bi"
/'
Number of claims
Total payment for all the claims in thousands of Swedish Kronor
for geographical zones in Sweden
'/
type InsuranceData
as single _
numberOfClaims, _
totalPayment
end type
type InsuranceTable
declare operator []( byval as uinteger ) byref as InsuranceData
as InsuranceData row( any )
as uinteger count
end type
TYPE COEFFICI
AS SINGLE _
b0, _
b1
END TYPE
operator InsuranceTable.[]( byval index as uinteger ) byref as InsuranceData
return( row( index ) )
end operator
sub add overload( byref t as InsuranceTable, byref d as InsuranceData )
t.count += 1
redim preserve t.row( 0 to t.count - 1 )
t.row( t.count - 1 ) = d
end sub
function loadDataset( byref path as const string ) as InsuranceTable
dim as InsuranceTable t
if( fileExists( path ) ) then
dim as long f = freeFile()
open path for input as f
do while( not eof( f ) )
dim as InsuranceData d
input #f, d.numberOfClaims
input #f, d.totalPayment
add( t, d )
loop
end if
return( t )
end function
'SUB iAppend(arr() AS DOUBLE, item AS DOUBLE)
' REDIM PRESERVE arr(LBOUND(arr) TO UBOUND(arr) +1)
' arr(UBOUND(arr)) = item
'END SUB
SUB iAppend(arr() AS DOUBLE, item AS DOUBLE)
dim as integer lbnd = LBOUND(arr), ubnd = UBOUND(arr)
REDIM PRESERVE arr(lbnd TO ubnd+1)
arr(ubnd+1) = item
END SUB
' Calculate the mean value of a list of numbers
function sum(x() as double) as double
dim as single result
for i as integer = 0 to ubound(x) - 1
result = result + x(i)
next i
return result
end FUNCTION
function sum2(x() as DOUBLE, mean2 AS DOUBLE) as double
dim as single result
for i as integer = 0 to ubound(x) - 1
result = result + x(i) - mean2
next i
return result
end FUNCTION
function mean(x() as double) as double
return sum(x()) / cdbl(ubound(x) + 1)
end FUNCTION
' Calculate the variance of a list of numbers
function variance(values() AS double, BYVAL means AS DOUBLE) AS DOUBLE
DIM resalt AS DOUBLE = 0
FOR i AS INTEGER = LBOUND(values) TO UBOUND(values)
resalt = resalt + (values(i) - means) * (values(i) - means)
NEXT i
Return resalt
END FUNCTION
FUNCTION covariance(x()as double, mean_x as double, y() as double, mean_y as double) as Double
dim covar as Double
for i as integer = 0 to UBOUND(x) - 1
covar += (x(i) - mean_x) * (y(i) - mean_y)
next
return covar
end FUNCTION
' calculate cofficiants
FUNCTION COEFFICIENTSb0 (x() AS DOUBLE,mean_x AS DOUBLE, y() AS DOUBLE, mean_y AS DOUBLE) AS DOUBLE
DIM coeffici AS COEFFICI
mean_x = MEAN(x())
mean_y = MEAN(y())
WITH coeffici
.b1 = COVARIANCE(x(), mean_x, y(), mean_y) / VARIANCE(x(), mean_x)
.b0 = mean_y - .b1 * mean_x
RETURN .b0
END WITH
END FUNCTION
FUNCTION COEFFICIENTSb1 (x() AS DOUBLE,mean_x AS DOUBLE, y() AS DOUBLE, mean_y AS DOUBLE) AS DOUBLE
DIM coeffici AS COEFFICI
mean_x = MEAN(x())
mean_y = MEAN(y())
WITH coeffici
.b1 = COVARIANCE(x(), mean_x, y(), mean_y) / VARIANCE(x(), mean_x)
.b0 = mean_y - .b1 * mean_x
RETURN .b1
END WITH
END FUNCTION
FUNCTION rmse_meteric(actual() AS DOUBLE, predicted AS DOUBLE) AS DOUBLE
DIM sum_error AS DOUBLE = 0.0
END FUNCTION
REDIM SHARED test_set_x(0) AS DOUBLE
REDIM SHARED test_set_y(0) AS DOUBLE
TYPE function_type AS FUNCTION(() As DOUBLE, () AS DOUBLE, () AS DOUBLE, () AS DOUBLE) As DOUBLE
FUNCTION elvaluate_algo(x() AS DOUBLE, y() AS DOUBLE, BYVAL algorithem AS function_type) AS DOUBLE
FOR i AS INTEGER = 0 TO UBOUND(x) - 1
IAPPEND test_set_x(), x(i)
IAPPEND test_set_y(), y(i)
NEXT
REDIM actual(0) AS DOUBLE
DIM AS DOUBLE predicted = algorithem(x(),y(), test_set_x(), test_set_y())
FOR i AS INTEGER = 0 TO UBOUND(y) - 1
NEXT
END FUNCTION
FUNCTION simple_linear_regression(train() AS DOUBLE, test() AS DOUBLE) AS DOUBLE
REDIM prediction(0) AS DOUBLE
END FUNCTION
var t = loadDataset( "D:\repo\FB_libLINREG\datasets\test.csv" )
REDIM SHARED x(0) AS DOUBLE
REDIM SHARED y(0) AS DOUBLE
DIM AS DOUBLE mean_x, mean_y, covar
for i as integer = 0 to t.count - 1
with t[ i ]
IAPPEND x(), CDBL(.numberOfClaims)
IAPPEND y(), CDBL(.totalPayment)
end WITH
WITH t [ i ]
? .numberOfClaims, "means:", format(MEAN(x()), "0.00"), .totalPayment, "means: ", format(MEAN(y()), "0.00")
END WITH
NEXT
? "convariance: ", format(COVARIANCE(x(), mean(x()), y(), mean(y())), "0.00")
mean_x = MEAN(x())
mean_y = MEAN(y())
covar = COVARIANCE(x(), mean_x, y(), mean_y)
? "X colume:", FORMAT(mean_x,"0.00"), "Y colume:", FORMAT(mean_y, "0.00"), "CONVARIANCE:", FORMAT(covar, "0.00")
? "varients x:", FORMAT(VARIANCE(x(),mean_x), "0.00"), "VARIANCE y:", FORMAT(VARIANCE(y(), mean_y), "0.00")
? "COEFFICIENTS:", "b0: " & FORMAT(COEFFICIENTSB0(x(), mean_x, y(), mean_y), "0.00"), "b1: " & FORMAT(COEFFICIENTSB1(x(), mean_x, y(), mean_y), "0.00")
sleep()
Code: Select all
def simple_linear_regression(train, test):
predictions = list()
b0, b1 = coefficients(train)
for row in test:
yhat = b0 + b1 * row[0]
predictions.append(yhat)
return predictions
Code: Select all
# Standalone simple linear regression example
from math import sqrt
# Calculate root mean squared error
def rmse_metric(actual, predicted):
sum_error = 0.0
for i in range(len(actual)):
prediction_error = predicted[i] - actual[i]
sum_error += (prediction_error ** 2)
mean_error = sum_error / float(len(actual))
return sqrt(mean_error)
# Evaluate regression algorithm on training dataset
def evaluate_algorithm(dataset, algorithm):
test_set = list()
for row in dataset:
row_copy = list(row)
row_copy[-1] = None
test_set.append(row_copy)
predicted = algorithm(dataset, test_set)
print(predicted)
actual = [row[-1] for row in dataset]
rmse = rmse_metric(actual, predicted)
return rmse
# Calculate the mean value of a list of numbers
def mean(values):
return sum(values) / float(len(values))
# Calculate covariance between x and y
def covariance(x, mean_x, y, mean_y):
covar = 0.0
for i in range(len(x)):
covar += (x[i] - mean_x) * (y[i] - mean_y)
return covar
# Calculate the variance of a list of numbers
def variance(values, mean):
return sum([(x-mean)**2 for x in values])
# Calculate coefficients
def coefficients(dataset):
x = [row[0] for row in dataset]
y = [row[1] for row in dataset]
x_mean, y_mean = mean(x), mean(y)
b1 = covariance(x, x_mean, y, y_mean) / variance(x, x_mean)
b0 = y_mean - b1 * x_mean
return [b0, b1]
# Simple linear regression algorithm
def simple_linear_regression(train, test):
predictions = list()
b0, b1 = coefficients(train)
for row in test:
yhat = b0 + b1 * row[0]
predictions.append(yhat)
return predictions
# Test simple linear regression
dataset = [[1, 1], [2, 3], [4, 3], [3, 2], [5, 5]]
rmse = evaluate_algorithm(dataset, simple_linear_regression)
print('RMSE: %.3f' % (rmse))
ron77