I have regularized Lasso models based on PolynomialRegression features of four degrees (1, 3, 7, 11) on pre-trained data in sci-kit learn. I generated predictions for 100 evenly spaced points on the interval [0, 20] and stored the results in a numpy array. My task is to return the 𝑅^2 score for each of the Lasso models relative to a new ‘gold standard’ test set generated from the true underlying cubic polynomial model without noise. The initial model, from which my below code is based, includes a noise constant. I have to compute this new test set by computing the true noise-less underlying function t^3/20 - t^2 - t
for each of 100 evenly spaced points on the interval [0, 20], and ultimately select the degree which has the R^2 that gives the best fit on the given function. Here is my code so far:
degs = (1, 3, 7, 11)
las_r2 = []
preds = np.zeros((4,100))
for i, deg in enumerate(degs):
poly = PolynomialFeatures(degree=deg)
X_poly = poly.fit_transform(X_train)
linlasso = Lasso(alpha=0.01, max_iter = 10000).fit(X_poly, y_train)
y_poly = linlasso.predict(poly.fit_transform(np.linspace(0,20,100).reshape(-1,1)));
preds[i,:] = y_poly.transpose()
X_test_poly = poly.fit_transform(X_test)
las_r2.append(linlasso.score(X_test_poly, y_test))
answer = las_r2.max()
What I don’t know is how to how to incorporate that “gold standard” function provided in the above paragraph into my for-loop.