Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperparameter optimization for kriging model #388

Open
chenr86 opened this issue Jul 27, 2022 · 6 comments
Open

Hyperparameter optimization for kriging model #388

chenr86 opened this issue Jul 27, 2022 · 6 comments

Comments

@chenr86
Copy link

chenr86 commented Jul 27, 2022

Hello,

I am using surrogates.jl v6.3.0 to build kriging for a dataset with 500 elements. There are 24 input variables and 1 output. But I found the accuracy is terrible. There is serious overfitting with the model.

using CSV 
using DataFrames
using Plots
using Surrogates
using SurrogatesPolyChaos
using StatsPlots
using Statistics
default()
using Dates
using LinearAlgebra
using Lathe.preprocess: TrainTestSplit

df = DataFrame(CSV.File("result.csv"))
train, test = TrainTestSplit(df, .8)
dim = 24

x_train = train[:, 1:dim]
x_train = values.(eachrow(x_train))
y_train = train[:, end]
# y_train = values.(eachrow(y_train))

x_test = test[:, 1:dim]
x_test = values.(eachrow(x_test))
y_test = test[:, end]

lower_bound = [
    200, 200, 200, 200, 200, 200, 
    200, 200, 200, 200, 200, 200,
    180, 180, 180, 180, 180, 180,
    300, 300, 300, 300, 300, 300
]
upper_bound = [
    230, 230, 230, 230, 230, 230,
    275, 275, 275, 275, 275, 275,
    200, 200, 200, 200, 200, 200,
    350, 350, 350, 350, 350, 350
]
p = ones(dim) * 2
theta = [0.5 / max(1e-6 * norm(upper_bound .- lower_bound), std(x_i[i] for x_i in x_train))^p[i] for i in 1:length(x_train[1])]
mymodel = Kriging(x_train, y_train, lower_bound, upper_bound, p=p, theta=theta)

# Prediction
ys_test = mymodel.(x_test)
ys_train = mymodel.(x_train)

# Model assessment criteria
function mae(x, y)
    return sum(abs.(x - y)) / length(x)
end

function mape(x, y)
    return sum(abs.(x - y)/y) / length(x)
end

function rmse(x, y)
    return sqrt(sum(((x - y).^2) / length(x)))
end

function mse(x, y)
    return sum(((x - y).^2) / length(x))
end

function r2(x,y)
    sse = sum((x - y).^2)
    sst = sum((y .- mean(y)).^2)
    return 1 - sse / sst
end

println("      ASSESSMENT CRITERIA            TRAIN              TEST")
println("      Mean Absolute Error       ", mae(ys_train, y_train), "      ", mae(ys_test, y_test))
println("Mean Absolute Percentage Error  ", mape(ys_train, y_train), "      ", mape(ys_test, y_test))
println("    Root Mean Square Error      ", rmse(ys_train, y_train), "      ", rmse(ys_test, y_test))
println("       Mean Square Error        ", mse(ys_train, y_train), "      ", mse(ys_test, y_test))
println("           R Square             ", r2(ys_train, y_train), "      ", r2(ys_test, y_test))

According to #368, I calculate the loss function, Inf is obtained!

function kriging_logpdf(params, x, y, lb, ub)
    d = length(params) ÷ 2
    theta = params[1:d]
    p = params[d+1:end]
    surr = Kriging(x, y, lb, ub; p, theta)

    n = length(y)
    y_minus_1μ = y - ones(length(y), 1) * surr.mu
    Rinv = surr.inverse_of_R

    term1 = only(-(y_minus_1μ' * surr.inverse_of_R * y_minus_1μ) / 2 / surr.sigma)
    term2 = -log((2π * surr.sigma)^(n/2) / sqrt(det(Rinv)))

    logpdf = term1 + term2
    return logpdf
end

loss_func = params -> -kriging_logpdf(params, x_train, y_train, lower_bound, upper_bound)

loss_func([theta; p])

I don't know how can I improve my model, as p and theta are already the values suggested.

@vikram-s-narayan
Copy link
Contributor

As per this paper:

"Since there is no analytical solution for estimating the hyperparameters θ, it is necessary to
use numerical optimization to find the hyperparameters θ that maximize the likelihood function.
This step is the most challenging in the construction of the kriging model. This is because, as
previously mentioned, this estimation involves maximizing the likelihood function, which is often
multimodal [Mardia and Watkins, 1989]. Maximizing this function becomes prohibitive for high-
dimensional problems (d > 10) due to the cost of computing the determinant of the correlation
matrix and the high number of evaluation needed for optimizing a high-dimensional multimodal
problem."

@archermarx
Copy link
Contributor

Try p = 1.99 * ones(dim) instead of p = 2 * ones(dim). This will give better numerical stability (your covariance matrix is singular here, so its determinant is zero and that is why the logpdf is Inf).

@jacktang
Copy link

@archermarx why 1.99 here? could you please explain more? Thanks!

@archermarx
Copy link
Contributor

When p = 2.0, the correlation function is nearly flat for points that are nearby, which causes ill conditioning in the correlation matrix. If p is just slightly less than 2.0, then the correlation function has a small slope so even points that are nearby are not counted as being effectively the same. An update I'm working on will change the default from 2.0 to 1.99

@chenr86
Copy link
Author

chenr86 commented Aug 15, 2022

I have tried p = 1.99 * ones(dim). Still, the model is overfitting and there is nearly no improvement. The predictive coefficient R squared in training set is 1 while in test set is 0.00186.

@archermarx
Copy link
Contributor

Does your loss function still produce Inf?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants