Skip to content

Conversation

@snurkabill
Copy link

No description provided.

@mihaelacr
Copy link
Owner

Hi! Thanks for the contribution!

Your commit and pull request say 'test set normalization based on training set', but I do not see the gaussianNormalization function that takes two parameters ever used.

Also overloading does not work like this in python: you cannot define two functions wit the same name, you have to use default arguments, please see a discussion here.

Did you try running the with your change? Do you see any improvements?

@snurkabill
Copy link
Author

Hi,

I must admit that I didn't run those changes. I have some local changes that works and I've just tried to put it together.

first of all I just wanted discuss those changes , I will make proper PR later. my point right here is, that test set's normalization should be based on atributes gained on training set.

normalization itself is done in normalizeData function

@snurkabill
Copy link
Author

Btw, I suspect that all of my PR's won't work for the first time. I really want to know your opinion :)

@snurkabill
Copy link
Author

ok, scale() is renamed as it was before.

Motivation for scaling testing data based on training set:

  • when data is online, we can't normalize testing set on it's parameters because we don't have all data yet, we need to use already measured params
  • model is based on some noramlized data, we should also use that normalization for testing data... if testing data are somehow different (shifted atc.) we have wrongly sampled training set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants