Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts on Interface Design #4

Open
Non-Contradiction opened this issue May 14, 2018 · 6 comments
Open

Thoughts on Interface Design #4

Non-Contradiction opened this issue May 14, 2018 · 6 comments
Milestone

Comments

@Non-Contradiction
Copy link
Owner

Non-Contradiction commented May 14, 2018

Dimensions

The package need to deal with

  • derivatives and second-order (or higher-order) derivatives of R1 -> R1 functions
  • gradients and hessians of Rn(n>1) -> R1 functions
  • jacobians of Rn(n>1) -> Rm(m>1) functions

And the package will provide separate functions, like deriv for derivatives,
and grad for gradients.

It is expected that users understand the difference and use corresponding functions when dealing with problems in different dimensionality.

Maybe we can include some error messages like "grad is not for R1 -> R1 functions, maybe you want to use deriv?"
if the user uses grad for a scalar function of one variable.

Arguments

The arguments to the interface functions will look like:
grad(func, x = NULL, mode = c("forward", "reverse"), ...)
where

  • func is the original function.
  • x is where the gradient is calculated, and it should be the first argument to func except the arguments matched by .... If it is NULL, then a gradient function will be returned.
  • mode is whether forward or reverse mode automatic differentiation is used,
  • And ... are other arguments to the function func.

Update

After wrapping the APIs of ForwardDiff and ReverseDiff, I have some new ideas about the arguments to the interface functions, it might be like:
grad(func, x = NULL, mode = c("forward", "reverse"), xsize = x, use_tape = FALSE, use_compiled_tape = c("NO", "YES", "FORCE"), ...)
where

  • func is the original function.
  • x is where the gradient is calculated, or a list of inputs where you want to calculate the gradient w.r.t.. If x is NULL, then a gradient function will be returned.
  • mode is whether forward or reverse mode automatic differentiation is used.
  • xsize is some vector of the same size of x, or a list with every element the same size of the corresponding element in x. If x is not given and xsize is given, then the return will be a more performant gradient function specialized for the same size inputs with xsize.
  • use_tape, use_compiled_tape are whether to use tape or compiled tape in reverse mode automatic differentiation. They will only have effect when mode is reverse, x is not given and xsize is given, because this is the only time that grad can return a much more performant gradient function specialized for the input size xsize and the function func. Note that the resulting function will be more performant when use compiled tape, but the compiling can take substantial time to finish depending on the function and the size of input. Setting use_compiled_tape to "YES" will carry on a check to find out whether the tape compiling will take a very loooooong time and it will fall back to not compiling tape if the time is "very long" and it will give a warning about this, but setting use_compiled_tape to FORCE will carry on compiling the tape anyway.
  • And ... are other arguments to the function func.
@Non-Contradiction Non-Contradiction added this to the First phase milestone May 14, 2018
@nashjc
Copy link
Collaborator

nashjc commented Jun 6, 2018

If possible, it would be better if grad() could recognize x is scalar and call different routines appropriately. The grad() function of numDeriv seems to work OK. However, recognize this might be quite difficult.

Example:
tfn <- function(x) {x^2 - x^3}
efn <- expression( x^2 - x^3 )
dfn <- D(efn, "x")
dfn
library(numDeriv)
x <- 2
grad(tfn,x)
eval(dfn)
x<- 1)
grad(tfn,x)
eval(dfn)
x<- -1
grad(tfn,x)
eval(dfn)
x<- -3
grad(tfn,x)
eval(dfn)

cat("\nNOW VECTORS\n")
x <- c(2,2)
grad(tfn,x)
eval(dfn)
x<- c(1,2,3)
grad(tfn,x)
eval(dfn)

@Non-Contradiction
Copy link
Owner Author

It seems to be possible, just call forwrad.derivative when x is a scalar and forward.grad or reverse.grad when x is a vector.

Should we do similar things with higher-order like a hypothetical hessian? So if x is a scalar and f(x) is a scalar too, then just return the second order derivative? If x is a scalar and f(x) is a vector, return the jacobian of the first derivative??

@nashjc
Copy link
Collaborator

nashjc commented Jun 6, 2018

If possible, we should make our codes "just work" because most R users don't even know what a gradient is. Seriously, I have had quite advanced researchers who were not in the math/stat field ask that. Moreover, we may call the functions from within optim, optimr or other functions that users don't actually see.

@Non-Contradiction
Copy link
Owner Author

Yes, I think it's totally reasonable to do this, and we can even make deriv and grad the same function. And it's also reasonable to make hessian work for scalar x and scalar f(x).

But the thing that if x is a scalar and f(x) is a vector, returns the jacobian of the first derivative is a little weird from my point of view and it may not be a good idea to do this because it can cause silent mistakes.

So maybe just the way in the first paragraph of this comment, make deriv and grad the same function, and make hessian work for scalar x and scalar f(x)?

@Non-Contradiction
Copy link
Owner Author

BTW, I updated the first post with some new thought on arguments of the high-level interface APIs.

The default options are for the most general case without any assumption. But in cases like optim and optimr, maybe it is more reasonable to have another default, particular with xsize set.

Is there any mechanism for optim and optimr to handle such things? Or should we have some specific version for these packages?

@Non-Contradiction
Copy link
Owner Author

Currently, there are functions deriv, grad, jacobian and hessian. The function interface is similar to the one in the update of first post.

All the functions deriv, grad, jacobian and hessian can deal with functions with either scalar or vector input with length greater than 1.

deriv, grad are for first order derivatives of functions with scalar output (in fact, deriv and grad are the same)
and jacobian are for first order derivatives of functions with vector output with length greater or equal than 1.
In fact, it can be seen that jacobian can deal with all the functions that deriv and grad can deal with, but pay attention to that jacobian will always return the result as a Jacobian matrix.

hessian is for second order derivatives of functions with scalar output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants