Thoughts on Interface Design #4

Non-Contradiction · 2018-05-14T16:51:26Z

Dimensions

The package need to deal with

derivatives and second-order (or higher-order) derivatives of R1 -> R1 functions
gradients and hessians of Rn(n>1) -> R1 functions
jacobians of Rn(n>1) -> Rm(m>1) functions

And the package will provide separate functions, like deriv for derivatives,
and grad for gradients.

It is expected that users understand the difference and use corresponding functions when dealing with problems in different dimensionality.

Maybe we can include some error messages like "grad is not for R1 -> R1 functions, maybe you want to use deriv?"
if the user uses grad for a scalar function of one variable.

Arguments

The arguments to the interface functions will look like:
grad(func, x = NULL, mode = c("forward", "reverse"), ...)
where

func is the original function.
x is where the gradient is calculated, and it should be the first argument to func except the arguments matched by .... If it is NULL, then a gradient function will be returned.
mode is whether forward or reverse mode automatic differentiation is used,
And ... are other arguments to the function func.

Update

After wrapping the APIs of ForwardDiff and ReverseDiff, I have some new ideas about the arguments to the interface functions, it might be like:
grad(func, x = NULL, mode = c("forward", "reverse"), xsize = x, use_tape = FALSE, use_compiled_tape = c("NO", "YES", "FORCE"), ...)
where

func is the original function.
x is where the gradient is calculated, or a list of inputs where you want to calculate the gradient w.r.t.. If x is NULL, then a gradient function will be returned.
mode is whether forward or reverse mode automatic differentiation is used.
xsize is some vector of the same size of x, or a list with every element the same size of the corresponding element in x. If x is not given and xsize is given, then the return will be a more performant gradient function specialized for the same size inputs with xsize.
use_tape, use_compiled_tape are whether to use tape or compiled tape in reverse mode automatic differentiation. They will only have effect when mode is reverse, x is not given and xsize is given, because this is the only time that grad can return a much more performant gradient function specialized for the input size xsize and the function func. Note that the resulting function will be more performant when use compiled tape, but the compiling can take substantial time to finish depending on the function and the size of input. Setting use_compiled_tape to "YES" will carry on a check to find out whether the tape compiling will take a very loooooong time and it will fall back to not compiling tape if the time is "very long" and it will give a warning about this, but setting use_compiled_tape to FORCE will carry on compiling the tape anyway.
And ... are other arguments to the function func.

The text was updated successfully, but these errors were encountered:

nashjc · 2018-06-06T13:08:59Z

If possible, it would be better if grad() could recognize x is scalar and call different routines appropriately. The grad() function of numDeriv seems to work OK. However, recognize this might be quite difficult.

Example:
tfn <- function(x) {x^2 - x^3}
efn <- expression( x^2 - x^3 )
dfn <- D(efn, "x")
dfn
library(numDeriv)
x <- 2
grad(tfn,x)
eval(dfn)
x<- 1)
grad(tfn,x)
eval(dfn)
x<- -1
grad(tfn,x)
eval(dfn)
x<- -3
grad(tfn,x)
eval(dfn)

cat("\nNOW VECTORS\n")
x <- c(2,2)
grad(tfn,x)
eval(dfn)
x<- c(1,2,3)
grad(tfn,x)
eval(dfn)

Non-Contradiction · 2018-06-06T14:21:05Z

It seems to be possible, just call forwrad.derivative when x is a scalar and forward.grad or reverse.grad when x is a vector.

Should we do similar things with higher-order like a hypothetical hessian? So if x is a scalar and f(x) is a scalar too, then just return the second order derivative? If x is a scalar and f(x) is a vector, return the jacobian of the first derivative??

nashjc · 2018-06-06T14:27:10Z

If possible, we should make our codes "just work" because most R users don't even know what a gradient is. Seriously, I have had quite advanced researchers who were not in the math/stat field ask that. Moreover, we may call the functions from within optim, optimr or other functions that users don't actually see.

Non-Contradiction · 2018-06-06T14:54:10Z

Yes, I think it's totally reasonable to do this, and we can even make deriv and grad the same function. And it's also reasonable to make hessian work for scalar x and scalar f(x).

But the thing that if x is a scalar and f(x) is a vector, returns the jacobian of the first derivative is a little weird from my point of view and it may not be a good idea to do this because it can cause silent mistakes.

So maybe just the way in the first paragraph of this comment, make deriv and grad the same function, and make hessian work for scalar x and scalar f(x)?

Non-Contradiction · 2018-06-06T15:03:32Z

BTW, I updated the first post with some new thought on arguments of the high-level interface APIs.

The default options are for the most general case without any assumption. But in cases like optim and optimr, maybe it is more reasonable to have another default, particular with xsize set.

Is there any mechanism for optim and optimr to handle such things? Or should we have some specific version for these packages?

Non-Contradiction · 2018-06-15T20:46:01Z

Currently, there are functions deriv, grad, jacobian and hessian. The function interface is similar to the one in the update of first post.

All the functions deriv, grad, jacobian and hessian can deal with functions with either scalar or vector input with length greater than 1.

deriv, grad are for first order derivatives of functions with scalar output (in fact, deriv and grad are the same)
and jacobian are for first order derivatives of functions with vector output with length greater or equal than 1.
In fact, it can be seen that jacobian can deal with all the functions that deriv and grad can deal with, but pay attention to that jacobian will always return the result as a Jacobian matrix.

hessian is for second order derivatives of functions with scalar output.

Non-Contradiction added this to the First phase milestone May 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thoughts on Interface Design #4

Thoughts on Interface Design #4

Non-Contradiction commented May 14, 2018 •

edited

Loading

nashjc commented Jun 6, 2018 •

edited

Loading

Non-Contradiction commented Jun 6, 2018

nashjc commented Jun 6, 2018

Non-Contradiction commented Jun 6, 2018

Non-Contradiction commented Jun 6, 2018

Non-Contradiction commented Jun 15, 2018

Thoughts on Interface Design #4

Thoughts on Interface Design #4

Comments

Non-Contradiction commented May 14, 2018 • edited Loading

Dimensions

Arguments

Update

nashjc commented Jun 6, 2018 • edited Loading

Non-Contradiction commented Jun 6, 2018

nashjc commented Jun 6, 2018

Non-Contradiction commented Jun 6, 2018

Non-Contradiction commented Jun 6, 2018

Non-Contradiction commented Jun 15, 2018

Non-Contradiction commented May 14, 2018 •

edited

Loading

nashjc commented Jun 6, 2018 •

edited

Loading