|
| 1 | +# Optimisers.jl |
| 2 | + |
| 3 | +## Installation: OptimizationFlux.jl |
| 4 | + |
| 5 | +To use this package, install the OptimizationOptimisers package: |
| 6 | + |
| 7 | +```julia |
| 8 | +import Pkg; Pkg.add("OptimizationOptimisers") |
| 9 | +``` |
| 10 | + |
| 11 | +## Local Unconstrained Optimizers |
| 12 | + |
| 13 | +- [`Optimisers.Descent`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.Descent): **Classic gradient descent optimizer with learning rate** |
| 14 | + |
| 15 | + * `solve(problem, Descent(η))` |
| 16 | + * `η` is the learning rate |
| 17 | + * Defaults: |
| 18 | + * `η = 0.1` |
| 19 | + |
| 20 | +- [`Optimisers.Momentum`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.Momentum): **Classic gradient descent optimizer with learning rate and momentum** |
| 21 | + |
| 22 | + * `solve(problem, Momentum(η, ρ))` |
| 23 | + * `η` is the learning rate |
| 24 | + * `ρ` is the momentum |
| 25 | + * Defaults: |
| 26 | + * `η = 0.01` |
| 27 | + * `ρ = 0.9` |
| 28 | + |
| 29 | +- [`Optimisers.Nesterov`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.Nesterov): **Gradient descent optimizer with learning rate and Nesterov momentum** |
| 30 | + |
| 31 | + * `solve(problem, Nesterov(η, ρ))` |
| 32 | + * `η` is the learning rate |
| 33 | + * `ρ` is the Nesterov momentum |
| 34 | + * Defaults: |
| 35 | + * `η = 0.01` |
| 36 | + * `ρ = 0.9` |
| 37 | + |
| 38 | +- [`Optimisers.RMSProp`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.RMSProp): **RMSProp optimizer** |
| 39 | + |
| 40 | + * `solve(problem, RMSProp(η, ρ))` |
| 41 | + * `η` is the learning rate |
| 42 | + * `ρ` is the momentum |
| 43 | + * Defaults: |
| 44 | + * `η = 0.001` |
| 45 | + * `ρ = 0.9` |
| 46 | + |
| 47 | +- [`Optimisers.Adam`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.Adam): **Adam optimizer** |
| 48 | + |
| 49 | + * `solve(problem, Adam(η, β::Tuple))` |
| 50 | + * `η` is the learning rate |
| 51 | + * `β::Tuple` is the decay of momentums |
| 52 | + * Defaults: |
| 53 | + * `η = 0.001` |
| 54 | + * `β::Tuple = (0.9, 0.999)` |
| 55 | + |
| 56 | +- [`Optimisers.RAdam`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.RAdam): **Rectified Adam optimizer** |
| 57 | + |
| 58 | + * `solve(problem, RAdam(η, β::Tuple))` |
| 59 | + * `η` is the learning rate |
| 60 | + * `β::Tuple` is the decay of momentums |
| 61 | + * Defaults: |
| 62 | + * `η = 0.001` |
| 63 | + * `β::Tuple = (0.9, 0.999)` |
| 64 | +- [`Optimisers.RAdam`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.OAdam): **Optimistic Adam optimizer** |
| 65 | + |
| 66 | + * `solve(problem, OAdam(η, β::Tuple))` |
| 67 | + * `η` is the learning rate |
| 68 | + * `β::Tuple` is the decay of momentums |
| 69 | + * Defaults: |
| 70 | + * `η = 0.001` |
| 71 | + * `β::Tuple = (0.5, 0.999)` |
| 72 | + |
| 73 | +- [`Optimisers.AdaMax`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.AdaMax): **AdaMax optimizer** |
| 74 | + |
| 75 | + * `solve(problem, AdaMax(η, β::Tuple))` |
| 76 | + * `η` is the learning rate |
| 77 | + * `β::Tuple` is the decay of momentums |
| 78 | + * Defaults: |
| 79 | + * `η = 0.001` |
| 80 | + * `β::Tuple = (0.9, 0.999)` |
| 81 | + |
| 82 | +- [`Optimisers.ADAGrad`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.ADAGrad): **ADAGrad optimizer** |
| 83 | + |
| 84 | + * `solve(problem, ADAGrad(η))` |
| 85 | + * `η` is the learning rate |
| 86 | + * Defaults: |
| 87 | + * `η = 0.1` |
| 88 | + |
| 89 | +- [`Optimisers.ADADelta`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.ADADelta): **ADADelta optimizer** |
| 90 | + |
| 91 | + * `solve(problem, ADADelta(ρ))` |
| 92 | + * `ρ` is the gradient decay factor |
| 93 | + * Defaults: |
| 94 | + * `ρ = 0.9` |
| 95 | + |
| 96 | +- [`Optimisers.AMSGrad`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.ADAGrad): **AMSGrad optimizer** |
| 97 | + |
| 98 | + * `solve(problem, AMSGrad(η, β::Tuple))` |
| 99 | + * `η` is the learning rate |
| 100 | + * `β::Tuple` is the decay of momentums |
| 101 | + * Defaults: |
| 102 | + * `η = 0.001` |
| 103 | + * `β::Tuple = (0.9, 0.999)` |
| 104 | + |
| 105 | +- [`Optimisers.NAdam`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.NAdam): **Nesterov variant of the Adam optimizer** |
| 106 | + |
| 107 | + * `solve(problem, NAdam(η, β::Tuple))` |
| 108 | + * `η` is the learning rate |
| 109 | + * `β::Tuple` is the decay of momentums |
| 110 | + * Defaults: |
| 111 | + * `η = 0.001` |
| 112 | + * `β::Tuple = (0.9, 0.999)` |
| 113 | + |
| 114 | +- [`Optimisers.AdamW`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.AdamW): **AdamW optimizer** |
| 115 | + |
| 116 | + * `solve(problem, AdamW(η, β::Tuple))` |
| 117 | + * `η` is the learning rate |
| 118 | + * `β::Tuple` is the decay of momentums |
| 119 | + * `decay` is the decay to weights |
| 120 | + * Defaults: |
| 121 | + * `η = 0.001` |
| 122 | + * `β::Tuple = (0.9, 0.999)` |
| 123 | + * `decay = 0` |
| 124 | + |
| 125 | +- [`Optimisers.ADABelief`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.ADABelief): **ADABelief variant of Adam** |
| 126 | + |
| 127 | + * `solve(problem, ADABelief(η, β::Tuple))` |
| 128 | + * `η` is the learning rate |
| 129 | + * `β::Tuple` is the decay of momentums |
| 130 | + * Defaults: |
| 131 | + * `η = 0.001` |
| 132 | + * `β::Tuple = (0.9, 0.999)` |
0 commit comments