|
186 | 186 | "data_normal.n_obs"
|
187 | 187 | ]
|
188 | 188 | },
|
189 |
| - { |
190 |
| - "cell_type": "markdown", |
191 |
| - "metadata": { |
192 |
| - "pycharm": { |
193 |
| - "name": "#%% md\n" |
194 |
| - } |
195 |
| - }, |
196 |
| - "source": [ |
197 |
| - "## Data\n", |
198 |
| - "\n", |
199 |
| - "This component in general plays a minor role in zfit: it is mostly to provide a unified interface for data.\n", |
200 |
| - "\n", |
201 |
| - "Preprocessing is therefore not part of zfit and should be done beforehand. Python offers many great possibilities to do so (e.g. Pandas).\n", |
202 |
| - "\n", |
203 |
| - "zfit `Data` can load data from various sources, most notably from Numpy, Pandas DataFrame, TensorFlow Tensor and ROOT (using uproot). It is also possible, for convenience, to convert it directly `to_pandas`. The constructors are named `from_numpy`, `from_root` etc." |
204 |
| - ] |
205 |
| - }, |
206 |
| - { |
207 |
| - "cell_type": "code", |
208 |
| - "execution_count": null, |
209 |
| - "metadata": { |
210 |
| - "pycharm": { |
211 |
| - "name": "#%%\n" |
212 |
| - } |
213 |
| - }, |
214 |
| - "outputs": [], |
215 |
| - "source": [ |
216 |
| - "import matplotlib.pyplot as plt\n", |
217 |
| - "import numpy as np\n", |
218 |
| - "import zfit\n", |
219 |
| - "# znp is a subset of numpy functions with a numpy interface but using actually the zfit backend (currently TF)\n", |
220 |
| - "import zfit.z.numpy as znp\n", |
221 |
| - "from zfit import z" |
222 |
| - ] |
223 |
| - }, |
224 |
| - { |
225 |
| - "cell_type": "markdown", |
226 |
| - "metadata": { |
227 |
| - "pycharm": { |
228 |
| - "name": "#%% md\n" |
229 |
| - } |
230 |
| - }, |
231 |
| - "source": [ |
232 |
| - "A `Data` needs not only the data itself but also the observables: the human readable string identifiers of the axes (corresponding to \"columns\" of a Pandas DataFrame). It is convenient to define the `Space` not only with the observable but also with a limit: this can directly be re-used as the normalization range in the PDF.\n", |
233 |
| - "\n", |
234 |
| - "First, let's define our observables" |
235 |
| - ] |
236 |
| - }, |
237 |
| - { |
238 |
| - "cell_type": "code", |
239 |
| - "execution_count": null, |
240 |
| - "metadata": { |
241 |
| - "pycharm": { |
242 |
| - "name": "#%%\n" |
243 |
| - } |
244 |
| - }, |
245 |
| - "outputs": [], |
246 |
| - "source": [ |
247 |
| - "obs = zfit.Space('obs1', (-5, 10))" |
248 |
| - ] |
249 |
| - }, |
250 |
| - { |
251 |
| - "cell_type": "markdown", |
252 |
| - "metadata": { |
253 |
| - "pycharm": { |
254 |
| - "name": "#%% md\n" |
255 |
| - } |
256 |
| - }, |
257 |
| - "source": [ |
258 |
| - "This `Space` has limits. Next to the effect of handling the observables, we can also play with the limits: multiple `Spaces` can be added to provide disconnected ranges. More importantly, `Space` offers functionality:\n", |
259 |
| - "- limit1d: return the lower and upper limit in the 1 dimensional case (raises an error otherwise)\n", |
260 |
| - "- rect_limits: return the n dimensional limits\n", |
261 |
| - "- area(): calculate the area (e.g. distance between upper and lower)\n", |
262 |
| - "- inside(): return a boolean Tensor corresponding to whether the value is _inside_ the `Space`\n", |
263 |
| - "- filter(): filter the input values to only return the one inside" |
264 |
| - ] |
265 |
| - }, |
266 |
| - { |
267 |
| - "cell_type": "code", |
268 |
| - "execution_count": null, |
269 |
| - "metadata": { |
270 |
| - "pycharm": { |
271 |
| - "name": "#%%\n" |
272 |
| - } |
273 |
| - }, |
274 |
| - "outputs": [], |
275 |
| - "source": [ |
276 |
| - "size_normal = 10000\n", |
277 |
| - "data_normal_np = np.random.normal(size=size_normal, scale=2)\n", |
278 |
| - "\n", |
279 |
| - "data_normal = zfit.Data.from_numpy(obs=obs, array=data_normal_np)" |
280 |
| - ] |
281 |
| - }, |
282 |
| - { |
283 |
| - "cell_type": "markdown", |
284 |
| - "metadata": { |
285 |
| - "pycharm": { |
286 |
| - "name": "#%% md\n" |
287 |
| - } |
288 |
| - }, |
289 |
| - "source": [ |
290 |
| - "The main functionality is\n", |
291 |
| - "- nevents: attribute that returns the number of events in the object\n", |
292 |
| - "- data_range: a `Space` that defines the limits of the data; if outside, the data will be cut\n", |
293 |
| - "- n_obs: defines the number of dimensions in the dataset\n", |
294 |
| - "- with_obs: returns a subset of the dataset with only the given obs\n", |
295 |
| - "- weights: event based weights\n", |
296 |
| - "\n", |
297 |
| - "Furthermore, `value` returns a Tensor with shape `(nevents, n_obs)`.\n", |
298 |
| - "\n", |
299 |
| - "To retrieve values, in general `z.unstack_x(data)` should be used; this returns a single Tensor with shape (nevents) or a list of tensors if `n_obs` is larger then 1." |
300 |
| - ] |
301 |
| - }, |
302 |
| - { |
303 |
| - "cell_type": "code", |
304 |
| - "execution_count": null, |
305 |
| - "metadata": { |
306 |
| - "pycharm": { |
307 |
| - "name": "#%%\n" |
308 |
| - } |
309 |
| - }, |
310 |
| - "outputs": [], |
311 |
| - "source": [ |
312 |
| - "print(\n", |
313 |
| - " f\"We have {data_normal.nevents} events in our dataset with the minimum of {np.min(data_normal.unstack_x())}\") # remember! The obs cut out some of the data" |
314 |
| - ] |
315 |
| - }, |
316 |
| - { |
317 |
| - "cell_type": "code", |
318 |
| - "execution_count": null, |
319 |
| - "metadata": { |
320 |
| - "pycharm": { |
321 |
| - "name": "#%%\n" |
322 |
| - } |
323 |
| - }, |
324 |
| - "outputs": [], |
325 |
| - "source": [ |
326 |
| - "data_normal.n_obs" |
327 |
| - ] |
328 |
| - }, |
329 | 189 | {
|
330 | 190 | "cell_type": "markdown",
|
331 | 191 | "metadata": {
|
|
362 | 222 | "A `Parameter` (there are different kinds actually, more on that later) takes the following arguments as input:\n",
|
363 | 223 | "`Parameter(human readable name, initial value[, lower limit, upper limit])` where the limits are recommended but not mandatory. Furthermore, `step_size` can be given (which is useful to be around the given uncertainty, e.g. for large yields or small values it can help a lot to set this). Also, a `floating` argument is supported, indicating whether the parameter is allowed to float in the fit or not (just omitting the limits does _not_ make a parameter constant).\n",
|
364 | 224 | "\n",
|
365 |
| - "Parameters have a unique name. This is served as the identifier for e.g. fit results. However, a parameter _cannot_ be retrieved by its string identifier (its name) but the object itself should be used. In places where a parameter maps to something, the object itself is needed, not its name." |
| 225 | + "The name of the parameter identifies it; therefore, while multiple parameters with the same name can exist, they cannot exist inside the same model/loss/function, as they would be ambiguous." |
366 | 226 | ]
|
367 | 227 | },
|
368 | 228 | {
|
|
376 | 236 | "outputs": [],
|
377 | 237 | "source": [
|
378 | 238 | "mu = zfit.Parameter('mu', 1, -3, 3, step_size=0.2)\n",
|
| 239 | + "another_mu = zfit.Parameter('mu', 2, -3, 3, step_size=0.2)\n", |
379 | 240 | "sigma_num = zfit.Parameter('sigma42', 1, 0.1, 10, floating=False)"
|
380 | 241 | ]
|
381 | 242 | },
|
|
412 | 273 | "name": "#%% md\n"
|
413 | 274 | }
|
414 | 275 | },
|
415 |
| - "source": [ |
416 |
| - "*PITFALL NOTEBOOKS: since the parameters have a unique name, a second parameter with the same name cannot be created; the behavior is undefined and therefore it raises an error.\n", |
417 |
| - "While this does not pose a problem in a normal Python script, it does in a Jupyter-like notebook, since it is an often practice to \"rerun\" a cell as an attempt to \"reset\" things. Bear in mind that this does not make sense, from a logic point of view. The parameter already exists. Best practice: write a small wrapper, do not rerun the parameter creation cell or simply rerun the notebook (restart kernel & run all). For further details, have a look at the discussion and arguments [here](https://github.com/zfit/zfit/issues/186)*" |
418 |
| - ] |
| 276 | + "source": [] |
419 | 277 | },
|
420 | 278 | {
|
421 | 279 | "cell_type": "markdown",
|
|
779 | 637 | "\n",
|
780 | 638 | " nbins = 50\n",
|
781 | 639 | "\n",
|
782 |
| - " lower, upper = data.v1.limits\n", |
| 640 | + " lower, upper = data.space.v1.limits\n", |
783 | 641 | " x = znp.linspace(lower, upper, num=1000) # np.linspace also works\n",
|
784 | 642 | " y = model.pdf(x) * size_normal / nbins * data.data_range.area()\n",
|
785 | 643 | " y *= scale\n",
|
|
863 | 721 | },
|
864 | 722 | "outputs": [],
|
865 | 723 | "source": [
|
866 |
| - "mass_obs = zfit.Space('mass', (0, 1000))" |
| 724 | + "mass_obs = zfit.Space('mass', 0, 1000)" |
867 | 725 | ]
|
868 | 726 | },
|
869 | 727 | {
|
|
897 | 755 | "source": [
|
898 | 756 | "# combinatorial background\n",
|
899 | 757 | "\n",
|
900 |
| - "lam = zfit.Parameter('lambda', -0.01, -0.05, -0.001)\n", |
| 758 | + "lam = zfit.Parameter('lambda', -0.01, -0.05, -0.00001)\n", |
901 | 759 | "comb_bkg = zfit.pdf.Exponential(lam, obs=mass_obs)"
|
902 | 760 | ]
|
903 | 761 | },
|
|
1426 | 1284 | "outputs": [],
|
1427 | 1285 | "source": [
|
1428 | 1286 | "values = z.unstack_x(data)\n",
|
1429 |
| - "obs_right_tail = zfit.Space('mass', (700, 1000))\n", |
| 1287 | + "obs_right_tail = zfit.Space('mass', (550, 1000))\n", |
1430 | 1288 | "data_tail = zfit.Data.from_tensor(obs=obs_right_tail, tensor=values)\n",
|
1431 |
| - "with comb_bkg.set_norm_range(obs_right_tail):\n", |
1432 |
| - " nll_tail = zfit.loss.UnbinnedNLL(comb_bkg, data_tail)\n", |
1433 |
| - " minimizer.minimize(nll_tail)" |
| 1289 | + "comb_bkg_right = comb_bkg.to_truncated(limits=obs_right_tail) # this gets the normalization right\n", |
| 1290 | + "nll_tail = zfit.loss.UnbinnedNLL(comb_bkg_right, data_tail)\n", |
| 1291 | + "result_sideband = minimizer.minimize(nll_tail)\n", |
| 1292 | + "print(result_sideband)" |
1434 | 1293 | ]
|
1435 | 1294 | },
|
1436 | 1295 | {
|
|
1617 | 1476 | },
|
1618 | 1477 | "outputs": [],
|
1619 | 1478 | "source": [
|
1620 |
| - "result.hesse(method='minuit_hesse', name='hesse')" |
| 1479 | + "result.hesse(method='minuit_hesse', name='hesse') # these are the default values" |
1621 | 1480 | ]
|
1622 | 1481 | },
|
1623 | 1482 | {
|
|
1654 | 1513 | },
|
1655 | 1514 | "outputs": [],
|
1656 | 1515 | "source": [
|
1657 |
| - "print(result.params)" |
| 1516 | + "print(result)" |
1658 | 1517 | ]
|
1659 | 1518 | },
|
1660 | 1519 | {
|
|
1707 | 1566 | },
|
1708 | 1567 | "outputs": [],
|
1709 | 1568 | "source": [
|
1710 |
| - "print(result.params)" |
| 1569 | + "print(result)" |
1711 | 1570 | ]
|
1712 | 1571 | },
|
1713 | 1572 | {
|
|
0 commit comments