Skip to content

Commit c4ce783

Browse files
committed
fix: intro long
1 parent 198ba2b commit c4ce783

File tree

2 files changed

+28
-310
lines changed

2 files changed

+28
-310
lines changed

_website/tutorials/introduction/Introduction_long.ipynb

+14-155
Original file line numberDiff line numberDiff line change
@@ -186,146 +186,6 @@
186186
"data_normal.n_obs"
187187
]
188188
},
189-
{
190-
"cell_type": "markdown",
191-
"metadata": {
192-
"pycharm": {
193-
"name": "#%% md\n"
194-
}
195-
},
196-
"source": [
197-
"## Data\n",
198-
"\n",
199-
"This component in general plays a minor role in zfit: it is mostly to provide a unified interface for data.\n",
200-
"\n",
201-
"Preprocessing is therefore not part of zfit and should be done beforehand. Python offers many great possibilities to do so (e.g. Pandas).\n",
202-
"\n",
203-
"zfit `Data` can load data from various sources, most notably from Numpy, Pandas DataFrame, TensorFlow Tensor and ROOT (using uproot). It is also possible, for convenience, to convert it directly `to_pandas`. The constructors are named `from_numpy`, `from_root` etc."
204-
]
205-
},
206-
{
207-
"cell_type": "code",
208-
"execution_count": null,
209-
"metadata": {
210-
"pycharm": {
211-
"name": "#%%\n"
212-
}
213-
},
214-
"outputs": [],
215-
"source": [
216-
"import matplotlib.pyplot as plt\n",
217-
"import numpy as np\n",
218-
"import zfit\n",
219-
"# znp is a subset of numpy functions with a numpy interface but using actually the zfit backend (currently TF)\n",
220-
"import zfit.z.numpy as znp\n",
221-
"from zfit import z"
222-
]
223-
},
224-
{
225-
"cell_type": "markdown",
226-
"metadata": {
227-
"pycharm": {
228-
"name": "#%% md\n"
229-
}
230-
},
231-
"source": [
232-
"A `Data` needs not only the data itself but also the observables: the human readable string identifiers of the axes (corresponding to \"columns\" of a Pandas DataFrame). It is convenient to define the `Space` not only with the observable but also with a limit: this can directly be re-used as the normalization range in the PDF.\n",
233-
"\n",
234-
"First, let's define our observables"
235-
]
236-
},
237-
{
238-
"cell_type": "code",
239-
"execution_count": null,
240-
"metadata": {
241-
"pycharm": {
242-
"name": "#%%\n"
243-
}
244-
},
245-
"outputs": [],
246-
"source": [
247-
"obs = zfit.Space('obs1', (-5, 10))"
248-
]
249-
},
250-
{
251-
"cell_type": "markdown",
252-
"metadata": {
253-
"pycharm": {
254-
"name": "#%% md\n"
255-
}
256-
},
257-
"source": [
258-
"This `Space` has limits. Next to the effect of handling the observables, we can also play with the limits: multiple `Spaces` can be added to provide disconnected ranges. More importantly, `Space` offers functionality:\n",
259-
"- limit1d: return the lower and upper limit in the 1 dimensional case (raises an error otherwise)\n",
260-
"- rect_limits: return the n dimensional limits\n",
261-
"- area(): calculate the area (e.g. distance between upper and lower)\n",
262-
"- inside(): return a boolean Tensor corresponding to whether the value is _inside_ the `Space`\n",
263-
"- filter(): filter the input values to only return the one inside"
264-
]
265-
},
266-
{
267-
"cell_type": "code",
268-
"execution_count": null,
269-
"metadata": {
270-
"pycharm": {
271-
"name": "#%%\n"
272-
}
273-
},
274-
"outputs": [],
275-
"source": [
276-
"size_normal = 10000\n",
277-
"data_normal_np = np.random.normal(size=size_normal, scale=2)\n",
278-
"\n",
279-
"data_normal = zfit.Data.from_numpy(obs=obs, array=data_normal_np)"
280-
]
281-
},
282-
{
283-
"cell_type": "markdown",
284-
"metadata": {
285-
"pycharm": {
286-
"name": "#%% md\n"
287-
}
288-
},
289-
"source": [
290-
"The main functionality is\n",
291-
"- nevents: attribute that returns the number of events in the object\n",
292-
"- data_range: a `Space` that defines the limits of the data; if outside, the data will be cut\n",
293-
"- n_obs: defines the number of dimensions in the dataset\n",
294-
"- with_obs: returns a subset of the dataset with only the given obs\n",
295-
"- weights: event based weights\n",
296-
"\n",
297-
"Furthermore, `value` returns a Tensor with shape `(nevents, n_obs)`.\n",
298-
"\n",
299-
"To retrieve values, in general `z.unstack_x(data)` should be used; this returns a single Tensor with shape (nevents) or a list of tensors if `n_obs` is larger then 1."
300-
]
301-
},
302-
{
303-
"cell_type": "code",
304-
"execution_count": null,
305-
"metadata": {
306-
"pycharm": {
307-
"name": "#%%\n"
308-
}
309-
},
310-
"outputs": [],
311-
"source": [
312-
"print(\n",
313-
" f\"We have {data_normal.nevents} events in our dataset with the minimum of {np.min(data_normal.unstack_x())}\") # remember! The obs cut out some of the data"
314-
]
315-
},
316-
{
317-
"cell_type": "code",
318-
"execution_count": null,
319-
"metadata": {
320-
"pycharm": {
321-
"name": "#%%\n"
322-
}
323-
},
324-
"outputs": [],
325-
"source": [
326-
"data_normal.n_obs"
327-
]
328-
},
329189
{
330190
"cell_type": "markdown",
331191
"metadata": {
@@ -362,7 +222,7 @@
362222
"A `Parameter` (there are different kinds actually, more on that later) takes the following arguments as input:\n",
363223
"`Parameter(human readable name, initial value[, lower limit, upper limit])` where the limits are recommended but not mandatory. Furthermore, `step_size` can be given (which is useful to be around the given uncertainty, e.g. for large yields or small values it can help a lot to set this). Also, a `floating` argument is supported, indicating whether the parameter is allowed to float in the fit or not (just omitting the limits does _not_ make a parameter constant).\n",
364224
"\n",
365-
"Parameters have a unique name. This is served as the identifier for e.g. fit results. However, a parameter _cannot_ be retrieved by its string identifier (its name) but the object itself should be used. In places where a parameter maps to something, the object itself is needed, not its name."
225+
"The name of the parameter identifies it; therefore, while multiple parameters with the same name can exist, they cannot exist inside the same model/loss/function, as they would be ambiguous."
366226
]
367227
},
368228
{
@@ -376,6 +236,7 @@
376236
"outputs": [],
377237
"source": [
378238
"mu = zfit.Parameter('mu', 1, -3, 3, step_size=0.2)\n",
239+
"another_mu = zfit.Parameter('mu', 2, -3, 3, step_size=0.2)\n",
379240
"sigma_num = zfit.Parameter('sigma42', 1, 0.1, 10, floating=False)"
380241
]
381242
},
@@ -412,10 +273,7 @@
412273
"name": "#%% md\n"
413274
}
414275
},
415-
"source": [
416-
"*PITFALL NOTEBOOKS: since the parameters have a unique name, a second parameter with the same name cannot be created; the behavior is undefined and therefore it raises an error.\n",
417-
"While this does not pose a problem in a normal Python script, it does in a Jupyter-like notebook, since it is an often practice to \"rerun\" a cell as an attempt to \"reset\" things. Bear in mind that this does not make sense, from a logic point of view. The parameter already exists. Best practice: write a small wrapper, do not rerun the parameter creation cell or simply rerun the notebook (restart kernel & run all). For further details, have a look at the discussion and arguments [here](https://github.com/zfit/zfit/issues/186)*"
418-
]
276+
"source": []
419277
},
420278
{
421279
"cell_type": "markdown",
@@ -779,7 +637,7 @@
779637
"\n",
780638
" nbins = 50\n",
781639
"\n",
782-
" lower, upper = data.v1.limits\n",
640+
" lower, upper = data.space.v1.limits\n",
783641
" x = znp.linspace(lower, upper, num=1000) # np.linspace also works\n",
784642
" y = model.pdf(x) * size_normal / nbins * data.data_range.area()\n",
785643
" y *= scale\n",
@@ -863,7 +721,7 @@
863721
},
864722
"outputs": [],
865723
"source": [
866-
"mass_obs = zfit.Space('mass', (0, 1000))"
724+
"mass_obs = zfit.Space('mass', 0, 1000)"
867725
]
868726
},
869727
{
@@ -897,7 +755,7 @@
897755
"source": [
898756
"# combinatorial background\n",
899757
"\n",
900-
"lam = zfit.Parameter('lambda', -0.01, -0.05, -0.001)\n",
758+
"lam = zfit.Parameter('lambda', -0.01, -0.05, -0.00001)\n",
901759
"comb_bkg = zfit.pdf.Exponential(lam, obs=mass_obs)"
902760
]
903761
},
@@ -1426,11 +1284,12 @@
14261284
"outputs": [],
14271285
"source": [
14281286
"values = z.unstack_x(data)\n",
1429-
"obs_right_tail = zfit.Space('mass', (700, 1000))\n",
1287+
"obs_right_tail = zfit.Space('mass', (550, 1000))\n",
14301288
"data_tail = zfit.Data.from_tensor(obs=obs_right_tail, tensor=values)\n",
1431-
"with comb_bkg.set_norm_range(obs_right_tail):\n",
1432-
" nll_tail = zfit.loss.UnbinnedNLL(comb_bkg, data_tail)\n",
1433-
" minimizer.minimize(nll_tail)"
1289+
"comb_bkg_right = comb_bkg.to_truncated(limits=obs_right_tail) # this gets the normalization right\n",
1290+
"nll_tail = zfit.loss.UnbinnedNLL(comb_bkg_right, data_tail)\n",
1291+
"result_sideband = minimizer.minimize(nll_tail)\n",
1292+
"print(result_sideband)"
14341293
]
14351294
},
14361295
{
@@ -1617,7 +1476,7 @@
16171476
},
16181477
"outputs": [],
16191478
"source": [
1620-
"result.hesse(method='minuit_hesse', name='hesse')"
1479+
"result.hesse(method='minuit_hesse', name='hesse') # these are the default values"
16211480
]
16221481
},
16231482
{
@@ -1654,7 +1513,7 @@
16541513
},
16551514
"outputs": [],
16561515
"source": [
1657-
"print(result.params)"
1516+
"print(result)"
16581517
]
16591518
},
16601519
{
@@ -1707,7 +1566,7 @@
17071566
},
17081567
"outputs": [],
17091568
"source": [
1710-
"print(result.params)"
1569+
"print(result)"
17111570
]
17121571
},
17131572
{

0 commit comments

Comments
 (0)