Skip to content

General feedback after teaching #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tiagosousagarcia opened this issue Feb 21, 2024 · 4 comments
Open

General feedback after teaching #21

tiagosousagarcia opened this issue Feb 21, 2024 · 4 comments

Comments

@tiagosousagarcia
Copy link

We've run a workshop with this lesson yesterday at Newcastle University, and I thought it would be a good idea to leave some feedback from our experiences.

First of all, many thanks for all your hard work in the lesson, it is a great resource and it proved very popular with learners. It goes without saying but what follows is meant to be helpful feedback, rather than harsh criticism. With that in mind, here are some of the things that I've noticed:

  • Timings should be more generous -- although the amount of code is relatively small and problem-free, we had lots of questions from the audience from the very start. On average, I think each chapter took about one hour, with the exception of the very first one for which 30 minutes was adequate;
  • In general, more exercise suggestions would be useful, at least one per episode;
  • Instructors might need a bit more support in the later episodes, in particular episode 4 about the reasons for implementing that particular model architecture (i.e., why those numbers of layers, why those kernel sizes, is this a starting point to be improved, borrowed from a popular model, completely original...)
  • In general, places where models can be tweaked to improve (or simply alter) their performance could be signposted better -- that was a common question from learners;
  • Although the term 'normalization' is used in episode 2 (under image pre-processing) and is generally used for this stage in ML, it was pointed out during the lesson that the actual operation being performed is 'standardization', so a note could be added about that distinction; some further explanation on why we downsample and normalise image data (and what effects they could have on model performance) could also be helpful for instructors;
  • Reusing variable names should be avoided, particularly when transforming the structure or shape of the dataset. An eagle-eyed helper spotted a particularly problematic spot, when in episode 4 we add a new dimension to the dataset
dataset = dataset[..., np.newaxis]
labels = labels[..., np.newaxis]

If that cell needs to be re-run for any reason, the resulting dataset will get more and more dimensions, which might not be picked up until much later in the lesson (in our case, during model training)

Finally, a couple of specific problems/questions with code:

  • link to data downloads a corrupted file (but you already know that chest_xrays.zip cannot be downloaded from the setup page #8 )
  • In the data augmentation section: shouldn't the options for ImageDataGenerator have different values? I.e., having everything at zero will simply create duplicates of existing images, no?
  • In the data augmentation section: shouldn't this augmented_images = [val_generator[0][0][0] for i in range(batch_size)] be augmented_images = [val_generator[i][0][0] for i in range(batch_size)] otherwise we plot 5 equal images.
  • In the model architecture section there's a discrepancy between the comment explanation and the function used:
# Global max pooling reduces dimensions back to the input size
x = GlobalAveragePooling2D()(x)

That's it! Hope this is useful, and thanks again for your great work!

@tompollard
Copy link
Collaborator

Thanks for this very helpful feedback @tiagosousagarcia (and I'm sorry that it has taken so long to respond). I'm working on addressing the points that you have raised, and will post updates in this thread.

@tompollard
Copy link
Collaborator

Some quick updates:

In general, more exercise suggestions would be useful, at least one per episode;

I have added several exercises (e.g. in #35). More to come.

Although the term 'normalization' is used in episode 2 (under image pre-processing) and is generally used for this stage in ML, it was pointed out during the lesson that the actual operation being performed is 'standardization', so a note could be added about that distinction; some further explanation on why we downsample and normalise image data (and what effects they could have on model performance) could also be helpful for instructors;

Good catch! I have replaced "normalization" with "standardization" and added some explanatory text. I'll add more on this later.

In the data augmentation section: shouldn't the options for ImageDataGenerator have different values? I.e., having everything at zero will simply create duplicates of existing images, no?

I have added an exercise to make it clear that the participants are expected to tweak the augmentation parameters themselves.

In the data augmentation section: shouldn't this
augmented_images = [val_generator[0][0][0] for i in range(batch_size)]
be
augmented_images = [val_generator[i][0][0] for i in range(batch_size)]
otherwise we plot 5 equal images.

It looks strange, but I think the current code is correct. We want to call the val_generator on the same image each time, so we can see the effect on that image (rather than plotting the effect of val_generator on 5 different images.

In the model architecture section there's a discrepancy between the comment explanation and the function used:
# Global max pooling reduces dimensions back to the input size
x = GlobalAveragePooling2D()(x)

Good catch, now fixed.

github-actions bot pushed a commit that referenced this issue May 21, 2025
Auto-generated via `{sandpaper}`
Source  : f7d0ad6
Branch  : main
Author  : Tom Pollard <[email protected]>
Time    : 2025-05-21 19:36:06 +0000
Message : tidying. avoid reuse of variable name. ref #21.
github-actions bot pushed a commit that referenced this issue May 21, 2025
Auto-generated via `{sandpaper}`
Source  : a82d0c4
Branch  : md-outputs
Author  : GitHub Actions <[email protected]>
Time    : 2025-05-21 19:36:57 +0000
Message : markdown source builds

Auto-generated via `{sandpaper}`
Source  : f7d0ad6
Branch  : main
Author  : Tom Pollard <[email protected]>
Time    : 2025-05-21 19:36:06 +0000
Message : tidying. avoid reuse of variable name. ref #21.
tompollard added a commit that referenced this issue May 21, 2025
github-actions bot pushed a commit that referenced this issue May 21, 2025
Auto-generated via `{sandpaper}`
Source  : 912f92b
Branch  : main
Author  : Tom Pollard <[email protected]>
Time    : 2025-05-21 19:38:27 +0000
Message : More generous timing. Ref #21
github-actions bot pushed a commit that referenced this issue May 21, 2025
Auto-generated via `{sandpaper}`
Source  : 8cc57f6
Branch  : md-outputs
Author  : GitHub Actions <[email protected]>
Time    : 2025-05-21 19:39:20 +0000
Message : markdown source builds

Auto-generated via `{sandpaper}`
Source  : 912f92b
Branch  : main
Author  : Tom Pollard <[email protected]>
Time    : 2025-05-21 19:38:27 +0000
Message : More generous timing. Ref #21
@tompollard
Copy link
Collaborator

link to data downloads a corrupted file (but you already know that #8 )

This was fixed in #32, though I have kept an issue open (#31) because I think there is a better solution than the one implemented.

Timings should be more generous -- although the amount of code is relatively small and problem-free, we had lots of questions from the audience from the very start. On average, I think each chapter took about one hour, with the exception of the very first one for which 30 minutes was adequate;

Good point, I think I'd just left the default times in the template. I have updated the sections to allow for more time (now mostly 40min teaching, 20min exercises).

Reusing variable names should be avoided, particularly when transforming the structure or shape of the dataset. An eagle-eyed helper spotted a particularly problematic spot, when in episode 4 we add a new dimension to the dataset
dataset = dataset[..., np.newaxis]
labels = labels[..., np.newaxis]
If that cell needs to be re-run for any reason, the resulting dataset will get more and more dimensions, which might not be picked up until much later in the lesson (in our case, during model training)

I've been caught by this before! Fixed in f7d0ad6

github-actions bot pushed a commit that referenced this issue May 21, 2025
Auto-generated via `{sandpaper}`
Source  : 5fa007b
Branch  : main
Author  : Tom Pollard <[email protected]>
Time    : 2025-05-21 20:09:46 +0000
Message : Add explanation for architecture. Ref #21.
github-actions bot pushed a commit that referenced this issue May 21, 2025
Auto-generated via `{sandpaper}`
Source  : 408366b
Branch  : md-outputs
Author  : GitHub Actions <[email protected]>
Time    : 2025-05-21 20:10:50 +0000
Message : markdown source builds

Auto-generated via `{sandpaper}`
Source  : 5fa007b
Branch  : main
Author  : Tom Pollard <[email protected]>
Time    : 2025-05-21 20:09:46 +0000
Message : Add explanation for architecture. Ref #21.
github-actions bot pushed a commit that referenced this issue May 21, 2025
Auto-generated via `{sandpaper}`
Source  : 8575443
Branch  : main
Author  : Tom Pollard <[email protected]>
Time    : 2025-05-21 21:05:43 +0000
Message : Add guidance on wher models can be tweaked to improve or alter their performance. Ref #21.
github-actions bot pushed a commit that referenced this issue May 21, 2025
Auto-generated via `{sandpaper}`
Source  : a6c8460
Branch  : md-outputs
Author  : GitHub Actions <[email protected]>
Time    : 2025-05-21 21:06:28 +0000
Message : markdown source builds

Auto-generated via `{sandpaper}`
Source  : 8575443
Branch  : main
Author  : Tom Pollard <[email protected]>
Time    : 2025-05-21 21:05:43 +0000
Message : Add guidance on wher models can be tweaked to improve or alter their performance. Ref #21.
@tompollard
Copy link
Collaborator

@tiagosousagarcia Thanks again for this feedback - extremely helpful! I think I have addressed most of your points. If you have an opportunity, please could you take a look at the updated materials and let me know if further work is needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants