Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test mathjax delimiter #90

Closed
wants to merge 6 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions chapters/en/Unit 1 - Fundamentals/image_and_imaging/image.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@

<!--

#TODO
how do I add equations to the .mdx?
working some schemes for definition of image
-->

## Definition of Image

An image is a visual representation of something. An image is an n-dimensional function. Typically, we consider it to be two-dimensional $$(n=2)$$. Let us denote: $F(X, Y)$, where $X$ and $Y$ are spatial coordinates. The amplitude of $F$ at a pair of coordinates $$(x1, y1)$$ is the intensity or gray level of the image at that point. Typically, when we have the pair coordinate $x1$ and $y1$, we refer to them as pixels (picture elements).

Most of the images are generated by a combination of an illumination source and the reflection or absorption of energy. Images are discrete, the processes involved in assembling them are continuous. Image generating process will be discussed in greater detail in the following sections: technical aspects of imaging and how we imaged everything.

What matters to us right now is that the value of F at specific coordinates holds a physical meaning. And that the function F(X, Y) is characterized by two components: the amount of illumination from the source and the amount of illumination reflected by the object in the scene. Intensity images are also constrained in their intensity since the function is typically non-negative and their values are finite. Sometimes when working with images we might come across images that have values smaller than zeros. This is sometimes done to signal to the algorithm not to look at that part of the image. You can think about it as sticker you put on old photograph with your ex in it.

A different time of image is volumetric or 3D images images. 3D images have the number of dimensions is equal to three and as a result have an F(X, Y, Z). Most of our reason still applies, with the only difference being that the triplet x1, y1, z1 is called a voxel.

Sometime people will refer to different image channels. Fear not! Image channels are the different color components that make an image. In reference to the F(X,Y), this means we will have F for each color component.

The most common color models is the RGB (Red, Green, Blue). Three colors, the channels, three F(X, Y). In this model, the true color of a coordinate (x, y) is given by the combination of the three different F(x,y) at that same location. Combined these three colors describe a wide spectrum of colors.

If you only looking at the F(x, y) for one color, we called the gray level or the intensity. The intensity values for each channel typically range from 0 to 255, where 0 represents no intensity and 255 represents maximum intensity. If you are in a different color system, these values might change so it is important to understand where your data come from.

On occasion, it is also possible to hear that people are working on 4D or 5D images/ this terminology is mostly used by people in the biomedical field and microscopists. Again fear not! This naming came to be from people that image volumetric data in time, different channels, or different imaging modalities (i.e. photo and an x-ray). The idea is that each new source of information becomes an extra dimension.


But how are they represented in the computer? Most commonly by matrices. It is easy to picture an image as a 2-D numerical array. This is an advantage because arrays are easy to deal with in computers. Apart from different performances in memory and reading, most different image formats end up in your script as a numpy array. Seeing matrices as images helps to understand some of the processes in convolution neural networks and an a image preprocessing, which we will see in latter chapters. For instace, aligning an image might be decompose into a mix of rotation and shearing operations much like the tipical linear algebra problem.

Alternatively, images might be represented as graphs where each edge is a coordinate and the edges are the neighboring coordinates. Take a moment to let that sink in, this means that the algorithms and models that are used for graphs can also be used for images. The inverse can also be true, you might be able to transform a graph into an image and analyze it as if it were a picture.

As you can see, we propose a rather flexible definition of image that can accomodate to different ways to acquire visual data.


<!--

#TODO
trying to check that our definition matches the one from the video team, I posted on discord and I am waiting their reply
-->

## Difference between Image and Video


If you have been paying attention, you might be guessing the videos are a visual representation of images that have a component of time attached to them. For 2D image acquistion, you would can add a time component such that F(X, Y, T).

Images can naturally have a hidden component in time but images and videos differ in how they sample this temporal information. An image is a static representation of a scene captured at a single point in time. While a video is a
sequence of images played at a rate that creates an illusion of motion. This rate is what we can call frames per second.

### Key Differences

| | Feature | Image | Video |
|---|------------------------|-------------------------------------------------|-----------------------------------------------------|
| 1 | Type | Single moment in time | Sequence of images over time |
| 2 | Data Representation | Tipically a 2D array of pixels | Tipically a 3D array of frames |
| 3 | File types | JPEG,PNG,RAW, etc. | MP4,AVI, MOV, etc. |
| 4 | Data Augmentation | Flipping, rotating, cropping | Temporal jittering, speed variations, occlusion |
| 5 | Feature Extraction | Edges, textures, colors | Edges, textures, colors, optical flow, trajectories |
| 6 | Learning Models | CNNs | RNNs, 3D CNNs |
| 7 | Machine Learning Tasks | Image classification, Segmentation, Object Detection | Video action recognition, temporal modeling, tracking |
| 8 | Computational Cost | Less expensive | More expensive |
| 9 | Applications | Facial recognition for security access control | Sign language interpretation for live communication |

We will discuss more about video specific tasks at the Video Chapter.



<!--

#TODO
this part could be improved i.e. compare image to audio and more tabular data points
-->


## Images vs other forms of data

In tabular data, dimensionality is usually defined by the number of features (columns) describing one data point. In data, dimensionality usually refer to the cardinality of your combined dimensions. Due to the spatial relation in the data each pixel contributes to add more dimensions to the data. Features that describe your image are usually generated by traditional preprocessing or learned throught deep learning methods. Thus feature extraction in an image involves different algorithms discussed in detail in the next chapter of this unit.
Moreover, tabular data often require the handlign of missing values, encodning categorical variables and re-scaling numerical features. The analogous process for image data is image resizing, normalization or data augmentation. We call these processes pre-processing and we will discuss them in greater detail in the chapter "pre-processing for computer vision".