Binary classification (Schizophrenia / Normal) of the adolescent EEG signals, obtained from an open-access dataset.
In this project, various Time and Frequency feature extraction methods (DWT, STFT and CWT) are applied to the EEG signals, in order to obtain better classification performance. The result of %94 test accuracy is obtained by the DWT-MLP method which uses spectral features, and by the CWT-CNN method.
Oguzhan Memis January 2025
- Files in this repository
- Dataset desription
- Code Organization
- Considerations
- Reference
Code file is "eeg_schizophrenia.py" and dataset file is "dataset_text.zip" which includes two folders. There is also a model file called "dwt_mlp_model_96.h5" to import and use for the DWT method. Relevant instructions are noted in later sections.
EEG Dataset which contains 2 classes of EEG signals captured from adolescents.
-Classes: Normal (39 people) and Schizophrenia (45 people).
-Properties:
16 channels * 128 sample-per-second * 60 seconds of measurement for each person.
Voltages are captured in units of microvolts (µV) 10^-6
So the amplitudes of the signals varies from -2000 to +2000
-Orientation:
Signals are vertically placed into text files, ordered by channel number (1 to 16).
The length of 1 signal is = 128 x 60 = 7680 samples.
So each text file contains 16 x 7680 = 122880 samples, vertically.
Source of the dataset: Moscow State University 2005
The original article of the dataset: 2005 Borisov et al. Physiology (Q4)
A recent article that uses this dataset: 2024 Bagherzadeh & Shalbaf Cognitive Neuroscience (Q2)
The codes are divided into separate cells by putting #%%,
RUN EACH CELL ONE BY ONE CONSECUTIVELY.
The cells are as follows:
1) Importing the data
2) Filtering stage (includes time and frequency plots)
3:
3.1) Visualization of all the healthy EEG channels together
3.2) Visualization of all the patient EEG channels together
4) Feature Examinations (including many statistical features on the signals)
5) Further explorations: Correlation matrix, and Recurrence plot
6) Multi-level Decomposition by DWT (examination)
7:
7.1) DWT Feature Extraction and Data Transformation
7.2) SVM Grid-search
7.3) SVM cross-validation
7.4) MLP model
7.5) Optional part: save the best model
7.6) MLP k-fold cross-validation
7.7) Leave One Out CV on the MLP
8:
8.1) STFT-Feature extraction method
8.2) STFT-MLP
8.3) STFT-SVM (Grid-search)
9:
9.1) STFT Data Transformation
9.2) STFT - CNN
10:
10.1) CWT Data Transformation
10.2) CWT - CNN
10.3) CNN k-fold cross-validation
10.4) Leave One Out CV on the CNN
Before running the classification models, consider related data transformation/feature extraction methods
and the input size (for the Deep Learning models).
The DWT-Feature extraction method gives an output dataset in size of (84,16,25)
then the data of every subject are flattened into 16 x 25=400
Use different wavelets for SVM and the MLP models. Such as 'bior2.8' and 'bior3.3' for the SVM
The first STFT-Feature extraction method gives an output dataset in size of (84,16,325)
It uses a downsampled and flattened STFT.
Then the data of every subject are flattened into 16 x 325=5200
In the second STFT method, Spectrograms of the signals are not flattened, and
dataset in size of (84, 16, 513, 21) is obtained.
The CNN model takes the input as 16 channel 513x21 matrices.
In the last CWT method, Scalograms (downsampled in one axis) of the signals are captured
into the resultant dataset which has a size of (84, 16, 60, 1920).
The CNN model takes the input as 16 channel 60x1920 matrices.
All the MLP models are built by using Keras,
and all the CNN models are built by using PyTorch (uses GPU)
Please refer with the name of the repository owner Oğuzhan Memiş, with the link of this repository. Also don't forget to cite the dataset owner 2005 Borisov et al..
Contacts and suggestions are welcomed: [email protected]