Initially, 4 classs ; clip, nonlinear, frequency-based and information, are identified in the characteristic extraction stage. 21 attribute algorithms are used in these 4 classs. 41 characteristic parametric quantities were obtained from these algorithms. These characteristic values are obtained with the aid of package developed by the MATLAB scheduling linguistic communication. The obtained characteristics are presented in Table 2.

## 3.2.1. Time sphere characteristics

Statistical steps: In this stage, the statistical properties of EEG signals are obtained. Short accounts of the properties are provided in Table 3.

Number of zero crossings ( ZC ) : Zero-crossingA is a term normally used in electronics, mathematics and image processing. The figure of ‘zero crossings ‘ in the EEG is believed to alter during ictus activity [ 31 ] . ZC is calculated by numbering the figure of times that the clip domain signal crosses zero within a given window. ZC is dei¬?ned as or or.

Hjorth parametric quantities: Three quantitative forms of EEG, activity, mobility and complexness, are presented [ 32 ] . The discrepancy of the signal and its first and 2nd derived functions in a section can be related to these parametric quantities. If the discrepancy of the original signal isA , so the discrepancy of theA i-th derived function can be defined asA . Table 2 displays the fake and deliberate Hjorth parametric quantities.

Table 2. Parameters of Hjorth

## Feature name

Activity ( HA )

Mobility ( HM )

Complexity ( HC )

## Formula

## 3.2.2. Nonlinear-based characteristics

Petrosian fractal dimension ( PFD ) : One of the simplest and fastest methods for appraisal of fractal dimension [ 33 ] , it is based on the alteration of marks in the signal ‘s derivation. In distinct signals, derivation can specify as the minus of two back-to-back signals [ 33 ] . It can be estimated by the undermentioned look:

( 8 )

where is the figure of signal ‘s samples and is the figure of mark alterations in the signal derived function.

Mean teager energy ( MTE ) : MTE, foremost proposed in [ 34 ] , is dei¬?ned as ;

( 9 )

where is an EEG clip series, is the window length and is the last sample in the era.

Mean Energy ( ME ) : Seizures are characterized by increased signal energy and hence mean energy is a good index of ictuss. ME is dei¬?ned as ;

( 10 )

where is an EEG clip series, is the window length and is the last sample in the era.

Mean curve length ( CL ) : CL [ 36 ] is an estimation of Katz ‘s fractal dimension [ 37 ] and is used as an effectual step for ictus sensing from EEG. CL is dei¬?ned as ;

( 11 )

where is an EEG clip series, is the window length and is the last sample in the era.

## 3.2.3. Frequency and Time frequency-based characteristics

Wigner-Ville distribution ( WV ) : A distribution that displays clip and frequence information on the same plane, WV can be successfully used in many applications where non-stationary signals are used [ 38 ] . Calculation of WV is shown in Equation 13.

( 13 )

Here, corresponds to clip of infinite, represents the complex conjugate of and is the frequence. The characteristics are deducted from the time-frequency plane obtained after the application of Wigner-Ville transmutation to the signal. The highest frequence values matching to the clip values in the plane are used for this procedure. In this survey, the map consisted of the frequence values are modeled by a 3rd order multinomial and the coefficients of this multinomial are used as characteristics [ 39 ] . In this manner, 4 characteristics ( WV-1, WV-2, WV-3, WV-4 ) consisted of the coefficients of the mentioned multinomial are obtained.

Discrete ripple transform and ripple coefficients: A spectral analysis technique used in the analysis of non-stationary signals [ 8, 40 ] , DWT provides time-frequency representations of the signals by utilizing long clip Windowss at low frequences, whereas short clip Windowss are used for high frequences. The procedure leads to good “ time-frequency ” localisation. The DWT of a signal, is the built-in of the signal multiplied by scaly and shifted versions of a ripple map and is dei¬?ned by,

( 14 )

DWT analyzes the signal in different declarations by dividing it into approximative item coefficients [ 41 ] . The sample outputs that belong to first high-pass filter ( g [ . ] ) and the low-pass filter ( h [ . ] ) form the elaborate D1 and approximative A1 bomber bands severally. The A1 approximate set separates once more and the procedure is continued as seen in Figure 2.

ten [ N ]

g [ N ]

H [ N ]

2

D1

A1

## aˆ¦aˆ¦

g [ N ]

H [ N ]

2

2

D2

A2

g [ N ]

H [ N ]

D3

A3

2

2

2

Figure 2. Sub-band decomposition of DWT execution ; h [ n ] is the high-pass filter, g [ n ] the low-pass i¬?lter

Electroencephalogram signals are decomposed into sub-bands by utilizing the DWT with Daubechies ripple of order 4 ( db-4 ) because it yields good consequences in the classii¬?cation of the EEG sections [ 9 ] . In this survey, EEG signals are separated until degree 5 by utilizing db-4 ripple. At the terminal of the procedure, 5 elaborate coefficient signals ( D1-D5 ) and 1 approximative coefficient signal ( A5 ) are extracted. These characteristic vectors which are calculated for the frequence bands A5 and D3-D5 are used for the classii¬?cation of EEG signals. For illustration, Figure 3 shows estimate ( A5 ) and inside informations ( D1-D5 ) of an Awake EEG signal. Figure 4 shows estimate ( A5 ) and inside informations ( D1- D5 ) of a N-REM2 EEG signal.

A compact representation is provided by these extracted ripple coefficients which show the EEG signal ‘s energy distribution in clip and frequence. Further procedures involve utilizing statistics over the set of ripple coefficients in order to diminish the dimensionality of the extracted characteristic vectors [ 7 ] . In order to stand for the time-frequency distribution of the EEG signals, the undermentioned statistical characteristics are employed:

Mean of the absolute values of the coefficients in each sub-band ( D3-1, D4-1, D5-1, and A5-1 ) .

Average power of the ripple coefficients in each sub-band ( D3-2, D4-2, D5-2, and A5-2 ) .

Standard divergence of the coefficients in each sub-band ( D3-3, D4-3, D5-3, and A5-3 ) .

Ratio of the absolute mean values of next sub-bands ( D3-4, D4-4, D5-4, and A5-4 ) .

Features 1 and 2 represent the signal ‘s frequence distribution, whereas characteristics 3 and 4 show the measure of alterations in frequence distribution. 16 characteristics are obtained in this mode.

Figure 3. EEG signal belonging to Awake phase with a 30 seconds epoch from PSG recordings

Figure 4. EEG signal belonging to N-REM phase 2 with a 30 seconds epoch from PSG recordings

## 3.2.4. Entropy-based characteristics

Spectral information ( SpEn ) : SpEn is a step of the signal regularity. A pure sine moving ridge has zero information whereas uncorrelated white noise has entropy of one.A In contrast to normal information, SpEn is computed by using the chances of the signal ‘s power spectra components. When information shows unvarying chance distribution, they will hold higher information [ 42 ] .

SpEn is based on the power spectrum of EEG moving ridges and describes the abnormality of the signal spectrum. The information of the power spectrum is denoted by Hsp and dei¬?ned as ;

( 15 )

where is the power denseness over a dei¬?ned frequence set of the signal, and are the lower and upper frequence and power is normalized such that [ 42 ] . is besides used in the normalized signifier as ;

( 16 )

where is the figure of frequences within the dei¬?ned set. In this work the frequence set is specii¬?ed as [ 0, 50 ] Hz.

Renyi information ( RE ) : Renyi information, introduced by Alfred Renyi [ 43 ] , is a particular instance of spectral informations based on the construct of generalised information of a chance distribution. Equation ( 17 ) represents the definition of the RE.

( 17 )

where is the order of the information, are the chances of. RE is calculated as in this survey.

## FEATURE SELECTION ALGORITHMS

Feature choice procedure, an of import portion of pattern acknowledgment and machine acquisition, decreases calculation costs and increase classii¬?cation public presentation. A suited representation of informations from all characteristics is an of import obstruction in machine acquisition and informations excavation jobs. All original characteristics can non ever be utile for classii¬?cation or arrested development undertakings since some characteristics are irrelevant/redundant or noisy in distribution of dataset and they will diminish classii¬?cation public presentation. Therefore feature choice procedure should be used in classii¬?cation or arrested development jobs so that classii¬?cation public presentation is enhanced and calculation cost of classii¬?ers is decreased ( Cao, Shen, Sun, Yang, & A ; Chen, 2007 ) . FCBF, t-test, reliefF, fisher mark, MrMR algorithms, efficient characteristic choice algorithms, were preferred in the current survey. Short information about these algorithms is provided below:

## Fast Correlation Based Filter ( FCBF )

The Fast Correlation-Based Filter which is based on relevancy among characteristics and redundancy values is an efficient characteristic choice algorithm ( Figure 5 ) . FCBF is comprised of two stairss: choosing relevant characteristics and taking excess 1s from the subset selected in the former measure [ 49 ] . In the rating of characteristic relevancy and characteristic redundancy, FCBF uses symmetrical uncertainness ( SU ) step, which is the information addition of a random variable Ten provided by another random variable Y, normalized by the summing up of their information values, i.e. ,

In Equation 19 ; and represents a vector brace composed of any two characteristics or a characteristic and a category label.

( 19 )

( 20 )

Here is the information addition of after detecting variable [ 49 ] . The information of variable and are and, severally. is the chance of variable and the information of after detecting values of another variable is dei¬?ned as ;

( 21 )

where is the anterior chances for all values of and is the posterior chances of given the values of [ 49 ] .

SU value of 1 indicates perfect correlativity between characteristics, whereas the value of 0 represents independency of them. We can specify the relevancy of theA i-th characteristic with regard to the category degree Celsius by SU ( I ; degree Celsius ) , and redundancy between two characteristics indexed by I and J by SU ( I, J ) . FCBF ab initio chooses all characteristics with relevancy values higher than a pre-defined threshold ( between 0 and 1 ) and so, FCBF removes excess 1s among the selected characteristics that have approximative Markov covers in the staying characteristics. For two relevant characteristics indexed byA iA andA J, the j-thA characteristic is defined to organize an approximative Markov cover for the i-thA characteristic if and merely if SU ( J, degree Celsius ) & gt ; = SU ( I, degree Celsius ) and SU ( I, J ) & gt ; = SU ( I, degree Celsius ) .

Decision trees ( C4.5 ) : These have been successfully used in work outing jobs related to machine acquisition and classii¬?er systems. It is the most normally used information excavation technique. Decision trees work by developing a series of “ if-then ” regulations. An observation is assigned to one section of the tree by each regulation and at that point another “ if-then ” regulation is applied. The initial section contains the full information set and forms the root node for the determination tree [ 52 ] . Unlike nervous webs and arrested development, determination trees do non work with interval informations. Decision trees work with nominal results that have more than two possible consequences and with ordinal result variables [ 52 ] .

The C4.5 determination tree acquisition is a method used for discrete-valued maps sorting, in which a C4.5 determination tree depicts the erudite map [ 29 ] . The aim of C4.5 Decision tree acquisition is to partition recursive informations into subgroups. ( See [ 29 ] , for more information on C4.5 Decision Tree larning ) .

Multilayer Perceptron Neural Network ( FFNN ) : In this survey, three-layer multilayer perceptron feed-forward nervous web architecture is used and trained with the mistake back extension algorithm. FFNN is a non-parametric unreal nervous web technique that provides assorted designations and anticipation procedures [ 53-55 ] . A multilayer nervous web includes an input bed, an end product bed and one or multi concealed beds. The concealed bed nerve cells and the end product bed nerve cells use nonlinear sigmoid activation maps. The categorization truth of the applied FFNN theoretical account is examined harmonizing to conceal nerve cell Numberss and the minimal figure of nerve cells that gave the best consequences is identified as 5. In this system, seven inputs ( Table 4 ) are characteristics and two end products are the index of two categories ( epilepsy and normal ) .

Radial Basis Network ( RBF ) : This is a different attack that presents the curve adjustment job in multidimensional infinite. Radial based maps have been used in the solutions of multi variable jobs in the numerical analysis and in the design of Artificial Neural Networks ( ANN ) every bit good as in the development of ANNs. RBFs are composed of an input bed, a concealed bed and an end product bed. However, transmutation from the input bed to the concealed bed is a non-linear changeless transmutation with radial based activation maps. The transmutation from the concealed bed to end product bed is a additive one. The free parametric quantities that can be applied in RBF are cardinal vectors, the breadth of the radial maps and the end product bed weight. Detailed information about the realisation of the RBF and FFNN constructions can be found in the nervous web toolbox portion of MATLAB Documentation [ 56 ] .

## 3.5.4. k-Fold cross-validation

This method is created to better the holdout method which is the simplest signifier of cross proof. In k-fold cross proof method, the given informations set is subdivided intoA kA sets and the simple holdout method is replicatedA kA timesA [ 59 ] .A In each reproduction, one of these thousand subsets is handled and processed as the trial set and the other ( k – 1 ) sets constitute a preparation set by puting them together. After K tests are completed in this mode, mean mistake across allA kA tests is calculated. The most of import advantage of this method is simpleness of the division of the informations since it does non count how it is divided. All information points are observed in the trial set merely one time and once more are seen in the preparation set ( k – 1 ) times. With the addition ofA K, discrepancy of the obtained estimation will diminish. However, the method has a disadvantage since the preparation algorithm needs to be replicated from the beginning K times and it generatesA kA times more calculation clip. This method can be varied by spliting the information into a trial set and a preparation set indiscriminately thousand times. The method has the ability to choose the size of each trial and the figure of tests and it ‘s a important advantage of the method [ 59 ] . 10-fold cross-validation was utilized in this survey.