Кроме видео, здесь еще копипаст из документации с описанием 4-х вариантов нормализации
Four methods are provided here for normalizing data (These methods are also explained in the attached Example Process)¶
z_transformation: This is also called Statistical normalization. The purpose of statistical normalization is to convert a data into Normal distribution with mean = 0 and variance = 1.
The formula of statistical normalization is Z = (X-u) /s .
You have your attribute values as vector X then you subtract the mean of the attribute values, u, and divide this difference by the standard deviation, you will get another vector Z that has normal distribution with zero mean and unit variance.
It is also called Standard Normal distribution, N(0,1) . However, the range of the standard Normal distribution is not between [0,1] but about -3 to +3 (actually infinity to infinity but by using -3 to +3 you already capture 99.9% of your data).
range_transformation: When this method is selected, two other parameters (min, max) appear in the Parameter View. Range transformation normalizes all attribute values in the specified range [min,max]. min and max are specified using min and max parameters respectively.
proportion_transformation: Each attribute value is normalized as proportion of the total sum of the respective attribute i.e. each attribute value is divided by the total sum of that attribute values.
interquartile_range: Normalization is performed using interquartile range. The range is the difference between the largest and the smallest value in the data set.
Since the range only takes into account two values from the entire data set, it may be heavily influenced by outliers in the data.
Therefore, another criterion - the interquartile range - is commonly used. It is the distance between the 25th and 75th percentiles (Q3 - Q1). The interquartile range is essentially the range of the middle 50% of the data. Because it uses the middle 50%, the interquartile range is not affected by outliers or extreme values.
Discretize by Frequency (RapidMiner Studio Core)¶
This operator converts the selected numerical attributes into nominal attributes by discretizing the numerical attribute into a user-specified number of bins. Bins of equal frequency are automatically generated, the range of different bins may vary.
Map¶
This operator maps specified values of selected attributes to new values. This operator can be applied on both numerical and nominal attributes.
This operator can be used to replace nominal values (e.g. replace the value 'green' by the value 'green_color') as well as numerical values (e.g. replace all values '3' by '-1').
But, one use of this operator can do mappings for attributes of only one type.
A single mapping can be specified using the parameters replace what and replace by as in Replace operator.
Multiple mappings can be specified through the value mappings parameter. Additionally, the operator allows defining a default mapping.
This operator allows you to select attributes to make mappings in. This operator allows you to specify a regular expression. Attribute values of selected attributes that match this regular expression are mapped by the specified value mapping.
Please go through the parameters and the Example Process to develop a better understanding of this operator.
Посты чуть ниже также могут вас заинтересовать
Комментариев нет:
Отправить комментарий