Determining Composition of Grain Mixtures by Texture Classification Based on Feature Distributions Timo Ojala, Matti Pietikäinen, and Jarkko Nisula Department of Electrical Engineering University of Oulu FIN-90570 OULU, FINLAND Abstract Texture analysis has many areas of potential application in industry. The problem of determining composition of grain mixtures by texture analysis was recently studied by Kjell. He got promising results when using all nine Laws’ 3x3 features simultaneously and an ordinary feature vector classifier. In this paper the performance of texture classification based on feature distributions in this problem is evaluated. The results obtained are compared to those obtained with a feature vector classifier. The use of distributions of gray level differences as texture measures is also considered. Keywords: machine vision, visual inspection, texture analysis, feature distribution 1. Introduction Texture analysis has many areas of potential application in industry. The proposed applications include, for example, the detection and identification of surface defects on metal surfaces, textiles or semiconductor wafers, the assessment of carpet wear or wheat hardness, and the determination of grain sizes of various materials in process industry. A recent survey of applications for texture analysis in industry is presented by Pietikäinen and Ojala11. Kjell proposed that the composition of grain mixtures can be determined by texture classification when different compositions are seen as different textures and the classification into discrete classes is considered as a measurement event6. The accuracy of the measurement is heavily dependent on the number of classes and the discriminative power of texture features. Kjell examined the performance of Laws’ texture energy measures and ordinary feature vector based classification using images of eleven different mixtures of rice and barley as a test material. Kjell got promising results when using all nine Laws’ features at the same time. About 61 percent of samples were classified into correct classes and misclassified samples were close to their own classes on the diagonal of the confusion matrix. A sample size of 128x128 pixels was used. Recently, we introduced a set of new measures for texture classification based on center-symmetric auto-correlation, using Kullback discrimintion of sample and prototype distributions, and conducted an extensive comparative study of texture measures with classification based on feature distributions3,10. By using two standard sets of test images we showed that a very good texture discrimination can be obtained by using simple texture measures and classification based on distributions of feature values. The very good results that were obtained in these studies suggest that distributions of feature values should be used instead of single values. The choice of proper texture features for classification is extremely important. In our experiments simple texture measures based on gray level differences and local binary patterns outperformed Laws features and center-symmetric covariance measures. Earlier, Ohanian and Dubes used the same image data to study the performance of four types of features: Markov Random Field parameters, Gabor multi-channel features, fractal-based features and co-occurrence features9. In this paper the proposed approach is applied to Kjell’s problem in order to evaluate its efficiency in this kind of application. The results obtained for a conventional feature vector classifier and for a distribution classifier are compared using Laws’ texture measures. The use of distributions of gray level differences as texture measures is also considered. 2. Test Material and Arrangements In order to have a test material comparable to the one used by Kjell, we prepared eleven different mixtures of rice and barley grain. Both of these were rather similar in size, rice was a little more elongated than barley and barley had a wider range of gray levels than rice. Four RGB images of each different mixture were taken using a
SONY 3 CCD DXC-755 camera and the material was mixed before every image. Images were 488x512 pixels in size and the horizontal and vertical resolutions were 0.54 mm and 0.35 mm, respectively. RGB images were converted to gray level intensity images using formula (1) and the image size was changed to 512x512 by bilinear interpolation, in order to have square pixels.
(1)
I = 0.299R + 0.587G + 0.114B
The average gray level of images that contained only barley was roughly 127. When the portion of white rice increased, the average gray level also rised, being about 153 for images containing only rice. Four of the test images are shown in Fig. 1.
(a)
(b)
(c)
(d)
Figure 1. Rice 100% (a). Barley 100% (b). Rice 70%, barley 30% (c). Rice 30%, barley 70% (d).
3. Classification Principles Most of the approaches to texture classification quantify texture measures by single values (means, variances etc.), which are then concatenated into a feature vector. The feature vector is fed to an ordinary statistical pattern recognition procedure or neural network to perform classification. In this way much of the important information contained in the whole distributions of feature values might be lost. There are some earlier results which indicate that the whole distributions of feature values provide very discriminative information about the textures4,13,14. Recently, Ojala et al.10 conducted an extensive comparative
study of texture measures with classification based on feature distributions. By using a standard set of test images they showed that a very good texture discrimination can be obtained by using simple texture measures and a classification principle based on a comparison of sample and prototype distributions of feature values. In the experiments of this study we compared the results obtained by both of these classification principles. For feature vector based classification a nonparametric k-nearest neighbor (kNN) classifier was used2. It constructs the decision rule directly without making any assumptions about the class conditional densities. The class of a feature vector is selected to be the class of the majority among its k nearest neighbors. The basic motivation for this is that patterns which lie close to each other in the feature space are likely to belong to the same class. In the experiments, k=3 turned out to be a reasonable choice. The sample data was split into training and test sets using the well-known leave-one-out method1. The classifier was designed by choosing all but one sample for inclusion in the design set and the single sample in the set was then classified. This procedure was repeated for all samples and the classification error rate was determined as the percentage of misclassified samples out of the total number of samples. A log-likelihood-ratio test, the G test, was used for comparing feature distributions4,5,10,12. The G test is closely related to the Kullback’s minimum cross-entropy principle7. The value of the computed G statistic indicates the probability that the two sample distributions come from the same population: the higher the value, the lower the probability that the two samples are from the same population. For a goodness-of-fit test the G statistic is: n
s G = 2 ∑ s i log -----i mi
(2)
i=1
where s and m are the sample and model distributions, n is the number of bins and si, mi are the respective sample and model probabilities at bin i. In our experiments a single model distribution for every class was not used. Every sample was in its turn classified using the other samples as models, hence the leave-one-out approach was applied. The model samples were ordered according to their probability to come from the same population as the test sample. This probability was measured by a two-way test-of-independence:
G = 2 n
n
∑∑
s, m i = 1
n
f i log f i
n
– ∑ ∑ f i log ∑ f i – i = 1 s, m i = 1 n
n
∑ ∑ f i log ∑ f i + ∑ ∑ f i log ∑ ∑ f i s, m s, m i = 1 s, m i = 1 i = 1 s, m
(3)
where s, m are the two texture samples (test sample and model), n is the number of bins and fi is the frequency at bin i. For a detailed derivation of the formula, see Sokal and Rohlf12 . After the model samples were ordered, the test sample was classified using the k-nearest neighbor principle, i.e. the test sample was assigned to the class of the majority among its k nearest models. The feature space was quantized by adding together feature distributions obtained for every single model image in a total distribution which was divided into N bins having an equal number of entries. Hence, the cut values of the bins of the histograms corresponded to 100 / N percentile of the combined data (e.g., 3.125 for N = 32). Deriving the cut values from the total distribution and allocating every bin the same amount of the combined data guarantees that the highest resolution of the quantization is used where the number of entries is largest and vice versa. 4. Texture Measures Used 4.1 Laws’ Texture Measures The “texture energy measures” developed by Laws8 have been widely used in texture analysis. Laws’ properties, which he called “texture energy measures”, are derived from three simple vectors of length 3, L3 = (1,2,1), E3 = (-1,0,1) and S3 = (-1,2,-1), which represent the one-dimensional operations of center-weighted local averaging, symmetric first differencing (edge detection), and second differencing (spot detection). If we now multiply
the column vectors of length 3 by row vectors of the same length, we obtain Laws’ 3x3 masks. The eight zerosum 3x3 masks (i.e. all but L3L3) are shown in Figure 2. . L3E3
E3L3
E3E3
E3S3
-1 0 1 -2 0 2 -1 0 1
-1 -2 -1 0 0 0 1 2 1
1 0 -1 0 0 0 -1 0 1
1 -2 1 0 0 0 -1 2 -1
S3L3
S3E3
S3S3
L3S3
-1 -2 -1 2 4 2 -1 -2 -1
1 0 -1 -2 0 2 1 0 -1
1 -2 1 -2 4 -2 1 -2 1
-1 2 -1 -2 4 -2 -1 2 -1
Figure 2. Eight Laws’ masks.
To use these masks for describing the texture in a (sub)image, we convolve them with the image and use statistics of the convolution result as textural features. Laws concluded that the most useful statistics are the variances of the convolution results. In this study we used the variances as features for the feature vector classifier and the whole distributions of mask responses for the classification based on feature distributions. 4.2 Gray Level Difference Method A class of simple image properties that can be used for texture analysis are first-order statistics of local property values, i.e., the means, variances, etc. In particular, a class of local properties based on absolute differences between pairs of gray levels or of average gray levels has been sometimes used15. For any given displacement d = (dx,dy), where dx and dy are integers, let f’(x,y) = |f(x,y) - f(x+dx,y+dy)|. Let P’ be the probability density function of f’. If the image has m gray levels, this has the form of an m-dimensional vector whose ith component is the probability that f’(x,y) will have value i. P’ can be easily computed by counting the number of times each value of f’(x,y) occurs. For a small d the difference histograms will peak near zero, while for a larger d they are more spread out. Commonly, several scalar measures are derived from gray level difference histograms, such as contrast, angular second moment, entropy, mean, and inverse difference moment. Ojala et al. obtained very good results by using distributions of gray level differences in classification10. In the present study we used the rotation invariant DIFF4 feature distribution which was computed by considering differences in all four principal directions with displacements 1 or 2, respectively. 5. Experimental Results Two different types of experiments were carried out. Firstly, the performances of Laws’ texture energy measures was determined using ordinary kNN classification based on feature vectors. Secondly, distribution based classification experiments were performed for the best-performing Laws’ feature (E3L3) and for the gray scale difference feature DIFF4 with histograms of 32 and 256 bins. In all cases for nearest-neighbor selection a value of 3 was used (3-NN classification). The effects of sample size and image preprocessing were also examined. The samples were obtained by dividing the original 512x512 images into non-overlapping subimages, resulting in 176 (11 texture classes x 4 images in each class x 4 subimages in an image), 704, 2816 and 11264 samples in total for sample sizes of 256x256, 128x128, 64x64 and 32x32 pixels, respectively. Histogram equalization was performed prior to feature extraction to remove effects of unequal brightness and contrast. It was applied to the whole 512x512 test images instead of the separate samples. Tables 1 and 2 show the results. The numbers in the tables denote the percentages of misclassified samples. Misclassification rate does not reveal how close the misclassified samples are to their correct classes, but it gives enough information to decide which features are suitable for this kind of application. More detailed information can be extracted from confusion matrices. Table 1 contains the results for single Laws’ features with feature vector based classification. With a sample size of 256x256 pixels reasonable results were obtained. The total error for Laws’ E3L3 feature was 36.36%. By applying histogram equalization to images decreases the discriminative power of Laws’ features significantly as
shown in the lower part of Table 1 (EQ 128x128 and EQ 256x256). The results obtained using all eight Laws’ measures simultaneously did not provide much better results (36.93% for the original 256x256 samples). Table 1. Feature vector based classification E3E3
E3L3
E3S3
L3E3
L3S3
S3E3
S3L3
S3S3
All Laws
128x128
58.66
55.97
58.81
57.39
58.95
60.09
61.79
63.07
51.70
256x256
42.61
36.36
46.02
45.45
48.86
46.59
44.89
50.57
36.93
EQ 128x128
85.23
84.38
90.34
82.67
83.10
88.64
88.07
88.07
61.93
EQ 256x256
78.98
72.16
89.77
72.16
82.95
85.80
82.95
83.52
52.84
sample size
Table 2. Distribution based classification DIFF4 E3L3 sample size
32 bins
256 bins
32 bins
256 bins
d=1
d=2
d=1
d=2
128x128
43.61
46.73
38.21
38.35
43.61
42.61
256x256
26.14
26.14
28.41
30.11
30.68
34.66
EQ 32x32
-
-
43.47
63.22
7.28
14.58
EQ 64x64
-
-
22.83
37.82
0.11
0.32
EQ 128x128
58.81
57.67
13.07
21.16
0.00
0.00
EQ 256x256
46.59
38.64
11.93
22.73
0.00
0.00
The results for distributions of feature values are summarized in Table 2. For Laws’ E3L3 feature the misclassification rate was 26.14% for 256x256 samples with histograms of both 32 and 256 bins. The gain for using the whole feature distribution instead of the bare variance of the distribution was about 10%. The discriminative power of Laws’ measures seems to be mostly contained in the variances of the feature distributions10. For this reason the whole distributions do not provide as much additional information for these measures as for some others. The results for DIFF4 are tabulated for two different quantizations (32 and 256 bins) and displacements (d =1, 2). Misclassification rates for original images were comparable to those obtained with E3L3, but histogram equalization improved significantly the performance of DIFF4. The reason for this is that DIFF4 is not invariant with respect to gray scale variance. This means that the textures to be analyzed should be gray scale corrected in order to have gray level differences of equal scale in all textures. For example, using a 128x128 pixel sample size and distributions with 256 bins, the error for DIFF4 was reduced from 43.61% to 0.00%. Even with 64x64 samples the classification error was as small as 0.11%. This means that only 3 out of all 2816 samples were misclassified. All these misclassified samples were classified to the neighboring class of their correct class which means that the measurement error of the composition for these samples was 10%. The confusion matrix for DIFF4 with 32 bins and histogram equalized samples of size 128x128 is shown in Fig. 3. Although classification error is over 10%, only 11 of the 92 misclassified samples are not assigned to the neighboring class of the correct label. The sample size 32x32 appeared to be too small for this kind of measurement purpose. Even if the total error rate was only 12.50% for d = 1, a lot of samples were classified far away from their correct classes. One reason for this might be that with a little sample size like 32x32 the composition of a sample is not necessarily correct. In order to test the robustness of the proposed approach based on DIFF4 feature distributions experiments with three different values of k (k = 1, 3, 5) and with two image resolutions (original 512x512 images and 256x256 images obtained by bilinear interpolation from the original images). A sample size of 64x64 was used. Table 3 presents the results. It can be seen that distributions with 256 bins provide robust performance with respect to variations of k, displacement d, and image resolution, achieving error rates of 0.64% or less in all cases. Quantization of the histograms into 32 bins seems to significantly decrease the performance of classification for DIFF4, but not
necessarily as much for some ofher features, as seen in the case of Laws’ measures (Table 2). Similar results were obtained in our earlier research10,11. A reason for this is that a great majority of absolute gray level differences is peaked near zero and important information may be destroyed by using too coarse quatization. Table 3. Robustness tests DIFF4 for sample size 64x64 imge size (k)
32 bins
256 bins
d=1
d=2
d=1
d=2
512x512 (k = 1)
22.43
42.40
0.11
0.64
512x512 (k = 3)
22.83
37.82
0.11
0.32
512x512 (k = 5)
22.02
36.08
0.11
0.32
256x256 (k = 1)
36.79
55.11
0.43
0.00
256x256 (k = 3)
32.81
51.28
0.00
0.00
256x256 (k = 5)
34.52
50.71
0.00
0.14
We also did distribution classification experiments with other texture measures used in our recent comparative study10, but it turned out that the performance of the classification based on gray scale difference histograms was superior in this application.
b 0 0 0 r 1 0 0
b 0 1 0 r 0 9 0
b 0 2 0 r 0 8 0
b 0 3 0 r 0 7 0
b000r100 63 b010r090 . b020r080 . b030r070 . b040r060 . b050r050 . b060r040 . b070r030 . b080r020 . b090r010 . b100r000 . undefined 0
1 64 1 . . . . . . . .
. . 59 3 . . . . . . .
. . 4 60 . . . . . . .
b 0 4 0 r 0 6 0 . . . 1 60 7 3 . . . .
b 0 5 0 r 0 5 0 . . . . 3 53 8 1 . . .
b 0 6 0 r 0 4 0 . . . . 1 4 49 7 . . .
b 0 7 0 r 0 3 0 . . . . . . 4 55 2 . .
b 0 8 0 r 0 2 0 . . . . . . . . 42 12 4
b 0 9 0 r 0 1 0 . . . . . . . 1 19 49 2
b 1 0 0 r 0 0 0 . . . . . . . . 1 3 58
1.6 0.0 7.8 6.2 6.2 17.2 23.4 14.1 34.4 23.4 9.4
% % % % % % % % % % %
Total error 13.07 %
Figure 3. Confusion matrix for DIFF4 with histogram equalized samples of size 128x128. Symbols bXXXrYYY refer at the relative proportions of barley and rice in the mixture, for example mixture b030r070 contains 30% barley and 70% rice.
6. Conclusion Our recent research results suggest that the whole distributions of texture feature values should be used instead of single values in texture classification. In this paper a practical problem of determining the composition of eleven different mixtures of rice and barley was studied. The results that we obtained demonstrate that classification based on feature distributions performs very well also in this problem. It was also shown that texture measures based on gray level differences outperform Laws’ measures used by Kjell in this problem.
References 1. P.A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. Prentice-Hall, London, 1982. 2. R. Duda and P. Hart, Pattern Classification and Scene Analysis. John Wiley & Sons, New York, 1973. 3. D. Harwood, T. Ojala,, M. Pietikäinen, S. Kelman and L.S. Davis, L.S., “Texture classification by center-symmetric auto-correlation, using Kullback discrimination of distributions,” Pattern Recognition Letters 16 (1995) 110. 4. D. Harwood, M. Subbarao and L.S. Davis, “Texture classification by local rank correlation,” Computer Vision, Graphics, and Image Processing 32 (1985) 404-411. 5. M. Kendall and A. Stuart, The Advanced Theory of Statistics, Vol. 2. Macmillan Publishing Co., New York, 1979. 6. B. Kjell,” Determining composition of grain mixtures using texture energy operators,” SPIE Vol. 1825 Intelligent Robots and Computer Vision XI, 1992, pp. 395-400. 7. S. Kullback, Information Theory and Statistics, Dover, New York, 1968. 8. K.I. Laws, “Textured image segmentation,” Report 940, Image Processing Institute, Univ. of Southern California, 1980. 9. P.P. Ohanian and R.C. Dubes, “Performance evaluation for four classes of textural features,” Pattern Recognition 25 (1992) 819-833. 10. T. Ojala, M. Pietikäinen and D. Harwood, “Performance evaluation of texture measures with classification based on Kullback discrimination of distributions,” Proc. 12th International Conference on Pattern Recognition, Vol. I, Jerusalem, Israel, 1994, pp. 582-585. 11. M. Pietikäinen and T. Ojala, “Texture analysis in industrial applications,” in Advances in Image Processing, Multimedia and Machine Vision, ed. J.L.C. Sanz, Springer-Verlag, in press. 12. R.R. Sokal and F.J. Rohlf, Introduction to Biostatistics, W.H. Freeman and Co, New York, 1987. 13. M. Unser, “Sum and difference histograms for texture classification,” IEEE Transactions on Pattern Analysis and Machine Intelligence 8 (1986) 118-125. 14. A.L. Vickers and J.W. Modestino, “A maximum likelihood approach to texture classification,” IEEE Transactions on Pattern Analysis and Machine Intelligence 4 (1982) 61-68. 15. J. Weszka, C. Dyer and A. Rosenfeld, “A comparative study of texture measures for terrain classification,” IEEE Transactions on Systems, Man, and Cybernetics 6 (1976) 269-285.