                  

                   Neural Network Mapping and Classification Training Data Files  


I   Mapping Training Data Files Description

1. FMTRAIN.DAT : ( 5 Inputs , 1 Outputs, 1024 Training Patterns, 61 K unzipped) 
    This training file is used to train a neural network to perform demodulation of an FM (frequency modulation) signal containing a sinusoidal message. The data are generated from the equation 
                        r(n) =  Camp * cos[2* PI* n* Cfreq + Mamp *sin(2* PI* n* Mfreq )] 
where Camp = Carrier Amplitude, Mamp = Message Amplitude, Cfreq = normalized Carrier frequency,  Mfreq = normalized message frequency. In this data set, Camp = .5, Cfreq = .1012878, Mfreq = .01106328, and Mamp=5. The five inputs are r(n-2), r(n-1), r(n), r(n+1), and r(n+2). The output is Cos(2* PI* n* Mfreq ). In each consecutive pattern, n is incremented by 1. 
For more details,  see 
K.Rohani and M.T.Manry,"The Design of Multi-Layer Perceptrons using Building Blocks,"Proc of IJCNN 91, Seattle WA., pp. II-497 to II-502. 

fmtrain.dat (winzipped version) 
fmtest.dat  (winzipped version)   

2. TWOD.TRA : ( 8 Inputs , 7 Outputs, 1768 Training Patterns, 244 K unzipped) 
        This training file is used in the task of inverting the surface scattering parameters from an  inhomogeneous layer above a homogeneous half space, where both interfaces are randomly rough. The parameters to be inverted are the effective permittivity of the surface, the normalized rms height, the normalized surface correlation length, the optical depth, and single scattering albedo of an inhomogeneous irregular layer above a homogeneous half space from back scattering measurements. 
    The training data file contains 1768 patterns. The inputs consist of  eight theoretical values of back scattering coefficient parameters at V and H polarization and four incident angles. The outputs were the corresponding values of permittivity, upper surface height, lower surface height, normalized upper surface correlation length, normalized lower surface correlation length, optical depth and single scattering albedo which had a joint uniform pdf. 
For more details,  see 
M.S.Dawson, A.K.Fung and M.T.Manry, "Surface parameter retrieval using fast learning neural networks," Remote Sensing Reviews, 1993, Vol. 7(1), pp. 1-18. 

M.S.Dawson, J.Olvera, A.K.Fung and M.T.Manry, "Inversion of surface parameters using fast learning neural networks," Proc. of IGARSS'92, Houston, Texas, May 1992, Vol II, pp 910 - 912. 

    The testing version of the data file TWOD.TST is also available (Size 138K) 
 twod.tra  (winzipped version) 
 twod.tst  (winzipped version) 

This file was generated by Mike Dawson while he worked for Prof.Adrian Fung, at University of Texas at Arlington. Dr.Dawson currently works at Raytheon E-Systems in Garland, Texas. 

3.  SINGLE2.TRA : (16 Inputs, 3 Outputs, 10,000 Training Patterns, 1.6M) 
        This training data file consists of 16 inputs and 3 outputs and represents the training set for inversion of surface permittivity, the normalized surface rms roughness, and the surface correlation length found in back scattering models from randomly rough dielectric surfaces. The first 16 inputs represent the simulated back scattering coefficient measured at 10, 30, 50 and 70 degrees at both vertical and horizontal polarization. The remaining 8 are various combinations of ratios of the original eight values. These ratios correspond to  those used in several empirical retrieval algorithms. 

For more details,  see 

A.K. Fung, Z. Li, and K.S. Chen, "Back scattering from a Randomly Rough Dielectric Surface," IEEE Trans. Geo. and Remote Sensing, Vol. 30, No. 2, March 1992. 

A.K. Fung, Microwave Scattering and Emission Models and Their Applications, Arctec House, 1994. 

 single2.tra (winzipped version) 

This file was generated by Mike Dawson while he worked for Prof.Adrian Fung, at University of Texas at Arlington. Dr.Dawson currently works at Raytheon E-Systems in Garland, Texas. 

4.  OH7.TRA : (20 Inputs, 3 Outputs, 15,000 Training Patterns, 3.1 M) 
        This data set is given in Oh, Y., K. Sarabandi, and F.T. Ulaby, "An Empirical Model and an Inversion Technique for Radar Scattering  from Bare Soil Surfaces," in IEEE Trans. on Geoscience and Remote Sensing, pp. 370-381, 1992. The training set contains VV and HH polarization at L 30, 40 deg, C 10, 30, 40, 50, 60 deg, and X 30, 40, 50 deg along with the corresponding unknowns  rms surface height, surface correlation length, and volumetric soil moisture content in 
g / cubic cm. 
 oh7.tra (winzipped version) 
  

5.  POW12TRN : ( 12 Inputs, 1 Output, 1414 Training Patterns, 299K) 
        This training file was generated using data obtained from TU Electric Company in Texas. The first ten input features are last ten minutes power load in megawatts for the entire TU Electric utility, which covers a large part of north Texas. The output is power load fifteen minutes in the future from the current time. All powers were originally sampled every fraction of a second, and averaged over 1 minute to reduce noise. For more details, see 

K. Liu, S. Subbarayan, R.R.Shoults, M.T.Manry, C.Kwan, F.L.Lewis, and J.Naccarino, "Comparison of Very Short-Term Load Forecasting Techniques," IEEE Transactions on Power Systems, vol.11, no.2, May 1996, pp. 877-882. 

M.T. Manry, R. Shoults, and J. Naccarino, "An Automated System for Developing Neural Network Short Term Load Forecasters," Proceedings of the 58th American Power Conference, Chicago, Ill., April 9-11, 1996, vol. 1, pp. 237-241. 

A testing version POW12TST (299 K) is also available for download. 
 pow12trn.zip (winzipped version) 
 pow12tst.zip (winzipped version)   (*Temporarily unavailable) 
  

 6. MAT.TRN: (4 Inputs, 4 Outputs, 2000 Training Patterns, 644K) 
        This training file provides the data set for inversion of random two-by-two matrices. Each pattern consists of  4 input features and  4 output features. The input features, which are uniformly distributed between 0 and 1,  represent a matrix and the four output features are elements of the corresponding inverse matrix. The determinants of the input matrices are constrained to be between .3 and 2. 

mattrn.zip (winzipped version) 
mattst.zip (winzipped version) 



II  Classification Training Data Files Description 

6.  GRNG.TRN : (16 Inputs, Class Id, 800 Training Patterns, 196K) 
        The geometric shape recognition data file consists of four geometric shapes, ellipse, triangle, quadrilateral, and pentagon. Each shape consists of a matrix of size 64*64. For each shape, 200 training patterns were generated using different degrees of deformation. The deformations included rotation, scaling, translation, and oblique distortions. The feature set is ring-wedge energy (RNG), and has 16 features. For more information on the data file, see 

H.C. Yau, M.T.Manry, "Iterative Improvement of a Nearest Neighbor Classifier", Neural Networks, Vol. 4, pp. 517-524, 1991 
 grng.tra (winzipped version) 

  7.  GONGTRN.TRA: ( 16 Inputs, Class Id, 3000 Training Patterns, 780K) 
        The raw data consists of images from handprinted numerals collected from 3,000 people by the Internal Revenue Service. We randomly chose 300 characters from each class to generate 3,000 character training data. Images are 32 by 24 binary matrices. An image scaling algorithm is used to remove size variation in characters. The feature set contains 16 elements. The 10 classes correspond to 10 arabic numerals. For more details concerning the features, see 

W. Gong, H.C. Yau, and M.T. Manry, "Non-Gaussian Feature Analyses Using a Neural Network," Progress in Neural Networks, vol. 2, 1994, pp. 253-269. 

A testing version GONGTST is also available (780K) for download. 
 gongtrn.tra  (winzipped version) 
 gongtst.tst  (winzipped version) (*Temporarily unavailable) 
  

8.  COMF18.TRA : ( 18 Inputs, Class Id, 12,392 Training Patterns, 3.8M) 
        The training data file is generated  segmented images. Each segmented region is separately histogram equalized to 20 levels. Then the joint probability density of pairs of pixels separated by a given distance and a given direction is estimated. We use 0, 90, 180, 270 degrees for the directions and 1, 3, and 5 pixels for the separations. The density estimates are computed for each classification window. For each separation, the co-occurrences for for the four directions are folded together to form a triangular matrix. From each of the resulting three matrices, six features are computed: angular second moment, contrast, entropy, correlation, and the sums of the main diagonal and the first off diagonal. This results in 18 features for each classification window. For more details concerning the features, see 

R.R. Bailey, E.J. Pettit, R.T. Borochoff, M.T. Manry, and X. Jiang, "Automatic Recognition of USGS Land Use/Cover Categories Using Statistical and Neural Network Classifiers," Proceedings of SPIE OE/Aerospace and Remote Sensing, April 12-16, 1993, Orlando Florida. 

        Four regions of land use/cover types were identified in the images per Level I of the U.S.Geological Survey Land Use/Land Cover Classification System : urban areas, fields or open grassy land, trees (forested land), and water ( lakes or rivers). 
comf18.tra  (winzipped version) 

