Final Project :  Neural Computing Prof.  Betancourt



You will use the back-propagation code which you developed in Project #1 to solve the
classification problem of hand written digits.  The data needed for training and testing can
be found in the class Web page, under the name data32.tar.gz .
You should download this file to your account, unzip it ( use gunzip in Unix system )
and untar it ( use tar -xvf data32.tar in Unix system ). This should create a 
sub-directory named data32 containing 3 kinds of files :



1) Data files, containing images of digits.  They have names like  d0000_97. Each
file contains about 120 binary images, each one corresponding to a single digit. There
are 20 such files, for a total of over 2000 images.


2) Classification files, which contained the target information.  These are text files, whose
first entry is the number of 4-digit images in the binary file of the same name.  The
following entries tell you the digits that the images represent.  They are named with the
same name, but with the extension .cls.  So, corresponding to the example above, its
classification file is named d0000_97.cls .


3) A Matlab .m file named imagez.m .  This contains Matlab source code which will allow you
to view the images.  To use it, you simply start Matlab ( type matlab in a window ), then
after the prompt, type

			       imagez ( ' d0000_97',32,32 )

where the first argument is the filename that you want to view, the 2nd.  and 3rd.  arguments
refer to the image size ( in pixels ).  The window should display the image of a digit ,
 the title on top tells you the corresponding target.  Ignore the first digit,
the second digit in the title tells you what image you have.  After a pause, you can hit the
space bar to see the next image in the file, and so on.  Typically , there are over 100
images of 4 digits each in one binary file.

HOW TO PROCEED

1.- You must choose your input features. The images have too many pixels to be used directly
as input features. You must find a way to decrease the dimensionality of the input vectors.
To find out if a particular choice is good or not, you will need to train and test the
network using those features.

2.- Split your data into 2 sets :  the training set and the testing set.  The first one is
used to train the network ( find the weights ), the second one to test the performance of the
network on independent data set, to make decisions such as network architecture, when to stop
a training run , choice of training parameters.

3.- All this choices will require extensive running of your code, so you must start working
on it as of now.

4.- A final report, with a listing of your code, as well as a description of how you
pre-processed the data, how you trained the network, how you chose the network architecture
and training parameters, and what kind of results you obtained in your testing set must be
provided.





DUE DATE :  The report is DUE on Tuesday, May 8. We will have oral presentations of your
report on May 8 and May 10. In the final day of classes, we will
go to the Sun Lab, I will give you a new data file of the same form as the training data.  I
will NOT give you the classification file.  You will run your code, and produce an output
file named ' your_name.final' which contains the results of your classification of the images
contained in this new data file. The format of this file should be exactly the same as all
the other .cls files. It should be an ASCII file, whose first entry is the number of images,
followed by the classification ( digit number ) corresponding to each of the images in
the input file. I will then compare your results with the correct ones in
the .cls file .  Your final grade will depend on the following things :

				     First Project 30 %

				     Final Report 40 %

				Classification Results 30 %

					 GOOD LUCK



RULES :  The same rules apply for the final project. 50% deduction  if the report is late
( after BEGINNING of class on May 8 ) . You need to have your code ready to run in the
SUN Lab ( even if you work at home, you can save your M files and load them in the Lab,
as well as your trained network ) . Your code must produce automatically the output
classification file in the correct format ( .cls file ). I will run a program  during the
final class to compute your results, so your format must be correct . BE SURE TO TEST your
code BEFORE the final class and check that it produces the correct output in the proper format.