Ethan Garofolo and Dr. William Barrett, Computer Science
Purpose
As digital image libraries continue to grow in size, classifying the content of such a large volume continues to grow in difficulty. At present, human users catalog images by giving descriptive filenames and/or labels in image header files, hoping that the given names will make sense and describe the images in weeks, months, or years to come. This is a painfully-slow and often inaccurate process. In areas such as national defense where digital images are a critical aspect of intelligence gathering, the shortcomings of existing methods can cost human life. The original purpose of my research was to devise a software solution for classifying images based on image content (i.e. – the objects depicted in an image, such as a car, a slice of pizza, or anything else), allowing images to be searched in a more natural, high-level way. This was to allow a user to simply upload a group of photos to a computer, and, using my software solution, the computer would classify the images based on their content. However, the nature of the project changed as I began the work. I originally wanted to feed a body of images into the system, have the system give names to the various objects depicted in the images, and then perform text-based search on the resultant image database. As an understatement, that was a lofty goal.
As stated above, the system was intended to replace human beings in the work of analyzing images and classifying the images based on content. However, when an individual being approaches the task of analyzing images and giving names to object, that individual brings a lifetime of experience and memory to bear on the task. That is to say, a human does not just take the visual input of an image and using that input alone classify the contents of a photo. That cannot be done. A car is a car not because it has four wheels and doors. It is a car because that is what such collections of wheels and doors have been named. Furthermore, a triangluar object in the desert is probably a pyramid and not a slice of pizza because we don’t find slices of pizza in the desert. I could not hope for my system to suddenly be endowed with a human being’s memories and knowledge of the world.
Upon realizing this, I switched from trying to classify images to simply comparing images. My new goal was to allow search of an image database based on a graphical query. Just as Google searches through its indexes and retrieves documents with words similar to those entered in by a user in a text search, my system takes graphical queries and searches its body of images for those images which contain similar objects.
Solution Description
Dr. Barrett supplied me with a software tool called OBIE which segments images and hierarchically arranges their constituent parts. For example, with an image of a slice of pizza, OBIE would separate the slice of pizza from the background and associate the slices of pepperoni as sub-objects of the original slice. OBIE also maps the pixels of an image to the distinct sub-objects with varying degrees of success (OBIE sometimes crashes with simple bitmap images with large sections of uniform color or creates sub-objects but does not assign any pixels to them). I modified OBIE to output the hierarchy and pixel mappings to files for my system to consume. I gave my system the highly-original name of Orca Project, and it reconstructs the image hierarchies and pixel mappings for use in comparing images. I based image similarity on number of sub-object children, sub-object shape, and average sub-object color. Using Orca Project a user can mouse over an object in an image and drag a selected sub-object to the tab labeled “Search Results” to initiate a search for images containing similar objects. This method requires refinement.
Results and Future Research
At a basic level, the system worked. For example, I could distinguish pizza from a pencil. The results were quite encouraging and support this hierarchical method of comparing image data. However, as the saying goes in the computer world, “garbage in/garbage out.” Orca Project was limited in several ways.
First of all, at the most basic level, image segmentation is a very difficult problem, and OBIE was not entirely reliable. For example, specular highlights on metallic objects were often counted as separate sub-objects, which did not seem appropriate in the context of this project. I believe that even at the image segmentation level, the type of knowledge and life experience a human being has provides an edge over a computer. A chicken-and-the-egg dilemma is created, as image segmentation is required to classify image content, but knowing the image content beforehand would improve the quality of image segmentation. It would be interesting to investigate attaching a knowledge database to an image segmentation routine.
The second limitation my system faced was the shape recognition engine—it was primitive and naive at best. I implemented a cookie-cutter type approach, supposing each sub-object to be a given shape, then checking to see if the layout of the pixels assigned to each sub-object matched the assumed shape. At present, I only check for ellipses and rectangles. Shape recognition is, of course, also a hard problem and worthy of a master’s thesis, as Dr. Barrett informed me.
Third, my comparison method is based on graph matching, which is an NP-complete problem. While the simple images I used in my research did not encounter the usual challenges associated with that level of complexity, it is reasonable to assume that any naive method of graph comparison will not necessarily be better than human classification. I also used a simple system of scoring similarity, and it remains to be seen if better means of weighting sub-object attributes would produce better results.
Lastly, research needs to be done on how to properly index images for fast searches. Whereas text is easily indexed on words, it is unknown what an image’s graphical counterpart would be.