Uploading photo.
What is this all about?
Folksonomy is all about, so called, 'social tagging'. Users, not moderators/administrators, assign keywords to the content that they are creating on the web. That way, with small effort, each content can be automatically categorized.
Why content should be categorized?
Every web application creator wants to keep user as long as possible at his web site. To achieve it, one can display related content to the final user. For example, when someone looks at the Ferrari photo, he may want to see other photographs that represent Ferrari.
The same applies to any type of content - blog articles, music files, videos and others.
Why keywords?
To automatically categorize content, computer needs to transform it to it's 'language'. It can't, for example, hear the music and just find similar tracks. Keywords (tags) are the easiest way to let computer gather necessary knowledge about given content. Since it's not an effort for an user to provide few words about content - they are a very good choice.
There is also a whole bunch of methods to force user to assign keywords that fits given content. In this sample application there is a limit of number of tags assigned to the photo. There is also an auto suggestion system that helps user provide a tag in correct form.
What about a categorization?
It's a bit complicated, but continue if it doesn't scare You ;)
Once we have a content with tags assigned, the process of finding similar content can be executed in many ways.
In this application content is being categorized by ART-1 neural network. This neural network, invented by G.A. Carpenter and S. Grossberg, was constructed basing on the observation how brain processes information. It uses unsupervised learning to classify particular vectors and assignes them to the calculated cluster.
How does it work in this case?
Since every photo has it's keywords assigned, it's not a problem to transform it to an input pattern that can be understood by an ART-1 network. From the set of all tags, there is being built a dictionary, basing on which every photo is being transformed to a binary vector. Consider this example:
Let's assume that the following tags where defined in the whole web application:
'Cat', 'Dog', 'Car', 'Ferrari', 'Red', 'Blue'
It means that all photos in an application contain only tags mentioned above.
Let's take the photo of a Ferrari:
And assume that the user assigned following tags to it: 'Car', 'Ferrari', 'Red'
Now, basing on the given dictionary, system transforms it to a vector [0, 0, 1, 1, 1, 0], since tags 'Cat', 'Dog' and 'Blue' are not present in the given photo.
ART-1 neural network takes each of such vector and classifies it, assigning corresponding photo to the calculated cluster.
From this moment, each photo, that could've been categorized - has it's related content assigned and can be presented to the user.
But what happens when user adds a new tag?
Since ART-1 network input size is constant this is a small problem. The easiest solution is to rebuild neural network periodically, so that it rearranges it's construction to adapt to a new dictionary. Pay attention to the fact, that in big systems, where are a lot of users, the size of tags dictionary should be so big, that there may be no need to train ART-1 network very often. This is due to the fact that tags dictionary may contain most of the keywords that users may want to provide to the content.
What's so special about this solution?
Speed. Once ART-1 is trained it can be loaded to a memory and classify content in a real time. When a user uploads a new content and assigns a tags to it, then it can be immediately turned into a vector and become classified through an ART network. There are a lot of options to configure it, so the classification meets all the requirements of any application. For instance, there can be set a number of common tags for each content that is required for it, to be assigned to the same cluster.
Another big advantage is the quality of a classification. With this method all related content is guaranteed to have two, three or even more common tags (depending on a configuration). In this application there is set a limit of two common tags. If two photos don't have at least two common keywords - they can't 'fall' into the same cluster. If tags, that were provided, are accurate, the content that was marked as 'related' is really related. Just take a look at the effects of 9345 photos classification within this method.
The last, but not least advantage is the full automation. There is no need for any administrator to work with tags - do the tag mappings, clean the database, create sub categories and do a lot of stuff to improve the performance of searching for related content.
What are the drawbacks?
Memory usage. Unfortunately all of the above advantages are blocked by a huge memory usage. For this sample application memory usage is not a case since current ART-1 neural network implementation requires about 100 MB for a classification of 9000 photos. However for a production system, with a lot of user content the memory requirements would grow to a very large size. It's related to the fact that this solution's memory usage grows exponentially and for a web site like Flickr it may need somewhere around 40 GB of RAM (in the optimistic case) in order to classify most of the content.
Remarks and summary:
Most of the photos are taken from Flickr.com. Thanks to their API the process of the database creation wasn't very hard. However, Flickr doesn't provide proper limitations to the tag assigning system. For example there is no upper limit for tags (or it is very big). What's more there are no rules about tags themselves. Tags can be very long, they can contain many strange characters and so on.
For this reason - there is a small mess in a database. If the application contained only photos uploaded via implemented upload form, the tags table would be more clean and the results should look better. Since upload form controls and limits tags in the manners mentioned earlier, there would be more appropriate key words assigned to each photo.
Nevertheless - this demonstration shows that the ART-1 based folksonomy works as expected, however the memory requirements disable it from the production systems. The whole project shows one of the directions in which the social classification implementation might go in order to simplify all of the processes related to the clusterization. Since, there are still many things that might be improved and optimized in this solution, maybe in future there will be introduced more effective solution.
ART-1 memory occupancy: 110 MB
Size of a dictionary (input size): 3976
Total number of clusters (output size): 3246
Minimum number of common tags: 2
Number of photos classified: 9348
Generated on: 18-04-2019 12:27 (UTC time)