Search within Lanny's blog:


Leave me comments so I know people are actually reading my blogs! Thanks!

Saturday, February 28, 2009

Paper Review: A Comparison of Document Clustering Techniques

This paper is written by Steinback, Karpis, and Kumar, University of Minnesota, published at KDD workshop on text mining, 2000.

This paper presents the results of an experimental study of two main approaches to document clustering, agglomerative hierarchical clustering and K-means (standard K-means and bisecting K-means).

Example of Hierarchical Agglomerative Clustering (HAC)

The two basic approaches to generating a hierarchical clustering are agglomerative and divisive. The paper evaluated agglomerative techniques in the comparison. It then described the agglomerative clustering algorithm, the K-means algorithm, and the bisecting K-means algorithm in details.


Visualization of the K-means algorithm

Three evaluation metrics are used in the experiments, and they include two external quality measure, entropy, F measure, and one internal quality measure, overall similarity. The paper described each measure in detail.

Eight data sets were used in the experiments: 5 from TREC, 2 from Reuters-21578, and 1 from WebACE. Performances of three agglomerative hierarchical techniques, Intra-Cluster Similarity Technique (IST), Centroid Similarity Technique (CST), and UPGMA were compared using F-measure and entropy. UPGMA is the best performing hierarchical technique overall, therefore, its performance is compared against standard K-means and bisecting K-means. The performances of bisecting K-means with refinement and hierarchical with refinement are also included in the comparison. In the experiments, the authors used many runs of the regular K-means algorithm and also used incremental updating of centroids.

Experimental results show that the bisecting K-means technique is better than the standard K-means approach and as good or better than the hierarchical approaches when using the three evaluation metrics mentioned. Also the time complexity of bisecting K-means is linear, which makes it very attractive.

The authors argued that the agglomerative hierarchical clustering didn’t do well because nearest neighbors of documents often belong to different classes. K-means and bisecting K-means algorithms do better because they rely on a more global approach. They also believe that bisecting K-means does better than standard K-means because it produces relatively uniformly sized clusters.


Video of the Day:

The Honest $10000 SPAM

Even though miracle happens, still, don't click on suspicious links or give out your bank information. The Nigerian connection at the end of the video is simply hilarious!!

Friday, February 27, 2009

Joy of Life: Volume 1 Chapter 5

Volume One: The City by the Sea
-- written by Maoni


Chapter 5: Staggering Pillow

Although Fan Xian had the body of a four-year-old, inside he had the soul of a mature person. The bloodshed and massacre on his first day in this world left deep imprints in his mind. For that reason, he always felt a sense of unease in his heart, knowing that his mysterious pedigree would one day bring serious trouble to him.
Trouble had finally found him today.
Since his sneak attack ended in vain, it was no good to repeat the old trick. While sobbing pitifully with the intent to baffle the late night visitor, Fan Xian worked his brain hard for a method of escape.
“If I cry for help, the man would surely kill me in no time. But the visitor is not moving. He must be really confused when I randomly called him ‘Daddy’,” he thought to himself.
“Daddy, Daddy…,” he immediately cried out, taking full advantage of his very young age.
“Stop pretending, Young Master Fan,” the man said in an indifferent tone. “You are really smart. Even at such a young age, you already know how to protect yourself. But you probably also know very well that I am not the Count, his Excellency.”
After these words, the man tightened his grip of the dagger and then approached forward toward the four-year-old Fan Xian.
Fan Xian kept the innocent look and teams streamed down his face, but he felt a strong throb in his chest.
“Then who are you, uncle?” he said with a sob.
“Your father sent me to check on you. So don’t scream, okay?”
The late night visitor’s two eyes both had small tints of brown and looked somewhat ugly. The crinkles by the corners of his eyes clearly gave away his age. The tone of his words directly reminded Fan Xian of a pervert grandpa who invites little girls to go watch goldfish with him[1].
Fan Xian did not reveal any of those thoughts and still put out a flawless act of a four-year-old kid, showing a bit of fright, a bit of surprise, and a little annoyance.
“You are not Daddy!”
Then as though he had not seen the dagger in the man’s hand, he turned his tiny buttocks and climbed up the big bed.
“I don’t even know what Daddy looks like,” he muttered.
The man walked toward the bedside with a sinister smile. All of a sudden, the little boy on the bed turned his head around and gazed in the direction behind the man with sheer joy. “Mommy!” he called out.
……
……
This was a very clumsy attempt to trick someone. If it were done by any other person, the man would not have fallen for it. He was, after all, a great master who owned an entire laboratory in the Capital City.
However, since it was done by a four-year-old boy, the man simply believed in him, and as soon as he heard Fan Xian’s calling for Mommy, his eyes opened wide in shock, and he jerked his head around and looked over his shoulder.
Naturally, behind him were only the tightly-shut door and the dense shade of the night.
“Wham!” a brittle cracking sound suddenly resonated in the bedroom. The late night visitor collapsed to the floor, blood all over his head.
Holding the remaining half of the porcelain “pillow” in his hands, Fan Xian looked at the man lying on the floor, heart still fluttering with fear. He felt the weight of the remaining “pillow” and then clenched his teeth. Raising his arms high, he swung the porcelain headrest at the back of the man’s head with all his might.
This time the sound was somewhat muffled because of the strength carried. Even if this late night visitor were a first-class master in martial arts, this staggering “pillow” blow would have kept him unconscious for a good while.
……
……
“Is everything alright?” The servant girl’s voice arose from the outside.
“Everything is fine, Sis! I broke a tea cup. You can clean it up tomorrow.”
“I better clean it up today. You might step on the broken pieces and hurt your feet.”
“You can do it tomorrow!”
When the servant girl recognized the unusual bad temper in the voice of the always gentle and lovely Young Master, she held her tongue and said nothing.
Fang Xian walked by the closet and pulled out a quilt used in the winter. Holding tight onto the quilt cover, he ripped hard and made himself some cloth strips. Twisting the cloth strips together, he tied up the unconscious man on the floor firmly. Only by then, he noticed that the clothes on his back had been soaked by cold sweat.
Fear quickly welled up in his heart – both in his previous life and the present life, this was the first time he ever tried to kill someone, even though he was still unsure if he did kill the man or not.
“That was too risky,” he thought, “if the man was indeed a Kung Fu master, my first strike would for sure have gotten me killed.”
He reached underneath the cloth mask of the man and felt. The man was still breathing. For some unknown reason, Fan Xian suddenly felt a strong urge to kill the man. A cold shudder awoke him. He suddenly realized that ever since his rebirth, he seemed to have morphed into someone much more firm and resolute. He didn’t even hesitate for a second when he ruthlessly attacked the man.
He didn’t realize that deep inside the heart of the child named Fan Xian, he considered himself as someone who had already died once. Therefore, the new life in this world was especially precious to him, and he would not allow anyone to take it away from him.
As the old saying goes, “You never know what you have until you lose it.” The truth was just that simple.
Fan Xian held the dagger in his hand and pondered over and over. In the end he still couldn’t make himself kill the unconscious man on the floor. Suddenly he thought of someone and a smile crept up his face. Silently pushing the door open, he ran to the back court and crawled out from a hole in the wall used by dogs to get in and out. Soon he was standing outside of the small grocery store on the street corner right across from the Count’s Manor.
……
……
“Thump! Thump!” he gently knocked on the door of the small grocery store. The sounds were very light and didn’t travel far even in the quiet night of Port Danzhou. But Fan Xian knew that the man inside would hear the knock with no problem. Even though the man always pretended to not know him for the last four years, at the critical moment, this man was the only person in this world Fan Xian could trust.
“Who is it?”
A flat, emotionless voice came from inside the grocery store.
“He hasn’t changed at all,” Fan Xian thought, “just like that year outside of the Capital City, following a prescribed pattern in both his speaking and his acts.” He rolled his eyes and then replied softly, “I am Fan Xian.”
As expected, the wooden door of the grocery store opened quietly, and the blind youngster stood at the doorway like a ghost, which actually caught Fan Xian by surprise.
Fan Xian stared at the man who sent him to Port Danzhou and the face and the black cloth covering his eyes, which didn’t seem to have changed even the slightest bit. He couldn’t help wondering, “Doesn’t this man ever get old?”





[1] “Inviting little girls to go watch goldfish” is a popular phrase used in China to refer to pervert child molesters.


Now support the author Maoni by clicking this link, and support the translator Lanny by following my blog and leaving comments! :)

Video of the Day:


Got an iPhone? Got a blender? Ever wondered if your iPhone would blend in your blender? Well, find out from this video! The funny thing is that every Saturday, I play soccer right across the street from this company called Blendtec!

Thursday, February 26, 2009

Robot of the Day: Geminoid-F, the Female Android Twin

Not too long ago, we talked about Geminoid, Dr. Ishiguro's "twin brother" robot in a previous post. That robot was built in 2006. Four years later in April 2010, Dr. Ishiguro unveiled his latest creation: a female andriod named Geminoid-F (F stands for female), built after an unnamed female model in her 20s (photo credit Yoshikazu Tsuno/AFP), according to this article at IEEE Spectrum.

The female robot is really the product of joint effort among Osaka University, ATR Laboratory, and Kokoro Co., a small Japanese firm specializing in building androids. Compared to the old male model, this new, more advanced model have the following improvements:
  • It can exhibit facial expressions much more naturally.
  • It only has 12 actuators (male model had 50)
  • Air servo valves and the control system are not embedded into the robot's body, and there is only a small external compressor (male model had a large external box for compressors and valves.)
  • The tele-operation system is using facial recognition software, so the operator doesn't have to wear any sensor at all.

Similar to it's male predecessor, Geminoid-F cannot walk and only have limited movements with its arms and legs. Most of the actuators are located around the neck and face. The main purpose of the robot is for tele-presence where an operator would be sitting in front of a camera and the robot would mimic the person's facial expression and lip/neck movements. One possible application of the robot is to support remote companionship. Dr. Ishiguro plans to test the robot in hospitals.


With a price tag of US $110,000 per copy, such a robot might not be very attractive to consumers even for people who seriously long for a twin brother or sister. However, the research team at least accomplished two things:
  • advanced the technology of natural expression for a robot, and
  • generated ample interest from the public to pay more attention to robotics technology
That leaves me with only one question: Who is he going to duplicate next time?


Picture of the Day:

Adeline will have a dance recital tomorrow. But she just lost a tooth the first time ever in her life! So here I present you:

The pretty, little dancer with a tooth missing!


Wednesday, February 25, 2009

Random Thoughts: Adventure in Japan -- Part 3

This is the last installment of the series. To read the previous two installments, follow the links below:

Adventure in Japan -- Part 1

Adventure in Japan -- Part 2

Osaka, Japan is a the second largest city in Japan with over 20 million people and the commercial capital. It was also the base for Toyotomi Hideyoshia in his successful unification of Japan during the sixteenth century. Different functions and roles resulted in a city that mixes modern technology with historical heritages, making Osaka into a unique city where traditional culture thrived side-by-side along present day life style. Here you can find skyscrapers (e.g., Business Innovation Center Osaka, where HRI 2010 conference was held at) alongside sixteenth century castles (e.g., Osaka Castle right in the middle of the city), and highly sophisticated robots (e.g., D+ ropop robot, designed and made in Osaka) together with women wearing the traditional Japanese clothing, Kimono, waiting at the subway station.


Left: Business Innovation Center Osaka. Right: Osaka Castle in the middle of the city


Left: D+ ropop robot representing modern beauty and advanced technology. Right: Woman in traditional Kimono waiting at the subway station.

During our visit, we stayed at the City Plaza Osaka hotel (left below) right at the heart of downtown, where we can see the crowded city landscape right from the hotel window (right below). The building has a very modern look from the outside. However, the oval shaped top portion actually contained traditional Japanese spa, where people would bath together completely naked (they do separate men from women).


Left: City Plaze Osaka Hotel. Right: City landscape view from our hotel room.

Due to our busy schedule, we only had half a day to look around the city before we fly back to the US, so in the morning, I was forced to take on the Japanese subway system all by myself so I could complete the mission of getting wife some famous Japanese cosmetics. I had two hours to do it, and I pulled it off even though I almost got on a train going the opposite direction and had to run in pouring rain in random directions.

Left: People riding subway train on a Saturday morning. Right: Street view of downtown Osaka.


Left: Saturday shoppers at the shopping district. Middle: people enjoying traditional Japanese food (in the cylinder). Right: Female customers shopping for Kimono.


Left: Cosmetics store. Right: Sushi shop along the street.

Later in the morning, we visited the famous Osaka Castle. Since the castle is located right in the middle of the city, we simply walked over. It was raining at the time, but that didn't stop us from a nice spring field trip. The ancient looking castle with a moat surrounding it was right next to the very modern looking Osaka City Museum, making a very sharp visual contrast.

Left: Osaka City Museum right across the street from, Right: Moat of the Osaka Castle


Students making spring field trips on Saturday. Left: Elementary school kids heading to unknown location at the subway station. Right: Middle school students visiting the Osaka Castle

The Osaka Castle park covers approximately 15 acres of ground. Due to limited time (and we really didn't want to miss our plane), we walked straight to the main building in the castle and then walked straight back to the hotel. I wish we could have spared a bit more time because there were already beautiful cherry blossoms in other parts of the park.


Looking at city landscape from the top of the Osaka Castle (can you see the cherry blossoms?)

The main castle has turned into a museum showing many historical artifacts and documents dating back to the sixteenth century. Most levels of the building prohibited photographing, so I don't really have a lot to show you here except the two below.


Left: Miniature figures depicting an ancient battle. Right: Battle helmets wore by ancient war lords.

It was quite fortunate that such a famous historical site is within walking distance from our hotel, so we were actually able to see traditional and historical sites during the trip. I hope one day I can bring my family to visit the beautiful city again, so wife can go shop for cosmetics while I do real sightseeing. :)


Video of the Day:

The beautiful Osaka Castle


Tuesday, February 24, 2009

Paper Review: Text Categorization with Support Vector Machines: Learning with Many Relevant Features

This paper was written by Thorsten Joachims from University Dortmund. It was published at ECML-98 10th European Conference on Machine Learning. It has been cited by 3632 people according to Google Scholar. Very influential paper indeed!

This paper provides both theoretical and empirical evidence that SVMs are great for text categorization.

When performing text classification, the first step is to transform documents. Each distinct word becomes a feature and the frequency of the word in the document is the value of the feature, resulting in a very high-dimensional feature space. Then, the information gain criterion can be used to select a subset of features. The final step is to scale the dimensions of the feature vector with their inverse document frequency.

SVMs are very universal learners. It can learn linear threshold function and can also learn non-linear functions by using the kernel trick. One great property of SVMs is the ability to learn which is independent of the dimensionality of the feature space, because SVMs use support vectors. SVMs also do not require parameter tuning.




Left: Group of dots in different colors on 2D plane.
Right: Boundaries identified using SVM to group colored dots.

SVMs work well for text categorization because of these following properties of text: 1) High dimensional input space. SVMs do not depend on the number of features and only use support vectors. This prevents overfitting. 2) Few irrelevant features. Aggressive feature selection may result in loss of information. SVMs can work with very high dimensional feature space. 3) Document vectors are sparse. SVMs work well with problems with dense concepts and sparse instances. 4) Most text categorization problems are linearly separable. These are the theoretical evidence.

The paper used two test collections: the “ModApte” split of the Reuters-21578 database and Ohsumed corpus. SVMs are compared with Naïve Bayes, Rocchio, kNN, and C4.5 Decision Tree. Precision/Recall-Breakeven Point is used as a measure of performance. Experimental results show that SVMs had robust performance improvements over other algorithms. Training using SVMs is slow but classification is very fast. SVMs eliminate the need for feature selection and do not require any parameter tuning.

Video of the Day:

I am not a member of the Mormon church, but I still found this story very moving. It was told by the former LDS Church President Gordon Hinckley. Hope you enjoy it! And may God bless us all if there is a God.