Search within Lanny's blog:

Leave me comments so I know people are actually reading my blogs! Thanks!

Saturday, February 28, 2009

Paper Review: A Comparison of Document Clustering Techniques

This paper is written by Steinback, Karpis, and Kumar, University of Minnesota, published at KDD workshop on text mining, 2000.

This paper presents the results of an experimental study of two main approaches to document clustering, agglomerative hierarchical clustering and K-means (standard K-means and bisecting K-means).

Example of Hierarchical Agglomerative Clustering (HAC)

The two basic approaches to generating a hierarchical clustering are agglomerative and divisive. The paper evaluated agglomerative techniques in the comparison. It then described the agglomerative clustering algorithm, the K-means algorithm, and the bisecting K-means algorithm in details.

Visualization of the K-means algorithm

Three evaluation metrics are used in the experiments, and they include two external quality measure, entropy, F measure, and one internal quality measure, overall similarity. The paper described each measure in detail.

Eight data sets were used in the experiments: 5 from TREC, 2 from Reuters-21578, and 1 from WebACE. Performances of three agglomerative hierarchical techniques, Intra-Cluster Similarity Technique (IST), Centroid Similarity Technique (CST), and UPGMA were compared using F-measure and entropy. UPGMA is the best performing hierarchical technique overall, therefore, its performance is compared against standard K-means and bisecting K-means. The performances of bisecting K-means with refinement and hierarchical with refinement are also included in the comparison. In the experiments, the authors used many runs of the regular K-means algorithm and also used incremental updating of centroids.

Experimental results show that the bisecting K-means technique is better than the standard K-means approach and as good or better than the hierarchical approaches when using the three evaluation metrics mentioned. Also the time complexity of bisecting K-means is linear, which makes it very attractive.

The authors argued that the agglomerative hierarchical clustering didn’t do well because nearest neighbors of documents often belong to different classes. K-means and bisecting K-means algorithms do better because they rely on a more global approach. They also believe that bisecting K-means does better than standard K-means because it produces relatively uniformly sized clusters.

Video of the Day:

The Honest $10000 SPAM

Even though miracle happens, still, don't click on suspicious links or give out your bank information. The Nigerian connection at the end of the video is simply hilarious!!

Friday, February 27, 2009

Joy of Life: Volume 1 Chapter 5

Volume One: The City by the Sea
-- written by Maoni

Chapter 5: Staggering Pillow

Although Fan Xian had the body of a four-year-old, inside he had the soul of a mature person. The bloodshed and massacre on his first day in this world left deep imprints in his mind. For that reason, he always felt a sense of unease in his heart, knowing that his mysterious pedigree would one day bring serious trouble to him.
Trouble had finally found him today.
Since his sneak attack ended in vain, it was no good to repeat the old trick. While sobbing pitifully with the intent to baffle the late night visitor, Fan Xian worked his brain hard for a method of escape.
“If I cry for help, the man would surely kill me in no time. But the visitor is not moving. He must be really confused when I randomly called him ‘Daddy’,” he thought to himself.
“Daddy, Daddy…,” he immediately cried out, taking full advantage of his very young age.
“Stop pretending, Young Master Fan,” the man said in an indifferent tone. “You are really smart. Even at such a young age, you already know how to protect yourself. But you probably also know very well that I am not the Count, his Excellency.”
After these words, the man tightened his grip of the dagger and then approached forward toward the four-year-old Fan Xian.
Fan Xian kept the innocent look and teams streamed down his face, but he felt a strong throb in his chest.
“Then who are you, uncle?” he said with a sob.
“Your father sent me to check on you. So don’t scream, okay?”
The late night visitor’s two eyes both had small tints of brown and looked somewhat ugly. The crinkles by the corners of his eyes clearly gave away his age. The tone of his words directly reminded Fan Xian of a pervert grandpa who invites little girls to go watch goldfish with him[1].
Fan Xian did not reveal any of those thoughts and still put out a flawless act of a four-year-old kid, showing a bit of fright, a bit of surprise, and a little annoyance.
“You are not Daddy!”
Then as though he had not seen the dagger in the man’s hand, he turned his tiny buttocks and climbed up the big bed.
“I don’t even know what Daddy looks like,” he muttered.
The man walked toward the bedside with a sinister smile. All of a sudden, the little boy on the bed turned his head around and gazed in the direction behind the man with sheer joy. “Mommy!” he called out.
This was a very clumsy attempt to trick someone. If it were done by any other person, the man would not have fallen for it. He was, after all, a great master who owned an entire laboratory in the Capital City.
However, since it was done by a four-year-old boy, the man simply believed in him, and as soon as he heard Fan Xian’s calling for Mommy, his eyes opened wide in shock, and he jerked his head around and looked over his shoulder.
Naturally, behind him were only the tightly-shut door and the dense shade of the night.
“Wham!” a brittle cracking sound suddenly resonated in the bedroom. The late night visitor collapsed to the floor, blood all over his head.
Holding the remaining half of the porcelain “pillow” in his hands, Fan Xian looked at the man lying on the floor, heart still fluttering with fear. He felt the weight of the remaining “pillow” and then clenched his teeth. Raising his arms high, he swung the porcelain headrest at the back of the man’s head with all his might.
This time the sound was somewhat muffled because of the strength carried. Even if this late night visitor were a first-class master in martial arts, this staggering “pillow” blow would have kept him unconscious for a good while.
“Is everything alright?” The servant girl’s voice arose from the outside.
“Everything is fine, Sis! I broke a tea cup. You can clean it up tomorrow.”
“I better clean it up today. You might step on the broken pieces and hurt your feet.”
“You can do it tomorrow!”
When the servant girl recognized the unusual bad temper in the voice of the always gentle and lovely Young Master, she held her tongue and said nothing.
Fang Xian walked by the closet and pulled out a quilt used in the winter. Holding tight onto the quilt cover, he ripped hard and made himself some cloth strips. Twisting the cloth strips together, he tied up the unconscious man on the floor firmly. Only by then, he noticed that the clothes on his back had been soaked by cold sweat.
Fear quickly welled up in his heart – both in his previous life and the present life, this was the first time he ever tried to kill someone, even though he was still unsure if he did kill the man or not.
“That was too risky,” he thought, “if the man was indeed a Kung Fu master, my first strike would for sure have gotten me killed.”
He reached underneath the cloth mask of the man and felt. The man was still breathing. For some unknown reason, Fan Xian suddenly felt a strong urge to kill the man. A cold shudder awoke him. He suddenly realized that ever since his rebirth, he seemed to have morphed into someone much more firm and resolute. He didn’t even hesitate for a second when he ruthlessly attacked the man.
He didn’t realize that deep inside the heart of the child named Fan Xian, he considered himself as someone who had already died once. Therefore, the new life in this world was especially precious to him, and he would not allow anyone to take it away from him.
As the old saying goes, “You never know what you have until you lose it.” The truth was just that simple.
Fan Xian held the dagger in his hand and pondered over and over. In the end he still couldn’t make himself kill the unconscious man on the floor. Suddenly he thought of someone and a smile crept up his face. Silently pushing the door open, he ran to the back court and crawled out from a hole in the wall used by dogs to get in and out. Soon he was standing outside of the small grocery store on the street corner right across from the Count’s Manor.
“Thump! Thump!” he gently knocked on the door of the small grocery store. The sounds were very light and didn’t travel far even in the quiet night of Port Danzhou. But Fan Xian knew that the man inside would hear the knock with no problem. Even though the man always pretended to not know him for the last four years, at the critical moment, this man was the only person in this world Fan Xian could trust.
“Who is it?”
A flat, emotionless voice came from inside the grocery store.
“He hasn’t changed at all,” Fan Xian thought, “just like that year outside of the Capital City, following a prescribed pattern in both his speaking and his acts.” He rolled his eyes and then replied softly, “I am Fan Xian.”
As expected, the wooden door of the grocery store opened quietly, and the blind youngster stood at the doorway like a ghost, which actually caught Fan Xian by surprise.
Fan Xian stared at the man who sent him to Port Danzhou and the face and the black cloth covering his eyes, which didn’t seem to have changed even the slightest bit. He couldn’t help wondering, “Doesn’t this man ever get old?”

[1] “Inviting little girls to go watch goldfish” is a popular phrase used in China to refer to pervert child molesters.

Now support the author Maoni by clicking this link, and support the translator Lanny by following my blog and leaving comments! :)

Video of the Day:

Got an iPhone? Got a blender? Ever wondered if your iPhone would blend in your blender? Well, find out from this video! The funny thing is that every Saturday, I play soccer right across the street from this company called Blendtec!

Thursday, February 26, 2009

Robot of the Day: Geminoid-F, the Female Android Twin

Not too long ago, we talked about Geminoid, Dr. Ishiguro's "twin brother" robot in a previous post. That robot was built in 2006. Four years later in April 2010, Dr. Ishiguro unveiled his latest creation: a female andriod named Geminoid-F (F stands for female), built after an unnamed female model in her 20s (photo credit Yoshikazu Tsuno/AFP), according to this article at IEEE Spectrum.

The female robot is really the product of joint effort among Osaka University, ATR Laboratory, and Kokoro Co., a small Japanese firm specializing in building androids. Compared to the old male model, this new, more advanced model have the following improvements:
  • It can exhibit facial expressions much more naturally.
  • It only has 12 actuators (male model had 50)
  • Air servo valves and the control system are not embedded into the robot's body, and there is only a small external compressor (male model had a large external box for compressors and valves.)
  • The tele-operation system is using facial recognition software, so the operator doesn't have to wear any sensor at all.

Similar to it's male predecessor, Geminoid-F cannot walk and only have limited movements with its arms and legs. Most of the actuators are located around the neck and face. The main purpose of the robot is for tele-presence where an operator would be sitting in front of a camera and the robot would mimic the person's facial expression and lip/neck movements. One possible application of the robot is to support remote companionship. Dr. Ishiguro plans to test the robot in hospitals.

With a price tag of US $110,000 per copy, such a robot might not be very attractive to consumers even for people who seriously long for a twin brother or sister. However, the research team at least accomplished two things:
  • advanced the technology of natural expression for a robot, and
  • generated ample interest from the public to pay more attention to robotics technology
That leaves me with only one question: Who is he going to duplicate next time?

Picture of the Day:

Adeline will have a dance recital tomorrow. But she just lost a tooth the first time ever in her life! So here I present you:

The pretty, little dancer with a tooth missing!

Wednesday, February 25, 2009

Random Thoughts: Adventure in Japan -- Part 3

This is the last installment of the series. To read the previous two installments, follow the links below:

Adventure in Japan -- Part 1

Adventure in Japan -- Part 2

Osaka, Japan is a the second largest city in Japan with over 20 million people and the commercial capital. It was also the base for Toyotomi Hideyoshia in his successful unification of Japan during the sixteenth century. Different functions and roles resulted in a city that mixes modern technology with historical heritages, making Osaka into a unique city where traditional culture thrived side-by-side along present day life style. Here you can find skyscrapers (e.g., Business Innovation Center Osaka, where HRI 2010 conference was held at) alongside sixteenth century castles (e.g., Osaka Castle right in the middle of the city), and highly sophisticated robots (e.g., D+ ropop robot, designed and made in Osaka) together with women wearing the traditional Japanese clothing, Kimono, waiting at the subway station.

Left: Business Innovation Center Osaka. Right: Osaka Castle in the middle of the city

Left: D+ ropop robot representing modern beauty and advanced technology. Right: Woman in traditional Kimono waiting at the subway station.

During our visit, we stayed at the City Plaza Osaka hotel (left below) right at the heart of downtown, where we can see the crowded city landscape right from the hotel window (right below). The building has a very modern look from the outside. However, the oval shaped top portion actually contained traditional Japanese spa, where people would bath together completely naked (they do separate men from women).

Left: City Plaze Osaka Hotel. Right: City landscape view from our hotel room.

Due to our busy schedule, we only had half a day to look around the city before we fly back to the US, so in the morning, I was forced to take on the Japanese subway system all by myself so I could complete the mission of getting wife some famous Japanese cosmetics. I had two hours to do it, and I pulled it off even though I almost got on a train going the opposite direction and had to run in pouring rain in random directions.

Left: People riding subway train on a Saturday morning. Right: Street view of downtown Osaka.

Left: Saturday shoppers at the shopping district. Middle: people enjoying traditional Japanese food (in the cylinder). Right: Female customers shopping for Kimono.

Left: Cosmetics store. Right: Sushi shop along the street.

Later in the morning, we visited the famous Osaka Castle. Since the castle is located right in the middle of the city, we simply walked over. It was raining at the time, but that didn't stop us from a nice spring field trip. The ancient looking castle with a moat surrounding it was right next to the very modern looking Osaka City Museum, making a very sharp visual contrast.

Left: Osaka City Museum right across the street from, Right: Moat of the Osaka Castle

Students making spring field trips on Saturday. Left: Elementary school kids heading to unknown location at the subway station. Right: Middle school students visiting the Osaka Castle

The Osaka Castle park covers approximately 15 acres of ground. Due to limited time (and we really didn't want to miss our plane), we walked straight to the main building in the castle and then walked straight back to the hotel. I wish we could have spared a bit more time because there were already beautiful cherry blossoms in other parts of the park.

Looking at city landscape from the top of the Osaka Castle (can you see the cherry blossoms?)

The main castle has turned into a museum showing many historical artifacts and documents dating back to the sixteenth century. Most levels of the building prohibited photographing, so I don't really have a lot to show you here except the two below.

Left: Miniature figures depicting an ancient battle. Right: Battle helmets wore by ancient war lords.

It was quite fortunate that such a famous historical site is within walking distance from our hotel, so we were actually able to see traditional and historical sites during the trip. I hope one day I can bring my family to visit the beautiful city again, so wife can go shop for cosmetics while I do real sightseeing. :)

Video of the Day:

The beautiful Osaka Castle

Tuesday, February 24, 2009

Paper Review: Text Categorization with Support Vector Machines: Learning with Many Relevant Features

This paper was written by Thorsten Joachims from University Dortmund. It was published at ECML-98 10th European Conference on Machine Learning. It has been cited by 3632 people according to Google Scholar. Very influential paper indeed!

This paper provides both theoretical and empirical evidence that SVMs are great for text categorization.

When performing text classification, the first step is to transform documents. Each distinct word becomes a feature and the frequency of the word in the document is the value of the feature, resulting in a very high-dimensional feature space. Then, the information gain criterion can be used to select a subset of features. The final step is to scale the dimensions of the feature vector with their inverse document frequency.

SVMs are very universal learners. It can learn linear threshold function and can also learn non-linear functions by using the kernel trick. One great property of SVMs is the ability to learn which is independent of the dimensionality of the feature space, because SVMs use support vectors. SVMs also do not require parameter tuning.

Left: Group of dots in different colors on 2D plane.
Right: Boundaries identified using SVM to group colored dots.

SVMs work well for text categorization because of these following properties of text: 1) High dimensional input space. SVMs do not depend on the number of features and only use support vectors. This prevents overfitting. 2) Few irrelevant features. Aggressive feature selection may result in loss of information. SVMs can work with very high dimensional feature space. 3) Document vectors are sparse. SVMs work well with problems with dense concepts and sparse instances. 4) Most text categorization problems are linearly separable. These are the theoretical evidence.

The paper used two test collections: the “ModApte” split of the Reuters-21578 database and Ohsumed corpus. SVMs are compared with Naïve Bayes, Rocchio, kNN, and C4.5 Decision Tree. Precision/Recall-Breakeven Point is used as a measure of performance. Experimental results show that SVMs had robust performance improvements over other algorithms. Training using SVMs is slow but classification is very fast. SVMs eliminate the need for feature selection and do not require any parameter tuning.

Video of the Day:

I am not a member of the Mormon church, but I still found this story very moving. It was told by the former LDS Church President Gordon Hinckley. Hope you enjoy it! And may God bless us all if there is a God.

Monday, February 23, 2009

AI and Robots: VEX Robotics Competition World Championship

Two days from today, and between April 22 and 24, the 2009-2010 VEX Robotics Competition World Championship will be held at Dallas Conventional Center where over 3000 contestants from 14 countries around the world will meet and fight their guts out (correction, fight their robots' guts out).

It is interesting that I only heard about this competition a few days ago from my wife because she is actually working on arranging hotel and travel for the Chinese team. Therefore, I looked it up and hence, today's blog post. :)

The main sponsor of the competition is a company called Vex Robotics Design System, who makes and sells robotics kits to hobbyists and young students. At the beginning of the season each year, the organizer would announce a new challenge, and students around the world can then form teams to compete in this world-wide competition using robots built from, of course, the VEX robotics kit. Contestants contained mostly middle school and high school students. However, even elementary school students can compete in this competition. These teams then compete against each other at the local and regional level until finalists are determined who then compete in the world championship. The competition is presently in its third season. The challenge for the 2008-2009 season is called Elevation Challenge, and the new one for the 2009-2010 season is called Clean Sweep Challenge.

The video below is from last year's world championship, also held at Dallas Conventional Center.

This year's challenge is the clean sweep challenge where two teams, each team using two robots, are divided into two courts, and the goal is to rack up as many points in a fixed time by pushing, shoveling, throwing, and dumping balls out of the team's own court and into the opponent's court. In the first 20 seconds of the game, the robots will play autonomously by running programs written by the contestants. For the remaining duration of the game, each robot is teleoperated using a remote control by contestants. Each team is free to design the robots anyway they like, and the only constraint is that the size of each robot can not exceed a certain limit (read the detailed description of the rules). The video below is the game animation describing the game in detail.

Since each team has to fight all the way from local to international, there are plenty of videos of games played at different cities and regions. The video below shows a game played by team number 8888 from La Salle High School in the semifinals (probably at the country level). You can probably see that during the first 20 seconds, the robots looked very dumb and didn't really do much. This is probably due to the difficulty for pre-college-level students to master and implement advanced AI algorithms and techniques. However, the students still have to put in a lot of effort designing and implementing these robots from an mechanical engineering perspective. Still though, it would be so nice if we see people designing fully autonomous robots (or robots with supervisory control) to compete in such interesting games.

I wish all the contestants the best luck in the upcoming world competition. I am sure they will all have a ton of fun and hopefully many of them will grow up into sincere robotists.

Video of the Day:

The street magician!

Sunday, February 22, 2009

Full Moon Crescent Saber: Chapter 1 (3)

The creek was as clear as crystal.
Ding Peng walked along the creek, walking very quickly.
Of course he needed to hurry back. He still had many things to do. The morning sun had been rising high gradually. He suddenly felt very hungry, deadly hungry.
Today could very well be the most important day in his entire life. The moment that would decide his fate was right around the corner. But what was he doing? He was looking for an old man in bright red robe for a naked girl on an empty stomach like a fool.
If anyone else had told him of such a fool, he would never have believed.
The only thing real was that the girl was amazingly beautiful. Furthermore, she also possessed a very special temperament that made it impossible and unbearable for someone to reject her requests.
If there actually exists men capable of saying “no” to the face of this girl, such men must be very rare.
Fortunately, the creek was not very long.
There was indeed an old tree at the end of the creek, and indeed two men playing the game of Go there. One of them was indeed an old man in a bright red robe. Ding Peng heaved a secret sigh and then walked toward them in large strides. Reaching his hand forward, he tried to mess up the present game.
He was indeed an obedient man. But as he reached forward with his hand, his foot suddenly tripped. There was a hole on the ground, and he had stepped into the hole.
Fortunately the hole was not very big and he didn’t fall. Unfortunately, just when he drew his foot out of the hole, his other foot also tripped. Turned out there was a rope circle on the ground, and he happened to step into it. The rope circle immediately tightened.
Since his other foot was still in the air, as soon as the rope circle tightened, he lost his balance.
Even more unfortunately, the rope circle was tied to a tree branch. The tree branch had been bent to the ground. When the rope circle moved, the tree branch immediately shot upward and also swung him upward.
Most unfortunately, as he was swung upward, he happened to bump into another tree branch, and the branch happened to poke him right on an acupoint around his waist, which could easily immobilize a person even when poked lightly. Therefore, Ding Peng
found himself hanging upside down like a stupid fish hanging from a fishing pole.
The hold on the ground, the rope circle, and the tree branch – did someone deliberately set up the trap?
When the girl told him to come here, did she intend for him to fall prey of this trap? The two of them obviously didn’t hold any grudges against each other. Why would she do such a terrible thing to him?
The two men under the tree concentrated on their ongoing Go game without ever sparing a glance at him, as though they had no idea that someone came and was now hanging from the tree upside down.
The two must be true Go enthusiasts.
All Go enthusiasts hate interruptions when they are playing.
Maybe they only laid the trap to prevent others from disturbing them. They didn’t do it specifically for him.
Of course the girl would have no idea about such a trap.
At that thought, Ding Peng felt slightly better inside.
“Excuse me, misters! Will you let me down please?” he asked calmly.
But the two Go players didn’t hear him at all. Ding Peng repeated three times, but they ignored him completely as though they didn’t hear a word he said. Ding Peng began to lose his calm.
“Hey…,” he shouted.
He only had the chance to call out that one word. The word required him to open his mouth, but instantly, something flew over and blocked his mouth, something stinking, soft, sticky, and reeking. Ding Peng couldn’t tell if it was mud or something much worse than mud. The thing came from a tree branch on the opposite side. A little monkey wearing a red dress and riding on the branch was actually laughing at him with its mouth stretched wide. Things thrown by a monkey cannot be anything good! He’d be very lucky if it were only mud. Ding Peng nearly fainted from anger. After years of hardship and struggle, when he could almost feel the edge of success, then this happened.
Now support the translator Lanny by following my blog and leaving comments! :)

Picture of the Day:

Adeline does the Kung-Fu Panda at BYU CS Building!
Did you know that Jason Turner, a BYU alumnus currently working at DreamWorks Animation, personally built the computer model for "Po," the panda who stars in the movie?

Saturday, February 21, 2009

Robot of the Day: Tetris-Bot, Lego Robot Playing Tetris

Remember the Rubik's Cube solving robots in a previous post? Well, as robots are gradually taking on our world, they are also taking on more and more of our games, and this time, it's Tetris -- one of the most popular video games in the world -- hmm, this really reminds me of those long, sleepless nights of a poor college student!

Pointing a web cam at a computer screen, hooking it to a Lego Mindstorms NXT robot, and setting the robot next to a keyboard, Branislov Kisacanin successfully created a Tetris-Bot that's capable of playing Tetris all by itself. Although Branislov claims that this was an educational project for his kids, chances are, he had a lot more fun than his kids.

The setup really had three pieces. The first piece is a camera capturing video of a computer screen running the game Tetris. A digital signal processing board then processes the video and determine how the falling piece should be moved. The DSP board then tells the NXT robot what to do using LED lights. Then the NXT robot uses its three fingers (hands) to punch three keys on a keyboard to move left, move right, or rotate. Although the robot is capable of punching 3 keystrokes per second, it moves at a much slower pace.

The creator Branislov must had a strong engineering background from his choice of using a DSP board for signal processing. If I were to create such a robot, I'd probably use a computer to perform the computer vision task. Recognizing the Tetris pieces and their orientation is not a very difficult task because of the color simplicity. Then the program just have to use a data structure to represent the state of the game and then choose moves that will maximize a certain utility (defined by the programmer). The video below demos the capability of the Tetris-Bot. The actual robot doesn't appear until 1:48, so skip forward if you want to hurry.

Tetris-Bot here plays like a novice player. My guess is that it will probably forever stuck on level 1 because of it's physical constraints. What would be really nice and fun is to implement some kind of learning algorithm so the robot actually learns what strategies to play from its own experiences and then does some advanced planning by thinking about what to do based on the pieces shown ahead of time. If the algorithm can adjust its parameters (such as threshold values on when to get rid of rows quickly vs. when to wait for a long stick), then the Tetris-Bot would look a lot smarter and more intelligent.

This is yet another example of what kind of robots you can build at your home at your free time using commercially available robotics kit. I know what I am getting for my kids' birthday -- I am very serious about my kids' education! Aren't you?

So if robots are doing our work and playing our games for us, what is left for humans to do? Well, I can think of at least three things:
  • building better robots
  • blog about robots, and
  • work on my translation projects
Wait, aren't I doing these already? :) That is, of course, until we have robots that build better robots, robots that blog about robots, and robots that can translate better than I do ... and I am sure glad I won't live long enough to that day!

Video of the Day:

This is excellent engineering too: OK Go - This Too Shall Pass

Friday, February 20, 2009

Random Thoughts: Adventure in Japan -- Part 2

Adventure in Japan - Part 1

It has been a while since I returned from Osaka, Japan, but I thought I'd share a bit more of my experience in Japan for people who would like to visit Japan one day. Let me start off with some traveling tips.
  • For a lot of people (61 countries and regions to be exact) including US citizens, visiting Japan for non-paid activities for 90 days or less does not require a visa. Just buy a plane ticket and go. It's that easy!

  • There are several ways to get Japanese Yen. You can get it from your local bank before the trip. However, be aware that you have to pre-order, and it might take them up to 5 business days to get the money ready for you. They also charge a service fee ($10 for US Bank) for the exchange (from US Dollar to Yen or later from Yen to US Dollar after you return). This option works well if you exchange large quantities of money. A more convenient way to get Japanese Yen for a short term visitor is to get the Japanese money from ATMs at the Japanese airport. You will be charged about 3% for the exchange plus the ATM fee (probably $2). This option is better for small amount of exchange.

  • Before visiting Japan, I was told that most places in Japan would take credit cards such as American Express of Visa. After visiting Japan, I learned the hard way that this is not true. Japanese businesses mostly don't take credit cards. Even McDonald's in downtown Osaka refused to take any credit cards.

  • Power outlets in Japan are different from North America. North America has polarized outlets (one big one small). Japan has non-polarized outlets (both small). Also they don't have three holes, only two. If you have polarized plugs, then you need an adapter. The hotel might loan you one for free.

  • Standard voltage in Japan is 100V. Make sure your devices can operate at 100V. If not, you need a transformer.
For the rest of the blog post, I'll focus on one single topic: Japanese Food.

The conference provided free lunch everyday in the form of a very traditional style of Japanese food: Bento Box. According to Wikipedia:
Bento (弁当) is a single-portion takeout or home-packed meal common in Japanese cuisine. A traditional bento consists of rice, fish or meat, and one or more pickled or cooked vegetables, usually in a box-shaped container.
The three pictures below show the three kinds of bento box lunches I was fortunate to try out. Each bento box contained a great variety of things, including rice, sea food, and lots of pickled things. Everything in a bento box is served cold, removing the need to heat up things using a microwave. I must confess that although the bento boxes looked very colorful and pretty, cold rice and too much pickled meat/vegetables just didn't quite agree with me. And I must mention that all the beautiful wooden boxes were properly recycled to save trees!

Because of the generosity of the HRI 2010 conference organizers (they covered most of the meals) and my very busy schedule, I only had the chance to visit one traditional Japanese restaurant during the trip. The picture below on the left shows the front of the small restaurant in downtown Osaka named Money House. The picture on the right shows the hall way inside, just wide enough for one person, a typical setup for traditional Japanese restaurants.

Since Japan is entirely made up of islands, it was not surprising to see lots of sea food dishes on the menu. Since a friend in our dinner group was an American who had lived in Japan for 8 years, he took charge of all the ordering, and we got to experience some interesting food. For example, deep fried squids (left), octopus balls (middle), and of course, raw fish (right). The first two actually tasted great despite the weirdness, however, I shied away from the raw fish, because I don't ever eat raw meat (e.g., a rare steak).

Some other dishes are very similar to Chinese dishes, such as dumplings, stir fried clams, and boiled green soy beans.

There were dishes that tasted very American too, such as the big Chicken Nugget shown below. Alcohol is also a big part of a Japanese culture (see all those bottles in the middle picture), and I wonder how many people in Japan drink and drive. The dinner was great! There is only one thing I'd like to complain though: why were all the dishes served in such small plates? See the stack of small plates in the last picture? We are a bunch of hungry grad students and I am not kidding when I say we can eat a lot!

For a group of 13 people, the dinner cost per person was 3000 yen (roughly about $35 USD), quite expensive in American standards, but it was well worth it. How often does one get the chance to eat a real authentic Japanese dinner? And by the way, they did not take credit cards. :)

The easiest way to put a baby to sleep is to give him classical music!

Thursday, February 19, 2009

Paper Review: Using Maximum Entropy for Text Classification

This paper is written by Kamal Nigam, John Lafferty, and Andrew McCallum, all from Carnnegie Mellon University. It was presented at IJCAI-99 workshop on machine learning for information filtering.

This paper talks about the use of maximum entropy techniques for text classification and compares the performance to that of naïve Bayes.

Maximum entropy is a general technique for estimating probability distributions from data. The main principle in maximum entropy is that when nothing is known, the distribution should be as uniform as possible, that is, have maximal entropy. In text classification scenarios, maximum entropy estimates the conditional distribution of the class label given a document. The paper uses word counts as features.

Training data is used to set constraints on the conditional distribution. Maximum entropy first identifies a set of feature functions that will be useful for classification, then for each feature, measures its expected value over the training data and take this to be a constraint for the model distribution.

Improved Iterative Scaling (IIS) is a hillclimbing algorithm for calculating the parameters of a maximum entropy classifier given a set of constraints. It performs hillclimbing in parameter log likelihood space. At each step IIS finds an incrementally more likely set of parameters and converges to the globally optimal set of parameters.

Maximum entropy can suffer from overfitting and introducing a prior on the model can reduce overfitting and improve performance. To integrate a prior into maximum entropy, the paper proposes using maximum a posteriori estimation for the exponential model instead of maximum likelihood estimation. A Gaussian prior is used in all the experiments.

One good thing about maximum entropy is that it does not suffer from any independence assumptions.

The paper used three data sets to compare the performance of maximum entropy to naïve Bayes. The three data sets are WebKB, Industry Sector, and Newsgroups. In WebKB data set, the maximum entropy was able to reduce classification error by more than 40%. For the other two data sets, maximum entropy overfitted and performed worse than naïve Bayes.

Video of the Day:

Liu Qian performing magic tricks at the Chinese New Year Show. Can you figure out how he did the tricks?

Wednesday, February 18, 2009

Full Moon Crescent Saber: Chapter 1 (2)

The girl was young and tender.
Ding Peng felt as if he could no longer breathe, and his heart pounded three times faster than usual.
He had never come so close to a girl.
That was not to say that there were no young girls in his hometown, or that he had never seen any.
He always tried very hard to be abstinent and had used numerous methods to do so: shoving snow into his pants, soaking his head in the creek, pricking himself in the leg with a needle, running, mountain climbing, doing cartwheels….
Before obtaining his fame, he would not allow such things to distract him. He would not let anything waste his strength.
But now, all of a sudden, he saw a naked woman, a young, beautiful, naked woman.
With snow white skin, firm breasts, slender and sleek legs….
It took him all his strength to turn his head away, but the woman ran to him and held him in her arms, begging while gasping.
“Help me! You must save me!”
She was so close to him. Her breaths were warm and sweet. He could even hear her heartbeats.
His mouth was so dry that he couldn’t even utter a single word.
The girl had realized the change in his body, and her face reddened. Trying her best to cover herself up with her hands, she asked.
“You…eh…can…can you take off your clothes and lend it to me?”
Although the robe was the only clothes he had, he took it off without hesitation. The girl calmed down a bit after draping his robe over herself.
“Thanks!” she said earnestly.
Ding Peng finally calmed down a bit himself and could finally speak out.
“Is there someone chasing you?”
The girl nodded, and tears quickly welled up in her eyes.
“This place is out of the way and hard to find. Even if someone comes for you, you don’t have to be afraid,” said Ding Peng.
He is a man, born with the instinct to protect women, not to mention such a beautiful girl. He held her hands in his.
“As long as me and my sword are here, you don’t have to be afraid.”
“Thank you,” the girl said gently, feeling reassured.
She seemed to have said those words before. Then she looked downward and closed her mouth.
Ding Peng didn’t know what say.
He was going to ask, “Why are you running? Who’s after you? Why are they chasing you?”
But he forgot to ask, and she didn’t say.
Though she draped the robe over herself, such a short robe simply cannot cover up a fully-grown girl entirely.
A girl like her has too many inviting places on her body.
His heart was still thumbing, only too rapidly.
After a long while he finally noticed that her eyes were fixed on his packet of beef stew.
This meal could very well be his last meal, for he only had one copper penny left.
However, he said without a second thought, “These foods are clean. Why don’t you have some?”
“Thanks!” the girl said again.
“Help yourself!” replied Ding Peng.
The girl really helped herself promptly.
Ding Peng could never have imagined how such a beautiful girl could eat like a horse.
She must have been hungry for a long time and suffered deeply.
He could even picture in his mind the kind of tragedy the girl had endured.
A lonely girl, stripped of her clothes by a bunch of villains, locked down in a cellar, without any food. After quite some struggle, she finally managed to escape.
As he imagined the scenes in his mind, she had almost eaten up all his belongings.
She finished off all the beef and bean curds. She even ate all the steamed buns. All that was left were no more than a dozen peanuts.
Even she herself was somewhat embarrassed. “You can have these.” she pushed the peanuts over and said in an almost inaudible voice.
Ding Peng smiled.
He really wanted to cry, but somehow he couldn’t help but smile.
The girl also smiled, her face blushing, as red as a pretty flower in the sunshine.
A smile not only can make people happy, but also can shorten the distance between two persons.
They were both more relaxed by now, and the girl finally told her story.
Ding Peng’s imagination was actually not too far from what she told.
The girl had indeed been kidnapped by a bunch of villains. She had been stripped of her clothes and locked in a cellar. For several days, she didn’t eat anything. Those villains thought she was too hungry to move about, so they became careless, and she took the opportunity and escaped.
“I am so lucky to have run into you!” She found words so pale for her gratitude toward him.
“Where are they? I’ll go with you to find them!” Ding Peng asked as he rubbed the hilt of his sword.
“You cannot go!” the girl gasped.
“Why?” asked Ding Peng.
“There are some things I cannot say, but I promise I’ll tell you later,” the girl said with hesitation.
It seemed that the story was more profound than what had surfaced. If she couldn’t tell, he wouldn’t ask.
“I need to find a person, and then I’ll be alright,” the girl said again.
“Who are you looking for?”
“An elder of mine. He is over seventy years old, but still likes to wear bright red clothes. If you see him, you’ll definitely recognize him.”
“Would you find him for me?” the girl lifted her head and asked gently, her beautiful eyes filled with plea.
Ding Peng of course couldn’t go. He indeed couldn’t go, and he really shouldn’t go.
It was less than two hours from the fight that would decide his fate for his entire life.
He was still hungry, and he hadn’t practiced his sword moves. He must cultivate his mood and retain his strength so he could face Liu Ruosong. How could he just go and find an old man he had never met before for the sake of a stranger girl?
Yet he simply couldn’t let the word “no” out of his mouth. It was really no easy task to say “no” to a face of a beautiful girl. It would really require a great deal of courage and a lot of nerves. A man can only learn how to say “no” after going through many painful experiences.
“Where can I find this old gentleman?” Ding Peng sighed in his heart and finally asked.
“You will help me find him?” The girl’s eyes brightened.
Ding Peng had no choice but to nod. The girl jumped up and hugged him.
“You are such a nice guy! I’ll never forget you!”
Ding Peng knew that in the rest of his life, it would be very difficult to forget this girl as well.
“If you follow the creek and go up, you’ll see an old tree with very strange shapes at the end of the creek. He is always there playing the game of Go when the weather is good.”
Today’s weather was very good indeed.
“Once you see him, it is very important that you mess up his game board first. That’s the only way he would listen to you and then follow you over!”
Aren’t all board game enthusiasts like that? Even if the sky is falling, they’d still finish their present game first.
“I’ll wait here. Whether you find him or not, please hurry back.”

Now support the translator Lanny by following my blog and leaving comments! :)

Picture of the Day:

My daughter turned 5 recently, but I only have candles for one 6 and two 4s (I know, I am a cheapskate). How many different ways can you find to get 5 by adding math operators to these three numbers? Come on guys! You should at least be able to come up with the four obvious ones!! And there are a lot more. If you come up with one, write it down as a comment, so other people can focus on the unsolved ones...

Tuesday, February 17, 2009

Paper Review: Detecting Spam Web Pages through Content Analysis

This paper was written by Ntoulas (UCLA) and et al. (Microsoft Research) and 15th international conference on World Wide Web, 2006.

This paper is continuing work following two other papers on detecting spam web pages by the same group of authors. It focuses on content analysis as apposed to links. The authors propose 10 heuristics and investigate how well these heuristics correlate with spam web pages using a dataset of 17,168 pages. These heuristics/metrics are then combined as features in addition to 28 others to build a training dataset, so machine learning classifiers can be used to classify spam web pages. Out of the several classifiers experimented, C4.5 decision tree algorithm performed the best, so bagging and boosting are used to improve the performance and the results are reported in terms of accuracy and the precision recall matrix.

The main contributions of this reference paper include detailed analysis of the 10 proposed heuristics and the idea of using machine learning classifiers to combine them in the specific spam web page detection application. Taking advantage of the large web page collection (over 105 million) and a good-sized labeled dataset (17,168 pages), the paper is able to show some nice statistical properties of web documents (spam or non-spam) and good performances of existing classifying methods when using these properties as features of a training set.
Not being an export in the IR field, I cannot tell which of the proposed 10 heuristics are novel ideas with respect to spam web page detection. However, fraction of visible content and compression ratio seem to be very creative ideas and look very promising. Using each heuristic by itself does not produce good performance, so the paper combined them into a multi-dimensional feature space. Note here that this method has been used in many research domains with various applications.

One common question IR researchers tend to ask is: how good is your dataset? In section 2, the paper did a good job acknowledging the biases of the document collection and then further provided good justifications. This makes the paper more sincere and convincing. The paper also did a good job explaining things clearly. For instance, in section 4.8, the example provided made it very easy to distinguish “Fraction of page drawn from globally popular words” from “Fraction of globally popular words”. Another example is in section 4.6 when the paper explained how some pages inflated during compression. I specifically liked how the authors explained the concepts of bagging and boosting briefly in this paper. They could have simply directed the readers to the references, but the brief introduction dramatically improves the experience for those readers who have not worked with such concepts (or are rusty on them such as in my case).
Although well-written, the paper still has some drawbacks and limitations. Firstly, section 6, related work, should really have been placed right after introduction. That way, readers can get a better picture of how this problem has been tackled in the IR community and also easily see how this paper differs. Also, this section gives a good definition of “content spam”, and it makes much more sense to talk about possible solutions after we have a clear definition.

Secondly, in section 3, the paper talks about 80% of all pages (as a result of uniform random sampling) being manually classified? I strongly suspect that is what the authors meant to say. 80% of over 105 million pages will take A LONG TIME to classify, period! Apparently this collection is not the same DS dataset mentioned in section 4 because the DS dataset only contained pages in English. So what is this collection? It apparently is a larger labeled dataset than the DS dataset. From Figures 6, 8, 10, and 11, we see the line graph touching the x-axis due to possibly not enough data. Using this larger labeled dataset (of the English portion) might have produced better graphs. Another thing I’d like to mention here is that spam web page is a “subjective classification” (at least for me it is). Naturally I’d think the large data collection was labeled under a divide-and-conquer approach, so each document is only looked at by one evaluator. If this were true, then the subjectivity of the evaluators plays an important role on the label. A better approach would have been having multiple evaluators working on the same set of web pages and label following the majority vote to minimize each evaluator’s subjectivity.

Thirdly, when building the training set, the proposed 10 heuristics are combined with 28 other features before applying the classifier. I think it would be better to compare results of using only these 10 features, using only those original 28 features, and using all features combined. That way, we can better evaluate how well these additional 10 heuristics contributed to the improvement of the classifiers.

Additionally, in section 4.1, the paper says “there is a clear correlation between word count and prevalence of spam” according to Figure 4. I failed to see the correlation.

Lastly, the experiment results are only for English web pages. Since the analysis in section 3 (Figure 3) clearly indicate that French and German web pages contained bigger portions of spam web pages, it would be great to see how proposed solution works with those languages. I understand the difficulty of working with other languages, but it would really improve the paper even if only some very initial experiments were performed and results reported.

There are other minor problems with the paper as well. For example, for each heuristic, the paper reported the mode, median, and mean. I think it is also necessary to provide variance (or standard deviation) because it is an important descriptor of a distribution. I would also suggest using a much lighter color so that the line graph is more readable for the portions where it overlaps with the bar graph. Dr. Snell once said that we should always print out your paper in black and white to make sure it looks okay, and I am strong believer of that! Also in section 4.3, the authors meant to say the horizontal axis represents the average “word length” within a page instead of “number of words”.

I think it’s worth mentioning that the authors did an awesome job in the conclusions and future work section. Detecting web spam is really like an “arms race” between the spam filter designers and spammers. As new technologies are developed to filter spam, spammers will always work hard to come up with ways to break the filtering technology. This is an ongoing battle and degradation of the classification performance over time is simply unavoidable.

This is a well-written paper that showed excellent performance, and I certainly enjoyed reading it. I’d like to end this report with a quote directly from the paper which is so well said:

“Victory does not require perfection, just a rate of detection that alters the economic balance for a would-be spammer. It is our hope that continued research on this front can make effective spam more expensive than genuine content.”

I just learned recently that Superman's father is the Godfather!