Search within Lanny's blog:


Leave me comments so I know people are actually reading my blogs! Thanks!

Saturday, January 17, 2009

Paper Review: The Music Notepad

This paper is written by researchers at Brown University and published at UIST'98.

Notating music can be done with a common UI with windows, icons, menus, and point-and-click (WIMP) such as those used in popular software synthesizers and editing tools (e.g. Cakewalk). However, the user model of using paper and pencil is very different and is more desirable because of the simplicity. This paper presents a system that allows the musicians to create music notation directly using a stylus on a Tablet PC.

 

The system described in this paper followed some previous work from Buxton, but added more features. The notation system allows the drawing of: notation symbols, Beams, Accidentals, Clefs and key signatures. Editing included region selection (lasso), copying, pasting, and deleting (scribble or text editing type delete gesture). The user can also assign instrument and view more of the music score using a perspective wall metaphor.




The authors developed an alternate method for entering notes by "scribbling in" a notehaed. This is different from Buxton's gestures (which had bad user experiences). This allowed accurate placement of symbols because an average position is used. This is also natural to the user because that's how they do it on paper. However, this method could be slower than point and click and also does nto convey the note duration. The video below shows how the system works.



To evaluate the system, the authors asked some users to try the system and then performed some informal interviews.

What's great about this paper is that it is the first in using gesture recognition to tackle the problem mentioned. The weak spot of the paper is its evaluation. If a more formal user study is performed to specifically measure certain aspects of the user performances by comparing old vs. new systems, the results would be more convincing. On a side note, the paper mentioned about estimating probability of posted tokens. I wish the paper had discussed more about how probability is calculated.

You can follow this link to read more about this project at Brown University.

In my humble opinion, a good UI is one where there’s minimal amount of learning/training/practicing involved. To the user it almost seems that all the designs are natural and logical conclusions (based on normal experiences of a standard user – with a certain profession or within a certain era). There might be better and more efficient ways (e.g. I can type a lot faster than write, and my handwriting is ugly), however, it might take a lot of training and practice in order to achieve the efficiency. In such cases, the best thing to do is probably to give the user the options so he/she can pick the way he/she wants it. Some incentives (with proper tutorials and demos) might be helpful to try to persuade the user to move toward the more efficient method, so he/she will endure the (maybe painful or dull) training and practice for higher efficiency. The important point is to let the user make the decision himself/herself. A forceful push toward the new method will only generate resentment (e.g. Windows Vista).



A user judges a solution based on how easy it is to to use, not how great the designer thinks it is.



Friday, January 16, 2009

AI and Robots: StarCraft AI Competition to be held at AIIDE 2010

The Fifth Artificial Intelligence for Interactive Digital Entertainment Conference (AIIDE 2010), one of the conferences organized by Association for the Advancement of Artificial Intelligence (AAAI), will be held in October 2010 at Stanford University (as always). And the organizers have recently announced that they will be hosting a StarCraft AI Competition at the conference. AI researchers all over the world will have the chance to let their AI system compete in a Real Time Strategy (RTS) platform, and the final matches will be held live at the conference.

The idea of having AI agents compete with each other in gaming environments is nothing new. In fact, in one of the AI classes I took at BYU, we had to program agents to compete with other teams playing the game of BZFlag, a Capture the Flag game using tanks. The winning team gets an automatic A for the class. That was certainly a lot of fun, even though we didn't win the end of semester competition (because of a bug that confused our agents occasionally between home base and enemy base, doh!), we, as human players, had a hard time beating the agents we created ourselves.

In 2007, I went the the AAAI conference held in Vancouver, BC. At that conference, there were two live AI competitions. One was the General Game Playing Competition, where AI agents would compete in games they have never played before (all they know is the game logic at the competition time). The winning agent then played a game of Pacman against a real human player, and was able to force a tie! The other one was the Computer Poker Competition, and the winning agents challenged two real-world Vegas professional poker players with real money on the table ($50,000). Although the professional poker players narrowly defeated the poker playing software, the two players felt as if they were playing against real human.

What makes this StarCraft AI Competition unique are:
  • StarCraft is a very popular game with a commercial rendering engine and beautiful graphics.
  • It is a Real Time Strategy (RTS) game where the player controls many characters at the same time and had to manage game play strategies both at the macro and micro level.
The following video shows the kind of game play one would expect to see in StarCraft. Make sure you watch the HQ version in full screen mode to really appreciate the beautiful real-time graphic rendering.


Follow this link to get more info about how to use Broodwar APIs to write bots to work with the StarCraft game engine. If I haven't been buried in papers Piled Higher and Deeper, I probably just writing some agents for fun!

There are, of course, other commercial game engines used for AI and robotics research. For example, the game engine for the very popular First-Person Shooting game Unreal Tournament had been turned into USARSim (Unified System for Automation and Robot Simulation), a high-fidelity simulation of robots and environments.


Now my question is: when will EA Sports ever release APIs for their FIFA 2010 video game, so I can write software agents that play the game of soccer like real professionals (at least graphically)?



Picture of the Day:


 
BYU Computer Science Department Building
(See that big Y on the mountain?)

Thursday, January 15, 2009

Robot of the Day: Aida, Your Driving Companion

[Don't get confused with the dates. You'll find that I frequently travel back and forth through time -- in my blog. :) ]


Aida is a robot built by Mikey Siegel from the MIT Media Lab for a research project at Audi. It is suppose to be a driving companion, something to be installed in your car!

During the summer of 2009, when I was doing an internship at the Intelligent Robotics Group in NASA Ames, I met Mikey for the first time. He was on his way to Audi Research Center located at the heart of the sunny Silicon Valley to present the robot he had built for them, but decided to stop at NASA Ames first to show us the robot, because he used to be an intern here at the IRG.

The purpose of the robot is to experiment with the idea of using a robot to influence people's driving behavior. Researchers hope to use the movement of the robot (really just the neck movement), the different facial expressions, and the robot's speech to encourage people to drive more safely. This required the robot to be able to communicate with human with many social cues, which was exactly the research topic at the Personal robots Group at MIT, led by Dr. Cynthia Breazeal, Mikey's advisor.

According to Mikey, the robot was built within a three-day period (I assumed that he didn't really get much sleep), which caused all our jaws to drop. The lovely head was printed off a 3D printer, and he also machined all the mechanical parts himself. However, to the fair to the other members of his lab, he added, the neck design was a copy from another project, the animated eyes and mouth movements were created by a friend (if I remember correct, someone from Pixar), and the software control was a mixture of modules previously developed at MIT and open source libraries such as OpenCV.

When Mikey demoed the robot to us, Aida was able to recognize faces. It became excited when it was surrounded by many people, and acted bored when it was left alone. The animated emoticons projected onto the plastic face from the back of the head made the robot look very cute, and the smooth neck movement made it almost appear "alive". At that time, the only sensor it had was a video camera mounted on the base (not moving with the neck or head), but eventually, Aida will be equipped with more eyes (cameras) and ears (microphones), so it can sense the world around it better.




Having a cute robot interacting with people in their cars sounds very cool, however, I am not so sure it is such a great idea.

First of all, could it be possible that the moving robot might distract the driver with its cute winks? I couldn't help but remember those signs next to bus drivers I used to see when I was a young kid: "Do not talk to the driver!" These days, when many states are making it illegal to talk on cell phone while driving, what would they think of a robot that not only talks to the driver, but also try to get the driver to look at it?

Secondly, don't you get annoyed sometimes when your better half keeps criticizing your driving skills (or was that just me)? Now imagine a robot, nagging constantly right next to your ear like your dear Grandma, telling you that you are driving too fast, or that you hit the brake too hard. Especially after you rear-end someone, I am sure a nagging robot saying "Told you! Told you to not follow so closely" would be the last thing you want.... (Disclaimer: I have never rear-ended anyone!)

On the other hand, for those LA solo commuters who get stuck in traffic many hours regularly (I was recently stuck in LA traffic for hours, so I know!), Aida would make a great driving companion! And I certainly wouldn't mind such a cute robot making a conversation with me, while my car drives itself to my intended destination!

Video of the Day:

If you were there at the Liverpool Street Station on January 15, 2009, would you have joined in?

Tuesday, January 13, 2009

AI and Robots: Highschool Students Register With Their Faces

In a previous post we discussed challenges to facial recognition apps and what people had to do (or choose to do) to get by (or bypass it). Does that mean the technology is not ready for the real world? Today we'll see a case where it is used in real world environment and is actually working quite well.

At the City of Ely Community College in UK, sixth-graders are now check-in and out of school registers using their faces. The facial recognition technology is provided by Aurora and the college is one of the first schools in UK to trail the new technology with its students.

So how does the technology work? The scanning station is equipped with infra-red lights and a regular video camera. Each infra-red sensor actually has two parts: an emitter and a receiver. The emitter shoots out an series of infra-red signals and the receiver detects the infra-red lights deflected back by objects in front of the sensor (a simple example would be the auto-flushing toilets in public restrooms). Then by analyzing the strength and pattern of the received signals, the sensor can sense how far the object is from the sensor. This allows the scanner to create a range (depth) image of the object in front of it. So the resulting image is a 3D surface, unlike a regular 2D image from a camera.

Combining this 3D surface with the 2D image taken from the video camera, features are extracted from the entire data set, then each set of features is tagged with a student ID (we know which face it is because each student has to be scanned at the very beginning so the data can be stored in the database). At the time of the scan, it is a simple machine learning classification problem, and I suspect that they probably just used nearest neighbor to match features with an individual student. You can click the image below to see a video of this from the original news article.

Click image to see video.
So how do people like this high-tech face recognition system? Principal Richard Barker said:
With this new registration technology, we are hoping to free up our teachers' time and allow them to spend it on what they are meant to be doing, which is teaching

As for the students, they love the idea of taking responsibility for their ow n registration and using Mission Impossible-style systems.


So why did this specific application turn out to be a success? That's the question we really should be asking. I think we have to attribute the success to the following factors:
  • This is a combination of 3D depth image with a 2D image, which allows the creation of many features (and some of them got the job done).
  • The college has a relatively small number of six-grader students. Classification becomes easier when you don't have to recognize a face out of millions of faces (like in the airport security check case).
  • The student is also required to enter a pin. This further improves accuracy. I guess the facial recognition technology is really there to prevent students from signing other people in and out.
  • Most importantly, the consequence of errors is very low. What if a face is not recognized correctly? The worst that could happen is a erred record in the registration. It's not like that the student would be marked as a terrorist at an airport, which could have severe consequences.
I certainly hope to see more and more successful facial recognition applications out there people can focus on what they enjoy to do instead of what they have to do.

Picture of the Day:

I think this would make a perfect picture for today.
Here I present: Lanny in 3D





Monday, January 12, 2009

AI and Robots: No Smile Allowed, When Technology Is Not Good Enough.

Since I've been struggling with my hand recognition application, which is far easier than face recognition, I thought I discuss some more about facial recognition applications.

In a previous post, I talked about how current facial recognition built-into laptops can easily be hacked. Today we'll talk about another real application of facial recognition, and specifically, what do people do when the technology fails.

About 20 states in the US use facial recognition technology with driver's licenses. To fight identify fraud, one standard procedure at DMVs is that the DMV employee would looked at the old photo of a person to see if it looked like the person seeking a new license. Using facial recognition technology, this step can be automated to improve efficiency, and the technology also, supposedly, allows the detection of facial features that are not easy to recognize by human, thus improve the accuracy of the detection.

The Indiana Bureau of Motor Vehicles recently rolled out a new set of rules governing how people must be photographed on their driver's license photos. Unfortunately, Indiana drivers are no longer allowed to smile. Smiling is taboo alongside glasses and hats.

What's going on here? Turned out the new restrictions are in place because the smiling can distort facial features measured by the facial recognition software according to BMV officials.

It is very interesting to see the kind of restrictions placed on users when the technology should have done the job. Here's something that for sure will improve the accuracy of the facial recognition even more: How about requiring all drivers to get a crew cut (men and women) and to be clean shaven?

I simply can't resist to show this picture below, which is part of the grooming standard in BYU's Honor Code, which I am openly opposed to.


Facial recognition technology was also tested at airports in hope to detect terrorists, but failed miserably, as expected.

"According to a story by the Boston Globe, the security firm which conducted the tests was unable to calibrate the equipment without running into one of two rather serious problems. When it's set to a sensitive level, it 'catches' world + dog. When it's set to a looser level, pretty much any idiot can escape detection by tilting his head or wearing eyeglasses."


The most popular facial recognition algorithm used today is SVM (Support Vector Machine) because of its good performance with real world data. The video below demonstrate how well the algorithm works (also using Gabor wavelets).




Anyway, I think there is still a long way to go for facial recognition technology to be useful in serious applications. Frankly, I am no good at facial recognition myself. A lot of times, I rely on hair style, glasses wore to help me remember people's faces. However, I don't think it is a good idea to impose lots of restrictions on the user because the technology is not good enough. That's my 2 cents.

Newton Moment: when you do things that are considered silly by normal people simply because you are too focused in thinking about your research.

Exceeding wife's tolerance threshold for the number of Newton Moment per day can cause serious consequences.



Video of the Day:
Try detect this face!