Asymmetric Computing: March 2009

Wednesday, March 25, 2009

OnLive

A company called OnLive has been making the news lately with their announcement of online purchases of games and a new model of gaming. This new model is intriguing, because it hooks our gaming experience into cloud computing. Essentially, we will game via streaming video, with the game video being rendered on a remote "cloud" computer somewhere. This has lots of advantages in terms of not needing noisy, expensive top-of-the-line hardware to play fancy games. Instead, THEY have the fancy hardware and render the game for us, then stream it to our browser or our TV via a presumably inexpensive console.

Will this work? There are many challenges, with latency being the biggest one. Controller events need to be captured, sent to the server system, where they impact the gameplay, which then causes feedback onscreen, which is streamed to you. This round-trip latency may be noticeable if it is much more than your brain's control loop time. Existing multiplayer games experience this as "lag" and it is very annoying, so it may be a problem here too. The LA Times quotes an OnLive exec as saying they want to bring that latency down to 1 millisecond. While they may be able to use prediction and other things to reduce perceived latency, actual packet transfer time is bounded by the speed of light. This means a packet could, at most, travel on the order of 186,000 miles in a second, or 186 miles in a millisecond. (Best case, of course, as there is overhead and signal propagation is often slower than the speed of light.) Therefore, unless they place their servers in every city, 1 ms doesn't make much sense. But then again, perhaps the newspaper misquoted or misunderstood and I am just interpreting it wrong.

I wish OnLive well and look forward to seeing how well it works.

Sunday, March 15, 2009

Cluster and CUDA troubles abound

Running this cluster for my class is more trouble than it's worth and I won't do it again. Research cluster, yes - teaching cluster, no.

Yesterday, one of the students ran the frontend node out of memory. Of course, that meant I couldn't log in to reboot the thing, so I had to go to campus, find which entrance of the building is open on weekends and isn't closed by construction, and press the reset switch. Very annoying. I am disappointed that Linux is that fragile and that mechanisms wouldn't be in place to prevent this sort of problem. Surely crashing that program and freeing the memory should have solved it, but obviously didn't.

An even more annoying problem has to do with the Tesla cards. It looks like the drivers keep crashing. After a while, suddenly programs are no longer able to open the appropriate /dev entries for the NVIDIA cards. Clearly the students could be running buggy programs and such, but that should be expected. I think the drivers must not be recovering properly after a crash or something. Even running nvidia-smi won't fix it. A reboot will, but I discovered "rmmod nvidia" will remove the driver module and clear the problem, then nvidia-smi will cause the driver to reload and reinitialize. But bloody annoying, because the root user needs to do it, and I'm not giving the students root...

Wednesday, March 11, 2009

New cluster at UCI

Though it isn't an asymmetric system there is a new cluster at UCI for researchers and grad students. See the following link for details:

http://news.nacs.uci.edu/2009/02/new-computing-cluster/

This is good news and seems to be a nice resource for the UCI HPC community.

Monday, March 9, 2009

Interesting parallel computing blog

My wife pointed out an interesting parallel computing blog:

http://perilsofparallel.blogspot.com/

It's well worth checking out. The long posts are clearly written and helpful. Take a look. I particularly like the car analogy for multicore processors (i.e., two 62 MPH cars are better than one 120 MPH car?). The author, Greg Pfister, has wide interests, from cloud computing to Larrabee.

Wednesday, March 4, 2009

Cluster Stable

The cluster has been stable for several days now with a Core2Quad frontend node. I don't know what was causing the Phenom frontend node to fail, but as it was my personal machine, I'll take it home and diagnose it after the quarter ends. For now, the students are able to make their parallel programs work on the cluster and finish their projects. Yay!

Asymmetric Computing