My Photo

Subscribe to this blog

AddThis Social Bookmark Button

Get an e-mail subscription

  • e-mail subscription
    Enter your Email


    Powered by FeedBlitz
  • Google Ads

Books I Recommend

  • This is the classic book that redefined modern thinking on business innovation. A must read.


  • This is a great book on design by one of the founders of IDEO.

Tools

  • Search This Blog

Analytics

Blogroll

My Flickr

  • www.flickr.com

Top 100 SF Restaurants Redux

The 2007 San Francisco Chronicle Top 100 Restaurant list was just published today, so I have revised my interactive chart for exploring various aspects of the top entries in San Francisco. I did this previously for the 2006 list using IBM’s Many Eyes, analyzing the SF entries and evaluating the relationships among the rating categories.

Click on the figure to the right to be taken to the website where you can easily plot data on the top SF restaurants according to your choosing. I’ve chosen to plot Food Rating on the horizontal axis, Price on the Vertical axis, and Overall Quality as symbol size. After you click on the figure, you can hover your mouse over each data point and see to which restaurant each corresponds. 

As with last year’s list, Ton Kiang continues to be the best valued restaurant in San Francisco (near the right-bottom: high food quality, low price), with Chow, Range, and Delfina among the next best valued.

One wonders why Kokkari is still in the Top 100 given how it sticks out from the crowd (it’s the tiny dot near the left-top: low food quality, high price). Kokkari used to be a top restaurant in the city, and I wonder if its continued Top 100 presence is simply due to lethargy in updating the list. I noticed that every new restaurant that I added to the list this year had solid 3 ratings across the board while Kokkari has mediocre ratings in each category—which is painfully obvious using the Many Eyes plot—except in atmosphere. There are 73 restaurants in San Francisco with an Overall Quality rating of 3 stars or more—why does the Top 100 list only 52 of them? And why do Hog Island Oyster Company and Tartine Bakery, both entries in the Top 100 Restaurant list, not have any Overall Quality rating at all from the SF Chronicle? Is it because they do not really qualify as restaurants?

As with last time, I did a correlational analysis of the restaurants to see what was responsible for the high ratings, and to see what higher prices bought a customer and what, if anything, is associated with good service.

Rest Corr 2007

The data above (see my previous post for an explanation of what this analysis means) shows how each category rating is related to each other. The closer to a value of 1, the more the two categories are related; the closer to a value of 0, the less the two categories are related.

Many of the same relationships hold from last year. The Overall Quality score was predominantly determined by Food Rating (correlation 0.88).  This reflects what criteria chief restaurant critic, Michael Bauer, uses to come up with his Overall Quality score (what the SF Chronicle refers to as a restaurant’s rating).

Noisiness was more correlated with the overall score this year (-0.35 this year vs -0.18 last year). Paying more will now get you slightly better service (0.36 this year vs 0.22 last year), although a correlation of 0.36 is still pathetically small—I’d like to see this number well above 0.5. Last year, the only thing that paying more seemed to get you was better atmosphere, but this year the Price is as correlated with Atmosphere (0.6) as it is with Overall Quality (0.56).

Note: I have not considered levels of statistical significance in this analysis, nor have I considered partial correlations which would be a more accurate but more time-consuming approach to this analysis.

Data Sharing Article in Nature

The weekly science journal Nature just published an article on online data sharing that quotes me. My comments are from an e-mail exchange that I had with their Senior Reporter Declan Butler about the potential of new online data sharing sites such as Swivel and IBM’s Many Eyes. I’ve posted about Many Eyes before.

DataAccording to Declan’s e-mail to me, some scientists are already using these new tools to share sequence and microarray data. The potential value from scientists openly sharing their data is huge, possibly akin to the value provided by open-source software development. More people exploring data is always a good thing, and someone could discover meaningful information in data that the original owner/researcher missed. Or one's interests might be different than that of the original owner/researcher and thus one could analyze the data in a different way that is meaningful to questions not investigated by the original researcher. In a scientific publication, the author can't produce every possible permutation of the data that the readers might want, so letting the "reader" explore the data themselves through online accessibility has value. As Edward Tufte says in his book Visual Explanations,

When assessing evidence, it is helpful to see a full data matrix, all observations for all variables, those private numbers from which the public displays are constructed. No telling what will turn up.

(Thanks to Squaring the Globe blog for providing this quote.)

Anyone who has tried to obtain the raw data behind published research, however, knows that it can be difficult to get for many reasons: researchers have difficulty retrieving the data from media that is no longer used, researchers not having the time to search for and provide the data in an understandable format, researchers simply not wanting to lose any perceived advantage in pursuing future funding.

I’ve thought that a way around this is for NIH (or whatever the funding organization is) to require that all data from NIH-funded research be submitted to the NIH and be made publicly available. There are many difficulties with this proposal, of course, not the least of which is ensuring that others know how to read and interpret the data. The potential for misinterpretation would be huge. One possible solution to this would be to make available only data associated with a publication that details the methods and procedures of the data collection. This could become a policy that the publishing journal mandates rather than the funding organization.

I’ve been told that a proposal was made within the NIH to do just this several years ago for a discipline that is data-heavy, but the scientists in that field shot down the idea for several reasons, one of which was that they didn’t want any errors in their own data analysis discovered. Whatever the reasons, published figures and tables have been the primary form of information transmission of data for hundreds of years. With today’s electronic tools, there is no reason to limit our data sharing ability to techniques developed centuries ago.

Data Visualization on Steroids

I’ve talked about Hans Rosling’s presentation at TedTalks before, but given my recent post on IBM’s new data visualization site and my newfound ability to embed video on my blog, I’m going to promote this amazing talk once again.

Rosling talks about world health, but in the process gives a master class on data visualization. You’ll never look at Excel’s chart plotting function the same way again.

At one point (around the 13–minutes into the video), Rosling is displaying five dimensional data on a two-dimensional figure by using the x- and y-axis, data symbol size, data symbol color, and a trail that shows how the data changes over time. Wow.

Rosling has been invited back to speak at this year’s TedTalks. Another speaker of interest to readers of this blog is MIT Media Lab’s design simplicity guru John Maeda. I’ll see you all there (kidding). Any TedTalks speakers who would like to speak at a research center in Berkeley, e-mail me (not kidding, although not delusionally optimistic either).

San Francisco Dreams, Fast Food Nightmares

The IBM Visual Communication Lab has created an interesting website called Many Eyes that allows people to get an intuitive understanding of data through different visualization techniques. The site provides a wide array of formats for data display, including the recently developed format of Tree Maps that I’ve found useful for analyzing my hard disk content using SequoiaView.

The Many Eyes service is free and easy to use. One can both upload data and explore visualizations of data others have uploaded. I’ll show you two that I’ve created.

FA(S)T FOOD FUN
Someone uploaded to Many Eyes the nutritional content of items from McDonald’s menu, so I decided to create a chart that I thought would highlight the good, bad and ugly. Using a scatterplot, the horizontal axis shows trans fat content, the vertical axis shows saturated fat content, and the size of each data symbol shows the cholesterol content. Click on the figure to the right to go to the visualization I created that allows you to see which data points are for which products and to create your own visualizations from this data set.

Plotting the trans fat, saturated fat, and cholesterol of the McDonald’s menu in this way makes serveral facts obvious:

  • The Deluxe Breakfasts should come with a defibrillator. They are the large-sized data points in the top right that max out all unhealthy categories.
  • Beware of products that proclaim low trans fat. Look for the small dot at the top left which is the Double Quarter Pounder with Cheese. It has very low trans fat and very low cholesterol, but more saturated fat than any other product they sell.
  • There are some products with small-size data points in the lower left corner indicating healthy goodness. Next time that I’m stuck at an airport and the only place to eat is McDonald’s, I’ll be eating an Asian Salad with Grilled Chicken or the Premium Grilled Chicken Sandwich. Well…more likely I’ll wait until I get home and use the visualization described below to decide where to eat.

SAN FRANCISCO TREATS
I love eating out, so I uploaded all of the ratings data for San Francisco restaurants from the San Francisco Chronicle’s Top 100 Bay Area Restaurants. Each restaurant was given ratings for Overall Quality, Food, Service, Price, Atmosphere, and Noise Level. Higher is better for all ratings except for Price where higher means more expensive, and Noise Level where higher means noisier (unless, of course, you like deafening crowds).

I created a visualization that readily displays the Overall Quality, Food and Price ratings. Click on the figure to the right to see which restaurants correspond to which data point. Restaurants further to the right in the plot have better food, ones higher up are pricier, and ones with larger data point sizes have better overall ratings. (Note that I had to add some randomness to each data point so that individual restaurants could be seen, otherwise too many restaurant data points fell on top of each other, making individual restaurants impossible to see as exhibited in my first attempt to visualize this data).

Comparing different aspects of restaurants using the Chronicle website is difficult and the advantages that some restaurants have over others are not obvious. This Many Eyes chart makes certain facts about these top restaurants very clear:

  • La Folie and Fleur de Lys reign supreme. Personally, I’m a little surprised by this because my one visit to La Folie wasn’t nearly as impressive an experience as at Gary Danko or Masa’s.
  • Ton Kiang provides the best value of all restaurants according to the Chronicle ratings. You’ll find it’s data point in the lower right: a high Food rating and a low Price rating. From personal experience, I can also add that it ranks high on the Sidewalk Waiting rating for the long waits on Sunday mornings to get in for dim sum.
  • What are Kokkari and House of Prime Rib doing on this list? They are the data points to the far left indicating low Food ratings. A quick change of the axes using the Many Eyes tools show that they are also low on Service but have decent Atmosphere ratings. High prices, mediocre food, poor service and good atmosphere: not a great combination for Top 100 restaurants.

I did a little other manipulation of the data axes to pull out interesting information. The figure on the right shows Food rating along the horizontal axis and Overall Quality along the vertical axis (Again, click on the figure to explore the individual data points). You’ll notice that the Food rating has a strong relationship to the the Overall Quality rating. If the horizontal axis is changed to service, price, or atmosphere (go ahead, click on the figure and change the axes yourself), you’ll find that these categories are not so strongly related to the Overall Quality rating, indicating how strongly food quality impacts the overall rating of a restaurant (as well it should).

If you pay more at one of these restaurants, are you more likely to get better food? Nope, the figure on the right shows that there is a slight trend to getting better food the more that you pay, but not much. A similar plot of Price versus Service indicates a similar disconnection. What you mostly pay for, according to this data, is ambiance: plotting Price against Atmosphere does show that the Atmosphere rating tends to increase as the Price rating increases.

To understand these relationships more precisely, I decided to do some statistical analysis on my own (not an ability available on the IBM website). The chart below shows the correlation matrix for each of the restaurant ratings. Numbers vary from 0 to 1, where a 0 means that two factors are uncorrelated, while a 1 means that the two factors are perfectly correlated. The two factors being compared by each number shown are the categories associated with the row and column of each number.

Ratings Correlation

Some interesting insights from the correlation matrix are:

  • The rating category most correlated with Overall Quality is Food, with a correlation of 0.89.
  • Price is more correlated with Atmosphere (0.6) than Food (0.45) . This means that by paying more, you have a better chance of increasing the look of the restaurant than the quality of the food.
  • I’m very surprised by how little Price is correlated with Service (0.22). Paying more appears to have little effect on the quality of the wait staff. This may be due to the expectations of the reviewer: higher-priced restaurants might have gotten penalized more in the Service rating for service faux pas (such as not providing clean utensils between courses) than lower-priced restaurants.
  • The noise level of a restaurant had next to no impact (-0.18) on the overall rating assigned to the restaurant.

That’s all I’ve got to say on San Francisco restaurants for now.

The service provided by Many Eyes is an interesting one and demonstrates how plotting data in the proper way can quickly pull out relationships and interesting features of a dataset. I’m sure the Edward Tufte would approve. I look forward to IBM allowing people to embed these visualization applets on their website, which would allow the SF Chronicle to provide this service on their own website.