The key to understanding the main ideas of this article is knowing its purpose. The article is written by two researchers at the Intelligent Systems Program of the University of Pittsburgh. This means that the main point of the research is to help develop artificial intelligence systems that can understand natural human speech.
One of the major problems faced by researchers working on speech recognition is that the meaning of a phrase may vary with the tone. For example, imagine the phrase, "Yeah, right" in two different contexts.
1. "Let's go to Luigi's for lunch. They have an all-you-can eat pizza deal." "Yeah, right."
2. "So your next car will be a Ferrari?" "Yeah, right. When I win the lottery."
In the first case the phrase signals assent and in the second case incredulity. While humans can distinguish easily between the two, machines cannot.
This paper is suggesting that there is a way to use prosodic clues to distinguish between humorous and serious uses of the same phrase. The authors argue that rather than using the lexical clues favored by other researchers, they think a better method is to use prosodic elements such as pitch, intensity, and tempo. They use the comedy "Friends" as a test case because it has laugh tracks following comments intended to be funny, and thus it is easy to distinguish humorous from serious comments.
The paper analyzes acoustic data from the show and concludes that:
[W]e found that humorous turns tend to have higher tempo, smaller internal silence, and higher peak, range and standard deviation for pitch and energy, compared to non-humorous turns.