IIT Bombay researchers survey different approaches to automatically detect sarcastic texts using computers
The Internet is the world’s largest ‘suggestion box’. Haven’t we all looked at the reviews left
by millions before we buy a product online or decide to watch a movie?. For online marketplaces, these reviews can help or dent their revenues. Hence, companies scramble to ‘understand’ the sentiment of their customers using tools that can read millions of such reviews each day and detect if the customer is happy, angry or disappointed. One such sentiment that has been difficult to comprehend is sarcasm. A team led by Prof. Pushpak Bhattacharyya of IIT Bombay, have analysed various ways of detecting sentiments, especially sarcasm, in online text, using a computer.
“Large organisations (political, commercial, etc.) use social media to understand what people think of them. This can also be applied to text such as employee appraisals, feedback forms in hotels, etc.”, explains Dr Aditya Joshi, and an author of the research published in ACM Computing Surveys. Online reviews come in various forms. Some are straightforward, clearly saying if the customer is happy or not. But there are some like this—“Absolutely incredible! This dye turned my hair the exact shade of red I’ve always wanted!” At first glance, it might appear to be a positive statement, unless this was a review for black hair dye. If a potential buyer knows that extra piece of information, she will be able to decipher the sarcasm in the review! Now, what if a computer is trying to analyse the sentiment? Can it detect the sarcasm? It can, says the current study as it sheds light on the different approaches used for detecting sarcasm in a body of text.
“Sarcasm has been traditionally known to be a pain in the neck for many sentiment analysis systems. Therefore, a focused research on detection of sarcasm becomes useful,” says Dr Joshi talking about the motivation behind this study. Dr. Joshi, Prof Pushpak Bhattacharyya and Dr Mark Carman have co-authored a book, Investigations in Computational Sarcasm, which provides a holistic view of past work in computational sarcasm and the challenges and opportunities that lie ahead.
Approaches to detecting sarcasm
Just like humans, computers also ‘learn’ to detect sarcasm by looking at various examples of it. One way to help computers learn is to ‘train’ them using large-scale data from Twitter where hashtags like #sarcasm and #not indicate sarcastic tweets. But, Twitter is only one mode of communication, there are other websites without helpful hashtags that may also contain sarcastic text.
Therefore, one of the early approaches in this field was to identify sarcastic patterns like positive verbs with negative phrases, as in “I love being awake at 4 am with a headache!” and then use them as variables for a sarcasm or statistical classifier—a function that trains the computer to identify sarcastic text using a data set of examples. One of the studies described in the paper shows that using the Logistic Regression classifier---a classifier that checks if certain conditions are true or false for a statement---is accurate about 81% of the time.
Another method of obtaining samples is using human annotators to manually classify text as sarcastic or not, especially where the text is long and identifiers like hashtags are not present. This, however, presents a unique problem considering that annotators come from different backgrounds. “One of our papers talks about the role of cultures in detecting sarcasm. We compared Americans and Indians and how their perception of what constitutes sarcasm may differ,” explains Dr Joshi. The statement “It’s sunny outside and I am at work. Yay” was considered sarcastic by American annotators but non-sarcastic by their Indian counterparts due to India’s climate.
As we well know, sometimes even people struggle to understand the tone of emails and texts, or to tell if someone is being sincere when they exclaim, “I love your hair!” Then how does a computer understand the tone? Certain statements are obvious in their causticity - “Being ignored is the best feeling ever!” is rarely, if at all, to be taken at face value. On the other hand, the sentence, “I enjoy working on math problems all weekend!” could be construed as sarcastic for people who despise Mathematics, but a true statement for those who love the subject. In many instances, context is the key. Just as humans need a backstory, so do computers, and the recent trend in computational research is to make use of just that. In the case of tweets, the ones that are not specifically identified by the author as sarcastic require some backstory, i.e. the remarks that came before and after the target text. It gives an idea of the author’s past sentiment regarding that topic and helps the computer to gain an understanding of the person’s mindset and whether he or she was being sarcastic or not when he or she tweeted “Politicians are never wrong #politics”.
Sarcasm comes in flavours
Other directions that sarcasm detection has begun to take are identifying the different types of sarcasm, differentiating between irony and sarcasm, and incorporating language and culture-specific traits in the evaluation process. Although much of the research exists for English, the same methods can be used for other languages. “Since machine learning methods are based on extracting appropriate indicators of sarcasm, the methods hold, given that corresponding dataset in the respective language or culture is available. However, a (single) universal sarcasm classifier would be a myth,” states Dr Joshi. The dataset or the features of the classifier would have to be modified based on the language and culture of the area. “Mapping sarcasm detection approaches across cultures and languages is one possible way forward,” he adds.
However, in training systems to understand sarcasm, is there a possibility that these systems use sarcasm on us as well? Not automatically, says Dr Joshi, but they can be programmed to be sarcastic in the right situation. Siri, Cortana, or digital assistants on shopping portals are chatbots. They are computer programs that talk/act like humans. “Bots can likely be trained to be sarcastic, but they cannot be sarcastic all the time. So, if chatbots are mere assistants, it is important for software architects and others to evaluate situations in which a chatbot needs to be sarcastic! You wouldn't want a chatbot to be sarcastic in a formal setting. However, if an angry customer is rude to a chatbot for a long time, the bot might decide to pull out a sarcastic response. As you can clearly see, *when* a bot can be sarcastic is very similar to when a human can be sarcastic.”
It is all well and good for a sentiment detection system to tell if a statement is positive or negative. However, when the same system is able to comprehend sarcasm as well, it shows a 4% improvement in accurately identifying the emotion behind the text. “Advances in sarcasm detection are key missing pieces of the sentiment analysis puzzle. Sentiment analysis papers would merely shrug their shoulders when asked about their ability to handle sentiment in sarcastic sentences. With sarcasm detection, sentiment analysis systems can cover a larger ground,” signs off Dr Joshi.
Prof. P Bhattacharyya