The huge amount of textual data on theWeb has grown in the last few years rapidly creating unique contents of massive dimensions that constitutes fertile ground for Sentiment Analysis. In particular, social networks represents an emerging challenging sector where the natural language expressions of people can be easily reported through short but meaningful text messages. This unprecedented contents of huge dimensions need to be efficiently and effectively analyzed to create actionable knowledge for decision making processes. A key information that can be grasped from social environments relates to the polarity of text messages, i. e. the sentiment (positive, negative or neutral) that the messages convey. However, most of the works regarding polarity classification usually consider text as unique information to infer sentiment, do not taking into account that social networks are actually networked environments. A representation of real world data where instances are considered as homogeneous, independent and identically distributed (i.i.d.) leads us to a substantial loss of information and to the introduction of a statistical bias. For this reason, the combination of content and relationships is a core task of the recent literature on Sentiment Analysis, where friendships are usually investigated to model the principle of homophily (a contact among similar people occurs at a higher rate than among dissimilar people). However, paired with the assumption of homophily, constructuralism explains how social relationships evolve via dynamic and continuous interactions as the knowledge and behavior that two actors share increase. Considering the similarity among users on the basis of constructuralism appears to be a much more powerful force than interpersonal influence within the friendship network. As first contribution, this Ph.D. thesis proposes Approval Network as a novel graph representation to jointly model homophily and constructuralism, which is intended to better represent the contagion on social networks. Starting from the classical state-of-the-art methodologies where only text is used to infer the polarity of social networks messages, this thesis presents novel Probabilistic Relational Models on user, document and aspect-level which integrate the structural information to improve classification performance. The integration is particularly useful when textual features do not provide sufficient or explicit information to infer sentiment (e. g., I agree!). The experimental investigations reveal that incorporating network information through approval relations can lead to statistically significant improvements over the performance of complex learning approaches based only on textual features.

(2015). Probabilistic Relational Models for Sentiment Analysis in Social Networks. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2015).

Probabilistic Relational Models for Sentiment Analysis in Social Networks

POZZI, FEDERICO ALBERTO
2015

Abstract

The huge amount of textual data on theWeb has grown in the last few years rapidly creating unique contents of massive dimensions that constitutes fertile ground for Sentiment Analysis. In particular, social networks represents an emerging challenging sector where the natural language expressions of people can be easily reported through short but meaningful text messages. This unprecedented contents of huge dimensions need to be efficiently and effectively analyzed to create actionable knowledge for decision making processes. A key information that can be grasped from social environments relates to the polarity of text messages, i. e. the sentiment (positive, negative or neutral) that the messages convey. However, most of the works regarding polarity classification usually consider text as unique information to infer sentiment, do not taking into account that social networks are actually networked environments. A representation of real world data where instances are considered as homogeneous, independent and identically distributed (i.i.d.) leads us to a substantial loss of information and to the introduction of a statistical bias. For this reason, the combination of content and relationships is a core task of the recent literature on Sentiment Analysis, where friendships are usually investigated to model the principle of homophily (a contact among similar people occurs at a higher rate than among dissimilar people). However, paired with the assumption of homophily, constructuralism explains how social relationships evolve via dynamic and continuous interactions as the knowledge and behavior that two actors share increase. Considering the similarity among users on the basis of constructuralism appears to be a much more powerful force than interpersonal influence within the friendship network. As first contribution, this Ph.D. thesis proposes Approval Network as a novel graph representation to jointly model homophily and constructuralism, which is intended to better represent the contagion on social networks. Starting from the classical state-of-the-art methodologies where only text is used to infer the polarity of social networks messages, this thesis presents novel Probabilistic Relational Models on user, document and aspect-level which integrate the structural information to improve classification performance. The integration is particularly useful when textual features do not provide sufficient or explicit information to infer sentiment (e. g., I agree!). The experimental investigations reveal that incorporating network information through approval relations can lead to statistically significant improvements over the performance of complex learning approaches based only on textual features.
TISATO, FRANCESCO
Sentiment Analysis; Opinion Mining; Social Network Analysis; Social Networks; Probabilistic Relational Models
INF/01 - INFORMATICA
English
12-feb-2015
Scuola di dottorato di Scienze
INFORMATICA - 22R
27
2013/2014
open
(2015). Probabilistic Relational Models for Sentiment Analysis in Social Networks. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2015).
File in questo prodotto:
File Dimensione Formato  
PhD_unimib_700928.pdf

accesso aperto

Descrizione: Tesi di dottorato
Tipologia di allegato: Doctoral thesis
Dimensione 9.48 MB
Formato Adobe PDF
9.48 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/65709
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact