An increasing number of Chips people tend to convey their opinions in different modalities.For the purpose of opinion mining, sentiment classification based on multimodal data becomes a major focus.In this work, we propose a novel Multimodal Interactive and Fusion Graph Convolutional Network to deal with both texts and images on the task of document-level multimodal sentiment analysis.The image caption is introduced as an auxiliary, which is aligned with the image to enhance the semantics delivery.
Then, a graph is constructed with the sentences and images generated as nodes.In Wrench line with the graph learning, the long-distance dependencies can be captured while the visual noise can be filtered.Specifically, a cross-modal graph convolutional network is built for multimodal information fusion.Extensive experiments are conducted on a multimodal dataset from Yelp.
Experimental results reveal that our model obtains a satisfying working performance in DLMSA tasks.