Computer Vision Analysis of Visual Narratives on X This project was funded by BMBF project 16DKWN027b Climate Visions (2025)

Katharina Prasse, Marcel Kleinmann, Inken Adam, Kerstin Beckersjürgen, Andreas Edte,
Jona Frroku, Timotheus Gumpp, Steffen Jung
Isaac Bravo, Stefanie Walter
Margret Keuper
Data & Web Science Group, University of Mannheim
Mannheim, Germany
katharina.prasse@uni-mannheim.de
School of Governance, Technical University of Munich
Munich Germany
University of Mannheim and Max Planck Institute for Informatics, Saarland Informatics Campus
Mannheim and Saarbrücken, Germany

Abstract

Climate change is one of the most pressing challenges of the 21st century, sparking widespread discourse across social media platforms.Activists, policymakers, and researchers seek to understand public sentiment and narratives while access to social media data has become increasingly restricted in the post-API era.In this study, we analyze a dataset of climate change-related tweets from X (formerly Twitter) shared in 2019, containing 730k tweets along with the shared images.Our approach integrates statistical analysis, image classification, object detection, and sentiment analysis to explore visual narratives in climate discourse.Additionally, we introduce a graphical user interface (GUI) to facilitate interactive data exploration. Our findings reveal key themes in climate communication, highlight sentiment divergence between images and text, and underscore the strengths and limitations of foundation models in analyzing social media imagery.By releasing our code and tools, we aim to support future research on the intersection of climate change, social media, and computer vision.

Index Terms:

climate change, social media, computer vision, visual narratives

I Introduction

Climate change is a defining challenge of the 21st century.Public discourse on this topic spans political debates, activism, and scientific communication, with social media playing a central role in shaping narratives.Platforms like X (formerly Twitter) provide a dynamic space where individuals, organizations, and policymakers share opinions, mobilize support, and react to events in real-time.Understanding these discussions is crucial for stakeholders aiming to engage with the public. While textual discourse analysis on social media is well established, the role of images remains understudied.Visual content—ranging from protest photos and scientific infographics to memes and news reports—influences perception and engagement [28, 13, 11].Within social sciences, scholars have analyzed climate change imagery through thematic framing [3, 5, 11, 17], often relying on manual annotation [19, 20, 25].Recent advancements have made large-scale visual analysis more accessible, enabling automated classification, sentiment analysis, and object detection.

In this work, we explore climate change discourse on X (formerly Twitter) through a computer vision lens.Using a dataset of 730k climate-related tweets from 2019, we analyze their images alongside their metadata to uncover key trends in image content, sentiment, and engagement.

Our main contributions include:

  1. 1.

    An image, text, and multi-modal analysis of climate change-related social media content, combining image classification, object detection, and sentiment analysis.

  2. 2.

    The application of foundation models (e.g., Gemini, Moondream) alongside traditional deep learning architectures (e.g., ResNet, ViT, GroundingDINO) to assess their performance in visual climate discourse analysis.

  3. 3.

    An interactive graphical user interface (GUI) for exploring tweet data and model predictions, supporting deeper qualitative analysis.

By making our code and tools publicly available, we aim to facilitate further research at the intersection of climate change, social media, and computer vision.

II Related Work

Analyzing climate change narratives on social media involves both textual and visual framing.In social sciences, framing theory is widely used to understand how climate issues are presented and interpreted [9].Traditionally, frames have been studied in text-based content, but research on visual frames has gained traction recently e.g.[5, 19, 21].

II-A Visual Framing in Climate Change Communication

Schäfer et al. [25] categorize visual frames into two types:First, formal/stylistic frames, which focus on image composition and aesthetic properties.Second, content-oriented frames, which analyze the subject matter and depicted image themes.Most prior studies have focused on content-oriented frames [5, 19, 21, 20, 17, 3, 11, 24, 22], using manual annotation to classify climate-related images.For example, Born et al. [3] explore how polar bears serve as an icon for climate change, while McGarry et al. [17] analyze wildfire imagery.However, manual annotation is time-consuming and subjective, limiting the scalability of such studies.Recent work has sought to automate visual frame detection through unsupervised clustering [22], offering a more scalable alternative.

II-B Computer Vision for Social Media Analysis

Advancements in computer vision and foundation models have enabled large-scale analysis of visual content.Vision-language models (VLMs) can now generate image descriptions, detect objects, and analyze sentiment with increasing accuracy [10]. Prior research has leveraged deep learning to study climate-related images, using models such as:ResNet and ViT for image classification of environmental topics [12, 7].DETR and GroundingDINO for object detection in climate-related imagery [4, 16].RoBERTa and LENS for sentiment analysis, bridging textual and visual understanding [1, 26].Despite these advancements, sentiment analysis for images remains challenging, as models struggle with context-dependent sentiment (e.g., sarcasm, irony, or contrasting text-image relationships).

III Data

We accessed data from X (formerly Twitter) when the academic API granted free access to academia.General selection criteria have been chosen: The tweet must contain either ”climate change”, ”climatechange”, or ”#climatechange” and must be published in the year 2019.In total, we have downloaded 730k tweets which contain at least one visual and the corresponding tweet data consisting of tweet text, date, author id, user reactions (e.g. likes, retweets, shares).We disregarded tweets without images.

We created subsets of the image data for increased efficiency and sustainability of the analysis.The first subset contains the 5k most liked tweets, whereas the second subset contains a random selection of 10k tweets to represent the whole data.For sentiment analysis, we created a third labeled subset which contains 500 randomly sampled images from the first subset that are manually labeled for their text and image sentiment.The annotations are given by two independent annotators, divergent labels are resolved during discussion.

IV Methods

The selection of methods is extensive: from task-specific models to general-purpose foundation models.

IV-A Frequency Analysis

We provide a general overview using frequency analysis of the tweets.This includes word clouds of the tweet’s hashtags to visualize the main themes present.Moreover, we investigate overall emoji usage, tweet dates, and user engagement by visualization (e.g. histograms or line graphs).

IV-B Foundation models

We use two foundation models for image classification and sentiment analysis:(1) Gemini Pro [10], which architecture is based on the Transformer decoder, optimized for both text and image decoding.Post-training, fine-tuning, and reinforcement learning from human feedback further refine its capabilities.(2) Moondream [18] is a compact, open-source computer vision model developed by M87 Labs, designed specifically for answering questions about images.With only 1.6 billion parameters, Moondream is lightweight and optimized for deployment on a wide range of devices, including mobile phones and edge devices like Raspberry Pi.

IV-C Image Classification

To detect the general topic of an image, we use image classification:(1) ResNet [12] models are employed, a common CNN architecture.More specifically, we use ResNet-18, which offers a good balance between performance and computational efficiency, while ResNet-50, provides greater depth and a higher capacity for capturing complex features.(2) Vision transformers, i.e. ViT-Base-Patch16-224 [7], leverage the transformer architecture originally designed for natural language processing tasks.ViTs split an image into patches and processes patches as a sequence, allowing it to capture long-range dependencies and global context.

IV-D Object Detection

We use two models for more intricate image understanding:(1) DETR (DetrForObjectDetection) [4] utilizes a transformer architecture to detect and locate objects within an image.Image features are extracted using a ResNet-50 backbone pretrained on ImageNet, whereafter the model is trained on MS COCO [15].(2) GroundingDINO [16] is an open-set object detector that combines the encoder-decoder model, DINO, with large, refined pre-training.

IV-E Sentiment Analysis

For the sentiment analysis task, weselected three possible labels: ”positive”, ”neutral”, and”negative” in line with the majority of works in this field:(1) We employ the text-only model RoBERTa (Robustly Optimized BERT Pre-training Approach), which is an advancement over BERT [14].While it maintains a similar architecture, RoBERTa employs dynamic masking during pre-training instead of static masking.Additionally, it is only pre-trained on masked language modeling and not on next-sentence prediction.(2) The vision model LENS (Large Language Models ENhanced to See) [1] is a modular approach designed to integrate vision capabilities into any off-the-shelf large language model (LLM) without requiring additional multi-modal training or data.It works by utilizing small, independent vision modules to generate detailed textual descriptions of images, which are then processed by the LLM to perform tasks like object recognition and VQA.

V Analysis

We report frequency analysis and sentiment analysis in the main paper and report object classficiation results in AppendixB.

V-A Frequency Analysis

The most prominent hashtags include #ClimateChange, #ClimateAction, #ClimateEmergency, #GlobalWarming, #ClimateCrisis, #ClimateStrike, #Sustainability, and #Environment as shown in Figure1.This indicates that 2019’s twitter discourse was shaped by climate activism.None of the frequent climate deniers’ hashtags show up, i.e. #climatescam, #climatehoax and #climatecult [29].It appears that in the year 2019, the deniers of climate change, did not use the main-stream hashtags (climatechange) on X discourse.

Computer Vision Analysis of Visual Narratives on X This project was funded by BMBF project 16DKWN027b Climate Visions (1)

Similar trends become apparent when analysing the emojis tweeted (compare Figure2).The most popular emojis include representations of the Earth, from the African/European perspective.Other perspectives of the earth are also frequent, with the American version coming in second and the Asian/Australian version coming in fourth.Other frequently used emojis, such as the pointing finger, the camera, and the tree, reinforce messages of urgency, visual documentation, and environmental focus.Within the 500 most popular tweets, a tweet contains on average 0.085 emojis.

Computer Vision Analysis of Visual Narratives on X This project was funded by BMBF project 16DKWN027b Climate Visions (2)

Figure3 illustrates the tweet volume throughout 2019, displaying daily variations with a recurring pattern of highs and lows across the year.A noticeable peak occurs in early February, potentially linked to significant weather events such as a record heatwave in Australia or an extreme cold spell followed by above average temperatures in the US.Moreover, this is the month in which the global school strike took place and Fridays for Future gained a lot of attention (compare [19].The overall tweet volume remains fairly steady month to month with the exception of September, which might be linked to the UN Climate Action Summit 2019 happening (compare [19]).The first half of May and the second half of December are the least active times with respect to the climate change discourse.

Computer Vision Analysis of Visual Narratives on X This project was funded by BMBF project 16DKWN027b Climate Visions (3)

In 2019, the engagement with posted tweets also varies over time, as Figure4 shows.Of all metrics, like count consistently leads in volume, with several notable spikes across the year.Retweet count exhibits a similar pattern, showing a significant rise alongside like count in August 2019.The reply count appears to lag behind the other two metrics, which can intuitively be explained by individuals first needing to see a tweet before they can react to it.

Computer Vision Analysis of Visual Narratives on X This project was funded by BMBF project 16DKWN027b Climate Visions (4)

V-B Sentiment Analysis

Model predictions were generally over-positive, as all models’ sentiment prediction distributions are more shifted towards the positive label compared to the ground truth (compare TableI).Moreover, the divergence between image and text sentiment becomes already apparent.Text sentiment prediction had an accuarcy of 67%, while image sentiment performance was slightly worse with 56% (GEMINI), 42% (Lens), and 60% (Moondream).

positiveneutralnegative
Text GT130198172
RoBERTa [26]143213141
Image GT122229149
Gemini [10]*212115151
Lens [1]29319188
Moondream [18]183172142

Over all experiments, the misclassified examples were mostly one step away from the true label, e.g ”neutral” instead of ”positive”. A reason for the misclassification seems to be that the model does not always recognize sarcasm, e.g. the following tweet text:

Socialists want to do Bad People things to you, likegive you a free education and pay for your healthcareand have you not die from climate change, by usingBad People policies a la such wastelands [as] Europeand Canada.

is negative according to RoBERTa while just pointing out good actions the socialist party wants.Thirty images were misclassified as positive instead of neutral by all vision models, with examples shown in Figure5.This shows that neutral content is particularly challenging.

Computer Vision Analysis of Visual Narratives on X This project was funded by BMBF project 16DKWN027b Climate Visions (5)

Computer Vision Analysis of Visual Narratives on X This project was funded by BMBF project 16DKWN027b Climate Visions (6)

Sentiment Divergence is an expected phenomenon on social media.The combination of contrasting positive images with negative text is frequently used to provoke thought or to highlight contradictions between appearance and reality, as shown in Figure6.In 31 instances a positive image sentiment was paired with a negative caption.Notably, many tweets involved prominent figures and politicians such as Prince William and Kate Middleton or Greta Thunberg, in which the accompanying text often criticizes the individuals, accusing them of hypocrisy among other things.Similarly, nature is often depicted in an idyllic state, while accompanying captions frequently criticize human actions that have led to its degradation.For example, tweets questioning ethical or life decisions might pair an innocent or joyful image with a critical or pessimistic caption, such as a discussion on the morality of bringing a child into a world faced with significant challenges.

Computer Vision Analysis of Visual Narratives on X This project was funded by BMBF project 16DKWN027b Climate Visions (7)

Computer Vision Analysis of Visual Narratives on X This project was funded by BMBF project 16DKWN027b Climate Visions (8)

Computer Vision Analysis of Visual Narratives on X This project was funded by BMBF project 16DKWN027b Climate Visions (9)

The bottom row of Figure6 shows tweets where a negative image sentiment is paired with a positive text sentiment.In these examples, images depicting environmental degradation, such as melting icebergs or deforestation, are accompanied by captions that express hope, resilience, or a call to action.Additionally, another prominent category in this subset involves images of demonstrations or protests, where the image sentiment is often negative, reflecting the tension inherent in such events.However, the accompanying captions frequently convey messages of optimism and solidarity which offer hope for change despite the challenging circumstances depicted.

VI Interactive GUI

To better interpret visual narratives, we have created a Graphical User Interface (GUI), which provides a large preview of the tweet’s image along with the most important tweet information and the corresponding results.Given that we want to make the code open source, we selected a framework which is free and provides stable long-term versions.We also preferred a widely adopted framework which exists alongside plentiful resources and tutorials, as this allows further development by the community.Finally, we opted for the Vue [27] framework due to its intuitive syntax, simplicity, and favorable learning curve.For our front-end and design, we used plain CSS [6] added by some Bootstrap [2].Further details can be found in AppendixC.

VII Discussion and Conclusion

We analysed 700k climate change-related tweets using image classification, object detection, and sentiment analysis.Frequency analysis indicates that X’s discourse aligns with related events and can thus provide insights into their perception.Automated sentiment analysis achieved a maximum accuracy of 67%, underscoring a need for further model improvement and sentiment dataset.The detailed differentiation between emotions would allow for additional insights.The tweets with divergent image and text sentiment are especially interesting in our opinion, since in this case, the use of multi-modal models is less effective than in tweets where both modalities belong to the same sentiment.This highlights the need for an interactive GUI to facilitate deeper evaluation of model predictions.By making our code and tools open-source, we aim to lay the groundwork for advancing AI-driven climate communication research.

References

  • [1]William Berrios, Gautam Mittal, Tristan Thrush, Douwe Kiela, and Amanpreet Singh.Towards language models that can see: Computer vision through the lens of natural language.arXiv preprint arXiv:2306.16410, 2023.
  • [2]Bootstrap documentation.https://getbootstrap.com/.
  • [3]Dorothea Born.Bearing witness? polar bears as icons for climate change communication in national geographic.Environmental Communication, 13(5), 2019.
  • [4]Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko.End-to-end object detection with transformers.In European conference on computer vision, pages 213–229. Springer, 2020.
  • [5]Andreu Casas and NoraWebb Williams.Images that matter: Online protests and the mobilizing role of pictures.Political Research Quarterly, 72(2), 2019.
  • [6]Cascading style sheets.https://www.w3.org/Style/CSS/.
  • [7]Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, etal.An image is worth 16x16 words: Transformers for image recognition at scale.In International Conference on Learning Representations, 2020.
  • [8]Flask documentation.https:\/\/flask.palletsprojects.com\/en\/stable\/.
  • [9]Erving Goffman.Frame analysis: An essay on the organization of experience.Harvard University Press, 1974.
  • [10]Google.Gemini.https://gemini.google.com/app.
  • [11]Sylvia Hayes and Saffron O’Neill.The greta effect: Visualising climate protest in uk media and the getty images collections.Global Environmental Change, 71, 2021.
  • [12]Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.Deep residual learning for image recognition.In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • [13]Michael Johann, Lukas Höhnle, and Jana Dombrowski.Fridays for future and mondays for memes: How climate crisis memes mobilize social media users.Media and Communication, 11(3), 2023.
  • [14]Wenxiong Liao, BiZeng, Xiuwen Yin, and Pengfei Wei.An improved aspect-category sentiment analysis model for text sentiment analysis based on roberta.Applied Intelligence, 51:3522–3533, 2021.
  • [15]Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and CLawrence Zitnick.Microsoft coco: Common objects in context.In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  • [16]Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, etal.Grounding dino: Marrying dino with grounded pre-training for open-set object detection.In European Conference on Computer Vision, pages 38–55. Springer, 2025.
  • [17]Aidan McGarry and Emiliano Treré.Fire as an aesthetic resource in climate change communication: Exploring the visual discourse of the california wildfires on twitter/x.Visual Studies, 2024.
  • [18]moondream.ai.moondream.https://github.com/vikhyat/moondream.git.
  • [19]Angelina Mooseder, Cornelia Brantner, Rodrigo Zamith, and Jürgen Pfeffer.(social) media logics and visualizing climate change: 10 years of# climatechange images on twitter.Social Media+ Society, 9(1):20563051231164310, 2023.
  • [20]Saffron O’Neill, Sylvia Hayes, Nadine Strauß, Marie-Noëlle Doutreix, Katharine Steentjes, Joshua Ettinger, Ned Westwood, and James Painter.Visual portrayals of fun in the sun in european news outlets misrepresent heatwave risks.The Geographical Journal, 189(1), 2023.
  • [21]Saffron O’Neill.More than meets the eye: A longitudinal analysis of climate change imagery in the print media.Climatic Change, 163(1), 2020.
  • [22]Katharina Prasse, Isaac Bravo, Stefanie Walter, and Margret Keuper.I spy with my little eye: A minimum cost multicut investigation of dataset frames.In Winter Conference on Computer Vision Applications, 2025.
  • [23]Python pandas documentation.https://pandas.pydata.org/.
  • [24]Stacy Rebich-Hespanha, RonaldE Rice, DanielR Montello, Sean Retzloff, Sandrine Tien, and JoãoP Hespanha.Image themes and frames in us print news stories about climate change.Environmental Communication, 9(4), 2015.
  • [25]MikeS. Schäfer and Saffron O’Neill.Frame analysis in climate change communication: Approaches for assessing journalists’ minds, online communication and media portrayals.In Matthew Nisbet, Shirley Ho, Ezra Markowitz, Saffron O’Neill, MikeS. Schäfer, and Jagadish Thaker, editors, Oxford Encyclopedia of Climate Change Communication. Oxford University Press, New York, 2017.
  • [26]KianLong Tan, ChinPoo Lee, Kalaiarasi SonaiMuthu Anbananthen, and KianMing Lim.Roberta-lstm: a hybrid model for sentiment analysis with transformer and recurrent neural network.IEEE Access, 10:21517–21525, 2022.
  • [27]Vue documentation.https://vuejs.org/guide/introduction.html.
  • [28]Susie Wang, Adam Corner, Daniel Chapman, and Ezra Markowitz.Public engagement with climate imagery in a changing digital landscape.WIREs: Climate Change, 9(2), 2018.
  • [29]Elizabeth Zak.The colors of a# climatescam: An exploration of anti-climate change graphs on twitter.JIS, 7(1), 2023.

Appendix A Gemini prompt

For this model we used a return structure which allowed us to further interpret the sentiment assessment by analysing the confidence score and the model’s explanation.However, this prompting technique did not work for all tested models, as the majority merely returned the sentiment regardless of the provided desired JSON return structure.

What is the sentiment of the image?Please classify the image’s sentimentinto positive, negative, and neutraland provide a confidence score in[0,1] as well as a short explanation.Structure your answer in the same jsonformat as this example and do not addany additional information:”sentiment”: ”XX”, ”confidence”: XX,”explanation”: ”XX”

GEMINI prompt.

Appendix B Image Classification and Object Detection

We focus our report on object detection, as this is also the task less discussed on other works.In Figure7 we highlight the most frequently detected objects.While news-worthy terms are included (e.g. glacier, fire, planet) and common object in public discourse (e.g. podium, microphone, flag), the data reveals also common objects (e.g. book, tv, cat, dog).This indicates that individuals share not only public content but also their own climate change experiences, even in the most popular subset.

Computer Vision Analysis of Visual Narratives on X This project was funded by BMBF project 16DKWN027b Climate Visions (10)

Appendix C GUI

We host both the front-end and back-end on a dedicated Linux server.In order to manage large amounts of data within reasonable processing times, we implemented a Python Flask [8]. We chose Flask since it allows users to easilyupload output files as Excel documents, which can then be processed with Python’s Pandas library [23] and it thus provides great flexibility for data aggregation, reformatting, and custom function applications.

C-A Filters and HTTP-Requests

To be able to apply text-based filters to the tweets, we came up with a modular filter system, where each column has a checkbox and an adjacent input field.Upon selecting the desired column(s) and filling out their text-based inputs, they get concatenated with the column-name to an HTTP-Request, that the back-end is serving.Next, the back-end responds with a list of only the entries that the filter criteria apply to.To avoid downloading too much data when having manysearch results, a second HTTP-request is then triggered to justget the information of the currently shown picture and show itin the GUI.The same idea applies to the images themselves,out of the entry list corresponding to the search criteria, onlythe current nine previews get downloaded and displayed.

C-B Design

Regarding the appearance, we aimed to develop a neutral, clean, and intuitive GUI focused on analyzing our findings.To achieve this, we adopted a simple and structured design that presents our results objectively.The main analysis page is organized vertically, featuring images, navigation elements, results, and filtering options in that order.Additionally, we prioritized a pleasant user experience by creating supplementary pages such as the Home and About pages.These pages contribute to a professional website feel, enhancing users’ perception of our tool and incentivizing them to stay on our page for longer.Furthermore, the FAQ section provides guidance for users and can be easily updated with future additions and thus adapted to user feedback.

Computer Vision Analysis of Visual Narratives on X This project was funded by BMBF project 16DKWN027b Climate Visions (2025)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Rueben Jacobs

Last Updated:

Views: 6028

Rating: 4.7 / 5 (57 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Rueben Jacobs

Birthday: 1999-03-14

Address: 951 Caterina Walk, Schambergerside, CA 67667-0896

Phone: +6881806848632

Job: Internal Education Planner

Hobby: Candle making, Cabaret, Poi, Gambling, Rock climbing, Wood carving, Computer programming

Introduction: My name is Rueben Jacobs, I am a cooperative, beautiful, kind, comfortable, glamorous, open, magnificent person who loves writing and wants to share my knowledge and understanding with you.