Affordances for expressing collective identities -
The case of the 2024 French parliamentary elections
(incomplete)

Team Members

  • Tom Willaert (Vrije Universiteit Brussel) - Facilitator

  • Eckehard Olbrich (Max Planck Institute for Mathematics in the Sciences) - Facilitator

  • Carlo Santagiustina (SciencesPo - médialab) - Facilitator (subproject 5)

  • Nora Zech (Leipzig University) - Facilitator (subproject 2)

  • Jan Babnik (IRRIS Institute) - Facilitator (subproject 2)

  • Lingyu Li (University of Amsterdam) - Researcher

  • Shuyu Zhang (Delft University of Technology) - Researcher (subproject 3)

  • Marina Loureiro (Federal University of Rio de Janeiro) - Researcher

  • Brogan Latil (University of Amsterdam) - Researcher (subproject 2)

  • Michelle Stewart (University of Quebec at Montreal) - Researcher

  • Brittany Elizabeth Zelada (University of Amsterdam) - Researcher

  • Tahereh Aboofazeli (University of Cologne & (MGK) University of Siegen) - Researcher

  • Deborah Nyangulu (University of Bremen) - Researcher (subproject 2)

  • Andrea Elena Febres Medina (Politecnico of Milan) - Design Facilitator

  • Elif Bozkurt (University of Amsterdam) - Researcher

  • Sara Nuta (University of Amsterdam) - Researcher

Contents

1. Introduction

In this project we studied how users express collective identities on social media, in particular in the context of the recent French legislative election (June-July 2024). Our main data set was from X (see section 2). In some of the subprojects we also looked at Tiktok (subprojects 1,3,5) and Instagram (subproject 1).

The project is part of the HORIZON Europe project “Social Media for Democracy (SoMe4Dem) – understanding the causal mechanisms of digital citizenship”. The SoMe4Dem project studies the impact of social media along three dimensions: participation, polarisation and trust by developing democracy theory, providing evidence for social media impact, performing experiments and building models to reveal causal mechanisms, and studying practices to improve digital citizenship.

As part of this project we want to understand how the technical and social affordances of social media platforms relate to the different functions of the public sphere in liberal democracies, such as information, deliberation and the forming of collective actors. The formation and expression of collective identities is part of the latter and the focus of this project.


Starting point of the analysis was the identification of political communities in the retweet network. Beginning with the seminal work by Conover et al. (2011) it was shown that people are more likely to share content the more the content corresponds to their own beliefs. Therefore, in political debates, clusters in the retweet network can be often interpreted as clusters of accounts with a similar political stance. Conover et al. (2011) showed this for Democrats and Republicans in the US context, Gaumont et al. (2018) for the French presidential elections 2017 and Gaisbauer et al. (2021) for the Saxon state elections in Germany 2019. In a next step we compared between the communities how people use different affordances for identity expression such as user biographies, emojis in screen names, or profile pictures.

More specific aspects were studies in several subprojects: One project studies how climate activistis and feminists express their identities and how feminism and climate change is addressed in the different communities, a second sub-project asked to which extend users express political identities. A third subproject to which extent and how users express use geographical information in their bios, for instance to express regional identities and whether there are systematic differences between rural and urban areas. The fourth suproject presented a case study comparing the self-presentation of a journalist between X, Instagram and TikTok.  

In the fith suproject we use a seeded Structural Topic Model to map differences in Bio self-description patterns reflecting the political identities of two major communities ("Rassemblement National" and "Socialistes”) previusly identified through the retweet network.

2. Research Questions

Our main research questions were: How do social media user express their identities on different platforms? Which identities are expressed? Are there systematic differences between platforms? Which role play these identities in political debates on these platforms and to which extent do they contribute to polarization?

3. Main data set and communities in the retweet network

3.1. Main data set

Data were collected using the Twitter/X API (through a DSA request) on the morning of July 8th using the query: --academic -o legislatives04.csv "(#NouveauFrontPopulaire OR #NFP OR #RassemblementNational OR #RN OR @RNational_off OR #Renaissance OR @Renaissance) (\"pouvoir d'achat\" OR climat OR retraite OR logement OR santé OR immigration OR migration OR migrant OR \"services publics\" OR emploi OR république OR inégalité OR justice OR gaza OR israel OR ukraine OR russie OR islam) lang:fr"

Figure 1:

Figure 2: Number of tweets per hour differentiated by retweets, reply, quotes and regular tweets

Number of tweets in total

326319

Number of retweets

286204 (87.7%)

Number of quotes

6044 (1.8%)

Number of replies

24527 (7.5%)

Number of original tweets (no replies)

10039 (3%)

Number of accounts

88937

Table 1: Basic properties of the main data set

3.2. Creating and analyzing the retweet network

In order to identify the political camps in a dataset collected on X. We do that by creating a force directed layout of the retweet network and assuming that people will more likely retweet content that is "closer" their political opinion. People have often used this interpretation of visualizations of retweet networks intuitively. For a formal justification see Gaisbauer et al. (2023).

Retweet networks and community detection were estimated using the twitterexplorer. For the retweet network each with each retweet an edge was created from the retweeting account to the retweeted account. If an account retweeted several tweets of another account the corresponding count was represented as an edge weight. For the visualization the biggest connected component was selected. To reduce the computational burden also all accounts that only retweet a single account were removed ("soft aggregation").

Number of accounts in the retweet network

79926

Number of accounts in the Giant component

79231

Number of accounts after soft aggregation

32847

Table 2: Basic properties of the retweet network.

The community detection was performed by modularity maximization using the Louvain algorithm on the retweet network after soft aggregation.

Community label

Number of users

Most retweeted account

# of retweets

Profile picture

La France Insoumise

4890

FranceInsoumis

7915

Mélenchon

7757

JLMelenchon

11403

Socialistes

5793

faureolivier

3358

Macron / Renaissance

4307

Renaissance

2679

Rassemblement National

9725

RNational_off

28060

Table 3: The 5 largest clusters detected in the retweet network detected by modularity maximization (Louvain algorithm)..

The Louvain method detected 37 communities for the accounts in the retweet network.Fig. 2 shows a spatial representation of the retweet network. Her one sees that the three communities from the Nouveau Front Populair (NFP) form only one spatial cluster, which indicates that they are strongly connected.

Figure 2: Visualization of the retweet network using a force directed layout (ForceAtlas2 from Gephi). The edges are not shown. Different colors indicate different communities. The size of the nodes indicates the number of retweets of this account. Also shown are the emojis most frequently used in the user bio by accounts in the respective communities.

3.2. Summarizing the users’ identity cues for top communities

Social media user profile bios play a vital role in conveying aspects of individual identity. These brief descriptions offer glimpses into personal interests, values, and self-perception. For researchers, these bios are a valuable resource, providing subtle identity cues that aid in the exploration of social behaviors and self-expressed identity patterns. Consequently, the content and composition of user bios holds considerable importance in the study of user identities.

Bio textual content for top communities

Community label

Word Cloud of top features

La France Insoumise

Mélenchon

Socialistes

Macron / Renaissance

Rassemblement National

Table 4: Word clouds of most frequent words in user bio, by community.

The word clouds of most frequent words for the five largest communities of users (La France Insoumise, Mélenchon, Socialistes, Macron/Renaissance, and Rassemblement National) identified with Louvain, offer a starting point for the comparison of their declared identities and discourses as reflected in the bios, allowing us to observe to which degree the identified communities are diversified and adhere, in terms of bio content, to the labels that were associated with them only looking to the retweet network.

The word cloud of La France Insoumise is dominated by terms related to militantism in the corresponding left party. Users self-identify themselves with the party by using in their bio terms like “LFI”, “soutien” (support), “militant”, “fan”, ”monde" (world), "insoumis”, "politique" (politics), and "gauche" (left). These words suggest an emphasis on political activism and a strong and coherent political identity, with words “like “militant" and "insoumis(e)” reflecting their commitment to the LFI party. The appearance of “monde" (world) suggests a global or international orientation of these profiles, while phrases like "nouveau front populaire" highlight the self-identification of these users with the French left coalition. The group's identity appears also to be rooted in opposition to that of the extreme right (“lutte”, “justice”, “sociale”, “paix”, “fachos”) and a desire to create a broad, inclusive leftist political movement .

In the word cloud of the so called Mélenchon community, prominent words such as "vie" (life), “temps” (time) and “aime” (like) reveal that users in this group appear to be more self-focused in their bio. Many frequent words reveal, like for the previous community, a dominance of leftist political views (“gauche”, “”femministe”). However, this group appears to be less homogenous and partisan in terms of party affiliation. The emphasis on “palestine” and “freepalestine” underscores a strong political position in relation to the conflict in Gaza, and an identity focus on the declaration of solidarity towards Palestine resistance. The presence of words related to famous French and European football teams (like “PSG”, “PSGinside”, “teamOM”, “realmadrid”), suggests that many users of this group are football supporters, probably from urban and suburban milieu, and share, for example, a strong interest for a football club, which appears to be an identity-characterising passion.

For the Socialistes, the word cloud is also characterized by left and center-left militantism, which is revealed through words like "militant”, "politique," "gauche”, and “femministe”. Terms like “maire” (major), “secrétaire” (secretary) indicate a structured and organized political identity, with a focus on formal roles and responsibilities within political parties. The prominence of words like "membre" (member) and "conseiller" (councilor) also highlights the importance of formal membership and participation in political activity, which aligns with the Socialist Party's traditional focus on institutional politics and governance. Additionally, the term "écologie" and ”ecologiste” reflects a strong interest in environmental issues within this group.

The word cloud of the Macron/Renaissance community stands out with terms such as "extrêmes" (extremes), "politique" (politics), "renaissance," "anti," and "laïcité" (secularism). This suggests a strong focus on centrist and moderate political identity, with a particular emphasis on opposing extremism from both the left and the right, as shown by the presence of the word “ni” (neither). The term "renaissance" aligns with the party's branding, and its vision of renewal, while words like "laïcité" and "anti" signal a defense of secularism and a stance against radical ideologies. The presence of other party names also suggests that this group of users explicitly construct and communicate their political identity as a contraposition to that of the RN and LFI.

Finally, the word cloud of the Rassemblement National is dominated by nationalist identity cues: "la france”, "pays" (country), "français" (French), and "droite" (right). These terms highlight a strong emphasis on patriotism, and a right-wing identity. The frequent use of "France" and "français" also underscores the group's main focus on nationalism and sovereignty, while "patriote" reflects a deep connection to patriotic values.

Bio hashtags for top communities

Hashtags on social media uniquely facilitate community self-identification and collective expression. By searching and aggregating posts under a common tag, hashtags create communities centered around shared interests and political beliefs. They enable individuals to align themselves with broader movements and discussions, making personal identities visible within a larger social context. For researchers, hashtags are essential in tracing the formation and dynamics of these online communities, providing a window into the landscape of group identities and the dynamics of political belonging.

Community label

Word Cloud of top hashtags

La France Insoumise

Mélenchon

Socialistes

Macron

Rassemblement National

Table 5: Word clouds of most frequent hashtags used in user bio, by community.

The top hashtag clouds for the five largest communities of users provide additional insights into how these communities express their identities and priorities on social media, through their bio.

The hashtag cloud for La France Insoumise prominently features terms related to the party's identity and political goals. The dominant hashtags include #FrontPopulaire, #UnionPopulaire, and #LFI, which indicate a strong alignment with the party’s branding and vision of a unified leftist movement. Hashtags like #NUPES (New Ecologic and Social People's Union) and #FranceInsoumise reinforce this sense of collective leftist political identity. The presence of #FreePalestine and #AvenirEnCommun reflects the community's support for international causes and its focus on social justice and “anti-imperialism”.

The hashtag cloud for the Mélenchon community shares similarities with the LFI group but also reveals distinct elements. #FreePalestine is a key theme, indicating a very strong focus on the Palestinian cause. Other prominent hashtags like #FrontPopulaire, #LFI, and #UnionPopulaire suggest a strong association with the broader leftist movement in France. The presence of hashtags like #TeamOM (Olympique de Marseille) and #TeamPSG (Paris Saint-Germain) is coherent with the word cloud and highlights a notable interest in football.

In the Socialistes community, the hashtag cloud is dominated by #FrontPopulaire, which signifies a connection with the broader left-wing coalition. The prominence of #EELV (Europe Écologie Les Verts) and #Ecologie indicates, as also suggested by the word cloud, a strong emphasis on environmental issues, aligning with the Socialist Party's focus on green politics. #NUPES and #UnionPopulaire further underscore this group's participation in leftist alliances. The inclusion of hashtags like #Paris and #Politique suggests a localized, perhaps also more institutional, engagement with politics.

The Macron/Renaissance community's hashtag cloud is characterized by #Renaissance, #LaRépubliqueEnMarche and #Ensemble which reflect the party's branding and its emphasis on renewal and progression. #StandWithUkraine and #Ukraine hashtags are prominent, indicating a strong stance on international solidarity in the context of the Ukraine conflict. The hashtags #Anti and #NiDroiteNiGauche (Neither right nor left) highlights a centrist positioning, rejecting and jointly opposing the two extremes of the political spectrum. his hashtag cloud reflects a community that emphasizes moderation, pro-European sentiment, and a rejection of polarizing ideologies.

The hashtag cloud for the Rassemblement National (RN) community is dominated by terms like #JambonBeurre, #TeamPatriotes, and #RN, reflecting a strong traditionalist, and nationalist right-wing identity. In particular, the recurring theme of #JambonBeurre likely reflects a cultural identity marker within this group, which reveals a community identity deeply invested in nationalistic, anti-globalist, and patriotic values. #Reconquete and #Frexit are also prominent, indicating a focus on reclaiming national sovereignty and a potential desire for France to leave the European Union, while #UnionDesDroites suggests an emphasis on unifying various right-wing factions.

In summary, each group's identity is shaped by a unique blend of bio discourses, identity markers and focus, reflecting their ideological positions in the broader context of the French legislative elections. These word clouds reveal how distinct these five groups are in terms of focus and content used to communicate their identities: The community of La France Insoumise centers on activism and leftist perspectives, similarly to users in the Mélenchon community, which however appear to be more self-focused and sensible to the Palestine-conflict; users in the Socialistes group emphasise formal roles and environmental concerns; users in the Macron/Renaissance community promote moderation against the extremes and secularism; and users in the Rassemblement National community are rooted in nationalism and right-wing patriotic values.

Emojis in the username field for top communities

Emojis, like country flags, in usernames offer a distinctive and visual method of conveying salient identity cues and affiliation to partisan groups. These small UTF-8 symbols can signal political stances at a glance. For researchers, emojis provide subtle yet rich data points, helping to decode how users represent themselves and connect with others. The strategic use of emojis in usernames thus plays a significant role in the study of digital identity.

Profile Pictures

In the following the three left (NFP) communities were considered as a single community. For each of the three communities a subset of less than 1000 accounts were selected by using a threshold on the follower numbers. It turned setting the threshold of the minimum number of followers to 5000 in the case of the left and right and to 2000 for the centrist community left us in all three cases with a bit less than 1000 accounts. Then the profile pictures from these accounts were embedded using PixPlot in 4CAT. In all embeddings similar clusters could be identified: (1) female faces, (2) male faces, (3) flags, logos, and symbols and (4) animals, flowers or landscapes. The only category that showed systematic differences, was number (3) with respect to the flags. While In the RN cluster the french flag was the most prevalent flag, accounts from the centrist cluster showed the Ukrainian flag and on the left a significant amount of accounts had the “stop genocide” sign in the colors of the Palestinian flag in their profile.

Figs.3 a to c show examples for all 4 categories for the three communities. Note that the 52 for each category correspond to a single spatial region in the corresponding PixPlot embedding, i.e. they are neighboring pictures in the 2-dimensional embedding.


Figure 3a: 52 profile pictures for each of the 4 categories for the left cluster from top to bottom: (1) females faces, (2) male faces, (3) graphical elements (4) photos (e.g. animals)


Figure 3b: 52 profile pictures for each of the 4 categories for the centrist cluster from top to bottom: (1) females faces, (2) male faces, (3) graphical elements (4) photos (e.g. animals)

Figure 3c: 52 profile pictures for each of the 4 categories for the right cluster from top to bottom: (1) females faces, (2) male faces, (3) graphical elements (4) photos (e.g. animals)

4. Subprojects

Sub-project 1: Climate and Feminism in the 2024 Legislatives Elections in France

Our group decided to explore how each political community (cluster) in our legislatives04 database talked about feminism and climate. We began by brainstorming a list of terms in French that might signal interest in feminism or climate issues. We then used 4CAT to filter our main database for user bios containing these terms (creating one list for each term). We created word clouds for each issue and a pixplot of user profile images.

Image 1. Looking at the bio of users that identifies themself as climate activists, we can see an association with the Left, signalled by words like ‘gauche’. We also see the abbreviations of specific parties that make up the NFP alliance - LFI, EELV- as well as social categories and professions, such as LGBT and ‘profs’. Given our “climate”-focused query, it is perhaps not surprising that we environmental emojis appear prominently, including the green heart emoji, the world emoji and the sunflower.

Image 2. Word cloud of the bios of users that identify themselves as feminists.

After performing a number of analyses on both of these sub-datasets (climate issues and feminism), we decided to focus our energies on the climate subset with the time remaining. This decision was also influenced by the notable finding that there was not a single user in Community 1 (Rassemblement National) whose bio contained one of our “feminism” terms.

We then began to explore the ways in which the same communities used platform-specific affordances to talk about climate, exploring Twitter, Instagram and TikTok. We used 4Cat to find the top hashtags associated with our climate terms for each community cluster. We also plotted these in graphs using GoogleSheets.

Image 3. Twitter Legislatives 2024 Hashtags.

Image 4. TikTok Legislatives 2024 Hashtags.

We also noted the importance of emojis for identification on Twitter and music for identification on TikTok. After attempting to explore Instagram in similar ways, the limits of its search functions led us to focus upon Twitter and TikTok.

Instagram

Following these analyses, we created visualizations – a pixplot (4CAT) visualization and an image wall (4CAT) of content on Instagram related to the keywords “Legislatives 2024”, “Rassemblement national”, and “Nouveau front populaire”. We scraped the data using Zeeschuimer, imported it into 4CAT and filtered by the appropriate dates: 7 June-7 July 2024. We made one image wall of images connected to user avatars and one linked to post images. Party, union, and NGO logos are the most important elements visible in the image wall of author avatar images.

Image 4. Post Images: Legislatives 2024

In the images associated with the posts, the content is, unsurprisingly, quite varied, including prominently, images of party leaders, graphs displaying the results of polls and election results, news articles, photos, and videos about the elections, and some political cartoons. The information, however, was too varied and too visual to analyze the political content in the time afforded. This suggests the important of tools that can describe and analyzed mixed image and text content.

Image 5. Author Avatars: Legislatives 2024

TikTok

For TikTok, we tried to query for “Legislatives2024” and then filter for “climate” to analyze patterns of identification via shared “original sound.” The results were a bit disappointing (very few shared songs in our dataset), but we did notice one song shared by 3 RN (community 1) supporters. We then went back to TikTok and searched for this song, noting that almost all of the returned posts were patriotic and the majority seemed to support the RN. The song in question, F.R.A.N.C.E is by actress and pop singer Candice Parise. Since 2018, Parise is the official singer of the Paris firefighters and sings for a variety of official events. The song was released and performed by Parise at the national parade for Bastille Day (14 July) in 2022 along with the national anthem, La Marseillaise. To give a sense of the song’s strong national pride, here are the original lyrics (Songwriter: Thierry Sforza, 2022).

France

Tes victoires flottent toujours plus haut

France

Ton histoire et ton nom claquent comme un drapeau

C'est la flamme du soldat

Qui ravive la Foi

Ta patrie est partout là où rayonne la

France

Ton flambeau se brandit par fierté

France

C'est Hugo et De Gaulle sur les Champs Élysées

C'est le sang du sillon

Qui redore ton blason

6 lettres qui scintillent sur tout l'horizon

France

C'est le « F » révolté

De la FRATERNITE

C'est le « R » espérance

De toutes les RESISTANCES

C'est le « A » constellé

Des plus fortes AMITIES

C'est le « N » infini

D'une NATION unie

C'est le « C » qui présage

De la force du COURAGE

C'est le « E » engagé

De notre EGALITE

Pour des siècles de légendes

Trois couleurs en offrande

Par le Bleu, par le Blanc, par le Rouge

Nos esprits et nos cœurs se retrouvent

Si belle et rebelle

Fidèles et Fou d'elle

C'est la tienne, c'est la mienne

Éternelle et on l'aime

France

C'est Marianne qui sait défier le pire

France

C'est une âme qui vient nous sauver jusqu'à périr

Dans le feu dans les flammes

Elle défend Notre Dame

Et se donne et se bat pour les peuples en sou

France

Tes médailles n'ont jamais de revers

France

Ta bataille c'est le sacrifice, tous solidaires

C'est servir en écho

Les étoiles d'un maillot

Et gagner quand le jour d'y croire est arrivé

France

C'est le « F » révolté

De la FRATERNITE

C'est le « R » espérance

De toutes les RESISTANCES

C'est le « A » constellé

Des plus fortes AMITIES

C'est le « N » infini

D'une NATION unie

C'est le « C » qui présage

De la force du COURAGE

C'est le « E » engagé

De notre EGALITE

Pour des siècles de légendes

Trois couleurs en offrande

Par le Bleu, par le Blanc, par le Rouge

Nos esprits et nos cœurs se retrouvent

Si belle et rebelle

Fidèles et Fou d'elle

C'est la tienne, c'est la mienne

Éternelle et on l'aime

France

Et Français, écoutez bien ce cri

France

Le mien c'est : Liberté et c'est là mon pays

Since then, the song F.R.A.N.C.E. seems to have been reclaimed as something of a newer anthem and was clearer repurposed in this way by RN supporters during the time period under observation (07 June – 07 July 2024). We then scraped these using Zeeschuimer and used 4CAT to filter for our time period (7 June - 7 July 2024). We then plotted the images accompanying each post that used “F.R.A.N.C.E” to PixPlot and made an ImageWall, which shows a sea of the French flag + some rather violent warnings about what might happen if France does not vote for the RN.

Image 6. TikTok Users: Original Sound F.R.A.N.C.E. (by Candice Parise)

Similarly, we noticed that emojis are used on Twitter for positive (for) and negative (anti) identification. We analyzed emoji use for each of the 5 main communities, for both “bios” and for the body of the tweets. In the bios of Community 1 (Rassemblement national), the French flag is the most frequent (n=131). Following closely is support for Israel (n=40) and identification with a Christian cross (n=29). The Russian flag icon follows (n=23) with the rainbow flag appearing next in frequency (n=15). Appearing less often, but supporting the other themes are other climate and nature icons, and religious icons. For the bios, we noted the Importance of flags, especially France and especially in the community 1. Community 2-3-4 use peace symbols, cross, radioactive, raised fist, seedling, and some animals such as elephant, turtle, and some sports such as climbing, cycling, swimming. In the communities associated with the NFP (2,3 and 4) we see the use of more nature related emojis like sunflowers, green branches, trees, and planet earth. We can also see rainbows, often associated with the LGBTQIA+ community, and a lot of different colors of hearts. Community 4 uses the purple circle , which for some users signifies “ the struggle of Jewish and Arab-Palestinian citizens of #Israel, standing together for peace, equality, and social justice; standing against the war, the Occupation and the escalation of hostilities”.


For the tweets, Community 2 and 3 utilize the following emojis: ballot box, planet, raised fist, sunflower (supporting climate consciousness), seedling, euro banknote or flag (supporting the E.U.), rainbow flag (LGBTQIA+), handshake, green heart. In the RN cluster, the red ball emoji appears most frequently – a warning sign. In that community, we also see the use of the ‘vomit’ emoji and the red X to critique or identify against other groups or issues. In this group religious symbols appear (the cross, the church) and some identification with Israel (the yellow ribbon, the star of David, the Israeli flag). The Russian-Ukrainian conflict plays out mostly in terms of flag icons, with support for Russia in this community appearing via the Russian flag. Finally, Community 2,3 4 (NFP), and 5 (Macroniste) utilize the Palestinian flag emoji that appears to support Palestinian.

In the table here we specifically sort emojis for the climate conversation:

Community label

top emojis

La France Insoumise

Mélenchon

Socialistes

Macron

Rassemblement National

Discussion

We would argue that platform affordances do affect the ways in which users express political identities online. But we would add that different political communities develop some specific, shared forms of expression. These are not so much unique modes of political expression, as they are tendencies within groups. For example, the choice of platform may itself be considered a shared form of expression for certain political communities. As image 3 and image 4 indicate, Twitter seemed to be the privileged site for discussions around the NFP, where the RN was more prominent on TikTok.

Following the methods for analyzing TikTok ’s “sound linking” affordances described by Geboers and Pilipets (2024), we noted that RN partisans use shared songs to express political identity, in this case, the patriotic F.R.A.N.C.E song by Candice Pariser. We found that partisans of the R.N. on TikTok were more easily identified via the sharing of this original song than by keywords.

Emojis continue to be a significant form of personal, social, and political identification online. But here too, communities and platforms seem to share specific tendencies in expressing partisanship. In our analyses of the 5 clusters, we noted that the Rassemblement national cluster tended to favor the use of emojis to express opposition (the red “warning” button) or disfavor (the vomit emoji) than the NFB clusters, whose dominant emojis expressed support for ecology (the sunflower, the green heart, trees, branches) or Palestine (the Palestinian flag, the inverted red triangle, the purple circle). That said, some emojis lend themselves more to expressing support and others to opposition. Some emojis acquire platform-specific meaning (the purple circle was discussed and adopted on X.com (https://x.com/omdimbeyachad/status/1394285539545780227?lang=en), and others take on more general meaning. The sharing of emojis remains a fundamental means of self and group identification, a cross-platform shared set of codes. In user bios in particular, flags are a prominent means of indicating support, with the RN cluster expressing support for Israel (the Israeli flag, the yellow ribbon) and Russia (Russian flag). In addition, expressions of support and identification can also, at the same time, work as expressions of antagonism or disidentification. For example, in the RN cluster, we see some accounts expressing identification with French food and culture via emojis of a baguette, wine, or ham-related icons (bacon, pigs). Here too, hashtags signal identification, as the popular french sandwich of ham and butter on baguette – #jambonbeurre – co-occurs with other strong French national identity terms (#françaisdesouche). These icons and hashtags express both patriotic identification with France, while also signaling antagonism towards halal codes, Islam and multiculturalism.


Finally, occasionally, an analysis of user bios in a dataset can be revealing in terms of which key terms are not present. We queried each cluster to explore which keywords appear in user bios as self-descriptions. We found no user bios in the RN cluster related to feminism specifically, though some accounts did include the rainbow-LGBTQ flag.

Sub-project 2: How the user biography affords expression of political vs. non-political identities

The X users of the French 2024 Election issue space express their identity through the affordance of user biographies in myriad ways. As such, we chose the following research question as our guide:

How does the X user biography afford expressions of collective identity?

To answer this question, we concentrated on the modes of biography use and how they map onto the political community clusters present in the legislatives04 dataset.

Our process began by filtering users into four distinct groups of biography use:

1. Expressing no explicit partisan, ideological, or otherwise political preference

  • Inductively constructed a list of political terms related to the French election issue space and filtered the data to exclude bios that contain such terms (see appendix A for the query terms).

2. Expressing party preference

  • Filtered using the above query list.

3. No biography

  • Filtered by “NA” (users with no biography are coded as “NA” in the dataset).

4. Ideological gray area

  • Derived from the remainder of users not grouped into other three clusters, where the biographies contain ideologically charged expressions of identity that are not specific to a French political party (e.g., feminism, climate activism, Palestinian genocide, etc.).

Originally we attempted to use several LLMs (e.g., ChatGPT, Llama) to automate the filtering process, but when comparing the output to the dataset this method proved inaccurate. As such, we manually constructed a list of political terms (n=216), mentioned above, beginning with terms most relevant to the election issue space and inductively crawling the data for other political expressions. This list was created to demarcate explicitly partisan user biographies from those that express more demure ideological identities and those that express none. While we recognize that all expressions of identity are capable of signifying politics, and moreover that identity is always already situated in the political, we chose to read these biographies with a pragmatic lens, taking users’ expressions of identity at their face value. This was done to avoid over-speculation and to make this space mappable; indeed, we are more concerned with how these users express their identities and less concerned with who these users realistically are.

Beginning with an initial list of political terms, we attempted to use 4CAT to filter the ‘legislatives04’ dataset by term but, for reasons unclear to us, it only filtered certain terms while ignoring others. We then moved to a manual filtering process where the dataset was filtered by each term, and further encountered terms were added to our list. While laborious, this iterative filtering process allowed us to construct an exhaustive list of political terms and ensured greater accuracy in filtering such a massive dataset. We also noted the use of political emojis in user bios, which we additionally filtered for (see Appendix A). These emojis correspond with French political parties, movements, and other ideological affiliations.

Through this filtering process, we successfully demarcated the four modes of biography use indicated above. This allowed us to perform several descriptive analyses of user biography practices.

Results

Using our filtered biography dataset, we performed an analysis concerning the following subquestion: What are the frequencies of the four biography practices across the French political communities?

Here, we compared our four biography-use clusters to the political community clusters identified in the ‘userswithcommunities’ dataset. This mapped user biography practices onto each political community to visualize each community’s use of the biography affordance.

Figure x. Distribution of biography use-type across political communities. Dataset: Legislatures 04; Method: Manual filtering by selected keywords (see appendix no. XY), relating results to Retweet networks and political communities main dataset (Legislatives 04); From left to right: La France Insoumise, Melanchon, Socialists, Macron, Rassemblement Nationale.

Pictured above are the five political communities identified in the ‘userswithcommunities’ dataset with the left-most party on the far left and right-most party on the far right. The height of the bar chart represents the number of users sorted into each biography use-type cluster.

From the data, the majority of users in this issue space do not express their identity through the biography affordance – indicated in yellow above. This cluster in particular invites over-speculation in that our knowledge of these accounts is limited due to the data provided as well as the lack of textual content in each account. However, further analysis should consider whether these accounts belong to authentic individuals or might consist of bot activity. While significantly present in each political community, the non-political use-type is most prevalent in the far-right RN political community.

The non-political use-type, pictured in purple, consists mainly of popular media quotes, humor, and personal descriptions (e.g., occupation, religion, gender, familial role, hobby, pets). While these expressions inherently reflect an implicit personal political identity, we read these biographies with a pragmatic lens, taking the users’ self-expression at face value. Each political community has a significant number of the non-political use-type, indicating that biographies are commonly used to express a non-political identity regardless of political preference.

The gray area use-type, fittingly pictured in gray, indicates biographies used to express an ideological position but not a specific political affiliation. Specifically, users claim their identity through signifiers like ‘feminist’, ‘environmentalist’, and even ‘islamophobe’, as well as quotations from ideological figures such as Noam Chomsky. Across all three political communities, this use-type is present but is consistently ranked third. It occurs most often in the Mélenchon political community, suggesting that some left-leaning users prefer to express an ideological identity over a non-political or strictly partisan one.

Lastly, users less often use the biography affordance to signify their party affiliation, pictured above in blue. Those who do so the most are of the RN political community, but even still it occurs significantly less often than the other three use-types.

This analysis suggests that the personal biography affordance on X is used fairly consistently across French political communities to express identity. Most users do not use their biography or use it to express a non-political identity. While we cannot say that these use-type distributions speak for the X population at large, this finding serves as a reference point for an extrapolative investigation of the biography affordance outside of this issue space.

Sub-project 3: Exploring the Affordances of Geo-Information on X and TikTok: A Comparative Analysis of Urban and Rural Areas in the French 2024 legislative elections

Objective

The objective of this research is to explore and compare the affordances of geo-information on social media platforms X and TikTok in the context of the French 2024 legislative elections, with a particular focus on urban and rural areas. The study aims to:

  1. Investigate how geo-information is utilized on X and TikTok by users in both urban and rural settings during the legislative elections.

  2. Compare the nature and content of political discourse across urban and rural areas on these platforms to identify differences and similarities in political engagement and themes.

  3. Evaluate the affordances of X and TikTok in facilitating geo-information dissemination and political discussions, considering their unique features and user interactions.

Through this comparative analysis, the study seeks to contribute to a deeper understanding of the role of social media in modern electoral processes and its implications for democratic participation in different geographic contexts.

Method

This analysis is based on the preliminary work and methodology developed at SciencesPo for the SoMe4Dem project, and uses a mixed methods to analyse platform affordances focusing on geo-information of X and TikTok.

Case study & results: The comparison of users’ bio description of the distribution of users’ location (urban and rural area in France) on X

By comparing the bio descriptions of users’ location in urban and rural France, we can see that during the election period, the category of urban users who posted tweets was different from that of rural users. For example, the group that introduced themselves as journalists only appeared among urban users, and the group that introduced themselves as railway workers, chefs, etc. only appeared among rural users. Urban users mentioned policies, culture, bands, citizens, freedom, ecology, health, justice, power, equality, and equality more often, while rural users mentioned retirement, children, education, future, teachers, youth, etc. more often.

Figure. Network of users’ bio description and places (Urban and Rural)

Figure. Word clouds of top-200 keywords of users’ bio description compared with urban and rural area

  1. The comparison of hashtags of the distribution of users’ location (urban and rural area in France) on X.

By comparing the hashtags of urban and rural users, it can be seen that the number of hashtags supporting the left party by urban users is greater than that of the right party, while the proportion of hashtags supporting the left and right parties by rural users is basically the same. Specifically, urban users are more concerned about health, war issues such as Russia and Palestine, immigration, taxes, the middle class, retirement, justice, housing and climate, etc. Rural users are more focused on war issues such as Russia and Palestine, crime, justice, housing and retirement, etc. The comparison shows that urban users pay more attention to health, climate, taxes and the middle class than rural users.

Figure. Network of hashtags and places (Left is Urban and Right is Rural)

Figure. Word clouds of top-100 keywords of hashtags compared with urban and rural area

  1. The comparison of content mentioned certain city and place of urban and rural area in France on X (election events city image).

By filtering tweets that mention geographic information and studying the topics and keywords of tweets about different locations, we can generate an election event city image. Nine large cities in urban areas of France and six cities or regions in rural areas of France were selected as examples.

For the urban areas of France:

Paris: The word cloud emphasizes "liberté" (freedom), "compte" (account), and "pays" (country). Other significant words include "secrétaire" (secretary), "libertés" (freedoms), "bloque" (block), and "justice". There are references to political figures and groups like "lfi" (La France Insoumise) and "francesouvnie" (a variant of France Insoumise). Themes of justice and freedom are prominent.

Marseille: Major words include "pays" (country), "compte" (account), "indépendant" (independent), and "libre" (free). "liberté" (freedom) and "secrétaire" (secretary) are also significant. The presence of words like "rn" (Rassemblement National) and "nouveauflfrontpopulaire" (New Popular Front) suggests political diversity. Other frequent terms include "conseiller" (counselor) and "soutien" (support).

Lyon: The most prominent words are "liberté" (freedom), "compte" (account), "pays" (country), and "humain" (human). Other notable terms are "conseiller" (counselor), "bloque" (block), and "media". Focuses on freedom, human aspects, and media. Terms like "lfi" (La France Insoumise) and "conseiller" (counselor) are repeated, indicating political discourse.

Toulouse: Dominant words include "conseiller" (counselor), "humaniste" (humanist), "liberté" (freedom), and "pays" (country). Other significant words are "oligarchie" (oligarchy), "histoire" (history), and "droite" (right-wing). Emphasizes counseling, humanism, and freedom. There are mentions of political ideologies like "patriote" (patriot) and "frontpopulaire" (popular front).

Nantes: Prominent words are "monde" (world), "pays" (country), "liberté" (freedom), and "conseiller" (counselor). Other significant terms include "massé" (mass), "guerre" (war), and "lfi" (La France Insoumise). Highlights global and national themes, freedom, and counseling. There are mentions of political groups and terms like "insoumise" (defiant) and "pcf" (French Communist Party).

Lille: Key words include "être" (being), "insoumis" (defiant), "compte" (account), "liberté" (freedom), and "monde" (world). Other notable terms are "politique" (politics), "journaliste" (journalist), and "jamais" (never). Focuses on being, defiance, freedom, and political engagement. Terms like "insoumis" (defiant) and "politique" (politics) indicate a strong political discourse.

Montpellier: Major words include "adjoint" (deputy), "maire" (mayor), "culture", "radio", and "conseiller" (counselor). Other significant terms are "besoindeurope" (need for Europe), "monde" (world), and "municipal". The focus is on local governance with terms like "maire" (mayor) and "adjoint" (deputy). "culture" and "radio" suggest an emphasis on media and cultural topics. "besoindeurope" indicates discussions around European issues.

Nice: Prominent words are "nationalité" (nationality), "immigration", "identité" (identity), "remplacement" (replacement), and "religieuse" (religious). Other notable terms are "invasion", "morale" (moral), "action", and "mass". There is a strong focus on issues related to immigration, identity, and national concerns. The presence of words like "invasion" and "remplacement" suggests discussions around perceived threats and changes to national identity. "morale" and "action" highlight ethical and proactive elements of the discourse.

Rennes: The most prominent words include "actualité" (news), "regarde" (look/watch), "jo", "compte" (account), and "conseillère" (counselor). Other significant terms are "mobilités" (mobilities), "ville" (city), and "forces". The focus is on current events and media consumption with "actualité" and "regarde". "jo" might refer to discussions around sports or other significant events. "compte" and "conseillère" suggest political engagement and local governance.

Figure. Urban cities/places: (Paris, Marseille, Lyon, Toulouse, Nantes, Lille, Montpellier, Nice, Rennes)

For the rural area of France:

Bretagne: Prominent words include "liberté" (freedom), "monde" (world), "toute" (all), "compte" (account), and "conseiller" (counselor). Other significant terms are "jo", "suivez" (follow), "municipale" (municipal), and "réflexion" (reflection). The focus here is on freedom, local governance, and media engagement. Terms like "liberté" and "monde" suggest broad discussions on freedom and global issues. "compte" and "conseiller" highlight local political involvement.

Aquitaine: Major words include "pro", "candidate", "partagée" (shared), "macron", and "emmerde". Other significant terms are "wef", "faire" (to do/make), "liberté", and "monde". The word cloud reflects a polarized political atmosphere with terms like "pro" and "emmerde" (a strong term implying annoyance or trouble). Discussions around "macron" and "candidate" indicate a focus on specific political figures. "liberté" and "monde" again highlight freedom and global concerns.

Midi-Pyrénées: Prominent words are "liberté", "monde", "faire", "france", and "politique" (politics). Other significant terms include "actualités" (news), "2024", "conseillère", and "front populaire". The focus is on freedom, politics, and current events. "liberté" and "monde" are central themes, while "faire" suggests action or involvement. "2024" and "conseillère" indicate discussions around upcoming elections and local political figures.

Centre: Key words include "compte", "conseiller", "aime" (love/like), "faire", and "tous" (all). Other notable terms are "exp" (likely experience), "fan", "justice", and "rétrouvez" (find/meet again). The word cloud emphasizes personal engagement and local governance with "compte" and "conseiller". "aime" and "faire" suggest a focus on positive action and involvement. "justice" indicates discussions around fairness and law.

Alsace: Major words are "monde", "vie" (life), "tout" (all), "plus" (more), and "fait" (done/made). Other significant terms include "anti", "culture", "pays" (country), and "patriote". The focus is on broader life issues and actions, with "monde", "vie", and "tout" being prominent. "anti" indicates opposition or resistance themes, while "culture" and "pays" highlight discussions around heritage and national identity.

Var: Key words include "politique", "être" (to be), "humain" (human), "fier" (proud), and "rn" (likely referring to Rassemblement National). Other notable terms are "liberté", "france", "compte", and "extrêmes" (extremes). The word cloud emphasizes politics, identity, and human concerns. "politique" and "être" suggest a focus on political involvement and personal identity. "humain" and "fier" highlight themes of human dignity and pride. "rn" indicates discussions around the Rassemblement National party.

Figure. Rural cities/places: (Bretagne, Aquitaine, Midi-Pyrénées, Centre, Alsace, Var)

  1. The comparison of content mentioned urban and rural area in France on X and TikTok.

Since Tiktok does not provide the data of user's location, we cannot compare the geographic distribution of users of x and tiktok. Therefore, we choose another strategy to compare the platform affordance. By searching posts on two different platforms, x and tiktok, with the same search terms, we filter and extract posts that mention the names of cities and rural areas in France to study the platform's affordance for geo-information from the content level. The search results show that there are only 236 posts on tiktok containing geographical names, which is less than x. Due to the video characteristics of TikTok affordance, people rarely post information related to locations on the platform. However, subsequent research can obtain geographic information by analyzing the locations where these videos were shot.

By comparing the keywords of x and tiktok in urban areas, it can be seen that tiktok encourages imitation behavior and participates in algorithms. The algorithmic culture of tiktok is not about content but about stimulating viral transmission, which leads to a high concentration of keywords on paris, manifestation, immigration and fyp (for your page), while the keyword distribution of x is more even, focusing on a more comprehensive distribution of keywords such as policy, journalists, and elections.

By comparing the x and Tiktok keywords in rural areas, since the sample size of TikTok in rural areas is much smaller than that in cities, it can be seen that the keywords are distributed more evenly. The x posts focus on policies, freedom, participation, legislation, etc., while the Tiktok posts focus on voting, elections, immigration, neighborhoods, the north, etc.

Figure. Word clouds of keywords in urban area on X and TikTok

Figure. Word clouds of keywords in rural area on X and TikTok

Sub-project 4: How do Journalists present themselves on TikTok, Instagram, and X (Twitter) in reference to social media affordances: case study approach

Objectives

Building on the X dataset from the SoMe4Dem project Affordances for Expressing Collective Identities: The Case of the 2024 French Parliamentary Elections, we observed that journalists constitute one of the most active groups within this dataset, with over 2,000 journalist accounts identified. Given the crucial role of journalists in disseminating information and their impact on democratic society, their self-presentation on social media offers insights into their identity, warranting further investigation. Our project specifically focuses on examining how journalists digitally present themselves across various platforms, with particular attention to the affordances offered by different social media platforms.

The concept of affordances, originally defined by Gibson as what an environment "offers" or "provides" to actors (Gibson, 1979, p. 127), is central to our analysis. In this research, we adopt a relational definition of social media affordances, which considers "how," "for whom," and "under what circumstances" technologies afford specific usages (Davis, 2020). Through the lens of technical affordances, we aim to explore how various social media platforms shape how journalists present themselves online, with a specific case study of freelance journalist: Charles Baudry. Charles Baudry is one of the most active journalists in the X dataset and is also noticeable on TikTok and Instagram, where not all journalists from the dataset have social media accounts.

Methodology

Identifying journalist Charles Baudry as a case study proved to be a lengthy process, providing valuable insights into how journalists present their online personas across different platforms. We began by analyzing the original Twitter dataset to identify the top 10 most active journalists discussing the French election.

Next, we used the tool Zeeschuimer to collect data from TikTok and Instagram by searching hashtags related to the French election. Zeeschuimer’s limitations, particularly its inability to retrieve bio information, prevented us from identifying which accounts were associated with journalists in the hashtag page’#legislatives2024’.

To address this, we manually searched for the names of the top 20 most active journalists from the Twitter dataset on TikTok and Instagram. Despite extensive efforts, we found only three journalists, including Clément Lanot , Pierre Jovanovic who had not posted on TikTok during the time period of 1st July to 10th July 2024. Given these challenges, we shifted our focus from analyzing journalists' group behavior to examining the online presentation of a single journalist, Charles Baudry, across the three platforms. We collected a total of 10 posts each from X, TikTok, and Instagram. We conducted qualitative content analysis and visual analysis to study their posts and profile page(screenshotted on 10th of August ), gaining insights into his online presentation.

Finding and discussion

The data-collecting process above also highlights that journalists primarily use Twitter as their main social media platform for information dissemination, in contrast to TikTok and Instagram. This difference underscores the varying affordances and cultures of different social media platforms. Twitter, known for its text-based content, real-time updates, and professional networking, contrasts with TikTok and Instagram, which are more visually oriented and focused on entertainment and personal content.

Profiles serve as a primary stage for journalists to present their professional identities. We observed that Charles Baudry's profile images on Twitter and TikTok convey a more professional identity, often featuring cameras that indicate a connection to the media industry. In contrast, his Instagram profile showcases more casual poses, with the individual looking down against a simple white or blue background. It aligns with Instagram's typical use as a platform for personal, lifestyle, and entertainment content, rather than professional work. Additionally, the Twitter profile header provides ample space for Charles Baudry to highlight their professional background, prominently displaying cameras and microphones at the center, emphasizing his media roles.

We highlight how journalists digitally present themselves on various platforms and pinpoint the affordances of news production and news consumption. The basis of our research is to demonstrate how journalistic content covering a political event like the 2024 Legislative elections in France contributes to the current polarization occurring within the public discourse on the recent election results. The spread of an explicit political position from a Journalist contributes to the polarization amongst digital communities, which has real-world implications. Even when journalists have a social media account, they tend to control the information they provide, and therefore the extent and type of online persona they want to present. Curating their online persona with the strategic management of online relationships can be referred to as self-branding (Gandini, 2016), which allows users to achieve influence and visibility that subsequently also transfers into the offline world (Page, 2012). What we found in our dataset is the variation of self-branding with journalist users and observed the transmission of information about electoral content.

We have found various self-presentations/self-branding of Charles Baundry’s journalistic persona among X, TikTok, and Instagram. His journalistic presentation is predominant within X and TikTok in comparison to how he presents himself on Instagram, which is a more personable display. To highlight this presentation, we compiled image walls with the 4Cat tool of the top posts from each platform to observe any similarities or differences among the content posted on his social media pages.

Each image wall presented shows how a journalist like Charles, uses social media platforms to report on current events like political elections and keep their followers up-to-date. The following image walls highlight:

Source: 4cat: Charles Baudry (@CharlesBaudry, Instagram)

The image wall above shows how Instagram is still used as a ‘personable’ platform where Charles states his profession on his bio, but the content is mostly personal photos of his life and only until recently have we seen content that is related to his profession as a journalist that is related to the Legislative elections in France.

Source: 4Cat: Charles Baudry ( @Charlesbaudryoff, TikTok)

The content from this image wall demonstrates a surprisingly increasing trend of TikTok being used as a news source among social media users. With the casual features adding music, emoticons, viral dances, etc. TikTok has become an easy platform where information can be easily created and shared among its users, in a sense creating a different form of public discourse, in particular covering politics.

Source: 4Cat: ( @CharlesBaudry, X)

The collective image wall branching off the original dataset from X reinforces how X (Twitter) has been a predominant platform where Journalists post and share information with their audiences in real time. As shown above, Charles posts the majority of his news on his X page, keeping organized documentation of the Legislative elections and any updates about the elections/ protests related to this event. The affordances of X with being able to create threads in real time prove effective for news production when it comes to sharing breaking news to news consumers on current events, which makes Baudry’s profile on X be seen more as a professional account rather than a personable one like his Instagram page.

Another difference we observed on profile pages is that different platforms emphasize different metrics, which may influence journalists' visibility and branding strategies. Twitter and Instagram prioritize metrics such as following, followers, and posts, while TikTok highlights the number of likes a user receives( Charles Baudry has received 977.8K likes). This emphasis on likes suggests a higher emotional affordance on TikTok, potentially shaping how journalists engage with their audience by encouraging content that receives immediate positive feedback. Journalists can leverage these metrics to showcase their popularity and social capital, which is crucial for personal branding. In addition, the widespread practice of linking accounts across multiple social media platforms enhances Charles Baudry's reach and allows him to maintain a cohesive presence across networks, underscoring the promotional functions of these platforms.

In terms of posts on different social media platforms, Charles Baudry utilizes hashtags on TikTok to leverage the platform’s algorithm for improved visibility, a strategy less evident on Twitter and Instagram. For example, on July 8th, he posted several videos about damage during the rally at Place de la République on both TikTok and Twitter. However, he used more hashtags on TikTok compared to Twitter. In addition to hashtags related to the French election, he included hashtags such as #fyp, #pourtoi, #fypシ゚, and #sinformersurTikTok.

The hashtags #fyp, #pourtoi, and #fypシ゚ refer to TikTok 's 'For You' page, which curates content based on user preferences and algorithmic ranking. By using these hashtags, content creators aim to optimize their posts' visibility and reach. The hashtag #sinformersurtiktok, meaning 'informers on TikTok,' indicates his attempt to position himself as a news provider specifically on TikTok. This practice highlights how journalists can leverage technical affordances to effectively brand their content and establish their identity as information deliverers.

Subproject 5: A structural Topic model of users’ bio enriched with Louvain communities

Objective

Analyze the differences in topic-specific feature probabilities in users’ self-descriptions between two distinct communities (identified in Section 3):

  • Cluster 1: Right | “Rassemblement National”

  • Cluster 3: Left |”Socialistes”

for specific topics and issues of interest.

This to reveal how each group's language use and focus in their biographies diverge, possibly reflecting political identities and views.

By examining these variations through a Structural Topic Model, we wish to gain insights into the linguistic markers and thematic emphases that characterize, at the aggregate level, the identities of the two selected political groups. Even if the case study focuses on differences between two specific clusters the model allows to compare all communities of users identified in Section 3.

Method

This analysis is based on the preliminary work and methodology developed at SciencesPo for the SoMe4Dem project, by Carlo Santagiustina in collaboration with Pedro Ramaciotti, first described here:

https://drive.google.com/file/d/1p3pzcsdR_4DgBT9alkPXXtVth6KY-Vt-/view?usp=sharing

The methodology builds on top of the work by Santagiustina & Warglien (2022) and consists in estimating a seeded Structural Topic Model (Roberts et al, 2013) of user biographies, using as features both words and their associations (i.e., dependency relations), this to extract and cluster into topics identity cues contained in users’ bio, and to map similarities and differences across specific dimensions of interest, such as users’ countries or communities.

Differently from what was done in the original work (see presentation link above) we here use Louvain communities based on the retweet network (see Section 3) as covariates in the STM, affecting both the contents and propensities of the different topics

Ingredients

Data

  • Users’ biographies: to infer user’s identities we employ the user_descrition column of the 2024 Legislative elections dataset (see description of the dataset in Section 2), filtering out all users with a bio with less than three tokens in the final feature dictionary.

  • French lexicon of seed words related to the online communication of identities through bios (translation in French and adaptation from earlier versions developed at SciencesPo for the SoMe4Dem project).

The multilingual lexicon (work in progress) is available at this link:

https://docs.google.com/spreadsheets/d/1a1WvVAb1PGLkpQYndUiLsVzFpYG2dIlnjeiOnXMyCBI/edit?usp=sharing

Tools

Pipeline

Figure 4: Visual representation of the pipeline

Step 0: Lexicon preparation and translation

Objective: Develop a French ontology (lexicon of words and phrases by topic and category) to be used to seed the STM

  1. Define the categories

  2. Identify key words and phrases (for example, for the age category, words like "age”,”years old”, "birthday”, "born”, etc.

  3. Create Regular Expressions for each pattern (for example, for “age”: ^age[d]?$, for “born”: "^born$|^born_in$|^born_the$|^born->in$|^born->the$”

  4. Add translations of these words and phrases for working with French data.

  5. Document the Lexicon in a Google sheet table containing the following columns:

    • word/phrase: The original word or phrase in English.

    • regex_seed: The RegEx pattern for the original word or phrase in English.

    • word/phrase_fr: Word or phrase translated in French.

    • regex_seed_fr: RegEx pattern for the translated word or phrase.

    • topic_label: The specific topic label.

    • topic_category: The broader category the topic belongs to.

Step 1: Raw data preprocessing

Objective: Clean the text data for the NLP and analysis.

  1. Text Cleaning: Use Regular Expressions (RegEx) to preprocess the text.

    • Remove URLs, emojis, special and strange characters.

Step 2: NLP

Objective: Tokenize text and extract features using NLP algorithms.

  1. NLP Pipeline: Use spaCy's fr_core_news_md model from R to process the text:

    • Tokenize sentences and words.

    • Extract Part-Of-Speech (POS) tags.

    • Extract and consolidate dependency relations (Dep.Rel.).

    • Extract and consolidate named entities (NE).

Step 3: DFM construction and metadata injection
  1. Convert Spicy output into a DFM: Convert the tokenized text into a sparse document feature matrix format, containing feature counts:

    • Each row represents a user description and each column represents a feature (i.e., a word, a named entity or a dependency relation).

  2. Add users’ metadata to rows of the DFM using the docvars() function in Quanteda

Step 4: Estimating the Seeded Structural Topic Model

Objective: Estimate a Structural Topic Modeling (STM) to analyze identity topics as a function of user communities.

  1. Setup and estimate STM: Use the STM package in R to estimate the seeded structural topic model using the constructed lexicon to seed topics, and the Louvain communities as a covariate affecting both topic contents and topic proportions.

Step 4+: Analyzing results

Objective: Interpret the results from the STM.

  1. Topic Distributions: Examine the topic distributions across documents.

    • Identify prevalent topics in bio aggregated by community or any other variable of interest.

  2. Correlation Matrix: Analyze correlations between topics.

    • Identify topics that frequently co-occur together or which inhibit each other.

  3. Covariate Effects: Study the effects of metadata (e.g., community) on topic proportions and content.

    • Understand how different communities influence topic prevalence.

Step 4++: Visualization

Objective: Visualize the STM outcomes for better understanding of the results.

  1. Topic Distributions Visualization: Use bar plots or heatmaps to display topic distributions for the different communities.

  2. Correlation Matrix Visualization: Create a network visualization of the topic correlation matrix.

  3. Covariate Effects Visualization: Plot difference word clouds to show the influence of covariates on topic contents (distribution of features in topics).

Case study & results: the French 2024 legislative elections

We have applied the aforementioned methodology to the dataset presented in Section 2.

Here below follows a summary of main results and findings.

Correlations between topics

Figure 5: Topics correlation network. Edges in blue represent positively correlated topics and (dashed) edges in red represent negatively correlated topics. Edge width pproportional to the absolute value of the correlation between two topics.

The network above illustrates the correlation between topics, with negative correlations shown in red dashed lines and positive correlations in blue continuous lines. Edge widths are proportional to the absolute value of the correlation between two nodes. Nodes represent topics and their size is proportional to the average topic propensity across all communities. The network reveals that most correlations between topics are negative.

Among the most negatively correlated pairs are Politics and political Events and Crafts, as well as Hobbies and Education: General, showing that individuals who focus on political events or education in their bio tend to engage less in discussions about crafts and hobbies. This suggests a clear difference between groups of people who are politically and educationally engaged and those who self-identify with crafting and hobby activities.

The residual topic, which is precisely there to capture contents not covered by the seeded topics, is positively related to many topics, such as Outdoors & Wildlife and Photography showing that there might be some other neglected identity dimensions related to the aforementioned seeded topics, not covered by our lexicon, such as outdoor sports, like climbing or sailing.

Among the few positively correlated topics, we find Ethnicity and Location: Country and between Ethnicity and Citizenship: migrant. This shows that users that self identify with their country in the bio are more likely to mention their ethnicity (see Ethnicity topic word cloud below), reflecting a strong sense of national and ethnic identity. It also shows that those who identify their ethnicity are more likely to discuss in their bio their status as migrants, indicating a connection between ethnic self-identification and the experience of migration or holding citizenship in a different country. This suggests that discussions in bio around ethnicity might include geographic elements and migratory context.

Politics and political Events is positively correlated with Location: Urban, showing that people that discuss politics and related political events in their bio are also more likely to claim in their bio that they live in urban areas or discuss issues related to this subject, and the other way round.

Also, Location: Urban and Employment Status: Entrepreneur are positively correlated topics. This indicates that individuals who identify as residents of urban areas tend to discuss relatively more than others about entrepreneurial identity, and vice-versa.

Employment status: Self-Employed is positively correlated with Media & Journalism, showing that there might be a relevant subset of the user population discussing French legislative elections that is both a journalist and a freelancer. Media & Journalism is also positively correlated to Identity: Political, showing that users that discuss about Media & Journalism in their bio, are more likely to also discuss their political views in their profiles.

Several topics discussed in the bio related to Family & relationships, such as Family & relationships: married and Family & relationships: parents and children, are positively correlated, showing, as expected, that individuals who talk about being married often also mention their roles as parents in their profiles, indicating a strong linkage between different aspects of familial relationships and online communicated identity.

Topic specific differences between the two communities (Left Vs Right)

Figure 6a: Topic-specific differences in feature distributions between the Left (Cluster 3: "Socialistes") and the Right (Cluster 1: "Rassemblement National") for the seeded topic "Politics and Political Events". Feature size is proportional to the overall feature probability. Feature color and x-axis position depend on the difference in feature probability between the two communities. On the left are features that are more probable for users in the Left community, on the right are features that are more probable for users in the Right community. At the center (vertical dashed line) are features that are equiprobable for both communities.

The word cloud for the topic Politics and Political Events, reveals distinct thematic focuses for the left and right clusters. The left emphasizes themes of unity and progressive movements with terms like «nouveaufrontpopulaire» and «nouveau front populaire» ("new popular front"), «écologiste» ("environmentalist"), «populaire» ("popular"), and «fédérale» ("federal"). Conversely, the right cluster is heavily centered on the extreme right party (RN) and its leader, Jordan Bardella, evidenced by words like «rn», «rassemblement national» ("national rally"), «rnational_off», (referring to the official profile of the RN) and «j_bardella» (referring to the official profile of Bardella). Both camps, however, utilize generic terms related to the 2024 and 2022 legislative campaigns with similar frequencies.

Figure 6b: Topic-specific differences in feature distributions between the Left (Cluster 3: "Socialistes") and the Right (Cluster 1: "Rassemblement National") for the seeded topic "Environmentalism". Feature size is proportional to the overall feature probability. Feature color and x-axis position depend on the difference in feature probability between the two communities. On the left are features that are more probable for users in the Left community, on the right are features that are more probable for users in the Right community. At the center (vertical dashed line) are features that are equiprobable for both communities.

The word cloud for the topic Environmentalism, also shows some relevant identity differences between the two groups. The right centers more on natural beauty and emotional connection with nature, with words like «terre» ("earth"), «nature», «amoureux» ("lover" - masculine), «amoureuse» ("lover" - feminine), and «beauté» ("beauty"). In contrast, the left focuses more on urgent environmental issues and advocacy, with terms like «climat» ("climate"), «environnement» ("environment"), «biodiversité» ("biodiversity"), «protection», «planète» ("planet"), «eau» ("water"), «respect», and «humain» ("human").

Figure 6c: Topic-specific differences in feature distributions between the Left (Cluster 3: "Socialistes") and the Right (Cluster 1: "Rassemblement National") for the seeded topic "Ethnicity". Feature size is proportional to the overall feature probability. Feature color and x-axis position depend on the difference in feature probability between the two communities. On the left are features that are more probable for users in the Left community, on the right are features that are more probable for users in the Right community. At the center (vertical dashed line) are features that are equiprobable for both communities.

For the topic Ethnicity, the left focuses more on diverse backgrounds and multicultural aspects with terms like «européen» ("European"), «de France» ("of France"), «adoption», «sang» ("blood"), «origine» ("origin"), «hors» ("outside"), and «français étranger» ("foreign French"). While the right centers more on national pride and identity, with words like «français» ("French"), «être français» ("being French"), «patrie» ("homeland"), «amour» ("love"), «fier» ("proud"), and «honneur» ("honor").

Figure 6d: Topic-specific differences in feature distributions between the Left (Cluster 3: "Socialistes") and the Right (Cluster 1: "Rassemblement National") for the seeded topic "Job & career". Feature size is proportional to the overall feature probability. Feature color and x-axis position depend on the difference in feature probability between the two communities. On the left are features that are more probable for users in the Left community, on the right are features that are more probable for users in the Right community. At the center (vertical dashed line) are features that are equiprobable for both communities.

The topic Job & Career, reveals a clear difference between the two groups.The left focuses more on the perspective of employees, with terms like « emploi » (employment), « rh » (human resources), « droit-social » (labor law), « dialogue » (dialogue), and « mixité » (diversity), indicating a strong focus on inclusion and labor rights. In contrast, the right centers more on perspectives related to talent, with words like « merite » (merit), « talent » (talent), « travaille » (work), « carrière » (career), « produit » (product), and « services » (services), suggesting a focus on the dynamics of hiring, skills, and company's viewpoints. The left cluster is more employee-centric, while the right cluster is more firm-centric.

Figure 6d: Topic-specific differences in feature distributions between the Left (Cluster 3: "Socialistes") and the Right (Cluster 1: "Rassemblement National") for the seeded topic "Economy & Business". Feature size is proportional to the overall feature probability. Feature color and x-axis position depend on the difference in feature probability between the two communities. On the left are features that are more probable for users in the Left community, on the right are features that are more probable for users in the Right community. At the center (vertical dashed line) are features that are equiprobable for both communities.

The word cloud for the topic Economy and Business highlights clear identitarian differences between the two camps in relation to business and the economic affairs. The right focuses more on conservative and market-oriented perspectives, with terms like «conservateur» ("conservative"), «libéral» ("liberal"), «passionné» ("passionate"), and «bourse» ("stock market"). In contrast, the left centers more on generic economic issues, energy markets and sustainable practices, with words like «économie» ("economy"), «fait» ("fact") and «énergie» ("energy").

Figure 6e: Topic-specific differences in feature distributions between the Left (Cluster 3: "Socialistes") and the Right (Cluster 1: "Rassemblement National") for the seeded topic "Science & Research". Feature size is proportional to the overall feature probability. Feature color and x-axis position depend on the difference in feature probability between the two communities. On the left are features that are more probable for users in the Left community, on the right are features that are more probable for users in the Right community. At the center (vertical dashed line) are features that are equiprobable for both communities.

The word cloud for the topic Sciences & Research shows some clear epistemic differences between the two groups, with respect to their beliefs about the nature of sciences, and research. The left focuses more on academic and teaching aspects with terms like «sciences» ("sciences"), «recherche» ("research"), «enseignement» ("teaching"), «humaines» ("humanities"), and «sociales» ("social sciences"). In contrast, the right centers more on existential and epistemic perspectives, with words like «vérité» ("truth"), «âme» ("soul"), «mensonge» ("lie"), «ailleurs» ("elsewhere"), and «consciences» ("consciousness").

Figure 6f: Topic-specific differences in feature distributions between the Left (Cluster 3: "Socialistes") and the Right (Cluster 1: "Rassemblement National") for the seeded topic "Gender: woman". Feature size is proportional to the overall feature probability. Feature color and x-axis position depend on the difference in feature probability between the two communities. On the left are features that are more probable for users in the Left community, on the right are features that are more probable for users in the Right community. At the center (vertical dashed line) are features that are equiprobable for both communities.

The word cloud for the topic Gender: Woman, focuses on womanhood, being a woman, and being free («être-libre», «libre»). The left focuses more on identity and liberation with terms like «être» ("being"), «femme» ("woman"), «être-libre» ("being free"), and «sage» ("wise"). In contrast, the right centers more on pride and national identity, with words like «fière» ("proud" - feminine), «française» ("French" - feminine), «mieux» ("better"), «libre» ("free"), and «chemin» ("path").

Figure 6g: Topic-specific differences in feature distributions between the Left (Cluster 3: "Socialistes") and the Right (Cluster 1: "Rassemblement National") for the seeded topic "Gender: man". Feature size is proportional to the overall feature probability. Feature color and x-axis position depend on the difference in feature probability between the two communities. On the left are features that are more probable for users in the Left community, on the right are features that are more probable for users in the Right community. At the center (vertical dashed line) are features that are equiprobable for both communities.

The word cloud for the topic Gender: Man highlights some very clear differences between how the two communities represent manhood in their biographies. The left focuses more on social and intellectual aspects with terms like «devenir» ("becoming"), «consciences» ("consciousnesses"), «révolte» ("revolt"), «commun» ("common"), «chaînes» ("chains"), and «pensées» ("thoughts"). In contrast, the right centers more on traditional and identity-related perspectives, with words like «chrétien» ("Christian"), «blanc» ("white"), «catho» ("Catholic"), «ennemi» ("enemy"), «de Gaulle», «male» ("male"), «male blanc» ("white male"), and «hétéro» ("heterosexual").

Findings

Our analysis of the biographies of X users that tweeted about the 2024 French legislative elections, reveals a pronounced divergence of identity cues in users’ biographies for the left (“Socialistes” | clusters 3) and right (“Rassemblement National” | cluster 1) communities identified in Section 3. These communicated identity differences appear to be coherent across numerous topics, reflecting deep political and cultural divides between these two communities. The left (cluster 3) community consistently emphasizes in their bio themes of unity, progressive movements, and climate advocacy, as seen in terms related to collective action, environmental protection, and social inclusion. Similarly, their approach to ethnicity celebrates multiculturalism and europeanism.

In contrast, the right (cluster 1) is marked by a strong focus on national identity, tradition, and conservative values. This is evident in its emphasis on the extreme right party (RN) and national pride in politics. Environmental issues are viewed by the right cluster through a lens of natural beauty and emotional connection to nature, rather than urgency and activism. Ethnic identity appears to be rooted in national pride, while job and career discussions highlight merit, talent, and company viewpoints, reflecting the dominance of employer-centric views. Gender discussions for the right cluster also emphasize traditional values and national pride, portraying manhood with a focus on religious and racial identity cues.

These clear differences underscore the broad ideological divide between the left's progressive and inclusive views and the right's conservative and patriotic identity, and their online narratives, which in the context of the 2024 French legislative elections, as this study shows, can be inferred by analyzing the bio of X users that tweeted about key electoral issues and parties during the election campaign.

5. Discussion

Recent elections such as the European elections (June 2024) and the ensuing federal elections in France (June-July 2024), both of which were heavily debated on social media such as X, have raised the question of how exactly social media relate to liberal democracies and democratic processes. One way to approaching this rather complex question, is to examine the extent to which social media might fulfill or support the functions typically ascribed to the 'public sphere'. Functions of the public sphere for instance comprise 1) providing news and information, 2) to provide a space for deliberation or decision-making, 3) to support the organization of collective action, or 4) to form collective identities. With populism and identity politics on the rise, the question of how social media contribute to the formation of collective identies thereby becomes all the more prominent.

In this summer school project, we invoked the concept of social media 'affordances' to investigate how the features of social media might allow users to construct and shape collective identities around the time of the French elections. Specifically, we set out to investigate how social media user express their identities on different platforms, notably looking at which identities are expressed, and whether there are any systematic differences between platforms. Through an exploratory analysis, we thus aimed to uncover the roles these identities play in political debates, and the extent to which they might contribute to polarization. As a case in point, we zoomed in on the recent French legislative election (June-July 2024).

The starting point for our analysis was a dataset of ca. 326000 tweets, from which we constructed a retweet network. Based on this retweet network, we identified five clusters based on the Louvain community detection algorithm. For each of the five communities, we then proceeded to examine key affordances and the corresponding identities these might help users construct. We were able to identifiy a number of community-descerning uses of the affordances explored, each of which might warrant further investigation (see Conclusion below).

Profile pictures

Profile pictures are an obvious way to express an identity. However, we found that usersuse the profile pictures only to a small extent to express political identities. Instead of that we found in all political clusters the same types of profile pictures (1) portraits (2) animals, flowers or landscapes (3) flags, logos, and symbols. Only the latter category that showed systematic differences between the political clusters. While In the RN cluster the french flag was the most prevalent flag, accounts from the centrist cluster showed the Ukrainian flag and on the left a significant amount of accounts had the “stop genocide” sign in the colors of the Palestinian flag in their profile.

Emojis

Emojis are used to express political affiliations in several ways. Thay appear in the user bios, in the screen names of users and in the content of actual tweets. Prominent examples of emojis that are connected to specific collective identities were again flags, but also other symbols that are connected to political parties or movements.

User biographies

One subproject studies to which etent users use their biogrpahies to express their political identity in general and party affiliation in particular. The overall finding was that across the political camps most users don't do it at all. They have either no biography or do not express their political orientation in it. For the subset of users that do express their political affiliation, however, we found using a seeded topic model that coherent identity cues are used within the political clusters with a a strong focus on national identity, tradition, and conservative values on the right and climate advocacy, social inclusion and europeanism on the left.

6. Conclusions

By way of conclusion, we highlight the main methodological observations that come from our present investigation, and envisage a number of avenues for future research.

One key methodological point to be raised here is that the study of platform affordances benefits from a comparative perspective, in which the possibilities offered by one platform are foregrounded in relation to those of other platforms. Such a comparison, in turn, depends heavily on the availability of comparable data across platforms. Unfortunately, obtaining such data proves to be challenging. Indeed, while the focus of the project reported here was mainly on X, we had originally intended to widen the scope of our analysis to also include comparable platforms such as Instagram or Tiktok, but we were restricted in the amounts and types of data that could be obtained from those platforms within the scope of a summer school project. Our excursions into other platforms have therefore mainly taken the form of explorations to be elaborated in future work. It should likewise be noted that we were only able to collect the relevant data from X through a Digital Services Act (DSA) request, which, although allowing us to conduct basic research, does not afford the flexibility that researchers previously enjoyed when the Twitter API could still be used relatively unrestricted for research purposes. These developments point towards a wider issue in the field of digital methods research, namely that access to social media data for researchers is becoming more and more restricted. We are however hopeful that a further implementation and potential expansion ofthe DSA might mitigate this devlopment. We also expect that more traditional methods for web-scraping might again take center stage as platform APIs themselves become less suitable for scientific purposes.

The exploratory analysis reported here opens up some further pathways for empirical analysis. In our present research design, we have started from a structural clustering of X accounts based on retweets, in order to then compare affordances such as the use of emojis between the different structural clusters. As such, we have clearly demonstrated the significance of some of these affordances in light of collective identity construction on social media in the context of deliberative democracy. In order to further investigate and pinpoint the role of each affordance in constructing specific communities, a future, alternative design can be implemented in which we do not start from retweets, but rather build alternative clusters based on each of the affordances discussed here. One migh for instance imagine clustering the data by the use of specific flags or emojis, and to then compare the resulting clusters with the aforementioned retweet network to examine differences and similarities. This might yield a future, more fine-grained perspective on our main research question.

7. References

  • Conover, M. D., Gonçalves, B., Ratkiewicz, J., Flammini, A., & Menczer, F. (2011, October). Predicting the political alignment of twitter users. In 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing (pp. 192-199). IEEE.

  • Gaisbauer, F., Pournaki, A., Banisch, S., & Olbrich, E. (2021). Ideological differences in engagement in public debate on twitter. PLoS One, 16(3), e0249241.

  • Gaisbauer, F., Pournaki, A., Banisch, S., & Olbrich, E. (2023). Grounding force-directed network layouts with latent space models. Journal of Computational Social Science, 6(2), 707-739.

  • Gaumont N, Panahi M, Chavalarias D (2018) Reconstruction of the socio-semantic dynamics of political activist Twitter networks—Method and application to the 2017 French presidential election. PLOS ONE 13(9): e0201879. https://doi.org/10.1371/journal.pone.0201879
  • Roberts, M. E., Stewart, B. M., Tingley, D., & Airoldi, E. M. (2013, December). The structural topic model and applied social science. In Advances in neural information processing systems workshop on topic models: computation, application, and evaluation (Vol. 4, No. 1, pp. 1-20).

  • Roberts, M. E., Stewart, B. M., & Tingley, D. (2019). Stm: An R package for structural topic models. Journal of statistical software, 91, 1-40.

  • Santagiustina, C. R. M. A., & Warglien, M. (2022). The architecture of partisan debates: The online controversy on the no-deal Brexit. PLoS one, 17(6), e0270236.

Topic revision: r11 - 10 Sep 2024, EckehardOlbrich
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback