You are here: Foswiki>Dmi Web>WinterSchool2024APIcalypse (20 Feb 2024, MartinTrans)Edit Attach

APIcalypse Now: Redefining data access regimes in the face of the Digital Services Act

Martin Trans (facilitator, UvA), Davide Beraldo (facilitator, UvA), Luca Draisci (information designer, DensityDesign), Leila Afsahi, Madeline Brennan, Vanessa Goldschmidt, Tereza Grendarova, Hannah Hamilton, Melis Keskin, Markella Papasokratous, Giovanni Rossetti, Madelyn Webb, Jade Williams, Honglan Xu

APIcalypse Now: Redefining data access regimes in the face of the Digital Services Act

1. Introduction

Research on social media platforms and search engines faces challenges in data access following the so-called 'APIcalypse' (Bruns, 2019), when more and more restrictions were introduced by major platforms like Meta and X (formerly known as Twitter) to their Application Programming Interfaces (APIs). In this ‘post-API’ scenario, researchers resorted to alternative data collection methods, such as scraping and data donations. However, with the upcoming enforcement of the Digital Services Act (DSA) Article 40, some of the platforms classified as 'Very Large Online Platforms' (VLOPs) and 'Very Large Online Search Engines' (VLOSEs) by the European Commission (2023) are updating their policies on transparency, for example by setting up new data access modalities for researchers (possibly including new APIs). This winter school project investigates these shifts, focusing on Instagram, Facebook, TikTok, and X/Twitter (even if the API maintains the earlier name). The project aims to evaluate the feasibility of obtaining data from these platforms in the context of the upcoming DSA. By exploring this aspect, the project seeks to audit and guide scholars through the changing dynamics of digital research, particularly in light of ongoing API limitations and anticipatory actions by social media platforms ahead of the DSA's enactment.

2. Research Questions

The project asks the central research question: How effective is it for researchers to negotiate data access with social media platforms in the context of the upcoming DSA, vis-a-vis existing methods of data collection (i.e., scraping or deprecated APIs)?

To answer this overall question, we are asking the following sub-questions:

RQ1) Who is eligible to apply for research according to different platforms?
RQ2) What is a researcher expected to prepare prior to submitting application forms?
RQ3) What restrictions to research are outlined by social media platforms Terms of Service?
RQ4) What type of data is obtainable to researchers through social media platforms' APIs, in light of the upcoming DSA?
RQ5) How does the answers to the questions above compare to existing methods of data collection ungoverned by the platforms?

3. Methodology

In order to answer these questions, we conducted a document analysis (Bowen 2009), examining platforms’ documentation and policy documents. Specifically, for each platform, we examined research-related sections of their websites, Terms of Service (ToS), API documentations, as well as other relevant material such as community guidelines and developers’ forums. Concretely speaking, we organised the work in groups. For each platform, each group proceeded with collecting, discussing and summarising question by question, on a rolling basis. The work in progress has been reviewed through periodic collective discussion moments.

4. Findings

Eligibility (RQ1)

TikTok To be eligible to apply for TikTok ’s Research API, individuals must meet specific criteria outlined by TikTok. This includes showcasing their academic background and expertise relevant to the research area specified in their application. They must either be a Qualified Research Partner or be properly authorised to act on behalf of one. Additionally, they must have no conflicts of interest in using the services and be committed to using data for non-commercial purposes, adhering to the Research API ToS. Eligible participants should be employed by a non-profit academic institution in the U.S. or Europe, with only researchers located in these regions allowed to apply. An important key to eligibility is providing a well-defined research proposal, ensuring that the eligibility criteria comply with applicable laws, regulations, and the scope and nature of the proposed research.

Meta Access to the Meta Data Library, managed by the University of Michigan's Social Media Archive (SOMAR), requires applicants to be researchers affiliated with an academic or non-profit organisation focused on scientific or public interest research. Applicants must demonstrate ethics committee approval, with specific guidance for U.S. institutions to use an Institutional Review Board (IRB), including those that might contract with an external IRB. The database prioritises entities such as universities, hospitals, and government departments, excluding non-profits and think tanks not aligned with these criteria. Only researchers meeting Meta's defined standards can access the API, ensuring a focus on ethical and institutionally supported research.

X / Twitter X's website, particularly in the Transparency section, outlines clear criteria for who qualifies as a researcher eligible to apply. Eligible individuals must have a primary institutional affiliation with an academic, journalistic, nonprofit, or civil society research organisation. Applications are open to individuals at the master's or Ph.D. level or above; those at the undergraduate level are not currently eligible. Applicants should have prior experience and relevant skills for data-driven analysis, as applicable datasets are mainly shared in JSON format and require technical analysis skills. A qualifying application must present a specific public interest research use case for the data, defined as non-commercial research intended for journalistic, academic, or nonprofit/civil society purposes. Additionally, applicants must adhere to industry-standard privacy and security measures for data management and sign a data use agreement. Conversely, X specifies who does not qualify as a researcher: undergraduate students; individuals with primary institutional affiliations in industry or government sectors; those without a primary affiliation in academia, journalism, nonprofit, or civil society research organisations; and anyone planning to share X data with government or other entities.

Preparation (RQ2)

TikTok The researcher has to create a developer’s TikTok account in order to be granted access to the application form. The application form was divided into three main sections, ‘Principle Researcher Information’, ‘Reseracher’s academic background’, and ‘Research Proposal’. The initial two sections of the form are covered in the previous section of this report, which showcases the standard expectations on who is eligible to apply. More specifically the third section of the application form is what the researchers are mostly expected to prepare before submitting the form. These include specifying which research topic category the project falls under, for example, ‘Human Rights, Algorithm, Mental Health, Consumer trends, etc’. The researchers applying for access to TikTok ’s API need to essentially have their research proposal ready with a description of the research design, hypothesis, expected outputs, summary of literature review, and the list of references or citations used for the project. Another question the application form asks of the researchers is the request start and end date of access to TikTok ’s API.

Meta Researchers seeking access to the SOMAR Meta database must navigate a comprehensive application process, overseen by SOMAR staff through what is termed an Independent Application Review (SOMAR, 2023). This involves detailing research ethics, project specifics, and the researchers' qualifications, among other requirements. Notably, the process mandates reapplication for significant project alterations and emphasises project-based approval over individual credentials. Communication from SOMAR about application status is promised within 4-6 weeks, a timeline that potentially conflicts with the Digital Services Act's stipulation for a 15-day response window for data access requests, raising questions about compliance. Applicants are expected to provide extensive details, including their data analysis expertise, project funding, team members, research objectives, and the necessity and impact of the requested data. A key part of the application is demonstrating how the research might address systemic risks within the EU. Given the platform’s novelty, there is little visibility into the approval or rejection criteria, but the depth of information requested suggests a rigorous screening process where insufficiently detailed or unsatisfactory responses could lead to denial.

X / Twitter Before submitting an application to X API access (related to the DSA’s article 40), researchers are expected to prepare several documents and statements, including a description of the research, detailed evidence of non-commercial use, disclosure of all funding sources (both direct and indirect), a description of required data and its purpose, a proposed timeframe with justification, and acceptance of data security and confidentiality requirements. Under certain circumstances, X welcomes applications from researchers with diverse backgrounds and methodologies, particularly those focusing on data-driven analysis of content moderation. However, upon closer inspection, the Developer Platforms website provides limited details on research prerequisites, raising concerns about the accessibility of necessary information specifically in relation to the upcoming DSA. The application form is hosted on a Google Doc, and online resources for researchers seeking access to X's data this way appear to be severely limited or nonexistent. The lack of detailed guidance or documented successful access to the data raises further concerns.

Restrictions (RQ3)

TikTok TikTok ’s ToS specify various restrictions to ensure ethical and responsible research practices. Researchers are restricted from retrieving other data or TikTok content through other means than TikTok Research API, explicitly prohibiting the use of scraping or other extraction techniques. Distribution, modification, or selling of TikTok data is strictly prohibited as well as giving access to unauthorised third parties, without TikTok ’s written authorization for access or use of Services. Creating a false identity on the TikTok Research Site is also forbidden. The Terms also emphasise compliance with reasonable request volume and policies, prohibiting actions that are considered excessive or abusive. Furthermore, researchers have to regularly refresh TikTok Research API Data at least every 15 (fifteen) days and delete non-available data at the time of each refresh. Researchers have to provide TikTok with a copy of the publication that includes the Research outputs, at least 7 days before publication for review to identify and remove any user private Personal Data. Upon publication of research outputs, TikTok has free and unlimited access to and use of the researcher’s outputs.

Meta Through assessing the practicality of complying with these terms within the scope of academic research, it is clear that researchers must conduct studies in a manner that is consistent with these terms, since the violations of the ToS, community guidelines, and other relevant policies, will lead to the disabling of accounts. This compliance might lead researchers to change their methodological approach or research strategies. Also, researchers have to pay more attention to the ephemerality of the violating/health-crisis-related posts and comply with the limited time frame to study the potentially violating contents, analyse, and publish the data with heightened sensitivity. The restrictions embedded within these terms significantly influence methodological choices and strategies, especially in the field of digital data studies as the terms can challenge the current research methodologies, especially for quantitative research. As qualitative research mostly deals with relatively smaller data sets with in-depth data (Fossey et al., 2002), it might be easier to comply with ToS. However, for quantitative research, the large-scale data analysis and the extensive data collection may conflict with ToS policies (Mohajan, 2020). As a result, researchers are restricted in data scraping or data access.

X / Twitter The restrictions outlined by X’s various ToS policies govern different distinct use-cases of access to the APIs. Generally, X offers different subscription levels, ranging from (drastically limited) Free, through Basic, Pro to Enterprise. The general ToS mandates compliance with X's rules and necessitates non-commercial use. Furthermore, Twitter prohibits the sharing of X data with government entities or other actors. Importantly, various sections of varying policies have disclaimers such as: ‘Note: This Section does not apply to researchers with X API access via Art. 40 of the EU Digital Services Act (2022) (“DSA”), who are instead subject to the procedures and restrictions set forth in the DSA and the Developer Agreement’ (X, n.d.), without any further information given. The only DSA-specific developer-terms is found in the Developer Agreement from November 2023, stating that:

You may not disclose, reproduce, license, or otherwise distribute the Licensed Material [X content] (including any derivatives thereof) that you retrieve through the X API to any person or entity outside the persons within your organization necessary to perform the research, unless (i) the information is disclosed to the Digital Services Coordinator or other party specifically permitted by the DSA pursuant to the “vetted researcher” status and procedures described in Article 40, or (ii) disclosure is required by law. (X, 2023, sec. III-H)

Data types (RQ4)

TikTok From TikTok ’s official website, TikTok Research API provides video, comment and user data for researchers who have the access. For each category, there are some criterias and every video that meets them can be accessible through API. For videos, they have to be set public by their creator who has to be aged 18 or over. Also, videos have to be shared in the regions of the USA, Europe, and the rest of the world but not Canada. Then, TikTok would provide more detailed information about each video such as share, like, comment, and view counts; video ID, create time, user name, and region code. For users, data of any user who made their account public and over 18 years old, is obtainable. It is possible to get following, followers, likes, and video count, as well as a display name, bio description, and avatar URL. Researchers can also see whether the user is verified.

Meta The data accessible through Meta's research platform encompasses a broad spectrum of public-facing information from Facebook and Instagram, as well as insights from election studies and advertisement data. Specifically: 1) Meta Content Library: Offers real-time data on public Facebook and Instagram content. For Facebook, it includes posts on public pages, groups, and events, detailing post metrics, contents, and associated media. Instagram data covers posts from public creators and business accounts, with metrics like likes, comments, views, and media details. 2) U.S. 2020 Facebook and Instagram Election Study: Provides detailed analysis on political news engagement among US users, tracking political news URLs and measuring potential, exposed, and engaged audiences. 3) Meta Ad Library: A publicly accessible dataset of all Meta platform ads, detailing ad placement, content, target demographics, and funding sources. It does not provide information on specific ad targeting strategies. 4) Geographical Scope Limitations: The access to data is restricted in certain regions, affecting the comprehensiveness of research on global societal trends and behaviours.

The research platform offers two interfaces for data access: a graphical user interface for more visual, user-friendly interaction, and a programmatic API for computational researchers familiar with R or Python. However, there are notable constraints, including the inability to download data for external analysis, a ‘clean room’ requirement for data access, a 7-day query limit with a 500,000 data records budget, and a policy of deleting all Content Library API research output every 30 days. These factors highlight Meta's emphasis on user privacy while also pointing to limitations in research flexibility and data portability.

X / Twitter There is not much clear information available as to exactly what access researchers applying under DSA’s article 40 will receive. While X promotes the paid subscription tiers as the ideal entryway to conduct research with the platform’s data, there is currently only speculation as to whether the DSA-enabled access would lead to any increased access to v2 endpoints at no cost for qualifying researchers, as relevant links are either broken (yielding error 404) or redirecting to the (paid) v2 developer API documentation. In inspecting the developer terms of service and agreement we can instead read that ‘... your access and use of the Licensed Material [X content] is limited solely to performing research that contributes to the detection, identification and understanding of systemic risks in the European Union and only to the extent necessary for X to comply with its obligations under the DSA’ (X, 2023, sec. III-H) hinting at X aiming to provide the bare minimum in terms of data quantity (e.g., amount of tweets) and quality (richness of metadata) to comply with the DSA, while not explicitly stating what that means.

Comparison (RQ5)

TikTok Using scraping tools such as Zeeschuimer unveiled unique data you can collect that can not be obtained through API, including information such as whether the video is an ad, a duet, or challenges. Other data such as author details, follower count, username, as well as specific TikTok -related information, like timestamps, create dates, music IDs, likes count, comments count, shares, views, and hashtags can be obtained with both methods. A significant difference when it comes to collecting data is that with scraping a researcher can only gather data by searching one specific query on TikTok. However, when using an API there are multiple ways of gathering methods, such as a specific keyword in the video description, user ID, create time, where the video was created, share counts, like counts, comment counts, the music used in a video, hashtags used in the description, the effects used, the ‘voice to text’ text as well as the playlist ID that the video belongs to. Another significant difference is that gathering data with an API is more flexible as a researcher has the ability to gather data from a specific period of time. Whereas with scraping, the only option to gather data is the data that is available on the platform in the moment they collect it, as the API offers rich query affordances compared to the general platform UI searching which users are presented with. Lastly, comments can be obtained easily through the API, where they are less easily obtained through various scraping tools (e.g., Zeeschuimer does not offer this). It is crucial to mention that, when utilising scraping tools, although it may not have specific eligibility criteria, preparation requirements, or application forms, since anyone who has access to it can use it no matter their academic qualifications and experience, its usage is not in alignment with TikTok 's ToS. Unlike API methods that need formal approval from the platform itself and adherence to platform guidelines, Zeeschuimer operates as a browser extension that allows users to capture data without undergoing a formal application process. As mentioned in the restrictions of API use, TikTok, explicitly prohibits unauthorised data scraping due to privacy concerns and potential misuse of user information.

Meta Before the introduction of the Meta Content Library and API, Crowdtangle was the primary tool for researchers to access Facebook data. Acquired by Facebook in 2019, Crowdtangle enabled keyword and phrase searches in public posts and facilitated the curation of public groups and pages. Despite its continued operation, Facebook's lack of maintenance suggests Crowdtangle may soon be phased out, shifting from a broadly accessible tool to one limited to researchers, journalists, and approved partners, with user-based rather than project-based approval. Comparisons between the data access provided by the Meta Content Library and Crowdtangle reveal limitations in the new system. Meta claims the Content Library offers comprehensive access to public content across Facebook and Instagram, yet Crowdtangle provides additional insights such as information on pages and groups posting specific content, including their categorization, likes, and descriptions. Unlike Crowdtangle, the Meta Content Library does not support quote searches or allow for the downloading of data for offline analysis, restricting research flexibility and analysis options.

X / Twitter The lack of a dedicated research API (or any documentation thereof) makes the comparison with established methods of data collection inapplicable.

5. Discussion

Scrutinising platforms’ provisions for data access in light of the upcoming DSA highlights several challenges and limitations for academic and independent research. The ways in which platforms deal with the issue seems to be largely discursive, as current efforts appear to be largely performative.
For Meta, getting access to the Content Library is an incredibly arduous process that requires researchers to have a high degree of institutionally granted professionalism. Despite being marketed as ‘new’, the Content Library and API actually offer less data than Crowdtangle does in its current state, and prohibits downloading data locally, which Crowdtangle allows. If Meta’s Content Library and API is providing less information to researchers than previous tools and require a much more stringent and limiting application process, Meta’s promise of comprehensive data access rings hollow. Privacy considerations could be a reason Meta is restricting access to data – there are certainly confidentiality reasons Meta does not give researchers access to user data only groups and pages. But Meta has a long history of an antagonistic relationship with researchers (Murgia et al., 2021) that prompts us to be sceptical about its commitment to transparency, and as we’ve shown in this essay the reality of its promised ‘comprehensive access’ falls short (Clegg, 2023). On top, it is unclear if any applications for accessing the API have yet been submitted, and we have no information on how many of them will be rejected and for what reasons.

The ToS of TikTok ’s research API TikTok review rights before any publishing can happen. On top of that, researchers have to provide very detailed preparation documents, which makes the process harder for researchers. And as a way to avoid being confronted with historical problematic content, and dissuading researchers from creating archives, researchers must refresh, or re-collect, any data every 15 days. Combined with the requirement to ‘notify [TikTok] without undue delay upon becoming aware of any content that is, or you believe to be, illegal, fraudulent, or violates TikTok Community Guidelines’ (TikTok, 2023, sec. III-3-D) and the requirement to ‘provide TikTok with a copy of any publications pertaining to or containing the results and findings of the Research outputs, and any supporting information, at least seven (7) days before publication’ (TikTok, 2023, sec. III-3-F-ii), TikTok is potentially granted with the power to unduly interfere with the pursuit of independent research. This could also well be at odds with institutional data retention policies and render independent researchers vulnerable to pushback (Venkatagiri, 2023).

With regards to X, their documentation and ToS contains (at the time of writing) a large portion of broken links, which leads to uncertainty regarding the level of access. It is unclear what the actual terms of conducting DSA-governed research are, leading to information scarcity. Contrary to what the website seems to communicate, a dedicated research API does not yet seem to exist, as all the links redirect back to the standard (paid) Twitter API v2. X’s research API application procedure appears as of now merely performative, as there are no public records disclosing that any application has in fact been approved.

Overall, these findings underscore a broader trend of ‘performative compliance’ with the upcoming DSA. The limited data output and more restrictive access provided by research-oriented APIs (when existing) compared to commercially available ones, coupled with the absence of data enabling the investigation of algorithmic personalization, constitute significant obstacles to transparency. These barriers hinder the ability to conduct timely research as events unfold, potentially pointing to a deliberate strategy by platforms to maintain control over how their data is used and interpreted in research contexts. The introduction of terms granting platforms increasing control over research findings and dissemination further exemplifies this trend, raising concerns about the autonomy and integrity of research in the digital age.

6. Conclusions

The findings of this winter school project highlights aspects surrounding data access on Facebook and Instagram (collectively under the umbrella of Meta), TikTok, and X (formerly known as Twitter) with a particular focus on research-oriented APIs under the upcoming DSA. The report examines eligibility criteria, expectations for researchers, restrictions outlined in the platforms' ToS, and the nature of data obtainable through their APIs. Additionally, a comparison between web scraping methods and applicable APIs reveals advantages and limitations of each approach. The research unveils the challenges and limitations in accessing and using social media data for (DSA-governed) research, highlighting the need for careful consideration of platform-specific guidelines and potential impacts on the integrity of research findings.

As the digital landscape continues to grow, it is essential for researchers and platforms to collaborate in creating frameworks that promote innovation while upholding ethical standards. The current provisions that platforms put into place are unlikely to uphold the standards set out by the DSA. Hopefully, future developments will bridge this gap – Internet researchers, legal scholars and digital right activists are keeping an eye on them.

7. References

Bowen, G. A. (2009). Document Analysis as a Qualitative Research Method. Qualitative Research Journal, 9(2), 27–40. https://doi.org/10.3316/QRJ0902027

Bruns, A. (2019). Filter Bubble. Internet Policy Review, 8(4), 1–14. https://doi.org/10.14763/2019.4.1426

Clegg, N. (2023, August 22). New features and additional transparency measures as the Digital Services Act comes into effect. Meta. https://about.fb.com/news/2023/08/new-features-and-additional-transparency-measures-as-the-digital-services-act-comes-into-effect. Accessed January 10, 2024.

European Commission. (2023, December 22). DSA: Very Large Online Platforms and Search Engines. Retrieved from https://digital-strategy.ec.europa.eu/en/policies/dsa-vlops.

Fossey, E., Harvey, C., Mcdermott, F., & Davidson, L. (2002). Understanding and Evaluating Qualitative Research. Australian and New Zealand Journal of Psychiatry, 36(6), 717–732. https://doi.org/10.1046/j.1440-1614.2002.01100.x

Mohajan, H. K. (2020). Quantitative Research: A Successful Investigation in Natural and Social Sciences. Journal of Economic Development, Environment and People, 9(4), 50–79. https://doi.org/10.26458/jedep.v9i4.679

Murgia, M., Criddle, C., & Murphy, H. (2021, December 6). Investigating Facebook: A Fractious Relationship with Academia. Financial Times. https://www.ft.com/content/1f409239-9e4a-4988-b6fa-cad4dbe7c344. Accessed January 15, 2024.

SOMAR. (2023). SOMAR InfoReady Application Guide for Meta Content Library and Content Library API. Google Docs. Accessed January 11, 2024 from https://docs.google.com/document/d/1iN4KOvFaYGZro23cB4j1FveouXMBcZnKl-yTUyx6fCg/edit?usp=embed_facebook

TikTok. (2023). TikTok Research API Terms of Service. Accessed January 10, 2024 from https://www.tiktok.com/legal/page/global/terms-of-service-research-api/en.

Venkatagiri, S. (2023). Researcher beware: Four red flags with the TikTok API’s terms of service [Substack newsletter]. Technomoral. Accessed January 11, 2024 from https://technomoral.substack.com/p/researcher-beware-four-red-flags.

X. (n.d.). Developer Policy – X Developers. Accessed January 9, 2024 from https://developer.twitter.com/en/developer-terms/policy

X. (2023). Developer Agreement and Policy. Accessed January 10, 2024 from https://developer.twitter.com/en/more/developer-terms/agreement-and-policy.

Topic revision: r2 - 20 Feb 2024, MartinTrans

Digital Methods

Course

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback