Open Source Intelligence on Budgets for Bits: An Analysis of EU Funding Allocation

Team Members

Anvee Tara, Bastian August, Brogan Latil, Fieke Jansen, Furkan Dabaniyasti, Gizem Brasser, Jasmin Shahbazi, Maxigas, Meret Baumgartner, Niels ten Oever, Sarah Vorndran, Zuza Warso

All cleaned datasets used in this project are openly available here: https://drive.google.com/drive/folders/1pUjTzBF8IdWIq0aGfK75w4yA88kEuhJt?usp=drive_link

The poster can be found here: https://drive.google.com/file/d/1BwnfKkqiz2PyZRgpQf35JI1Kv5MYGr9h/view?usp=share_link

Summary of Key Findings

EU research and innovation funding for digital technologies in the Horizon Europe programme (Cluster 4) prioritizes advanced technologies developed by the industry over those that more directly benefit the public and address their needs. Only a small portion of the total funding goes to alternatives or to corporate digital solutions that people use in their everyday lives.

Across the different destinations, there is a mission of catching-up, regaining control over materials and technologies, and fostering national and international ecosystems.

All six of the destinations under Cluster 4 share a techno-optimist centre of gravity, with technologies and infrastructures as solutions to the mission-driven goals at the periphery of the network.

Of the 25 projects receiving maximum funding from the EU, only one is intended for public use, while the remaining 24 are intended to target industry.

40 percent of EC funding in Cluster 4 goes to private entities

1. Introduction

This project arose from an interest in how public funds for research and innovation in digital technologies are allocated in the European Union (EU). Technological innovation is often attributed to corporations, but as Mazzucato shows, many significant innovations have been at least partially funded and incubated by governments. Mazzucato introduced the term “entrepreneurial state” to describe how the major technological advances in contemporary history were made possible by governments’ structural financial support and risk-taking. ​​

The concept of the "entrepreneurial state" challenges the conventional view that innovation and economic success are solely the result of private sector activity and free market dynamics. This is illustrated by the example of the Department of Defence, Defense Advanced Research Projects Agency (DARPA), but also the Department of Energy, and the National Science Foundation making significant investments in internet protocols, search algorithms, GPS technology, microprocessors, LCD displays and touch screens through applied research grants, early-stage finance, and strategic procurement, without which these technologies might never have emerged.

Another relevant concept employed by Mazzucato is that of mission-driven innovation policy. According to Mazzucato, missions are about using innovation to address a challenge by solving a bold, significant problem. Missions should address major societal objectives that will be important across Europe. The archetypical historical mission was NASA putting a man on the moon and bringing him back safely. But the problems Europe and the world are currently facing are more social and wicked than going to the moon which was mainly a technical achievement. Mazzucato explains that:

“Apollo was inspirational, and much can be learned about the importance of setting clear goals, while allowing bottom-up experimentation to contribute to the overall success, but when we think of selecting EU missions today it is necessary to frame missions with a clearer societal relevance. While a purely technological mission may be appropriate for an innovation agency (e.g. in the case of space this would include NASA or ESA), at the EU level, we must be more ambitious in making the link to societal impact.”

This state’s capacity to shape markets and steer innovation has been under-researched. To address this knowledge gap we mobilized Open Source Intelligence methods to dive into the European Union's funding allocation for digital technology research and innovation. We explored the explicit and implicit values and priorities behind funding allocation decisions to determine what drives them and what mission (if any) they are trying to accomplish.

The explicit priorities were discernible through the analysis of policy documents and public statements, whereas the implicit values, though less overt, were inferred from the allocation of funds. Funding decisions provided insights into what is deemed worthy of support and revealed how the concept of digital innovation is perceived and prioritized. By analyzing these decisions, we gained a better understanding of the criteria and values that guide investments in new technologies, as well as the strategic directions and trends favored by the EU funding bodies. The concepts of the entrepreneurial state and mission-oriented innovation policy formed the theoretical foundation of our investigation.

2. Initial Data Sets

The research focuses on Horizon Europe (HE) funding, as it is Europe’s flagship research and innovation program. HE has set aside 95.5 billion euros for innovation funding, with 15.3 billion euros allocated to Cluster 4 "Digital, Industry, and Space" projects between 2021 and 2027 (Fig. 0).

Figure 0. Horizon Europe’s funding structure

To analyze the EU's investment in research and innovation in digital technologies we queried the following datasets:

  • Strategic plans and work programmes for HE Cluster 4, which outline funding priorities, calls for proposals, and specific research areas.

  • The project database for HE Cluster 4, containing information on all funded projects, including their objectives, budgets, and outcomes.

  • The organizations dataset for HE Cluster 4, containing information on all recipients of HE funding, including their role in the projects, type of institution, and funding amount granted.

These datasets were downloaded from the open source European Data portal. These datasets, as well as supplementary information also found on the EU Data portal, are listed by work package below.

Work packages 1, 2, and 4 began with the ‘HORIZON Projects’ dataset, filtered for projects under Cluster 4. Work package 3 began with the ‘HORIZON Organisations’ dataset, also filtered for projects under Cluster 4, to focus on the information of funding recipients missing in the Project file.

To supplement their discrete analyses, each work package also used the following data:

WP1:

  • Official EU taxonomy.

  • HE Work Programmes CL4.

WP2:

  • Datasets were created using the following documents:

    • Horizon Europe Strategic Plan (2021-2024).

    • Horizon Europe Work Programme 2021-2022: 7. Digital, Industry and Space.

    • Horizon Europe Work Programme 2023-2025: 7. Digital, Industry and Space.

  • The Strategic Plan was manually scraped, and the Work Programmes were automatically scraped.

WP3:

  • Many of the columns of the organization.csv dataset use internal codes, such as ‘ecContribution’, ‘netECcontribution’, ‘totalcost’, etc., and were translated using the EU Funding and Tenders Portal.

  • Activity type abbreviations (eg., PRC, REC, HES, PUB, and OTH) were decoded using the ‘Organization activity type’ dataset.

WP4:

  • More information on each project was accessed using the CORDIS URL structure by appending the project ID to the base URL.

Each work package cleaned and transformed their initial datasets to suit their research questions and analyses. Those edited datasets and those called in scripts are available in Annex 2. Further details on these datasets and the processes of transforming can be found in the Methodology section.

3. Research Questions

Our work focused on exploring investments under Cluster 4 of the Horizon Europe programme. Cluster 4 is dedicated to “Digital, Industry, and Space,” which provides funding for research and innovation projects in these three areas.

Our research was guided by the following research questions:

  1. What are the explicit and implicit priorities and values reflected in the EU’s funding decisions for digital technologies under Cluster 4?

  2. What are the domains and types of digital technologies that the EU funding allocation favors? How are budget allocations distributed across different technology domains such as robotics, AI, 5G (and beyond), Internet of Things, quantum technologies, etc.?

  3. Which entities (companies, universities, etc.) are the main recipients of the technology infrastructures funds of Horizon Europe?

  4. Do funding decisions prioritize projects aligned with industry priorities over those promoting open-source solutions, the public interest, and fundamental rights?

  5. Is there a bias in EU funding decisions towards technologies perceived as driving significant societal change?

The initial broad research questions were refined into more specific queries for each work package.

RQ WP1: Which digital technologies does the EU support under Cluster 4? What technologies are supported under each destination?

RQ WP2: What are the EU’s objectives and high-level goals for Horizon Europe’s Cluster 4? What are the explicitly stated overarching goals? What are the expected outcomes?

RQ WP3: Which actors are benefiting the most from Horizon Europe Cluster 4? What types of actors receive funding? Where are they from?

RQ WP4: Which projects received the most amount of funding? What do they have in common? Whom do they benefit?

4. Methodology/ approach

We divided our work into the following four work packages (WPs), each focused on a particular aspect of our research questions:

  • WP1: Which digital technologies does the EU support through the selected funding instruments?

  • WP2: Unpacking EU research agenda: how are goals translated into technologies?

  • WP3: Who are the main actors receiving EU funding?

WP4: What are the largest funded technology projects (examples of what they do, what are their characteristics)?

Below, we provide a description of each work package, outlining its specific research questions, methodology, and datasets that were used.

WP1

RQ1: Which digital technologies does the EU support through the selected funding instruments?

Prompt design

Each HE project had a short explanation of the objective the project is looking to achieve, including the problem they are trying to address and how. Initially, we opted for Natural Language Processing (NLP) to look for keywords in these objectives that could be used to categorise which technologies are being used for the projects. Due to our lack of knowledge on the projects, the use of synonyms, and typographical errors, it was challenging to extract keywords that efficiently query the documents. For this reason we asked ChatGPT to extract which technologies are being mentioned in each objective.

Topics vocabulary

We used ChatGPT to transform the table of contents for Cluster 4 into a vocabulary for the various topics in all calls. The following prompt extracted the vocabulary through GPT-4 on July 2nd 2024:

I need you to transform this information in a table with the following structure: column 1: "Topic" containing "HORIZON-CL4-2021-TWIN-TRANSITION-01-01" column 2: "Description" containing "AI enhanced robotics systems for smart manufacturing (AI, Data and Robotics - Made in Europe Partnerships) (IA)". When you are ready I will provide you with a PDF containing the information to be transformed.

Extracting technologies

Attempt #1

In order to extract technologies from the objective of each project, we used the following prompt:

Extract all technologies related to [TOPIC DESCRIPTION] in the following text: [PROJECT OBJECTIVE]. Separate each term with a comma.

The prompt uses the topic description as a scope to filter out irrelevant terms by providing a more detailed vocabulary. With this prompt, we get a result similar to this:

id

plainText

101058523

Laser Beam, AI-enhanced optical beam shaping module, multispectral in-line process monitoring and control system application, PBF-LB/M, additive manufacturing, AI-based solutions, photonics, green manufacturing, laser-based technologies.

The advantage of this prompt is that it employs an inductive, bottom-up approach. However, it also extracts many irrelevant terms that would require an extensive manual cleaning of the data set.

Attempt #2

To limit the technologies extracted, we used the taxonomy available at euroSciVoc to formulate the following prompt:

"Based on this text: [PROJECT OBJECTIVE], categorize the text according to these categories: [‘from euroSciVoc database']. Do not introduce new categories, populate the column with one or more categories, separated by comma."

With this prompt, we get an output which successfully populates consistent categories:

['additive manufacturing', 'industrial biotechnology']

['additive manufacturing', 'nanotechnology', 'sustainability sciences', 'manufacturing engineering', 'composites']

['additive manufacturing', 'sustainability sciences', 'biosensors', 'biofuels', 'recycling', 'manufacturing engineering', 'industrial biotechnology']

['additive manufacturing', 'sustainability sciences', 'industrial biotechnology', 'manufacturing engineering', 'sustainable agriculture']

['additive manufacturing', 'sustainability sciences', 'remanufacturing']

Still, we did not proceed as it turned out to be an unsustainable prompt due to its lengthy nature and excessive use of tokens.

Attempt #3

Ultimately, we used a shorter list of technology categories to classify the data.These technologies are listed by the European Commission (EC) as goals for Cluster 4. We enriched the categories with an additional set of currently popular technologies: 5G, 6G , extended reality, augmented reality, virtual reality, metaverse, blockchain.

[‘manufacturing technologies’, ‘key digital technologies including quantum technologies’, ‘emerging enabling technologies’, ‘advanced materials’, ‘artificial intelligence’, ‘robotics’, ‘next generation internet’, ‘advanced computing and big data’, ‘circular industries’, ‘low carbon and clean industries’,‘space including earth observation’, ‘5G’, ‘6G’, ‘extended reality’, ‘augmented reality’, ‘virtual reality’, ‘metaverse’, ‘blockchain’]

We used the same ChatGPT prompt as Attempt #2 but with said shorter list instead. This resulted in a more consistent output, although a small amount of records contained new categories made up by ChatGPT.

We asked ChatGPT to give each project multiple tech categories. This will impact how the frequencies and funding for each project are summed up, organizing them by tech category rather than by project.

Data manipulation

We used Python to clean and manipulate the project dataset by merging it with other data scraped from the PDFs for the Horizon Europe work programmes. The full methodology is explained in the Jupyter Notebook on Github.

WP2

RQ2: Unpacking EU research agenda: how are goals translated into technologies?

In order to uncover the EU’s agenda for Cluster 4, three documents were consulted: Horizon Europe’s Strategic Plan 2021 - 2024, and Horizon Europe’s work programmes (WP) for Cluster 4 ranging between 2021-2022 and 2023-2025. We created datasets by extracting funding calls’ expected outcomes from the work programmes and matching them to the Strategic Plan’s expected impacts.

Preparing datasets

The first dataset was created to make explicit the EU Strategic Plan’s six expected impacts, otherwise referred to as destinations, as well as Horizon’s WP’s justifications of the impacts. Each destination in Cluster 4 has a shortened title (see Table 1). With the destinations established, the two WPs addressing Cluster 4 were reviewed to assess to what extent certain overarching goals of Horizon Europe may have been prioritized.

Table 1. Destination Titles (long to short form)

Destination (title) Destination (short form title)
Climate Neutral, Circular and Digitised Production TWIN TRANSITION
Increased Autonomy in Key Strategic Value Chains for Resilient Industry RESILIENCE
World-leading Data and Computing Technologies DATA
Digital & Emerging Technologies for Competitiveness and Fit for the Green Deal DIGITAL EMERGING
Open Strategic Autonomy in Developing, Deploying and Using Global Space-Based Infrastructures, Services, Applications and Data SPACE
A Human-Centred and Ethical Development of Digital and Industrial Technologies HUMAN

The WPs have several “calls” proposed to meet the six destinations. Each call has several “topics” which set the expectations for the funded projects. Each topic has one or more “expected outcomes” (see fig. 1 for the structure of the WPs). Therefore, the dataset also includes the main “expected outcomes” of each destination, which we obtained from a secondary dataset that we created.

Figure 1. Diagram of the structure of the WPs

A secondary dataset was created by scraping the two WP PDFs, matching all “expected outcomes” to their respective “topics”. This allows us to quantify which words, and by extension goals and ideals, were repeated. This was done through the following command line code:

grep -Poz 'HORIZON-CL4(-.*?):(.|\n)*?(?=Scope)' other_source.txt > result.csv

sed -i -E 's/(HORIZON-CL4-:digit:*-:alpha:*-:digit:*-:digit:*)/\1 \|/g' result.csv

sed -i -E 's/(HORIZON-CL4-:digit:*-:alpha:*-:alpha:*-:digit:*-:digit:*)/\1 \|/g' result.csv

While the code managed to scrape the PDF’s topics, it was not perfect in matching up the expected outcomes with topics, therefore requiring manual cleaning of the dataset.

Analytical method

With the complete dataset, we were interested to understand how the expected outcomes per destination may reveal visions for Horizon Europe, and the extent to which they match with the EU’s expected impacts. To this end, a semantic network-map was created to document which words from the expected outcomes overlap with all of the destinations. However, the words were often vague. To perform a refined linguistic analysis of the words, we added the expected outcomes per destination to Voyant Tools. This allowed us to better map the vision of the EU Horizon project by interrogating collocations of words, rather than singular occurrences. After qualitatively close-reading selected word collocations, we put the linguistic data into Gephi for mapping. To retain the relation between words, we included the word collocations and their frequencies under each destination.

WP3

RQ3: Which actors are benefiting the most from Horizon Europe Cluster 4? What types of actors receive funding? Where are they from?

Preparing datasets

We primarily used the organization.csv dataset for WP3. We also looked at the yearly datasets for 2021-2023 from the Financial Transparency System of the EC (downloadable here), but decided not to use the FTS data because of discrepancies between the two datasets. We tested our methods on a 5G sample of cluster 4, adapted our methodology to some of the issues we found in the dataset, before applying it to the entire cluster. This iterative design allowed us to become familiar with the dataset and find that companies like Ericsson and Thales were in the dataset under different names because of different subsidiaries. The code and datasets referred to in this section can be found on GitHub.

Preparing organzation.csv

Getting familiar with the dataset: first, we got familiar with the organization.csv dataset and the meaning of its different columns. The following are some explanations, definitions and observations of these:

  • While every row has a name (of the organisation), the shortName is not always given.

  • activityType denotes the type of organisations, these abbreviations have the following meanings (with the percentage of rows that have this abbreviation):

HES: Higher or Secondary Education Establishments (34.2%)
REC: Research Organisations (22.7%)
PRC: Private for-profit entities (excluding Higher or Secondary Education Establishments) (29.8)
PUB: Public bodies (excluding Research Organisations and Secondary or Higher Education Establishments) (0.05%
OTH: Other (0.08%). NOTE: the organisations categorised as "OTH" included associations, membership organisations and NGOs.
  • role represents the organisation's position in the internal project hierarchy. The roles go from coordinator, associated partner, participant, to third party. Third parties include construction companies, PR firms, certification centres, and engineering consultants. The internal hierarchy also seems to be denoted by order, with coordinators always being assigned order=1. For the other three roles, we did not find a clear correlation between order and role, so we decided not to use order.

  • We were not able to find out what rcn means.

  • The amount of money per organisation-project pair is given as ecContribution, netEcContribution and totalCost. We arrived at the following definitions of these column:

ecContribution: total money this organisation receives for a specific project including budget for third parties.
netEcContribution: total money this organisation keeps for a specific project after distributing money to other organisations minus budget included for third parties and including EC money they receive from another organisation in this project. If ecContribution is lower than netEcContribution, the organisation received EC money from another organisation.
If ecContribution is higher than netEcContribution, the organisation gave EC money to another organisation.
totalCost: total amount of money invested in the project for this organisation. This includes EC contribution and other costs not covered by EU funding (in kind or third party funding). NOTE: for some projects, totalCost is not given or set to zero although ecContribution and/or netEcContribution are higher than zero. This seemed to be the case primarily to projects related to space or chemical engineering. :

Preparing organisations.csv: in order to be able to draw conclusions from the organisations.csv dataset with regards to cluster 4.
  • Extract cluster 4 projects: using the topic_part column in project.csv, we filtered out the projects in organisations.csv that were part of cluster 4 (see cluster4_filter.py). The result of this step was organization_cluster4.csv.

  • Clean up (see cleanup_organisations.py), the result of this step was organizations_cluster4_headorgs.csv.

  • Merge subsidiaries: some companies were in the dataset under different names because of different subsidiaries. We merged these into the name of their parent company by first creating a list (head_orgs.txt) of parent companies based on analysis of the data set and the 50 biggest companies in Europe from Wikipedia. Then, we replaced all names that contained these company names with the name of the parent company. Per parent company, the subsidiaries which we replaced can be found in merged_orgs.json.

  • Prepare costs: we formatted the values in the ecContribution, netEcContribution and totalCost columns so they are all represented as integers or floats with decimal point notation. We also added a column diffContribution = totalCost - netEcContribution, which is equal to the amount of money not covered by EC contribution (third party or in kind). If totalCost is 0, diffContribution is therefore negative. For the rows for which this was the case, diffContribution was set to None.

  • Add continent: based on the country code provided in the country column, we added a column which contains the code of the continent that country is in.

Linking with technologies: using the technology keywords linked to projects by WP1, we created datasets that link organizations to the technologies they receive most funding for.

  • Extract most funded technology (see link_tech.py), results in organizations_cluster4_tech.csv.
    • Link datasets: using the projectID column, we added the technology keywords from the taxonomyObjective from tech_project_categorized_v2.csv to each row of organizations_cluster4_headorgs.csv.
    • Cluster keywords: as some of the keywords in taxonomyObjective are very similar (e.g. circular industries, sustainable innovation, renewable technology), we decided to cluster some of these keywords into bigger categories (see cluster_tech_keywords.json).
    • Extract most funded keyword: we added the biggest_key column, which contains the keyword that was associated with the most funding (netEcContribution) for that row's organisation. If multiple keywords have the same amount of total funding, one of these keywords is arbitrarily chosen (this only happens for organisations that participate in a small number of projects).
  • Extract most funded technology for the top 10 funded organisations: we did everything we did for the previous step, but only did it for the 10 most funded organisations (highest sum of netEcContribution, see top_10_tech.json) (see link_tech_top10.py), which resulted in organizations_cluster4_tech_top10.csv.

NOTE: we found out that there are 9 projects from cluster 4 that are included in project.csv but not in organization.csv. All these projects started in 2024 and organization.csv was not updated after April 2024. To find this out, we used check_projectIDs.py.

Preparing and linking FTS data sets

  1. Extract Horizon Europe Projects: we filtered each of the three FTS data sets such that they only contained Horizon Europe projects (see fts_filter_horizon.py). From this, we got the YEAR_FTS_dataset_en_HORISON.csv datasets (2021, 2022, 2023).

  2. Compare VAT numbers: to compare the data in the two different data sets (organisation and FTS), we extracted the VAT numbers that appear in one but not the other (see check_VATs.py) to unq_vats.txt. There were 4051 out of 34410 VAT numbers (11.8%) that appeared in one data set but not the other. Because of this discrepancy, we decided not to use the FTS data set.

Analytical method

We analysed the organization.csv dataset through descriptive statistics to account for the descriptive nature of our research questions. Here, we clarified the dominant recipients of funding, distribution of actors per role, and frequency of actors’ participation. We then visualised these analyses using Tableau to produce a treemap of dominant funding recipients and a geographic map of dominant funding recipients by nation. This map was converted into a circle-packing visualisation using RawGraphs 2.0 to enrich its details. Our use of visualisations aims to simplify large, complex datasets not only to the reader’s benefit but to our own; we assume a visual orientation in our analyses to draw out and give shape to any narratives that may be lost in the statistics themselves.

WP 4

RQ4: What are the largest funded technology projects (examples of what they do, what are their characteristics)?

To conduct our research, we began by downloading the spreadsheet of Horizon projects from the European Data Portal. Using Google Sheets, we filtered the data to focus specifically on Cluster 4 projects. We applied a filter with the criteria “Text Contains: cl4” in the "Mastercall" column (Column M) to identify relevant projects, which resulted in 775 identified projects. These projects were then sorted by the "EC Max Contribution," representing the maximum amount of funding provided by the European Union for each project, from largest to smallest. For our research, we selected the top 25 projects in Cluster 4.

We then undertook a qualitative analysis of the top 10 EU-funded projects that were divided between two researchers. We accessed more information on each project using the CORDIS URL structure by appending the project ID to the base URL. We noted key characteristics and keywords related to the project, analyzed project links and related websites. Notes were taken for each project to facilitate further analysis.

Following this we manually coded and categorized the top 25 projects using several criteria. The projects were categorized into sectors such as Space, Open Access, Green Industry, and Quantum. Additionally, we identified the primary beneficiaries or target groups for each project, categorizing them into Industry and Public. To calculate the ratio of EU funding for each project, we first cleaned the "EU Max Contribution" data. We then summed the total EU contribution for all 775 projects and calculated the proportion of funding allocated to each project using a specific formula. The spreadsheet with the Top 25 projects and their analysis is available online.

5. Findings

WP1

Figure 2. Treemap diagram displaying the number of projects for each technology category and destination title.

As depicted above (fig. 2), projects referring to artificial intelligence technology form the biggest group within Cluster 4, followed by projects using advanced materials, manufacturing technologies, advanced computing and big data, and circular industries. Projects using quantum technologies, emerging enabling technologies, 5G & 6G, and blockchain form the smallest groups. Most projects using quantum or energy enabling technologies fall under Digital & Engineering Technologies for Competitiveness and Fit for the Green Deal. IoT technologies, on the other hand, are most used by projects falling under World-leading Data and Computing Technologies. Finally, blockchain is most used by projects falling under A Human-Centered and Ethical Development of Digital and Industrial Technologies.

Figure 3. Treemap diagram displaying the total amount of funding for each technology category and destination title.

Figure 4: Treemap diagram displaying the number of projects for each destination title and technology category.

From looking at the technologies grouped per destination title (see fig. 4), we can see that advanced computing and big data are the technologies most used for three destination titles: Climate Neutral, Circular and Digitalised Production (together with manufacturing technologies); Digital & Emerging Technologies for Competitiveness (together with Artificial Intelligence); and Increased Autonomy in Key Strategic Value Chains for Resilient Industry. Artificial Intelligence is also the technology most used by projects in A Human-Centered Development of Digital and Industrial Technologies and World leading Data and Computing Technologies. Finally, space including earth observation is the technology most used under Open Strategic Autonomy in Developing, Deploying and Using Global Space-Based Infrastructures, Services, Applications, and Data. Across destination titles, most projects fall under the destination Climate Neutral, Circular and Digitised Production, followed by Increased Autonomy in Key Strategic Value Chains for Resilient Industry.

Comparing the number of projects with the relative amount of funding received per destination title, we can see that A Human-Centered and Ethical Development of Digital and Industrial Technologies; Digital & Emerging Technologies for Competitiveness; and Increased Autonomy in Key Strategic Value Chains for Resilient Industry have reduced in size and that therefore, the projects under these destination titles received lower than average amounts of funding on average. Projects in Climate Neutral, Circular and Digitalised Production; Open Strategic Autonomy in Developing, Deploying; and World leading Data and Computing Technologies, on the other hand, received more than average amounts of funding on average.

Figure 5a: Treemap diagram displaying the total amount of funding for each destination title and technology category.

Figure 5b: Alluvial graph of mentions of technology category per destination title.

These findings are also substantiated by the alluvial graph showing the number of projects per technology category and destination title (fig. 5b). From this graph, we can also conclude that artificial intelligence and advanced computing and big data are the technologies that are used in the biggest number of projects in cluster 4, while the least number of projects use quantum technologies, emerging enabling technologies, 5G & 6G and next generation internet. From the destination titles, Climate Neutral, Circular and Digitised Production and A Human-Centered and Ethical Development of Digital and Industrial Technologies do indeed have the largest amount of projects. Furthermore, artificial intelligence and advanced computing and big data together make up almost half of the technologies mentioned by projects under Digital & Emerging Technologies for Competitiveness, Climate Neutral, Circular and Digitised Production, World leading Data and Computing Technologies, and Increased Autonomy in Key Strategic Value Chains for Resilient Industry.

WP2

Close-reading of word frequencies

From analysing the six expected impacts of Cluster 4 deductively from both Horizon Europe’s Strategic Plan 2021-2024 and the two WPs ranging between 2021-2025, we condensed the six expected impacts as the following overarching goals:

(i) leadership in the green and digital (twin) transition;

(ii) autonomy and resilience in industrial sectors;

(iii) secure ‘data’ economy;

(iv) autonomy in digital emergence;

(v) autonomy in space;

(vi) ethical and human-centered developments

Cluster 4 is therefore expected to contribute to an autonomous and ethical digital and green transition for economic security and social cohesion for Europe.

After analysing the instances of frequent words in the expected outcomes of Cluster 4, it became clear that many words were used with different goals and orientations. For example the word “human” appeared 28 times in the expected outcomes of the destination “HUMAN” and 26 times in the destination “DIGITAL EMERGING.” By inferring context through a close-reading, we found that “human” was employed differently between the two destinations (see Table 2).

Table 2: Examples of occurrences of the word “human” in destinations “HUMAN” and “DIGITAL EMERGING” in WP 2021-2022 and WP 2023-2025

Destination: “HUMAN” Destination: “DIGITAL EMERGING”
“The development process and system behavior of the technologies should explicitly acknowledge human values and needs and thereby enable social inclusion and environmentally friendly innovation.” “Smooth and trustworthy (including safety and reliability) human-robot collaboration through advanced reactivity and mutual understanding, and human-centric automated adaptation of robots in human-robot interactions.”
“European ecosystem of top internet innovators, with the capacity to set the course of the Internet evolution according to a human-centric approach.” “Achieve substantial “next step autonomy” in robots, undertaking non-repetitive tasks in realistic settings, including Human-Robot interactions.”
“Truly mixed human-AI initiatives for human empowerment.” “Systems able to demonstrate beyond human performance in complex tasks, with high impact in key sectors, that show extended levels of adaptation and flexibility.”
“Create an active network and cross disciplinary communities on digital humanism bringing together ICT experts, ethnologists, sociologists and experts in fundamental rights Help defining and strengthening EU’s approach to a human-centered digital transformation through cross-disciplinary, world class foundational and application oriented research.” “Robots with greatly improved intrinsically safe and efficient human-centric human-robot and robot-environment/objects physical interaction capabilities, at natural human speed or more.”

Although both destinations use “human” with human-centric developments of technology as a priority, destination “DIGITAL EMERGING” primarily uses “human” as means of extending and improving human capabilities. The word “human” in destination “HUMAN” reiterates the importance of funding projects focused on human values, ethics and social cohesion in technological developments.

Network analysis

Figure 6: Semantic network-map of the most frequent words of the expected outcomes overlapping multiple destinations.




Catch-up and Go-beyond:

As insinuated by the close-reading of HE’s Strategic Plan, autonomy stands at the forefront of the funding calls related to SPACE. The destination topic Open Strategic Autonomy in Developing, Deploying and Using Global Space-Based Infrastructures, Services, Applications and Data is on a mission to achieve independence. Terms like dependence, observation, enhance, availability, benefit, availability, objective are used, expressing a desire for control over available infrastructures to break out of dependence. This objective is framed as enhancement and benefit, as it allows for the observation of earthly activities from a territory beyond the EU's immediate reach. Upon close-reading, this sentiment is often mentioned in tandem with oversight and even foresight of geological phenomena. This is justified as a mitigation of risks that come from climate change and competition over resources.

Environmentalism and Industry:

SPACE terms bleed into the environmentalism cluster, which circles the TWIN TRANSITION topic: Climate neutral, Circular and Digitised Production. It is important to note that previous close-reading also found a desire for leadership within the green and digital transition, which got drowned out by more frequent words encapsulated by the network. This hub is satiated with climate jargon, such as footprint, waste, sustainability, recycling, and consumption. This illustrates how climate buzzwords have been normalized within the twin transition narrative. Some of these terms such as carbon footprint are promoted by industry players and shift the responsibility of sustainability to individuals. This comes with criticisms of the marketization of carbon off-setting projects, which are said to enable green-washing and colonial patterns of responsibility. Its narrative justifies the funding of innovations that promise efficiency to mitigate risks rather than tackling the root problem.

There is a visible proximity between the TWIN TRANSITION and the RESILIENCE (Increased Autonomy in Key Strategic Value Chains for Resilient Industry) topic. We observe a desire to regain autonomy over value chain networks through the acceleration, exploration, and integration of initiatives. This is mentioned alongside the material side of value chains, namely exploitation as an act and mining as an industry. Above the cluster, we observe industry terms such as manufacturing, value, and innovative. To build a resilient industry throughout the digital-green transition, then, means to respond to climate problems with industry solutions.

Governance, Ecosystems, and Corporate Chit-Chat:

The cluster on governance closely links to the ecosystems cluster, alluding to the way in which the EU would like to govern the HUMAN (Human-Centred and Ethical Development of Digital and Industrial Technologies) topic. Important European values describe leadership as achieved through international, regional, national collaboration, and partnership where needed. Ecosystems as a cluster describes the creation of communities, networks, synergies, and programmes through said human-centred developments of technologies. Implicitly, the co-occurence of companies (private) and public next to these rather multilateral collaborations speaks to the multistakeholder model currently fronted by the EU in its emerging digital strategy.

When moving towards the topic DIGITAL EMERGING (Digital & Emerging Technologies for Competitiveness and Fit for the Green Deal) the European mission of competitiveness is framed as driven by a political economy of security. When close-reading instances of security, we primarily find the word pertaining to secure EU systems, such as the creation of secure networks and secure access rather than stating explicit military values.

Missions around DATA (World-leading Data and Computing Technologies) burst with corporate chit-chat, such as appealing to stakeholders, through roadmaps and scientific inquisition. This corporate chatter is sandwiched in by the terms secure and leader/ship that flow into challenges, which are linked to cloud and edge computing as frequently mentioned innovations. This describes the desire to align with stakeholder interests by acquiring leadership in secure cloud and edge computing.

Gravitational Pull of Techno-solutionism:

Techno-optimism, or rather techno-solutionism, brings together all six destinations in the centre of the network: infrastructure, quantum, technology, capabilities, services, systems, markets, improve, efficiency, competitiveness. Meanwhile, the clusters surrounding the six destinations are freckled with directions pertaining to climate and material challenges, value chain and territorial autonomy, and ecosystem leadership. Techno-solutionism can be seen as the glue that keeps all six topics together, a centre of gravitation if you will.

The network distribution mirrors how innovation strategies involve the framing of new missions in relation to challenges that surround it. In this case the challenges are risks which can be mitigated by the promised efficiency of tech-solutions. Overall, this network shows how the EU’s mission-oriented investments aim to catch-up and go-beyond dependence by regaining control over material realities and innovation while fostering international and national ecosystems.

WP3

Top Receivers

Figure 7. Treemap diagram displaying how much funding the organisations in cluster 4 receive.

The first step of our analysis was to describe the dominant funding recipients in the dataset (fig 7). The dominant recipient is a German research consultancy, Fraunhofer Gesellschaft, who receives almost twice the amount of funding as the second greatest recipient, Teknologian Tutkimuskeskus. While we initially hypothesized that private companies would receive the most money, this analysis indicates that research institutions significantly outcompete all other activity types, also depicted in Figure 8 (below).

Figure 8. Treemap diagram showing the ten organisations that received the most funding in cluster 4, the amount of funding they received, and the technology category that was used most across the projects the organisation was part of.

As depicted above (fig. 8), each entity corresponds with the technological sector and which funding projects they most often contribute to. The dominant recipient, Fraunhofer, most often corresponds with advanced materials funding while the second greatest recipient, Teknologian, corresponds with sustainability projects. Advanced materials receives the most funding in the top ten most dominant entities, closely followed by sustainability. The space and internet domains are less represented in the top ten, indicating that their counterparts dominate the types of projects funded by HE.

Funding by country

Figure 9. Bar graph showing the amounts of funding per country.






The majority of the HE recipients lay within the EU (fig. 9). However, the distribution of these funds within EU countries is highly concentrated; the dominant national recipient is Germany, followed by Spain, Italy, France, and Greece. The bottom five recipients who receive funding are, from greatest to least, Chile, Guinea, Georgia, South Korea, and Brazil. The nations of the actors receiving the greatest funding all fall within the EU, while those receiving the least all reside outside of the EU. This indicates that the HE is primarily allocated to European actors.

Horizon Fund net contributions compared to total project cost

Figure 10. Bar graph showing the sum of netEcContribution and totalCost for each activityType.

This chart (fig. 10) depicts the difference between the netEcContribution – meaning the HE’s contribution – and the totalCost of the project corresponding to the actors’ activity types – meaning the category of the institution. This difference between netEcContribution and totalCost reveals how reliant each activity type is on the HE and if they have greater self or alternative funding pipelines; however, this analysis does not reveal the characteristics of those other pipelines.

Private companies receive the most funding while also having the most self or alternative funding power, with a nearly 200 million euro difference. Research institutions receive the second greatest amount of funding, but also appear to rely more on the HE to fund their total project cost. Universities receive the third greatest amount of funding which nearly accounts for their total project cost, indicating that they rely almost solely on the HE for their projects. ‘Other’ types of institutions – including industry federations, committees, and consortiums – receive the fourth greatest amount of funding, which also nearly matches their significantly smaller total project cost and indicates a sole reliance on the Horizon Fund. Lastly, public bodies – like municipal offices – receive the least amount of funding which accounts for less than half of their total project cost.

In essence, private industries receive by far the greatest amount of funding while also demonstrating the greatest self or alternative funding power. At the same time, the allocation of funds to these types of entities demonstrates how technological innovation is not simply driven by private companies and free market dynamics but emerges through state funded partnerships between entities of different types.

Peculiarities in the data set

As alluded to in the Methodology section, we noticed some inconsistencies with the primary dataset. Firstly, we noticed that for some projects, the totalCost is not given or set to zero, although the netEcContribution is bigger than zero. This should not be possible, as the totalCost is a sum including netEcContribution. The projects for which this was the case were primarily airspace or chemical engineering projects, so one possible explanation could be that this information would contain trade secrets. Next to this, we noticed that some big companies have subsidiaries located around the EU that receive funding. Just within cluster 4, for example, six different subsidiaries of Ericsson and ten subsidiaries of Thales received funding. More concretely, Ericsson Telecomunicazioni SPA is located in Spain, while its parent company Ericsson is located in Sweden, so the funding received by Ericsson Telecomunicazioni SPA goes towards that received by Spanish and not Swedish companies. Next to this, the categorization used by the EC for the activityType does not include a category for military organisation. For example, the Ministries of Defense are categorised as public institutions, and companies that have some variation of "defence" in their name are categorised as PRC (private organisations). Finally, we found some discrepancies between the different files in the dataset. Project.csv contains nine projects from cluster 4 that are not in organization.csv, for example, which seems to be because of them having been edited last at different points in time.

WP4

Figure 11. Bar graph showing the ecMaxContribution of the 25 projects receiving the most funding grouped by sector.

We created a Final Graph by categorising the top 25 projects in cluster 4 by Sector (Green Industry, Space, Quantum and Open Source, fig. 11) and by Target Group (Public or Industry, fig. 12). Then we created a sun graph by copying the dataset of the top 25 projects into Raw Graphs. The hierarchy commences with the target group, transitions to sector, and ends in the acronyms for each individual project. The layers are colourized by sector and arranged by size based on the ecMaxContribution. The sun graph displays the top 25 projects sorted within sectors and target groups. For example, the biggest project Salto is from the space sector and industry based, with the most ecMaxContribution at 39 million Euros. While 24 out of the 25 projects are beneficial for industry there is only one project that benefited the public. Albeit not significantly bigger than others, it is the second biggest project and goes towards a Commons Fund. The biggest projects with the most EU funding are notably located within the sectors of space, quantum computing and communication

and green industry/ sustainable and recycling technology. These projects are all industry-specific and not aimed at public interests.

Figure 12. Sunburst diagram showing the ecMaxContribution of the 25 projects receiving the most funding grouped by sector, grouped by target group.





The above graph displays the top 25 Eu funded projects in Cluster 4 according to their sector wise distribution. The Four sectors are - Open Source (5.7%), Quantum (35.8%), Green Industry (31.8%) and Space (26%). The top 25 projects receive more than 11.4 percent (470,136,748.46 Euro) of the total EU funding for Cluster 4 which includes 775 projects; total funding 4,444,228,526.68 Euro. Only one out of the 25 projects directly benefit the public whereas the others are targeted at industry. In the above graph each project is represented through its acronym in the outermost ring. The top funded project is SALTO which recieves 39 million euros. SALTO is a project dedicated to the production of a reusable space launcher for rockets.

Through a close-reading of the top 10 project descriptions we found that many projects emphasise the competitiveness of the European Union with international global powers in these subjects such as the US or China. This was especially observed in the Space and Quantum category. Furthermore, Quantum acceleration of speed and security seem to be the projects’ most emphasised goals. There are patterns that emerge when performing a qualitative analysis through a close-reading of the project objectives. When referring to Quantum projects, the overwhelming number of descriptions used the word revolutionise or revolutionary. We also found that Quantum projects were vaguely worded and ambiguous, in contrast to the green industry or space projects. Beyond Green Industry and Space, there are many investments in Sustainable technology and technology for recycling materials. The second biggest project consists of a fund for open source technology and knowledge. This fund is divided into small sub projects, whereas the other projects require many participants to collaborate. Thus, there is a difference in the type of projects funded in Cluster 4: while some are research projects or impact-based projects, others are funds that are then re-distributed.

6. Discussion

We based our research on Mazzucato’s insight on the role of government in driving technological innovation. This runs counter to the conventional wisdom that it is the private sector, particularly start-ups, that play the role of drivers of innovation. We were interested in exploring the assumptions about the role of digital technologies by looking at how these assumptions are expressed in Horizon Europe’s programme. Beyond this explicit narrative, we aimed to look at implicit values as ascertained through funding allocations. We investigated whether there is a clear mission that could explain the decisions on how the EU spends money on digital technologies.

The analysis of the Horizon Europe project documents about Cluster 4 pointed to the following key findings. First, EU research and innovation funding in this Cluster prioritizes advanced technologies developed by the industry over those that more directly benefit the public and address their needs. Only a small portion of the total funding goes to alternatives to corporate digital solutions that people use in their everyday lives. Moreover, of the 25 projects receiving maximum funding from the EU, only one is intended for public use, while the remaining 24 are intended to target industry. While these findings can be explained, to some extent, by the scope of the Cluster, which is “Digital, Industry, and Space”, the imbalance between the three fields and in particular who directly benefits from the project (industry or the public) is significant. Secondly, across the different destinations, there is a mission of catching-up, regaining control over materials and technologies, and fostering national and international ecosystems. Moreover, all six of the destinations under Cluster 4 share a techno-optimist center of gravity, with technologies and infrastructures as solutions to the goals at the periphery of the network. Finally, our analysis revealed that 40 percent of EC funding in Cluster 4 goes to private entities, which confirms that the private sector benefits significantly from public funding.

In the context of the ‘entrepreneurial state’, Rancière’s concepts of ‘consensus’ and ‘dissensus’ offer an additional approach to reading the funding protocols of HE. Where technological innovation is largely assumed to take place in the free market logic of unrestrained competition, our data corroborates Mazuccato’s claim that the contemporary state plays a primary, yet underacknowledged, role in facilitating technological advancement. Rancière posits that the contemporary neoliberal regime prefers seamless agreement, or consensus, rather than contestation and friction between civil and institutional subjects and entities. Through its funding structure, HE brings together entities that, at most, compete with and, at least, are stratified from one another to promise Europe’s future through concentrated innovation power. This indicates an orientation towards innovation through consensus, where all parties are swayed to participate through the promise of mutual benefit rather than the assumed rationality of free market competition and disagreement. HE mediates innovation through consensus – facilitating partnerships between disparate entities – to solidify and concentrate its promise of the future while simultaneously mitigating the risks associated with free market temporalities.

Another useful theoretical framing for our analysis is Beck’s ‘Risk Society’. The ‘risk society’ arose when modernization, as in the development and employment of technologies, led to questions of managing political and economic risks. Unknown effects and threats of rapid development and change need to be assessed in order to calculate and minimise risks on a global level. These risk assessments include consequences on a social, political and economic level. Although the global threats described as ‘hazards’ by Beck need to be managed, the development of technologies also includes the management of risks by global actors staying competitive with others. Something that is first considered unpolitical can, due to its social, political, or economic consequences, become political and enter the public space and debate. Managing these risks includes a ‘reorganisation of power and authority’. Risk management can be interpreted as applying what is known, avoiding as much as is unknown, and constantly reassessing and adapting to threats that cannot be imagined at this point. Falling behind in the race of technological development becomes a threat for nations. The only way to manage the risk of being left behind is to hop onto the train of development. The EU invests sizable amounts of money in areas that supposedly allow it to stay competitive with hegemonic global actors such as the US and China. Horizon fund projects in Cluster 4 are often described with the mission-driven goal of being competitive within those fields. This narrative could be seen when looking at the space sector. When it comes to accessing space and reducing the environmental damage of space travel, investments in projects that construct a reusable space launcher are first and foremost developed to stay competitive (e.g. Project SALTO). It appears that the EU’s geopolitical power is tied to a desire for economic grandeur, or at least control of the infrastructures that facilitate economic growth.

Framing the EU as a risk society illustrates Mazzucato's hypothesis that the entrepreneurial state needs to formulate mission-driven innovation policy through directions. These directions could be described as responses to risks, such as a lack of autonomy, leadership, and control over territory and material realities. EU investment procedures aim to secure and repair its credibility as an entrepreneurial state by responding to (non-European) corporate control over infrastructures. For instance, investments into quantum computing reveal that funding missions are driven by security, managing the risk of dependence on third parties. The EU also expresses an interest in green tech alongside its digital transition, aiming to facilitate a cross-sectoral Single Market. While this could be seen as a flirt with sustainability, it is rather an attempt to escape value chain and resource-dependency during impending conflicts and climate emergencies. In fact, green tech is often unsustainable as it extracts resources and works towards increasing efficiency, which in turn increases industry output. If we look at the EU’s future, hydrogen is mentioned as a way to green energy leadership. This feeds into the greening of the steel industry, enabling potential economic power within a lost industry. It appears that the EU’s risk of losing the race on the next technology pushes the values of autonomy and leadership to the front and sustainability to the back.

We observe that innovation funding aims to make things less bad rather than attacking the root of problems, favouring investments into technologies that manage risks instead of solving them. The HE funding calls respond to risks by investing into technological centers, which pushes risks to the peripheries of the EU and beyond. EU risk assessment procedures frequently include an exploration of scenarios similar to how the insurance industry operates when evaluating future risks. This approach limits solutions to the hypothetical, yet sensible scenario (i.e. the risk of economic fall-back and dependence) rather than making space for solutions that serve people and the public. EU and HE narratives create a monopoly of consensus on what progress or innovation is, structuring the sensible within the EU around the desirability of advanced (“new”) technologies. In contrast, we argue that the EU must start funding directions that are driven by sufficiency rather than efficiency, in order to support and maintain technologies that address the needs of the people and planet.

7. Conclusion

Our analysis of both the narrative documents and the funding allocation data revealed assumptions about how the EU institutions view the role of technology in society in general, ideas about specific technologies and what they can achieve, as well as how this is operationalized.

At the same time, we need to keep in mind that Cluster 4 within the Horizon Europe program is one element of a much broader funding process. In interpreting the results, we have kept in mind that funding under the Horizon programs is specifically dedicated to research and innovation projects. Therefore, the conclusions we could draw were limited to this particular object of funding, which per definition (“research and innovation”) favours high-profile, "disruptive" innovations (such as AI) over incremental or less "glamorous" technological advances and the potentially more important repair, maintenance, and development of existing infrastructure that already serves the public interest.

To better understand how the EU is shaping digital infrastructure through fund allocation, we also need to look at other programs currently funding digital technologies in Europe. The most important in terms of scope and size are the Digital Europe and Connecting Europe Facility-Digital programs. This conclusion points to the need for further research that examines these different policy instruments.

8. References

Beck, Ulrich. 1992. “Risk Society: Towards a New Modernity.” Translated by Mark Ritter. London. Sage.

Directorate-General for Research and Innovation. ‘In the Horizon Dashboard, There Are Some
Metrics Which I Don’t Quite Understand. Can You Please Elaborate?’ EU Funding & Tenders Portal, 20 July 2022. https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/support/faq;keywords=/11024.

European Commission. “EU Hydrogen Strategy under the EU Green Deal.” EU Hydrogen Strategy under the EU Green Deal, September 11, 2023. https://observatory.clean-hydrogen.europa.eu/eu-policy/eu-hydrogen-strategy-under-eu-green-deal.

European Commission. “European Multi-Stakeholder Platform on ICT Standardisation.” Shaping Europe’s digital future, June 13, 2024. https://digital-strategy.ec.europa.eu/en/policies/multi-stakeholder-platform-ict-standardisation.

European Commission. 2021. Horizon Europe Strategic Plan (2021-2024). LU: Publications Office. https://data.europa.eu/doi/10.2777/083753.

European Commission. “In the Horizon Dashboard, there are some metrics which I don't quite understand. Can you please elaborate?” European Union Funding and Tenders Portal. 2022. https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/support/faq;keywords=/11024.

European Commission. Organization activity type. 2018. Distributed by The Community Research and Development Information Service. https://data.europa.eu/data/datasets/cordisref-data?locale=en.

European Union. “CORDIS - EU Research Projects under HORIZON EUROPE (2021-2027).” European Data, June 28, 2023. https://data.europa.eu/data/datasets/cordis-eu-research-projects-under-horizon-europe-2021-2027?locale=en.

List of Largest Companies in Europe by Revenue’. In Wikipedia, 25 May 2024. https://en.wikipedia.org/w/index.php?title=List_of_largest_companies_in_Europe_by_revenue&oldid=1225622463.

Paterson, Matthew, and Johannes Stripple. “My Space: Governing Individuals’ Carbon Emissions.” Environment and Planning D: Society and Space 28, no. 2 (January 1, 2010): 341–62. https://doi.org/10.1068/d4109.

Publications Office of the European Union. “Horizon Europe Work Programme 2021-2022: 7. Digital, Industry and Space.” 2021. European Commission. https://ec.europa.eu/info/funding-tenders/opportunities/docs/2021-2027/horizon/wp-call/2021-2022/wp-7-digital-industry-and-space_horizon-2021-2022_en.pdf.

Publications Office of the European Union. “Horizon Europe Work Programme 2023-2025: 7. Digital, Industry and Space.” 2023. European Commission. https://ec.europa.eu/info/funding-tenders/opportunities/docs/2021-2027/horizon/wp-call/2023-2024/wp-7-digital-industry-and-space_horizon-2023-2024_en.pdf.

Publications Office of the European Union. “Open Superconducting Quantum Computers (OpenSuperQPlus).” 2023. Distributed by The Community Research and Development Information Service. https://cordis.europa.eu/project/id/101113946.

Publications Office of the European Union. Organizations. 2023. Distributed by The Community Research and Development Information Service. https://data.europa.eu/data/datasets/cordis-eu-research-projects-under-horizon-europe-2021-2027?locale=en.

Rancière, Jacques. Dissensus: On Politics and Aesthetics. London: Bloomsbury Publishing Plc, 2010. https://doi.org/10.5040/9781474249966.

Rancière, Jacques. The politics of aesthetics: The distribution of the sensible. Edited by Gabriel Rockhill. London, UK: Bloomsbury Academic, 2022.

Annex 1

Prompt design

Topics vocabulary

We used ChatGPT to transform the table of content for Cluster 4 into a vocabulary for the various topics in all calls.

We got the vocabulary with the following prompt:

I need you to transform this information in a table with the following structure: column 1: "Topic" containing "HORIZON-CL4-2021-TWIN-TRANSITION-01-01" column 2: "Description" containing "AI enhanced robotics systems for smart manufacturing (AI, Data and Robotics - Made in Europe Partnerships) (IA)". When you are ready I will provide you with a PDF containing the information to be transformed.

using GPT-4 on July 2nd 2024.

Extracting technologies

Attempt #1

In order to extract technologies from the objective of each project, we used the following prompt:

Extract all technologies related to [TOPIC DESCRIPTION] in the following text: [PROJECT OBJECTIVE]. Separate each term with a comma.

The prompt uses the topic description as a scope to filter out irrelevant terms by providing a more detailed vocabulary.

With this prompt, we get a result similar to this:

id

plainText

101058523

Laser Beam, AI-enhanced optical beam shaping module, multispectral in-line process monitoring and control system application, PBF-LB/M, additive manufacturing, AI-based solutions, photonics, green manufacturing, laser-based technologies.

Pros: bottom-up approach

Cons: many irrelevant terms

Attempt #2

Using the taxonomy available at euroSciVoc, we used the following prompt to extract technologies:

"Based on this text: [PROJECT OBJECTIVE], categorize the text according to these categories: ['energy and fuels', 'ecosystems', 'climatic changes', 'hydrogen energy', 'sustainability sciences', 'evolutionary biology', 'DNA', 'super resolution microscopy', 'nano-materials', 'nanotechnology', 'tissue engineering', 'stem cells', 'physiology', 'particle physics', 'statistical mechanics', 'statistics and probability', 'coronaviruses', 'sensors', 'alcohols', 'additive manufacturing', 'photocatalysis', 'coating and films', 'antibiotic resistance', 'cell metabolism', 'oncology', 'spintronics', 'data processing', 'computational intelligence', 'enzymes', 'comparative morphology', 'ornithology', 'drug discovery', 'melanoma', 'microfluidics', 'biosensors', 'fibers', 'drug resistance', 'arboriculture', 'forestry', 'databases', 'electric batteries', 'diagnostic imaging', 'virology', 'proteins', 'HIV', 'nutrition', 'electrochemistry', 'cell biology', 'microscopy', 'cardiovascular diseases', 'gene therapy', 'genomes', 'phycology', 'epigenetics', 'molecular biology', 'genetic engineering', 'lipids', 'inflammatory diseases', 'immunology', 'computational science', 'microbiology', 'homeostasis', 'proteomics', 'bacteriology', 'biological interactions', 'artificial intelligence', 'mathematics', 'synthetic biology', 'big data', 'dark matter', 'fermentation', 'inorganic compounds', 'alkaline earth metals', 'mycology', 'grains and oilseeds', 'planets', 'ecology', 'mutation', 'RNA', 'transition metals', 'metallurgy', 'acoustics', 'machine learning', 'genetics', 'nucleic acids', 'cardiology', 'prostate cancer', 'radiology', 'pathophysiology', 'developmental biology', 'wind power', 'desalination', 'public health', 'spectroscopy', 'wastewater treatment processes', 'pollution', 'ebola', 'vaccines', 'renewable energy', 'industrial biotechnology', 'electrolysis', 'biocatalysis', 'soil sciences', 'agriculture', 'simulation software', 'fisheries', 'waste treatment processes', 'freshwater ecosystems', 'plant protection', 'food safety', 'animal husbandry', 'infectious diseases', 'sustainable agriculture', 'hepatology', 'obesity', 'cardiac arrhythmia', 'personalized medicine', 'condensed matter physics', 'fermions', 'gluons', 'algebra', 'polymer sciences', 'crystals', 'soft matter physics', 'multiphysics', 'mathematical model', 'antibiotics', 'animal feed', 'textiles', 'arteriosclerosis', 'laboratory samples analysis', 'electromagnetism and electronics', 'coal', 'asthma', 'air pollution engineering', 'obstetrics', 'reverse osmosis', 'catalysis', 'nanomedicine', 'molecular and chemical physics', 'geometry', 'pathology', 'mass spectrometry', 'neurobiology', 'pandemics', 'invertebrate zoology', 'spacecraft', 'composites', 'multiple sclerosis', 'quantum field theory', 'laser physics', 'electrocatalysis', 'agricultural genetics', 'amines', 'biomaterials', 'solar energy', 'thermochemistry', 'natural language processing', 'cell signaling', 'aldehydes', 'polyurethane', 'manufacturing engineering', 'bioplastics', 'biomass', 'psychiatry', 'parkinson', 'radio technology', 'ultrasound', 'mathematical logic', 'algebraic topology', 'operator algebra', 'confocal microscopy', 'superconductivity', 'crystallography', 'biophysics', 'halogens', 'metabolic engineering', 'structural biology', 'recycling', 'electric energy', 'solid-state physics', 'gravitational waves', 'neutron stars', 'nuclear physics', 'lung cancer', 'biomolecules', 'internet of things', 'volcanology', 'leukemia', 'satellite technology', 'land-based treatment', 'x-ray astronomy', 'galaxy evolution', 'particle accelerator', 'geochemistry', 'cells technologies', 'carbohydrates', 'organic reactions', 'organometallic chemistry', 'leptons', 'colors', 'mobile phones', 'combinatorics', 'fetal medicine', 'embryology', 'alkali metals', 'graphene', 'optics', 'quantum optics', 'photons', 'virtual reality', 'cereals', 'quarks', 'knowledge engineering', 'alzheimer', 'medicinal chemistry', 'drinking water treatment processes', 'biosensing', 'botany', 'epidemics prevention', 'entomology', 'immunotherapy', 'reinforcement learning', 'food allergy', 'signal processing', 'hydrocarbons', 'arithmetics', 'autoimmune diseases', 'organic acids', 'pharmaceutical drugs', 'mammalogy', 'robotics', 'ontology', 'natural sciences', 'fibre optics', 'paleoecology', 'radar', 'hydrology', 'nanophotonics', 'pharmacokinetics', 'magnetic resonance imaging', 'conformal field theory', 'post-transition metals', 'thermodynamic engineering', 'ethnolichenology', 'nuclear medicine', 'bose-einstein condensates', 'cavity optomechanics', 'quantum computers', 'aliphatic compounds', 'health care services', 'malaria', 'epidemiology', 'neurology', 'clinical neurology', 'quantum gases', 'heuristic programming', 'topology', 'two-dimensional nanostructures', 'fuel cells', 'sport and fitness sciences', 'optical sensors', 'ethology', 'woodworking', 'pancreatic cancer', 'el niño', 'troposphere', 'noble gases', 'partial differential equations', 'antivirals', 'software development', 'cognitive neuroscience', 'analytical chemistry', 'atmospheric sciences', 'biological sciences', 'palaeontology', 'ophthalmology', 'smart cities', 'radio astronomy', 'architecture engineering', 'construction engineering', 'chromosomes', 'medical genetics', 'inflammatory bowel disease', 'biosphera', 'medical and health sciences', 'computational fluid dynamics', 'geothermal energy', 'chemical sciences', 'internet', 'optical networks', 'transplantation', 'dietetics', 'nucleotides', 'nuclear engineering', 'natural gas', 'deep learning', 'absorption spectroscopy', 'telecommunications', 'bladder cancer', 'retinopathy', 'allergology', 'stroke', 'functional analysis', 'asteroids', 'optoelectronics', 'organ on a chip', 'liquid crystals', 'cryptography', 'limnology', 'computer and information sciences', 'discrete mathematics', 'giant planets', 'breast cancer', 'waste management', 'water management', 'paleoclimatology', 'glaciology', 'graph theory', 'primatology', 'data mining', 'influenza', 'ocean engineering', 'aeronautical engineering', 'volatile organic compounds', 'bioremediation', 'biocomposites', 'distillation', 'solar thermal', 'natural satellites', 'black holes', 'anatomy and morphology', 'dendrochronology', 'engineering and technology', 'ceramics', 'organic chemistry', 'vascular diseases', 'evolutionary ecology', 'nanoelectronics', 'posttraumatic stress disorder', 'neutrinos', 'astrophysics', 'cosmochemistry', 'supernova', 'implants', 'fluid statics', 'calorimetry', 'thermodynamics', 'root crops', 'pulsed lasers', 'photovoltaic', 'biodiversity conservation', 'electromagnetism', 'multidrug resistance', 'ketones', 'fruit growing', 'vegetable growing', 'quantum physics', 'nanoelectromechanical systems', 'theoretical physics', 'relativistic mechanics', 'semiconductivity', 'pattern recognition', 'mathematical physics', 'dynamical systems', 'L-functions', 'bioreactors', 'irrigation', 'metalloids', 'biofuels', 'emission spectroscopy', 'bioinorganic chemistry', 'infrared astronomy', 'histology', 'aircraft', 'soft robotics', 'cell polarity', 'sleep disorders', 'epilepsy', 'food technology', 'big bang', 'observational astronomy', 'ultrafast lasers', 'biochemistry', 'heredity', 'physiotherapy', 'software', 'rotorcraft', 'sustainable building', 'structural health monitoring', 'freshwater biology', 'telecommunications networks', 'colorectal cancer', 'marine biology', 'electrophoresis', 'supervised learning', 'photochemistry', 'nanocrystals', 'reproductive biology', 'physical geography', 'radiation chemistry', 'ecohydrology', 'atmospheric circulation', 'mining and mineral processing', 'rheumatology', 'space exploration', 'computer vision', 'structural engineering', 'liver cancer', 'mathematical analysis', 'numerical analysis', 'pharmacology and pharmacy', 'childbirth', 'physical sciences', 'celestial mechanics', 'nanocomposites', 'muscular dystrophies', 'classical mechanics', 'sea vessels', 'autonomous vehicles', 'biogeochemistry', 'dementia', 'differential equations', 'hepatitis B', 'eHealth', 'medical biotechnology', 'algebraic geometry', 'lithology', 'commutative algebra', 'string theory', 'energy conversion', 'electron microscopy', 'legumes', 'pneumology', 'electrical engineering', 'gastroenterology', 'nuclear fusion', 'nuclear waste management', 'fluid dynamics', 'vehicle engineering', 'smart sensors', 'physical chemistry', 'environmental sciences', 'other engineering and technologies', 'physical cosmology', 'amyotrophic lateral sclerosis', 'atomic physics', 'remote sensing', 'diabetes', 'kidney diseases', 'natural disasters', 'planetary geology', 'tuberculosis', 'geology', 'scanning tunneling microscopy', 'drones', 'aerospace engineering', 'solid mechanics', 'viticulture', 'isotope geochemistry', 'agricultural biotechnology', 'environmental biotechnology', 'data networks', 'oceanography', 'nonlinear optics', 'aromatic compounds', 'data science', 'mineralogy', 'exoplanetology', 'clinical medicine', 'relational databases', 'transport layer', 'chemical engineering', 'materials engineering', 'home automation', 'polylactic acid', 'video games', 'viral genomes', 'prime numbers', 'heterocyclic compounds', 'surgical procedures', 'software applications', 'health sciences', 'sedimentology', 'hydrogeology', 'hematology', 'cerebrovascular diseases', 'bayesian statistics', 'schizophrenia', 'anxiety disorders', 'bioelectrochemistry', 'plant breeding', 'geographic information systems', 'toxicology', 'food packaging', 'RNA viruses', 'molecular spintronics', 'zoonosis', 'seismology', 'radio frequency', 'control systems', 'ichthyology', 'protein folding', 'electric power transmission', 'meteorology', 'urology', 'speech-language pathology', 'surgery', 'autonomous robots', 'carbon capture engineering', 'expert systems', 'business intelligence', 'urban horticulture', 'wearable medical technology ', 'reproductive medicine', 'bioleaching', 'paediatrics', 'dairy', 'computed tomography', 'microelectronics', 'data protection', 'domestic animals', 'wave power', 'anaesthesiology', 'occupational health', 'petroleum', 'microtechnology', 'automotive engineering', 'sexual health', 'access control', 'peripheral vascular disease', '5G', 'mechanical engineering', 'venereal diseases', 'orthodontics', 'dermatology', 'fodder', 'plasma physics', 'separation technologies', 'computer security', 'horticulture', 'immunisation', 'naval engineering', 'dental implantology', 'cetology', 'operating systems', 'transfer learning', 'mobile network', 'critical care medicine', 'nephrology', 'analogue electronics', 'electrodialysis', 'heat engineering', 'power engineering', 'diagnostic technologies', 'drive by wire', 'supercomputers', 'DNA viruses', 'emergency medicine', 'computer processors', 'chemical process engineering', 'oilseeds', 'other agricultural sciences', 'hydrometeorology', 'bluetooth', '4G', 'electroporation', 'drug safety', 'paediatric cardiology', 'medical engineering', 'computational neuroscience', 'liquid fuels', 'piezoelectrics', 'system software', 'surgical specialties', 'gynaecology', 'otorhinolaryngology', 'quantum chemistry', 'orthopaedics', 'synthetic dyes', 'nuclear energy', 'skin cancer', 'glaucoma', 'LiFi', 'artificial bone', 'civil engineering', 'health care sciences', 'agronomy', 'electric power generation', 'compost', 'water supply systems', 'cartography', 'social biomedical sciences', 'medical laboratory technology', 'head and neck cancer', 'polyhydroxyalkanoates', 'microwave technology', 'invasive species', 'parasitology', 'astrochemistry', 'speleology', 'higgs bosons', 'molecular genetics', 'graft versus host disease', 'non-relational databases', 'plate tectonics', 'history of medicine', 'astronomy', 'computer hardware', 'solar physics', 'electronic engineering', 'extragalactic astronomy', 'swarm robotics', 'drainage basins', 'mesoscopic physics', 'game theory', 'nuclear decay', 'ozone depletion', 'comets', 'climatology', 'molecular engineering', 'complex analysis', 'control engineering', 'lab on a chip', 'unsupervised learning', 'geotechnics', 'geophysics', 'electric power distribution', 'biochemical engineering', 'nano-processes', 'drug allergy', 'eukaryotic genomes', 'meteorites', 'fluorescence lifetime imaging', 'squamous cell carcinoma', 'atmospheric pressure', 'plant cloning', 'geochronology', 'combined heat and power', 'white dwarfs', 'postnatal care', 'subtractive manufacturing', 'veterinary sciences', 'molecular evolution', 'semantic web', 'inorganic chemistry', 'mechatronics', 'molecular neuroscience', 'andrology', 'physical oceanography', 'applied mathematics', 'prokaryotic genomes', 'biological behavioural sciences', 'earth and related environmental sciences', 'solar astronomy', 'solar radiation', 'atmospheric turbulence', 'nursing', 'fluid mechanics', 'earthquake engineering', 'silviculture', 'lubrication', 'microseisms', 'duchenne muscular dystrophy', 'optical astronomy', 'silicene', 'symplectic topology', 'network security', 'digital electronics', 'linear algebra', 'remanufacturing', 'fourier analysis', 'product engineering', 'internet access', 'world wide web', 'environmental engineering', 'x-ray radiography', 'geomorphology', 'coastal ecosystems', 'apidology', 'functional equations', 'information engineering', 'carbon fibers', 'galactic astronomy', 'palynology', 'animal and dairy science', 'climatic zones', 'nuclear fission', 'dendroclimatology', 'gerontology', 'obsessive-compulsive disorder', 'robotic surgery', 'malicious software', 'clinical microbiology', 'family planning', 'water treatment processes', 'satellite radio', 'data exchange', 'hydroelectricity', 'isotope hydrology', 'metamorphic petrology', 'igneous petrology', 'petrography', 'ocean chemistry', 'stellar astronomy', 'tribology', 'port and harbor engineering', 'fossil energy', 'fixed wireless network', 'agricultural sciences', 'protozoology', 'adverse drug reactions', 'cognitive radio', 'coastal and estuarine hydraulics', 'hydraulic engineering', 'endocrinology', 'forensic sciences', 'agriculture', ' forestry', ' and fisheries', 'periodontics', 'zoology', 'astronautical engineering', 'coastal geography', 'edaphology', 'functional morphology', 'bioartificial liver', 'other medical sciences', 'web accessibility', 'odontology', 'cervical cancer', 'landscape ecology', 'behavioural ecology', 'cytology', 'super-Earths', 'gas giants', 'hepatitis C', 'computational creativity', 'industrial crops', 'airport engineering', 'sedimentary petrology', 'cognitive robots', 'tidal energy', 'seismic loading', 'mobile radio', 'compressed sensing', 'hybrid energy', 'transportation engineering', 'synthetic fuels', 'natural resources management', 'continuous glucose monitors', 'asteroseismology', 'concentrated solar power', 'geological engineering', 'phytoremediation', 'renal dialysis', 'virus mutation', 'hydroinformatics', 'tropical medicine', 'knot theory', 'petrology', 'modeling of diseases spread', 'marine energy', 'basic medicine', 'substance abuse', 'apiculture', 'diabetic nephropathy', 'device drivers', 'highway engineering', 'logarithmic functions', 'dendrology', 'fiber-optic network', 'urban engineering', 'meteors', 'sustainable architecture', 'applied mechanics', 'bioprocessing technologies', 'planetary sciences', 'web development', 'radiochemistry', 'ultraviolet lasers', 'internet protocols', 'WiFi', 'marker assisted selection', 'north atlantic oscillation', 'railroad engineering']. Do not introduce new categories, populate the column with one or more categories, separated by comma."

With this prompt, we get this result:

['additive manufacturing', 'industrial biotechnology']

['additive manufacturing', 'nanotechnology', 'sustainability sciences', 'manufacturing engineering', 'composites']

['additive manufacturing', 'sustainability sciences', 'biosensors', 'biofuels', 'recycling', 'manufacturing engineering', 'industrial biotechnology']

['additive manufacturing', 'sustainability sciences', 'industrial biotechnology', 'manufacturing engineering', 'sustainable agriculture']

['additive manufacturing', 'sustainability sciences', 'remanufacturing']

Pros: consistent categories

Cons: the prompt is very long and it requires many tokens

Attempt #3

We decided to use a shorter list of technology categories to classify the data.These technologies are listed by the EC as goals for CL4. We were also interested in these categories so we added them to the list as well: 5G, 6G , extended reality, augmented reality, virtual reality, metaverse, blockchain.

[‘manufacturing technologies’, ‘key digital technologies including quantum technologies’, ‘emerging enabling technologies’, ‘advanced materials’, ‘artificial intelligence’, ‘robotics’, ‘next generation internet’, ‘advanced computing and big data’, ‘circular industries’, ‘low carbon and clean industries’, ‘space including earth observation’, ‘5G’, ‘6G’, ‘extended reality’, ‘augmented reality’, ‘virtual reality’, ‘metaverse’, ‘blockchain’]

We used the same ChatGPT prompt as Try #2 but with the shorter list instead. This resulted in a more consistent output, although a small amount of records contained new categories made up by ChatGPT.

Annex 2

All cleaned datasets used in this project are openly available here: https://drive.google.com/drive/folders/1pUjTzBF8IdWIq0aGfK75w4yA88kEuhJt?usp=drive_link

Topic revision: r2 - 24 Jul 2024, ZuzaWarso
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback