Open access peer-reviewed chapter

Perspective Chapter: Open Science Rejuvenation with AI – The Past, Present and Future Dimensions

Written By

Mayukh Sarkar and Sruti Biswas

Submitted: 19 July 2023 Reviewed: 27 July 2023 Published: 01 December 2023

DOI: 10.5772/intechopen.1003267

From the Edited Volume

Open-Source Horizons - Challenges and Opportunities for Collaboration and Innovation

Laura M. Castro

Chapter metrics overview

86 Chapter Downloads

View Full Metrics

Abstract

The inception of Open Science ideology originated with a vision towards advancing the scientific knowledge with the value of availability, accessibility, reusability, and transparency to democratise complete research cycle across all sectors of society irrespective of any class or community has successively coalesced with various vistas of “Open movement” and also outreached its realm from STEM subjects to the universe of disciplines. The advent of Artificial Intelligence (AI) with machine learning (ML) and its specific specialisations like deep learning (DL), reinforcement learning (RL) and genetic algorithms (GA) enunciate an intelligent, expert, and decision support system revolutionises the contemporary technologies to a newfangled one, providing the most powerful discovery engine for analysis, retrieval, transfer of data, hypothesis/metrics generation, and determining research originality open up new opportunities in the domain of Open Science as well as eroding the commercial interests of the enterprises. The chapter, therefore, portrays the symbiosis of Open Science and AI in the canvases of historical antecedents how it evolving progressively, instigates the AI drivers (ML, DL, RL, and GA) and enablers (natural language processing, computer vision, ontology and knowledge graph) practicable in Open Science, evaluate recent Open Science and AI amends of global confederations.

Keywords

  • Open Science
  • artificial intelligence (AI)
  • machine learning (ML)
  • deep learning (DL)
  • reinforcement learning (RL)
  • genetic algorithms (GA)

1. Introduction

Through the microscopic evaluation of various encyclopaedic, dictionary and scholastic definitions of the term “Science” derived from the Latin word “Scientia”, meaning “knowledge”, refers to the human holistic endeavour to understand the natural and other unexplainable phenomenons of the universe with unique logical method containing hypothesis construction, experimentation, observation, analysis and deriving results, i.e. diverse and varied from one another evolved with the time in order to establish the theory. As Segan [1] rightly mentioned, “A central idea of science is that to understand complex issues (or even simple ones), we must try to free our minds of dogma and to guarantee the freedom to publish, to contradict and to experiment. Arguments from authority are unacceptable.” In the modern dilemma of data and the information-oriented world, the fascinating notion of “Openness”, where “Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness)” as concretely defined by Open Knowledge Foundation [2] from the perspective of data and content. Nevertheless, in reality, it is a multifaceted jargon associated with the global “Open movement” beyond any context-based definitions. Now merging these two distinct concepts, Open Science emerged as a new field of study and became an inextricable component of the “Open movement”, which began when scientific journals were published in the early seventeenth century [3]. The Facilitate Open Science Training for European Research project has defined “Open Science”, as “the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods” [4]. According to European Commission [5], “Open Science is a system change allowing for better science through open and collaborative ways of producing and sharing knowledge and data, as early as possible in the research process, and for communicating and sharing results.” According to UNESCO’s [6] draft recommendation, “Open Science an inclusive construct that combines various movements and practices aiming to make multilingual scientific knowledge openly available, accessible and reusable for everyone, to increase scientific collaborations and sharing of information for the benefits of science and society, and to open the processes of scientific knowledge creation, evaluation and communication to societal actors beyond the traditional scientific community.”

Though the inception of the Open Science movement began with the goal of promoting scientific knowledge in fields like modern STEM (science, technology, engineering, and math) subjects, its true essence has now expanded to encompass a wide range of fields, all under the umbrella term “Open Science.” So, comprehending the standard definitions, Open Science refers to the noble and evolving ideas with a vision towards advancing the scientific knowledge from the realm of STEM subjects to the universe of disciplines with the value of availability, accessibility, reusability, and transparency to democratise complete research cycle across all sectors of society beyond borders, paywalls, intellectual patterns irrespective of any class or community successively coalesced with various vistas of “Open movement” that include Open Access, Open Source, Open Data/FAIR Data/Open Knowledge, Open Research/Methodology, Open Scholarships, Open Peer Review (also includes Open Identities and Open Interactions), Open Metrics/Impact, Open policies, and Open Educational Resources.

In order to understand Artificial Intelligence (AI), first, let us understand the concept of intelligence (preferably human or other sentient creatures), which refers to the “mental quality that consists of the abilities to learn from experience, adapt to new situations, understand and handle abstract concepts, and use knowledge to manipulate one’s environment [7].” According to Poole and Mackworth [8], AI is the specific domain of study syntheses and analyses the “computational agents” (whose activities are described in terms of human or computer-based computations) that “act intelligently” (gets smarter as it goes along, acquiring both the short-term and long-term effects of its activities while making decisions, adapting to different situations and circumstances, and selecting the most appropriate options). So, AI is the evolving agent system simulating human intelligence in machines partially, entirely or goes beyond the intellectual capability of the human brain to acquire and apply all the cognitive functions as referred to in the opening statements of this paragraph, unleash new possibilities in various domains.

The advent of AI with machine learning (ML) and its specific specialisations like deep learning (DL), reinforcement learning (RL) and genetic algorithms (GA) enunciate an intelligent, expert, and decision support system that revolutionises the contemporary technologies to a newfangled one, providing the most powerful discovery engine for analysis, retrieval, transfer of data, hypothesis/metrics generation, and determining research originality opens up new opportunities in the domain of Open Science as well as transforming the commercial interests of the enterprises. The rapid development in AI research and its rising applications in Open Science urges a need for reformation in policy frameworks congruence with global AI policies. The chapter, therefore, portrays the symbiosis of Open Science and AI in the canvases of historical antecedents and how it is evolving progressively, instigates the AI drivers (ML, DL, RL, and GA) practicable in Open Science, and evaluates recent Open Science and AI amends of global confederations (European Commission, UNESCO, COS, OECD) would help policymakers to recommend a holistic policy guideline for amalgamating two notions detouring the ethical issues related to them.

Advertisement

2. Open Science with AI: the beginning

Identifying the origin of AI’s Open Science application is challenging to decipher. Though Open Science and AI developed over a lengthy period, they only came together in the last two or three decades, as reflected through the “Guerilla Open Access Manifesto” [9]. Following that, in the same year, the concept of digital shadow libraries was born and widely practised all over the globe in the form of “Library Genesis” or “LibGen”, an initiative of Russian scientists went online [10]. In 2011, Kazakhstani programmer Alexandra Elbakyan, Aron Swartz’s condign successor, pushed the Open Access campaign to the next level by breaching the paywall shield, extending his alterations to the meaning of accessing academic literature, and creating Sci-Hub [11] (a PHP-coded online application) soon introduced new databases, servers and multiple mirrors able to interlink the databases of LibGen. Considering several obstacles such as legal actions, hackers’ attacks, frequently blocked by ISPs/Operators, and changing domains, both the applications are still functioning, e.g., Sci-Hub’s logs analysis displayed 28 million download requests from September 1, 2015, to February 29, 2016 [12], and more than 1.2 million new records were added to the LibGen database between January 2008 and April 2014 [13]. According to a research analysis of 2750 random samples from 55 databases by Houle [14], the full-text retrieval rate for Sci-Hub and LibGen reported 70% and 69%, respectively, for April 2017. This trend continues into the year 2023, with academics all over the world using Sci-Hub; among 6632 medical students from six Latin American countries (Argentina, Bolivia, Chile, Colombia, Paraguay, and Peru), 10.3% used Sci-Hub at least once a week to consult scientific journals, making it a particularly notable example [15], and the same backed by Ajani et al. [16] by addressing it as “a blessing in disguise to library users.” In addition, according to data generated by Sci-Hub [17] over the past month, the top five nations in terms of downloads are China (45.65 M), the United States (27.28 M), Brazil (7.67 M), India (2.79 M), and Russia (2.79 M).

Besides the guerilla movement for OA to information (recall Aron Swartzs’ [9] famous quote “information is power”), another silent revolution happened to emphasise data (as mathematician Clive Humby [18] used the phrase “data is the new oil”) in the form of ArnetMiner, an intelligent system for the academic community and an AI application to Open Science that intends to extract and mine scholarly social networks in five major steps; (i) auto-extraction of researchers’ profiles and their publication data through web-mining, (ii) integrating them using name identifier, (iii) storing and indexing the data using MySQL and inverted file index, respectively, (iv) data modelling, and (v) executing search results [19].

Advertisement

3. Open Science with AI: the present

The present paradigm of Open Science revolves around different AI-based applications that align with Open Science goals (publishing and disseminating research ideas efficiently, removing linguistic and sharing hindrances, encouraging transparency/originality in research and supporting the collaborative networks), and assisting global academicians through developing various intelligent tools. Now each of these goals can be satisfied through AI, for instance, auto-summarisation, identifying and predicting emerging areas of interest for research (data science and data analytics), generating new book ideas (natural language generation), estimating book performance (predicting the potential essence of the book before it is written), checking language/grammar, editing manuscripts, extracting keyword to sentence/sentence to keywords, managing references, making the scholastic community comfortable to publish and communicate research. Machine-generated books/literature, auto-translation, workflow suggestions (recommendation systems), text-2-speech and video transcriptions (NLG), and sharing the best practices attempt to remove the linguistic and sharing barriers. Checking plagiarism, research integrity, bias recognition, natural language processing, and sentiment analysis promote transparency. Finally, AI supports the collaborative networks by identifying potential authors (BAIT), peer reviewers, editors and editorial board members, matching topics and stakeholders using recommendation systems, and identifying specific communities or researcher groups online using sentiment analysis.

In tandem with the discussed context, preprints and author services driven by the futuristic vision of Open Science are becoming more prominent on enterprise/community-oriented platforms, which have significantly increased their visibility. Research Square (https://www.researchsquare.com/) and its subdivision American Journal Experts (https://www.aje.com/#), is one of the pioneer enterprises that acquired expertise in providing the services described earlier without violating the ethical grounds. Furthermore, IntechOpen (https://www.intechopen.com/) is an example of a significant industrial initiative in support of Open Science, a book and journal publishing house that believes and adheres to the OA/Open Science principles (recognised by the Budapest Initiative, International Association of STM Publishers; ALPSP; COPE; CC; Crossref; OASPA) and is indexed in the prominent platforms (WoS, Scopus, BKCI, BIOSIS Previews, Zoological Record etc.). Another OA journal publishing platform Frontiers Media (https://www.frontiersin.org/) also leveraged Open Science with AI, using a state-of-the-art platform for peer review, semantic algorithms, an extensive reviewer database with an article-level metrics and a researcher profile for high visibility, as well as a streamlined production workflow and a digital editorial office. Because of this trend towards greater openness and access to research ideas among researchers, modern enterprises have reformed their business interests to keep up with the changing surroundings. For example, most leading publishing houses, such as Elsevier, Emerald, Sage, Springer, Taylor & Francis, Willy etc., started providing an OA publishing option to the academicians, which is also considered a dilemma of change towards welcoming the open science movement.

eLife Science Publications Ltd. (https://elifesciences.org/), an online non-profit OA journal publishing house for life sciences and biomedical research, developed a platform for research communication that attempts to speed up the discovery process by using AI. Since 2017 they have designed two noteworthy projects, ScienceBeam [20, 21] and PeerScout [22]. With Apache Beam and TensorFlow, ScienceBeam seeks to unlock the PDF format’s immense store of scientific knowledge and generate a complete XML document using computer vision and natural language processing. PeerScout uses machine learning trained API and NLP to locate relevant peer reviewers from an existing collection based on the qualities of the articles they have reviewed and authored. Besides these two landmarks, NLP and general ML approaches have been used in other projects to analyse citations’ context and sentiment. Their technology and data science team construct their products in an open-source environment and make them available on GitHub under permissive open-source licences [23].

Wang [24] has observed a groundbreaking investigation where the Microsoft research group aimed to scour the Web for research artefacts and pulls the concurrent scholarly information represented through a web-scale heterogeneous entity graph/knowledge graph (known as Microsoft Academic graph or MAG) using AI-powered agents trained in natural language understanding and reinforcement learning [25, 26]. The research empowered Microsoft to launch an analytic, discovery and auto-distribution service called Microsoft Academic Services (MAS), integrating MAG, Azure Storage account, Microsoft Academic Knowledge API and Microsoft Academic Knowledge Exploration Service (MAKES). An improvised research version was released in 2017 under the ODC-BY licence, which interlinked both the MAG and ArnetMiner, known as the Open Academic Graph (OAG), a large-scale linked graph (dataset of 0.7 billion entities and 2 billion relationships); looking forward to investigating citation networks, collaboration, content analysis, and finding the answer to how diverse academic graphs interact with one another. The developers presented a unified LinKG framework with three linking modules (a. venue name matching and sequence encoding, b. hashing technique and convolution neural networks, and c. heterogeneous graph attention network technique) for three entities (a. venue, b. paper and c. author) that can efficiently handle and bypass the challenge of designing a large-scale linked entity graph [27]. The project is still under constant development (OAG v.2.1 released in November 2020), and a new affiliation entity has been incorporated along with the previous entities and generated 16,384, 29,948, 119,384,813, and 1,829,385 linking relations among the two graphs for affiliation, venue, paper and author entity, respectively.

The subsections integrating Open Science with AI under two distinct terms, “drivers” and “enablers”, represent different aspects influencing AI technologies’ development, adoption, and growth. AI drivers are the major factors that trigger AI technologies’ advancement, application, and integration. On the other hand, AI enablers are the factors, essentials, or resources that create an environment conducive to the development, utilisation and support of the successful implementation and growth of AI technologies.

3.1 AI drivers for Open Science

Since the dawn of the new millennium, breakthroughs in AI research fields have seen multiple rounds of rapid advancement; even each of the noteworthy field’s subfields has become a specialised area. From the foundation of AI in the Dartmouth College workshop in 1956, the field got richer in terms of establishing theory and various learning algorithms [28]. It is not an easy undertaking to foster each AI breakthrough from the very foundation, while expansion of Open Science is also in progress. Keeping this inadmissible issue in mind, we have only instigated the AI drivers practicable in Open Science. Here we have mentioned a few that have been used or are still in the experimental pipeline to contribute to Open Science and achieve beyond its goals.

3.1.1 Machine learning (ML)

Machine learning (ML) refers to such computational algorithms, approaches to identify the hypothesis from the vast and complex space of possibilities losing minor data points by simulating human/sentient creature’s intelligence and adopting the environment [29]. This task might carry forward by recognising the patterns in the input data/entities through several learning mechanisms; supervised learning (each training sample/characteristic of input data is linked with its known categorisation label), unsupervised learning (based on the input data used for training, the algorithm determines its own path and moves towards perfections with each attempt) and semi-supervised learning (uses both labelled and unlabeled data, where labelled section can help learn the unmarked part). Further, the ML algorithms are divided into distinct classes depending on the type of input, learning process and genre of the model [30].

3.1.2 Deep learning (DL)

Deep learning (DL) is a subset of ML that relies on artificial neural networks (ANN) to learn several representations simultaneously. DL systems refer to “a class of multi-layered networks capable of automatically learning meaningful hierarchical representations from various structured and unstructured data” on the cutting edge of ML innovation [31]. DL advances have made it possible to construct novel representations, information extraction, and inference speculation from complex data sources like photos, videos, texts, speeches, time series, and other discrete events. Different ANN algorithms such as perception neural network (PNN), back propagation (BP), self-organising network (SON), self-organising map (SOM), and learning vector quantisation (LVQ) work at the backend of a DL structure with its common algorithms namely, restricted Boltzman machine (RBN), deep belief network (DBN), convolutional neural network (CNN), and stacked auto-encoder (S-AE) etc. [32].

3.1.3 Reinforcement learning (RL)

Reinforcement learning (RL), as a specific flavour of ML, deals with an AI-powered agent system’s state at distinct time points and examines situations in which the assumptions made by a hypothesis influence the formation of new data points without accessing the labelled ones. It is different from traditional ML techniques (which use supervised and unsupervised learning mechanisms) and must be able to sense the environment and take action against the entire problem importing equal importance to a goal-directed objective and an uncertain environment [33]. Tabular TD(0), tabular TD(λ), TD(λ) with linear function approximation, every-visit Monte-Carlo, Gradient temporal difference (GTD2), Least-squares temporal difference (LSTD), Least-squares policy evaluation (LSPE), PAC-MDP, SARSA(λ) with linear function approximation are some effective RL algorithms [34].

3.1.4 Genetic algorithms (GA)

Using evolutionary modelling, genetic algorithms (GA) create and alter (artificial evolution) a search algorithm approaching both heuristic and metaheuristic search based on a set of natural selection and genetic mechanisms involving the cycle of initialisation, crossover, mutation, fitness computation, selection, and termination to execute Darwin’s survival of the fittest principles [35, 36]. Basically, GA is an invariant of “evolutionary algorithms” (performing complex optimisation or neural network learning process and replacing the solution with the preferable one with the flexibility to evolve in different situations) and “evolutionary intelligence” (ability to overextend and integrate present intelligence to allow for new ones), a part of the broader field of “evolutionary computations”, focuses on resolving the encoding issues, handling constraints and introducing DNA computing [37, 38, 39].

3.2 AI enablers for Open Science

In addition to above mentioned AI drivers, some enablers can leverage Open Science as future-proof. We will go over a few of the key enablers now:

3.2.1 Natural language processing (NLP)

In computer science and human linguistics, natural language processing (NLP) refers to the machines’ ability to recognise and understand human text language with multi-directional applications such as machine translation, speech recognition, grammatical and semantic analysis, sentiment analysis, text summarisation, information extraction, text and audio generation, text-to-speech and speech-to-text conversion, question-answering system, dialogue system, chatbots, and voicebots [40]. In mid-2017, researchers at Google introduced “Transformers”, a novel architecture combining three potential mechanisms called an encoder-decoder framework, attention, and transfer learning became benchmark research in NLP. The two popular transformers, Generative Pretrained Transformer (GPT) and Bidirectional Encoder Representations from Transformers (BERT), with their latest upgrades like GPT-Neo, and GPT-J or DeBERTa, are now in the mainstream [41].

3.2.2 Computer vision (CV)

The domain of computer vision (CV) deals with studying and developing specific algorithms that enable computers to identify, process, analyse, and interpret digital objects with visual properties such as images/videos. As rightly mentioned by Bekhit [42], “if AI enables computers to think, computer vision enables them to see, observe, and understand.” The entire methodology is carried out in multifarious stages using several techniques, scilicet, (i) image preprocessing, which includes grayscale manipulation, edge enhancement and detection, noise removal, image restoration, interpolation, (ii) image segmentation, (iii) image processing includes feature extraction, texture analysis, pattern recognition, and (iv) image classification. Currently, the field is rapidly switching from conventional algorithms such as Scale-Invariant Feature Transform (SIFT) and Speeded-Up Robust Features (SURF) to deep learning and augmented reality-based computer vision.

3.2.3 Ontology

The foundational notion of the term “Ontology” in the Computer Science and AI research field obeyed a standard definition, “an ontology is a formal, explicit specification of a shared conceptualization”, where each facet has its unique attributes [43]. Until now, Ontologies have been categorised into four significant types depending on the intents and distinct granularity levels: Top-level/foundational ontology, Domain ontology, Task ontology and Application ontology [44]. As a whole, ontology research empowers the knowledge/ontology engineering branch by allowing them to discover how concepts exist, are linked together, and are used to reduce overall system costs by enhancing efficiency or quality. Ontology with ML contributes to software engineering, data analysis (capturing complexities between entities and relationships using automated information extraction), making recommender systems/portals or ontology-extended browsers, and providing computational support for data.

3.2.4 Knowledge graph (KG)

The idea of a knowledge graph (KG) getting lots of recognition from the global AI research community as well as the leading enterprises; as directed to Bagchi [45], “Knowledge graphs is the go-to solution for populating, reasoning and visualising knowledge domains in recent semantic information systems”, or more specifically, “a graph of data intended to accumulate and convey knowledge of the real world, whose nodes represent entities of interest and whose edges represent relations between these entities” [46]. KG has become a specialised interdisciplinary domain of study that uses graph theory, mathematical logic and reasoning, human-machine interaction, knowledge and data representation, and cognitive and semantic modelling, and constantly evolves using ML and DL algorithms trends to archive beyond representation. As the field is still under development, the definitions are contentious, and from the perspective of construction, two types of KGs have been explored by the researchers till now, (i) “data level KGs” or “entity graphs (EGs)” and (ii) “schema level KGs” or “entity type graphs (ETGs)” [47].

Advertisement

4. Open Science with AI: the promising future

There are different schools of thought (Public School, Democratic School, Pragmatic School, Infrastructure School, and Measurement School) have seen and defined open science from different perspectives [48], its core foundation embedded in the formulation of such robust standards and sustainable policy frameworks in order to accomplish the desired infrastructure for academic freedom, collaboration and practices and overcoming the challenges of legal, technological, cultural, and ideological transformation. In the 41st General Conference of United Nations Educational, Scientific and Cultural Organisation [6], the advisory committee issued a draft recommendation on Open Science comprising a consistent definition of open science, four core values and six guiding principles, and action guidance under seven key areas. Based on the “online information meeting on Implementation of the UNESCO Recommendation on Open Science” [49], ad hoc working groups highlighted the role of Open Science in scientific prosperity, especially as a critical accelerator for the implementation of all of the Sustainable Development Goals (SDGs) [50], as well as bridging the science, technology, and innovation gaps between and within countries, and pointed out four key pillars such as Open scientific knowledge, Open Science infrastructure, Open engagement of societal actors, and Open dialogue with other knowledge systems and addressed the challenge of achieving them. They are also developing a series of supporting tools (Open Science Toolkit+Open Science Info App) [51, 52], technical briefs, fact sheets, and guidelines that are easy to use, reuse, expand, and update and accessible to all. AI plays a game-changing role in attaining all four key pillars and strategic implementation of the recommendations, including monitoring/analysing the progress and, most importantly, making the technical dreams a reality.

European Commission [5] monitors and equips trends, data and indicators related to global Open Science advances and functions with two expert groups (Open Science Policy Platform and expert group on indicators) now became an integrated part of the “Horizon Europe Programme of 2021 for research and innovation continues developing the Open Science policy framework under the section dubbed “8 ambitions of the EU’s open science policy”, also highlighted by the League of European Research Universities’ “eight pillars of Open Science” [53] as correctly identified by Bagchi [54]. All eight ambitions and UN SDGs are a component of Horizon Europe, including Research Infrastructure with the European Open Science Cloud (EOSC), Marie Sklodowska-Curie Actions, and the Open Research Europe (ORE) publishing platform, which all have AI implications. Regarding this, OpenAIRE (https://www.openaire.eu/), a socio-technical infrastructure/legal entity, supports EC and European Open Science mandates by taking care of its policy alignment through the National Open Access Desks network (NOADs) and building several cutting-edge services through Open Science Graph in the backend, which connects worldwide infrastructures and networks to disseminate open research findings [55]. The Novel EOSC Services for Emerging Atmosphere, Underwater and Space Challenges [56], an ambitious initiative with AI services for Open Science, was introduced to develop an additional application/resolving the existing issue(s) within EOSC. IntelComp’s (https://intelcomp.eu/) revolutionary Cloud Platform delivers AI-based services or “Policy Intelligence” to EU’s public administrators and policymakers for data and evidence-driven policy creation in Science, Technology, and Innovation (STI) policy. Under the leadership of partner “Spanish Secretary of State for Digitalization and Artificial Intelligence” (SEDIA), the Project examines the co-development and Platform’s usage for AI R&D intelligence gathering and the industry’s AI credentials.

Center for Open Science (COS), aimed at reducing the waste and discovering knowledge solutions and cures for the world’s most pressing needs, offers the Open Science Framework (OSF) [57], an open-source and freely accessible project management tool for open, reproducible, and trustworthy research practices in all stages of the research life cycle. The OSF infrastructure supports cultural shifts by enabling rigour and transparency across the research life cycle. It has mastered the art of reproducibility and preregistration, which is the process of developing research questions and an analysis strategy before viewing the study’s findings [58]. Besides these, it also has cloud-based archival solutions, preprint vaults, integration of local repositories, collection management through customisable filtering, and taxonomies with robust discovery and retrieval. The validity of AI discoveries depends on reproducible experiments and Open Science with FAIR principles [59] of distributing data, software, and other scientific resources in public repositories under liberal licences, which would be advantageous for the AI research community.

The Organisation for Economic Co-operation and Development [60] revised its policy framework in order to incorporate new technologies and guiding principles reflected in all seven key areas, namely Data governance for trust, Technical standard and practices, Incentives and rewards, Responsibility, ownership and stewardship, Sustainable infrastructure, Human capital and International cooperation for access to research data. According to the January 2021 mandate, the revised template entitled “Recommendation of the Council concerning Access to Research Data from Public Funding” in which the expanded scope covers research data, metadata, bespoke algorithms, workflows, models, software and code.

Advertisement

5. Discussion

The synergy between AI and open science has the potential to transform the way scientific research is conducted, disseminated, and applied since it draws from both the cognitive learning processes of individuals and the learning processes of enterprises during open innovation.

Open science and AI can work together:

  1. Data analysis and interpretation: AI can assist in processing and analysing large datasets quickly and efficiently. Researchers can use AI algorithms to identify patterns, trends, and correlations within vast amounts of data, leading to faster insights and discoveries.

  2. Automated experimentation: AI-powered bots and systems can conduct experiments autonomously, increasing the efficiency of data collection and minimising human errors. Researchers can focus more on designing experiments and interpreting results.

  3. Text and literature analysis: AI-driven NLP can sift through vast amounts of scientific literature, extracting relevant information, summarising articles, and identifying gaps in research. This enhances researchers’ ability to stay updated and build on existing knowledge.

  4. Collaboration and crowd science: Open science promotes collaboration among researchers globally. AI-powered platforms can facilitate sharing of data, methodologies, and findings, enabling researchers to collaborate on a larger scale.

  5. Reproducibility and transparency: AI can help ensure the reproducibility of research results by providing detailed documentation of analysis methods and codes used in studies. This enhances transparency and trust in research outcomes.

  6. Data sharing and accessibility: AI can aid in organising and structuring research data, making it more accessible and reusable. This contributes to the overall integrity and validity of scientific findings.

  7. Education and outreach: AI can facilitate the creation of interactive educational materials, simulations, and virtual labs, making scientific concepts more engaging and accessible to learners of all ages.

AI-powered tools have the potential to both positively and negatively impact the quality of science. While AI can enhance various aspects of scientific research and discovery, it can also introduce challenges and concerns that need to be carefully managed. For instance, the attributes mentioned above can positively affect the quality of science in terms of efficiency, speed, data analysis, pattern recognition, reproducibility, and hypothesis generation. But it is necessary to take into account the potential negative impacts and concerns:

  1. Bias and fairness: Biases in the training data might be inherited by AI algorithms and result in unreliable predictions. If not properly addressed, these biases can affect the quality and fairness of research results.

  2. Data overfitting: AI models, if not carefully designed and validated, can overfit the training data, leading to results that do not generalise well to new data. This can undermine the validity of research findings.

  3. Lack of interpretability: Deep learning models, in particular, can be challenging to interpret. This can make it difficult to understand the reasoning behind AI-generated results, potentially affecting the trustworthiness of those results.

  4. Data privacy and security: AI often requires access to large, sensitive datasets. Ensuring the privacy and security of this data while still conducting meaningful research can be challenging.

  5. Dependency on technology: Overreliance on AI-powered tools might reduce the emphasis on critical thinking and traditional scientific methodologies, potentially undermining research rigour. For example, AI-powered text generators (OpenAI’s GPT-3, Google’s ChatGPT, Microsoft’s XiaoIce), image generators (OpenAI’s DALL-E, Google’s DeepDream, NVIDIA’s StyleGAN, StyleGAN2, BigGAN) became so prevalent that enterprises have begun to prioritise these over the human brain.

  6. Loss of serendipity: AI may prioritise known patterns and trends, potentially reducing the likelihood of serendipitous discoveries that arise from creative thinking and unexpected observations.

  7. Ethical considerations: The use of AI in research raises ethical concerns related to consent, transparency, and the potential for unintended consequences. Such issues may occur in the form of digital divides (exacerbate existing disparities in access to information, resources, and opportunities), invisibilisation of certain groups (reinforce dominant norms and exclude marginalised voices and perspectives), and discrimination of minorities (bias present in training data) because of the nature of algorithmic transparency and accountability. Therefore, ensuring ethical research practices is crucial.

It’s vital to remember that the degree to which AI improves scientific quality is not solely determined by the technology itself but also by how researchers, institutions, and the broader scientific community use and integrate AI into their workflows. It is essential to strike a balance between automation and human oversight, adopt rigorous validation practises, promote transparency, and continuously address ethical and bias-related challenges, ethical considerations, data privacy, bias mitigation, and standardisation of practises in order to fully leverage the synergy between AI and open science and mitigate potential negative impacts and harness the positive potential of AI. Nevertheless, the combined power of AI and open science has the potential to accelerate the pace of discovery, democratise access to knowledge, and foster a more collaborative and innovative scientific community.

Advertisement

6. Conclusions

Open Science, as the pristine engine of prosperity (viewed as better science), developed and improved through continuous research (technological boon making it a reality) and levelling up towards the hope of wisdom of which the homo sapience culture must acquire for its survival. The last two decades have seen a staggering increase in data, which, in tandem with the world’s changing technology, particularly in the form of AI, puts the greatest challenge of our time to one side and, on the other hand, opens up new potential to build a continuous knowledge base. Throughout the study, we have shown numerous use case scenarios of how Open Science symbioses with AI accelerate the discovery of knowledge, collaboration, and solutions to achieve them, as well as address the potential threats and ethical concerns that require a multi-faceted approach involving cooperation among governments, tech companies, researchers, ethicists, and affected communities to ensure that AI technologies are developed and deployed in ways that respect human rights, promote fairness, and avoid exacerbating existing inequalities. One thing that needs special mention here is that AI plays a significant role in executing Open Science practices, and two distinct notions with multifarious attributes amalgamating, future work on recommending holistic policy guidelines must be carried out to detour the ethical issues related to them.

Advertisement

Acknowledgments

It is the result of a personal research endeavour that led to this work.

Advertisement

Conflict of interest

The authors declare no conflict of interest.

References

  1. 1. Segan C. Billions and Billions: Thoughts on Life and Death at the Brink of the Millennium. New York: Ballantine Books; 1997
  2. 2. Open Knowledge Foundation. The Open Definition [Internet]. 2022. Available from: https://opendefinition.org/
  3. 3. Hanwell MD. What is Open Science [Internet]. 2022. Available from: https://opensource.com/resources/open-science#:∼:text=Open%20science%20arguably%20began%20in,such%20as%20the%20Royal%20Society
  4. 4. FOSTER. Open Science Definition [Internet]. 2022. Available from: https://www.fosteropenscience.eu/foster-taxonomy/open-science-definition
  5. 5. European Commission. Open Science [Internet]. 2022. Available from: https://ec.europa.eu/info/sites/default/files/research_and_innovation/knowledge_publications_tools_and_data/documents/ec_rtd_factsheet-open-science_2019.pdf
  6. 6. UNESCO. Draft Recommendation on Open Science [Internet]. 2021. Available from: https://unesdoc.unesco.org/ark:/48223/pf0000378841
  7. 7. Britannica. Human Intelligence [Internet]. 2022. Available from: https://www.britannica.com/science/human-intelligence-psychology
  8. 8. Poole DL, Mackworth AK. Artificial Intelligence: Foundations of Computational Agents. 2nd ed. Cambridge: Cambridge University Press; 2017
  9. 9. Swartz A. Guerilla Open Access Manifesto [Internet]. 2008. Available from: https://archive.org/download/GuerillaOpenAccessManifesto/Goamjuly2008.pdf
  10. 10. Bodó B. The genesis of library genesis: The birth of a global scholarly shadow library. In: Karaganis J, editor. Shadow Libraries: Access to Knowledge in Global Higher Education. Cambridge: The MIT Press; 2018. pp. 25-51
  11. 11. Sci-Hub. Elbakyan [Internet]. 2023. Available from: https://sci-hub.se/alexandra#works
  12. 12. Elbakyan A, Bohannon J. Data from: Who’s downloading pirated papers? Everyone [dataset]. Dryad. 2021. DOI: 10.5061/dryad.q447c
  13. 13. Bodó B. Library genesis in numbers: Mapping the underground flow of knowledge. In: Karaganis J, editor. Shadow Libraries: Access to Knowledge in Global Higher Education. Cambridge: The MIT Press; 2018. pp. 53-77
  14. 14. Houle L. Sci-Hub and LibGen: What if... why not? In: IFLA World Library and Information Congress 2017 – Wrocław, Poland – Libraries. Solidarity. Society. Gdansk. 2020. Available from: http://library.ifla.org/id/eprint/1892/1/S12-2017-houle-en.pdf
  15. 15. Valladares-Garrido MJ et al. Association between the use of Sci-Hub and consultation of scientific journals by medical students from six Latin American countries: A secondary analysis. Heliyon. 2023;9(e17868):1-11. DOI: 10.1016/j.heliyon.2023.e17868
  16. 16. Ajani YA, Tella A, Okere S. Access to full-text documents in libraries via Sci-Hub: A blessing in disguise to library users. Library Hi Tech News. 2023:1-4. DOI: 10.1108/LHTN-03-2023-0053 [Ahead-of-print]
  17. 17. Sci-Hub. Stats [Internet]. 2023. Available from: https://sci-hub.se/stats
  18. 18. Palmer M. Data Is the New Oil [Internet]. 2006. Available from: https://ana.blogs.com/maestros/2006/11/data_is_the_new.html
  19. 19. Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z. ArnetMiner: Extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data. New York: Association for Computing Machinery; 2008. pp. 990-998. DOI: 10.1145/1401890.1402008
  20. 20. Ecer D, Maciocci G. ScienceBeam - Using Computer Vision to Extract PDF Data [Internet]. 2017. Available from: https://elifesciences.org/labs/5b56aff6/sciencebeam-using-computer-vision-to-extract-pdf-data
  21. 21. Github. Elifeciences/sciencebeam-parser [Internet]. Available from: https://github.com/elifesciences/sciencebeam-parser
  22. 22. Github. Elifeciences/peerscout [Internet]. Available from: https://github.com/elifesciences/peerscout/
  23. 23. Ecer D, Shannon P. AI for automation and influence in open science publishing. In: Implementing AI. London: Artificial Intelligence Conference; 2018. Available from: https://conferences.oreilly.com/artificial-intelligence/ai-eu-2018/public/schedule/detail/70119.html
  24. 24. Wang K. Opportunities in open science with AI. Frontiers in Big Data. 2019;2(26):1-4. DOI: 10.3389/fdata.2019.00026
  25. 25. Sinha A, Shen Z, Song Y, Ma H, Eide D, Hsu B, Wang K. An overview of Microsoft academic service (MAS) and applications. In: Proceedings of the 24th International Conference on World Wide Web. New York: Association for Computing Machinery; 2015. pp. 243-246. DOI: 10.1145/2740908.2742839
  26. 26. Wang K, Shen Z, Huang C, Wu C, Eide D, Dong Y, et al. A review of Microsoft academic services for science of science studies. Frontiers in Big Data. 2019;2(45):1-16. DOI: 10.3389/fdata.2019.00045
  27. 27. Zhang F, Liu X, Tang J, Dong Y, Yao P, Zhang J, et al. OAG: Toward linking large-scale heterogeneous entity graphs. In: Proceedings of the Twenty-Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery; 2019. pp. 2585-2595. DOI: 10.1145/3292500.3330785
  28. 28. McCarthy J. What Is a Artificial Intelligence [Internet]. 2007. Available from: http://jmc.stanford.edu/articles/whatisai/whatisai.pdf
  29. 29. Kubat M. An Introduction to Machine Learning. 3rd ed. Cham: Springer; 2021
  30. 30. El Naqa I, Murphy MJ. What is machine learning? In: El Naqa I, Li R, Murphy MJ, editors. Machine Learning in Radiation Oncology: Theory and Applications. Cham: Springer; 2015
  31. 31. Wani A, Khoshgoftaar TM, Palade V. Deep Learning Applications. Vol. 2. Singapore: Springer; 2021
  32. 32. Zhang C, Lu Y. Study on artificial intelligence: The state of the art and future prospects. Journal of Industrial Information Integration. 2021;23:100224. DOI: 10.1016/j.jii.2021.100224
  33. 33. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. 2nd ed. Cambridge: MIT Press; 2018
  34. 34. Szepesvári C. Algorithms for Reinforcement Learning [Internet]. 2019. Available from: https://sites.ualberta.ca/∼szepesva/papers/RLAlgsInMDPs.pdf
  35. 35. Gridin I. Learning Genetic Algorithms with Python. New Delhi: BPB Publications; 2021
  36. 36. Kramer O. Genetic Algorithm Essentials. Cham: Springer; 2017
  37. 37. Kotyrba M, Volna E, Habiballa H, Czyz J. The influence of genetic algorithms on learning possibilities of artificial neural networks. Computers. 2022;11(5):70. DOI: 10.3390/computers11050070
  38. 38. Tao J, Zhang R, Zhu Y. DNA Computing Based Genetic Algorithm: Applications in Industrial Process Modeling and Control. Singapore: Springer; 2020
  39. 39. Yu X, Gen M. Introduction to Evolutionary Algorithms. London: Springer-Verlag; 2010
  40. 40. Patel AA, Arasanipalai AU. Applied Natural Language Processing in the Enterprise: Teaching Machines to Read, Write, and Understand. Beijing: O’Reilly; 2021
  41. 41. Tunstall L, von Werra L, Wolf T. Natural Language Processing with Transformers: Building Language Applications with Hugging Face. Beijing: O’Reilly; 2022
  42. 42. Bekhit AF. Computer Vision and Augmented Reality in iOS: OpenCV and ARKit Applications. New York: Apress; 2022
  43. 43. Studer R, Benjamins R, Fensel D. Knowledge engineering: Principles and methods. Data & Knowledge Engineering. 1998;25(1–2):161-197. DOI: 10.1016/S0169-023X(97)00056-6
  44. 44. Staab S, Studer R, editors. Handbook on Ontologies. 2nd ed. Dordrecht: Springer; 2008
  45. 45. Bagchi M. A large-scale, knowledge-intensive domain-development methodology. Knowledge Organization. 2021;48(1):8-23. DOI: 10.5771/0943-7444-2021-1-8
  46. 46. Hogan A et al. Knowledge Graphs [arXiv]. 2021. Available from: https://arxiv.org/pdf/2003.02320.pdf
  47. 47. Giunchiglia F, Bocca S, Fumagalli M, Bagchi M, Zamboni A. iTelos - Building Reusable Knowledge Graphs [arXiv]. 2021. Available from: https://arxiv.org/pdf/2105.09418.pdf
  48. 48. Fecher B, Friesike S. Open science: One term, five schools of thought. In: RatSWD Working Paper. Vol. 218. 2014. Available from: https://www.econstor.eu/bitstream/10419/75332/1/746340028.pdf
  49. 49. UNESCO. Online Information Meeting on Implementation of the UNESCO Recommendation on Open Science [Internet]. 2022. Available from: https://www.youtube.com/watch?v=Yw9U4mwGVTE
  50. 50. United Nations. Sustainable Development Goals [Internet]. Available from: https://sdgs.un.org/goals
  51. 51. UNESCO. Open Science Toolkit [Internet]. Available from: https://www.unesco.org/en/open-science/toolkit
  52. 52. UNESCO. UNESCO index of Open Science Knowledge Sharing Platforms [Internet]. Available from: https://www.unesco.org/en/open-science/knowledge-sharing
  53. 53. LERU. Open science and its role in universities: A roadmap for cultural change. Advice Paper, 24. 2018. pp. 1-32. Available from: https://www.leru.org/files/LERU-AP24-Open-Science-full-paper.pdf
  54. 54. Bagchi M. Open science for an open future. In: Madalli DP, Prasad ARD, editors. Proceedings of the International Conference on Exploring the Horizons of Library and Information Sciences: From Libraries to Knowledge Hub. Bangalore: Documentation Research and Training Centre, Indian Statistical Institute; 2018. pp. 422-431
  55. 55. Manola N, Rettberg N, Manghi P, Mertens M, Schmidt B, Steiner T, et al. Achieving Open Science in the European Open Science cloud: Setting out OpenAIRE’s vision and contribution to EOSC. OpenAIRE MAKE. 2019. DOI: 10.5281/zenodo.3610132
  56. 56. NEANIAS. AI Services for Open Science [Internet]. 2021. Available from: https://www.neanias.eu/images/neanias/Articles/202102_WP4_AI_Services_for_Open_Science.pdf
  57. 57. Center for Open Science. OSF [Internet]. Available from: https://osf.io/4znzp/
  58. 58. Nosek BA, Ebersole CR, DeHaven AC, Mellor DT. The preregistration revolution. PNAS. 2017;115(11):2600-2606. DOI: 10.1073/pnas.1708274114
  59. 59. GO FAIR. FAIR Principles [Internet]. Available from: https://www.go-fair.org/fair-principles/
  60. 60. OECD. Recommendation of the council concerning access to research data from public funding, OECD/LEGAL/0347. 2022. Available from: https://legalinstruments.oecd.org/api/print?ids=159&lang=en

Written By

Mayukh Sarkar and Sruti Biswas

Submitted: 19 July 2023 Reviewed: 27 July 2023 Published: 01 December 2023