Open access peer-reviewed chapter - ONLINE FIRST

Paper Recommender System Using Big Data Tools

Written By

Nasrin Jokar, Mehra Esfandiari, Shima Aghamirzadeh and Hossein Hatami

Submitted: 09 October 2022 Reviewed: 23 November 2022 Published: 26 December 2022

DOI: 10.5772/intechopen.109136

Optimization Algorithms - Classics and Recent Advances IntechOpen
Optimization Algorithms - Classics and Recent Advances Edited by Mykhaylo Andriychuk

From the Edited Volume

Optimization Algorithms - Classics and Recent Advances [Working Title]

Dr. Mykhaylo I. Andriychuk and Dr. Ali Sadollah

Chapter metrics overview

119 Chapter Downloads

View Full Metrics

Abstract

To face the problem of information overload, digital libraries, like other businesses, have used recommender systems and try to personalize recommendations to users by using the textual information of papers. This textual information includes title, abstract, keywords, publisher, author and other similar items. Since the volume of papers is increasing day by day and recommender systems do not have the ability to cover this huge volume to process papers according to the user’s tastes, that is why we need to use our papers to cover and process this volume quickly. We have big data tools, which will offer relevant recommendations by running parallel processing. In this chapter, the researches and researches of researchers in the field of recommender systems/aware of the text of scientific papers and recommender systems have been discussed.

Keywords

  • paper recommender system
  • personalization
  • textual information
  • big data
  • extracting user interests

1. Introduction

Scientists usually rely on sharing their research results to obtain peer, research and financial opportunities. The most common approach to share knowledge is publication. In the past decades, academic publications have had significant changes since electronic format replaced with printing format. The advent of Internet makes the possibility for electronic publishers to publish more research in the area. Hence, the distribution of publications has generally become simpler and the widespread publication of the Web has increased at higher levels of publications. Researchers often share their research through the web, journal, papers, conference and presentations currently. Researchers determined three types of websites for distribution and publication of scientific papers: personal, institutional and social. Personal and institutional websites are created, respectively, by scholars and academic institutions to provide academic information. Social web sites allow researchers to automatically submit papers and research results; for example: LinkedIn, Academia.edu, Research Gate, Google Scholar [1].

Many researchers publish scientific papers without time and place restrictions. Users find themselves in front of hundreds of thousands of scientific-research papers and choose a sample of them according to their tastes and needs. But when users are encountered with a huge amount of information, the selection problem arises and searching among this space takes a lot of time. Recommender systems are effective options to direct and guide the user, among a huge amount of possible choices to obtain the useful and favorite option. Recommender systems for research papers have become increasingly popular. In the last 14 years, more than 170 research papers, patents, web pages, etc. have been published in this area. Recommender systems are useful applications for research papers, for example, they help researchers who are pursuing their research field [2].

Nowadays, there are many approaches for scientific recommendation on such papers [3], which most of them are based on two content-based (CB) algorithms and Collaborative refinement, but there are two other methods, knowledge-based (KB) algorithm and combination method that can be a combination of three other methods. The main idea of the CB methods in the paper recommendation systems, the user’s interest and priority of study that has previously studied, is specified by the content of scientific research. In this way, although the focus is on finding similarities between items or scientific papers, the collaborative refinement methods perform as extraction and using similar user’s feedback without analyzing the content and focus on similarity between users. These approaches have been introduced and evaluated in more than 170 research papers, as well as patents, lectures and blogs [3].

Papers recommendation systems are content based and reference-based that return results by keywords. Contextual-based recommendations are usually carried out using specific keywords provided to search engines such as CiteSeer and google scholar [4]. The referral-based recommendations is aimed to help authors in selecting most relevant papers for referring to the potential number of resources, that referencing relevance depends on the content of the paper; For referral, the contextual next to the bracket ([]) is used, in such a way that a certain number of words are used before and after referrals, few used sources demonstrated by the recommendation engine in the predefined [5]. In referral recommendation, the concept of referrals and the content of papers (documents), there is a completely different nature, and the translation model can be a bridge between these gaps.

The Web-based recommendation systems are highly interested because personalized services are important factors for shopping malls and Internet service providers. Researchers and academics save their research information and others’ on the Internet and share them among each other and eventually organize them. In the references [6, 7], a personalized recommendation system has been introduced that increases the accuracy of research papers, and it can be a specific, feasible and applicable domain for developing recommendation system.

With regard to the fact that in a system, users usually share different domains of papers that they have specific preferences for specific domains, in reference [8], it is assumed that the user interests can be extracted from their behavior on the site, which is calculated (when they visit pages). By collecting and analyzing user’s behavior, they have approached user’s interests and created a personalized recommendation model that selects nominated papers for recommendation. Their experiments demonstrated that the user-behavioral recommendation model improves the accuracy of the recommendation. If users have just registered or they are not active or even they have little behavioral information, the system can’t receive knowledge about their priorities. Also, for implementation of user-centered personal recommendation, the system should track the behavior of the user to collect information that includes user’s searches, scope of work, published papers as well as friends or people associated with them in this field.

Context-aware recommender systems are systems that try to provide each user with suggestions based on performance, personal tastes, and user behaviors and depending on the fields in which they are used. To help in the decision making process. Background information extracted for digital libraries (DLS) are classified into three main groups of users, documents and environment [9]. Users’ textual information includes user profile, user type, purpose, activity/work, knowledge/skills, social networks and home page information, logs, information behavior and level of information need. For example, it means the type of user, student, faculty member, researcher or researcher, or the level of information need means whether the user needs it urgently or not. Each paper has special features that are different from other papers. These features can be considered as the background information of the paper. Background information includes bibliographic information, references between papers and popularity of papers. Contextual information also includes the location of items in the library or geographical location, time and type of service.

The bibliographic information of the papers includes title, number or abstract International Standard Serial Number (ISSN), keywords, author name, publisher, publication time, paper classification, paper main text, paper format, language, URL and paper status in terms of rank [9]. In the researches [2, 10, 11, 12], the information of the title, abstract, keywords and the main text of the papers have been used to find the most similarity of the user’s query to the data set, the researches [7, 13, 14, 15]. It has used the text of the paper for recommendation Research [7, 8] has used keywords for recommendation. Research [16] has used the title of the papers to create a profile of the papers to categorize them, in addition to using the title, the author’s name is also used for categorization, but since in each research the researcher has access to all the papers published in does not have the world, this feature is used in handfuls, which Sun et al. [13] have used this method.

Reference background information between papers includes papers cited by the paper or papers, which researches [4, 14, 17, 18] have used the reference of papers in the method presented in their papers. The background information of paper popularity includes the most popular papers in the week/month/year, which only research [5] used this feature. In the existing dataset, such information is very rare and cannot be manipulated much on this feature. Since the data set used is limited to a few journals, only some bibliographic background information of the paper has been used.

Text-based and citation-based paper recommender systems return results. Text-based recommendations are often made of specific keywords provided to search engines such as Citeseer, IEEE, and Google Scholar [4]. In citation-based recommendations, the citation recommendations is aimed to help authors select the most relevant papers for citation from a potential number of references, where the citation relevance depends on the paper content. The placeholder text shown in the form of brackets "[ ]" is used as a context for the reference, in that a certain number of words are used before and after the reference, which are used in the pre-definition of a small number of sources. Which is used by the recommender engine [5].

Since the volume of papers is increasing day by day and recommender systems do not have the ability to cover this huge volume to process papers according to the user’s tastes, that is why we need to use our papers to cover and process this volume quickly. We have big data tools, which will offer relevant recommendations by implementing parallel processing (map-reduce programming).

Advertisement

2. A review on paper recommender systems

In this section, we have examined some of the researches and researches of researchers in the field of recommender systems that are aware of the text of scientific papers and recommender systems, also the researches that are aware of the text have also been used. These researches include a systematic review on text-aware recommender systems, recommending papers for reference to draft papers, personalized recommendations, extracting user interests and behavior and analyzing them for personalized recommendations and extracting the domain of papers, as well as the recommender system of the papers is divided into the big data platform, that show in Table 1,which will be explained.

MethodUsed byDescription
Text-aware recommender systems[10]The background information (context-aware) of articles and users is used to make recommendations
Cite to draft papers[4, 10, 11, 14, 19, 20]Citation recommendation systems generate a list of relevant articles to cite in a particular text.
Personalized recommendations[6, 7, 16, 21, 22]The personalized recommendation system based on the query and the text of the articles offers recommendations that increase the accuracy of finding research articles
Extracting user interests and behavior[8, 22, 23]This recommendation system extracts the user’s interests based on the user’s behavior and suggests recommendations according to the user’s interests
Extracting the domain of papers[10, 13, 24, 25, 28, 29, 30, 31]Extracting the domain of articles and creating profile of articles for recommendation
Big data platform[30, 31]They are implemented using big data system platforms, which are suitable for systems with huge data

Table 1.

Overview of recommender systems surveyed.

2.1 Text-aware recommender systems

Recommender system is necessary as a suitable tool to facilitate and accelerate dynamic information processing due to the significant increase of data in DLS. In recent years, recommender systems use the description of users’ situational information such as location, time, work, etc. in order to make more relevant and personalized recommendations, and to predict accurate recommendations to users in a specific domain such as DLS, It is essential to understand and exploit the relevant conditions of the users, which leads to the creation of an intelligent recommendation.

Reference [10] is a systematic review on research text-aware recommender systems, the purpose of this study is to conduct a literature review on recommender systems for academic DLS for the following:

  1. Identification of the textual information adopted to build the recommender in the academic DLS.

  2. To identify the methods that have been used to adopt textual information for making recommenders in academic DLS.

  3. How has the relevance of textual information in recommender systems for the academic domain previously used by researchers been discovered and understood.

The main contribution of this research is to review past studies in order to discover textual information that is effective on recommendations in the domain of academic DLS. They reviewed 82 papers published from 2001 to 2013 to identify background information and methods used to develop scientific recommendations. The authors of the paper have conducted this review in three phases to identify textual information: the first phase: planning review, the second phase: conducting the review, and the third phase: review report. Finally, the results of their investigation showed that the contextual or textual information extracted for recommender systems in academic DLS is classified into three main groups, which include the user, documents, and contextual information environment. The background information of the user indicates the information of the user’s current status, such as user profile, types of users, purpose, activity/work, prior knowledge/skills, social networks and homepage information, logs, information behavior (searching, reading, etc.) and the level of information requirement (necessary and unnecessary). Each document has special characteristics that are different from other documents, such as bibliographic information (title, abstract, ISSN, etc.), references, popularity (number of visits or downloads of a paper per month or year), which these characteristics are known as information. Background documents are adopted. Contextual information environment provides a set of information for users’ official status such as location, time, service type, especially when users’ status changes dynamically and frequently.

To answer question b, they divided the many methods that have been successfully applied to produce recommendations into four categories: common filter (CF), CB, KB, and other methods (data mining, cluster analysis, etc.). To answer question c through five methods of contextual communication in the recommendation, which include the following: researchers have referred to the past information used in this contextual information, as well as other researchers’ context definitions.

2.2 Cite to draft papers

When a writer is writing or preparing to write a research paper, he always has the appropriate sources in mind, however, there are likely to be sources that the writer has missed that should be cited. In this way, a good reference recommender system will not only improve the paper, but in general, it will improve the efficiency and quality of literature search, which in this section we will examine some examples of this type of recommender systems.

In reference [26], in order to recommend papers to refer to the draft paper, it first creates a set of documents for ranking in three stages, which includes the following steps: The first stage selects 100 papers that are most similar to the draft paper. It chooses R as the set. In the second step, it adds all the papers referenced in R to the collection (finds the sources papers and adds them to the collection). In the third step, it adds all the papers that are referenced by some papers in R to the collection. Finally, it uses 0 existing documents in the R collection with features such as publication year, similar text, same author, co-referencing pairs, Katz distance measurement graph and reference count (number) for ranking. According to the test results, they introduced the following features as important features in the final ranking for better system performance, which are: author, number of references, publication date, title text and Katz measurement function.

Research [10] has provided a text-aware referencing recommendation for a draft paper considering the inability of graph-based link prediction technique and natural language processing analysis. He presented the problem in such a way that d represents the document, D represents the collection of documents, a context C which is a set of words around the bracket that is to be referenced, a global context which includes the title and abstract d, and a local context which is the surrounding text. The reference is in d. Also, if document d1 refers to document d2, the local context of this reference is named out-link, and considering that document d1, document d2 is called the context of in-link. According to draft paper d without bibliography a national recommendation is a ranked list of references in set d that are recommended as candidates for bibliography d. Given out-link local context, c* given d, a local recommendation is a ranked list of referrals in the set D recommended as candidates for the holder " [] " associated with c*.

Their framework-based recommendation system [10] can take two types of input, a manual query d1 or just an out-link local context for c*, which they use to extract its global text (e.g. abstract and title) and All out-link local contexts process it. Another entry for each reference holder associated with an out-link local context is provided for the bibliography, ranking the references by their relevance and returning the top k papers.

Research authors [10] use two methods to retrieve papers from their candidate collection (CiteSeer library papers).

  1. Text-aware methods, in which the local context is not considered in this method, and for a manual query d1, documents can be retrieved using the following methods:

    1. N top references with abstracts and titles that are most similar to d1, which is called GN.

    2. Documents that authors share with d1, which is called the Author method.

    3. Papers referenced by documents previously produced in the candidate collection by some other methods (for example, GN, Author), which is called CitHop.

    4. Documents written by authors whose papers are currently in the candidate collection and are generated by some other method called AuthHop.

    The above methods are too slow because they are based on the method based on the text content of the documents and may cover papers that are not relevant and appropriate (many ideas are not in the abstract).

  2. Context-aware methods, which help to improve the coverage for recommendations by considering the local context in the manual query. For a d1 for each local context c* they introduced two methods:

    1. Top N documents that are most similar to c* in terms of in-link, which is called LN.

    2. Documents containing N out-link fields that are most similar to c* (this method refers to documents retrieved by LN), this method is called LCN.

The above 6 methods can be combined. They used Gleason’s non-parametric probabilistic model to measure the association between a draft document and a referral candidate and to rank it.

Given the manuscript document d1, the authors used formula 1 to rank the documents for recommendation in the c* context related holder. If the reference context c* is given without d1, then we use the formula 1.

Sim(c*,d2)=Trace(Td2 c*c*T)=1k2E1

When an author writes a paper, he uses words and methods that he forgets which author he is from. If the recommender system can provide some candidate papers containing the required information about the use cases, writing the paper becomes easy. In the research [14], the referral recommendation problem has been improved with the translation model. Basically, the translation model is used in one language to another language, and it is also used in document retrieval that displays heterogeneous content in search (for example, cross-language retrieval). In the problem of referral recommendation, the concept of referral and the content of papers (documents) are actually completely different in nature and the translation model can be a bridge between this gaps. This method actually works better than the state of art method and has better performance.

Many researchers have studied the application of CF algorithms in citation recommendation for papers. However, CF algorithms have limitations such as sparse data and scalability, and in referral recommendation tasks, referral graphs tend to be noisy and sparse, and potentially due to referral errors, referrals are lost or have space limitations. To deal with these problems, the authors of reference [4] have used Singular Value Decomposition (SVD) method to build a reliable referral recommendation system and address the limitations of memory-based CF algorithm. SVD is a popular method for identifying latent semantic factors, where association patterns in data can be easily identified and compared to the original space.

People are faced with the problem of information overload on a daily basis. As people publish more information on the World Wide Web, this problem is getting worse day by day, making it difficult to find the information they need. By recommending items to users based on previously stated user preferences, recommender systems guide and control the complexity of users’ information space. Research [18] integrated the CF algorithm in the domain of research papers that works with a dataset as a ranking matrix, where the rows of the ranking matrix represent users and the columns of the matrix represent items. In order to integrate the CF algorithm in the domain of research papers, the reference network on the CF score matrix is ​​needed. There are different ways to create a CF score matrix from the reference network between research papers.

One solution would be the references as items in the matrix while the users would be the rates of the papers, which approach is not used in the reference network due to the setup problem. An alternative approach is to consider the authors of papers as users and the maintainers of references as items. In this author’s matrix, the references used by that author are given points in the matrix. The problem with this method is that an author may have written several papers in different fields during his career, which may provide an incorrect recommendation system [18].

The recommendation algorithm proposed in the paper [18] uses a referral basket as input to the given system and the system returns a ranked list of referrals as output. In this experiment, co-referencing algorithms such as co-referencing matching, CF user-item algorithm, CF item-item algorithm and simple Bayesian classification were used against non-CF algorithms such as local reference search graph and keyword search. The test has been evaluated offline and online. In the offline evaluation of the recommender system based on the Bayesian algorithm, it has the best result in terms of ranking, and considering the coverage of the papers, the user-user and user-item algorithms have performed the best. In the online test, which are categorized according to related and unrelated papers, Bayesian methods, local reference search graph, and keyword search have performed better, but in the newness of papers, user-user algorithms and user-item algorithms have performed better. Finally, CF algorithms are able to make recommendations for each paper and are suitable for reading, while basic methods are suitable for finding related works.

Citations are very important for the credibility of scholarly papers, as well as proper references to support claims in your work. However, as the number of research publications grows, researchers may find it difficult to find suitable and necessary references, so using a reference recommendation engine can recommend authors writing a paper a complete list of suitable and relevant references. Many reference recommendation models are divided into two categories: global recommendation, which recommends a list of references for a draft paper, and local recommendation, which recommends references for a specific topic in the paper. Reference [11] focused on local recommendations or context-based recommendations. A citation context C is defined as a sequence of words that appear around a specific citation, usually a citation context containing words that describe or summarize the referenced papers. Directly, the semantics of referenced documents should be close to the reference context. The authors of the paper [11] use a neural network model to estimate the probability of referring to the paper according to the reference context. This method ensures that keywords used to reference similar documents have a high semantic similarity. To evaluate this model, a general test was performed on the CiteSeer dataset, which performed better than the State of Art method.

2.3 Personalized recommendation

Interest in web-based recommender systems is high because personalized service is an important factor for shopping malls and Internet service providers. Researchers and academics store and share their research information and others on the Internet, and finally they organize. In the research [6, 7], a personalization recommendation system was presented that increased the accuracy of finding research papers, also this system can be feasible and applicable for the development of a recommender system of a specific domain.

The proposed system architecture [6, 7] consists of user interface, crawler, transform, extractor, filter and keyword finder, management profile, user profile and database. The user interface interacts with users who create input and output topics for each topic for the crawler. The crawler of papers from Google Scholar stores the URL of each paper in one place and converts them all to a specific format. The extractor extracts the paper title, keywords, abstract and main body of each research paper in text form. If a paper does not have keywords, we extract keywords from the title of the paper. Filtering process, suitable papers for users among the stored papers according to the topic selected by the user through the user’s profile, which includes the user’s personal preferences in the form of saved XML forms, the filtering process is done.

The suggested algorithm for extracting keywords is that first unnecessary characters such as or, of, is, etc. are removed, then the keywords in the title and introduction, keywords specified for the paper by the author and keywords on the first page Papers are extracted. If there is no keyword, a few words can be selected from the title and the formula 2 checks the degree of reflection between the title and the keywords:

Reflex=#Keyword Title#Title TermE2

A user profile is collected with each user click on a research paper. Each time a paper is selected, the frequency of domain, topic, and profile keywords is incremented, and then the rate of each occurrence is recalculated, reflecting the user’s profile. To calculate the similarity, cosine similarity is used, which is generally used to solve the cold problem. To evaluate the test, SAT, which means the ratio of the total number of correct and incorrect research papers to the total number of research papers stored for a certain subject, and ACC, which means the ratio of correct research papers to the number of recommended research papers, have been used, which have had significant results.

In research [16], for each user, it provides a small set of categories as the context for each query sent by the user, which is based on their search history. Their method is as follows: (1) A strategy is used to collect the user’s search history, (2) A user profile is created based on the search history of a public profile, (3) Appropriate categorization for each user query based on the user’s profile. And public is chosen.

To personalize the search, the user’s search history must first be created, which is a tree of search records, the root of the search history is a query, and each query has one or more related categories. Then, a user profile is created that shows the user’s interests, and finally, a matrix is created for the user’s history and profile, and a general profile is created from these two matrices.. In addition to using the three mentioned matrices, general knowledge that is applicable to all users has also been used. Personalization is done by mapping a user search to a set of categories, reflecting user intent based on user profile and general profile. The maps are as follows:

  1. The similarity between the user’s query and the categories representing the user’s interests is calculated.

  2. The similarity of categories is ranked in descending order.

  3. Finally, it shows the top three categories together with a button that shows the next three categories to the user.

In the study [21], the combination method in the individual prediction stage and the aggregation method based on the ER rule in the group aggregation stage are proposed to recommend the improved GPRAH_ER group article. The framework of the researchers’ proposed approach consists of three steps: The first step: data acquisition, where the original data is obtained by the web crawler as in Jokar et al.’s [30] paper, which is used to retrieve all researchers, articles, and groups in SSNs. Records of collecting articles, researcher group relationships, content of articles and other information are collected together. Then remove data noise, segmenting the words, and removing the stopped words are performed for the obtained data. In the second stage, the individual prediction method is performed based on the probabilistic matrix factorization (PMF), (that it is a powerful method for modeling data associated with pairwise relationships) method, one of the common model-based CF methods, which includes secondary information from articles and group information. The decomposition process is performed based on the researcher-article ranking matrix RM×N together with the researcher-group matrix GM×L to obtain the researcher’s hidden feature matrix U∈ RM×K, the article’s hidden feature matrix V∈ RK×N. The researcher-article ranking matrix RM×N is decomposed into the researcher latent feature matrix U and the article latent feature matrix V. Ui and Vj refer to the latent feature vector of the ith researcher and the latent feature vector of the jth article, respectively. To ensure that articles with similar content can have similar feature vectors when doing matrix factorization, the content similarity between articles is calculated based on the content-based filtering (CBF) (CBF algorithms recommend suitable items to users based on the descriptions of items and user preferences) method and then included in the PMF method. Finally, in the third step, the group aggregation method is used. That the group aggregation method is built for each group to obtain the predicted final ranking of the groups through the evidential reasoning (ER) approach. For each group, the final predicted rankings on the articles are sorted in descending order, and then the top-ranked articles are recommended to the groups. Thus, ER is a general approach for multi-criteria decision analysis that can be considered as a probabilistic approach by using an integrated structure to model different types of uncertainty and from all the data generated by users. Full use. Researcher-article ̃RM×N = {rij} is obtained by the combined method. Through the ̃RM×N matrix, the prediction values ​​of the group members in the articles (i.e. rij) can be shown. Then, the aggregation process which merges the final recommendation list of each group by the prediction values ​​of all members by the ER rule is performed. For each UI in group Gk, his prediction values ​​across all articles can be taken as evidence for the “truth” prediction that that article matches the group’s taste.

2.4 Extracting user interests and behavior

With regard to the fact that in a system, users often share papers from different domains given the specific priorities for specific domains. In the research [8] it is assumed that the user’s interests are extracted from their behavior on the site calculated automatically (when they visit the pages). They approached user interests by collecting and analyzing user behavior and developed a personalized recommendation model that selects candidate papers for recommendation. Their experiments showed that the recommendation model based on user behavior increases the accuracy of the recommendation. The system has little behavioral information about newly registered or inactive users. It cannot get knowledge about their preferences, so they optimize their proposed model.

To implement user-centered personal recommendation, they need to follow the user’s behavior to collect information and discover the feature that can reflect the user’s feature, the preferences with high influence in the model should be selected. They [8] select user preferences from the site structure and their existing data records. Active user operations include publishing a paper, marking a paper as interest, rating a paper, creating a comment, and tagging a paper. They [8] extract the title, abstract, keywords of papers of interest to each user and configure them in one file. Also, they have defined the recommendation task as a triple relation Di, Dx, U, where U refers to the user, Dx refers to the collection of papers that the user is viewing, and Di refers to the collection of papers for recommend to the user.

In this work [23], a user intention model based on deep sequential topic analysis is proposed. This model predicts user intent based on the topic of interest. The proposed approach is classified into two steps: (1) user preference learning and (2) user intent prediction. In the first step, the learning method implicitly explains the user’s preferences. The proposed method gathers the user’s preferences implicitly by considering the user’s click data. In the research article recommendation system, clicked data points to a set of articles. A paper has several properties. The proposed method considers the latent features of articles that are “topic” as users’ preference. In other words, the proposed approach tries to capture the user’s preference in terms of topics of interest. In other words, the proposed approach tries to capture the user’s preference in terms of topics of interest. In order to extract the topic of the articles, a hybrid model is proposed. The subject model is a combination of Latent Dirichlet allocation (LDA) method and Word2Vec method. To calculate the probability distribution of the words of an article among several predefined topics, LDA, a topic modeling technique, is considered. The measure of word-to-word correlation is necessary to justify the true identification of essay topics. In this context, a Word2Vec word embedding technique is considered. Word2Vec helps to group semantically and syntactically similar words in a specific topic. This work proposes a method how to combine LDA and Word2Vec to obtain the full word-topic distribution. In the second stage, the user profile is divided into two types: (1) static and (2) dynamic. A static profile contains such user information (such as age, gender) that does not need to be changed. Generally, profile information is provided by the user himself. In contrast, the dynamic profile is automatically generated by the recommendation system, and the features it contains undergo changes over time. Since, the proposed approach captures a user’s interest topic that may change in different context or time, it is dynamic. It will be updated accordingly, and its size and variety will also increase. In this regard, conventional user profile modeling approaches are used to obtain all historical sequences (long-term and short-term) of user-item interaction. The proposed approach considers a deep sequential topic analysis technique to predict a user’s future favorite topic from his past preferred topic sequence. Specifically, a variation of Recurrent Neural Network (RNN) which is Long Short-Term Memory (LSTM) is used to combine long-term and short-term topic interest to predict future topic interest. This model considers the time context, i.e. the time difference between the clicks of two consecutive articles viewed by the user. In addition, a few external attributes such as “likes” (whether the user likes the article or not), “session number” are considered.

2.5 Extracting the domain of papers

Text analysis is a process of analyzing unstructured textual data with the aim of extracting meaningful information. This meaningful information is used to identify the domain of each paper, which domain refers to a specific branch of scientific knowledge or scientific field. For example, the scientific fields of computer science have several domains such as data mining, network, operating system, etc., and the domain of data mining, in turn, has several sub-domains such as pattern recognition, machine learning, statistics, etc. Any given paper basically has a set of words, when the researcher reads the paper, he may be able to discover its domain and subdomain, but the ability to understand this subject can be in the researcher’s previous knowledge of the domain and subdomain.

In reference [27], disambiguation of prepositions using supervised learning method is used to extract the scope of each paper. In this way, they have described a method for extracting meaning from the title, keywords and abstract of research papers in the field of computers.

To identify the scope of each paper, transitive methods such as the use of words, stop words, sentences, clauses, phrases, m grams, prepositions, prepositions with a sense of intention, favorite phrase, derivative, word scope, disambiguation of prepositions and etc. there are The definition of some of these items is as follows:

  1. Sentence: It is a sequence of words that are complete by themselves and contain a subject and proposition and convey a statement, and it consists of a main clause and has one or more clauses.

  2. Clause: It is a unit of command organization that includes entity and proposition.

  3. Preposition: A ruling word that usually comes before a noun or pronoun and expresses the relationship with another word or element in the clause.

  4. Preposition with sense of intention: It is a preposition that follows a phrase that indicates a specific purpose or action.

  5. Favorite phrase: phrases that end after a preposition with a sense of intention and before the next preposition in the sentence or with the end of the sentence.

  6. Disambiguation of prepositions: Semantics is a branch of linguistics that deals with the meaning of words and phrases in a particular context.

In the proposed method [27], the title of a paper, which can be a sentence or a phrase, is used, and the title is scanned to find prepositions with a sense of intent, and interesting phrases that follow a preposition are extracted. In the next step, interesting phrases that have one or more words in common with the keywords are saved as practice words. Each derivative is then labeled and classified as seed or non-domain, which forms the dataset for training data as a gram list. The test data is also classified using the training data. Text mining is a growing field that involves automatic knowledge extraction from natural language. Natural language processing or NLP is a set of essential techniques for extracting text, which disambiguation of prepositional “concept” is an NLP method that researchers extract the important “concept” conveyed by prepositions in the text.

In the study [28], the important concepts of prepositions were extracted from the text with some interesting expressions, and these interesting expressions were then classified using a simple Bayesian classification. The domain of a scientific paper is the main topic or topic raised in it, which is a branch of a scientific discipline, for example, the field of computer is domains such as network, parallel computing, theory of computation, data mining, etc. Each domain can in turn have other sub-domains, in which case the second domain is considered, for example, the second domain for the data mining domain includes NLP, pattern recognition and . . . Is. The method used in this research [28] is to extract the root or context of each paper in the form of its domain, and interesting phrases based on their place in the vicinity of certain prepositions are deduced using the results of preposition disambiguation.

The authors of the paper [28] have used interesting phrases for their research, they have compiled a complete list of prepositions after reviewing several English textbooks. Each preposition in this set is defined as pi, with all other prepositions defined as ts:

Pi=fortotowards

Complement C is an expression that is derived based on the permutation of pi and p0 in a clause, and E is defined as the end of a clause:

piCp0,p0Cpi,piCE

Authors of papers [28] have used integration and keywords to extract interesting phrases, because the titles of papers tend to be unique and are good proxies for domain recognition, and on the other hand, phrases or keywords are usually more widely used, which Used for tagging to identify the domain. Then the common interesting phrases in the title and keywords are retained as derivatives for retrieval. For classification, a small amount of input data is labeled and each derivative is classified as “domain” or “not domain” using a Bayesian algorithm.

In the research [29], the title and abstract were used to summarize the paper, which were presented based on the identification of frequent phrases in the titles of research papers to discover the trend of research topics in a field. In this method, they first extract phrases by removing punctuation marks and stop words. These phrases may be repeated in many similar lines, which should be extracted from these phrases to determine how many times each sub-phrase is repeated in the title and abstract. The proposed method has been implemented with Java and R languages for various conferences and journals, which have had significant results.

The authors of reference [13] proposed a personalized recommendation system based on the interest survey model, which effectively examines the characteristics of newly published papers and selects the appropriate group for them. Personal recommendation technology is mainly divided into three categories, including rule-based filtering, content-based filtering, and collaborative filtering.

Rule-based filtering requires users to provide information about their interests, whose interest model is created and maintained by them. The advantages of this method reflect the accuracy of the user’s interest, but due to the availability and scalability of such systems, they may perform poorly because the users take responsibility for the modeling. Building a model for content-based filtering is such that the system automatically collects user history information, the problem of this method is to build an ontology domain, because it requires a large amount of training text for clustering. The joint filter is usually based on KNN, which is very popular, but it currently has challenges such as sparse data and scalability, and the growth of the number of users and cases leads to computational complexity and also suffers from the cold problem. Due to the problems that the above methods had, the authors have developed a personal recommendation method that uses classification algorithms in machine learning (the Bayesian hierarchy model for content-based recommendation).

The article [25] was done in three stages, in the first stage: pre-processing of the text of the articles was done (removing noise and stopwords). The second stage was the semantic operation, which was used to discover knowledge and extract meanings from Gibbs sampling. Then the data were modeled using LDA (Latent Dirichlet Allocaon). LDA assumes that each word in a document is generated in two steps. First, assuming that each document has its own topic distribution, a topic is randomly selected based on the topic distribution of the document. In the next step, assuming that each topic has its own word distribution, a word is randomly drawn from the word distribution of the topic selected in the previous step. By repeating these two steps word by word a document is created

Basically, LDA reduces the super dimensionality of the researcher’s data from words to topics, based on the repetition of words in the same document, similar to cluster analysis or principal component analysis. According to the LDA algorithm, a set of documents is defined as Scholar-Context and words are defined as topics. Finally, by considering the marginal probability production of each researcher’s article in the collection, the probability of the process of building a corpus is determined.

In step 3, after semantic extraction and topic extraction, in this step keywords are generated as an explosive topic based on ranking and high probability. These topics are recommended as data science research areas (Figure 1) [25].

Figure 1.

The research framework for recommending research field and semantic mining from scholar’s paper [25].

2.6 A recommender system in the Big Data platform

2.6.1 Hadoop platform

In reference [30] paper presented a method based on the background information of the papers in the Hadoop platform. They collected a dataset of conference papers from 2013 to 2015 of the IEEE Digital Library in the three domains of computer, electrical, and mechanical using a crawler that received the URL of each paper as crawler input and an XML file containing the attributes Fields such as the title of the paper, the abstract of the paper, the keywords of the paper were extracted as output, and the keywords part included the keywords that the author considered for the paper, the terms that IEEE considered, the indicators controlled in INSPEC And the non-controlled indicators are in INSPEC.

Their proposed system included two steps: (1) identifying the scope of papers and (2) recommending similar papers.

  1. Identification of the scope of the papers

    Each paper profile contains sentences and words from the title, abstract and keywords that these words must be converted into a language that can be understood by the processor, therefore these words must be converted into code. In these sentences, there are letters such as is, are, when, etc., which are called inhibitory words and are also present in all papers, as a result, these letters must be removed, after removing these letters, the words must be rooted, then these words are weighted and Convert to code. Keywords and other terms specified in the profile and words that are mentioned more than twice in the abstract are considered as the domain vector of the paper. For 80% of the papers, their domains were trained, and for the remaining 20%, using the similarity function between the domain vector of the new papers and the domain vector of the dataset papers, it was detected and stored in the same domain.

    1. Remove StopWords and Finding Roots

      First, we create a class named RemoveStopWord, which includes three subclasses named deleteStopWord, Stem, and main, then we explain the function of each subclass.

      deleteStopWord:

      This subclass first receives a list of inhibitor words as a collection and converts this collection into an array list.

      Stem:

      This subclass is the implementation of Porter’s algorithm, which first shows a word as input and the root of that word as output. This algorithm can be added to Java libraries by using library functions.

      Main:

      This subclass consists of two parts:

      1. Create a list of blocking words

        In this section, the list of blocking words from the input file is received as input and a set of blocking words is created.

      2. Converting a dataset containing words into an equivalent dataset where the words are the roots of the words.

        The system data set file, which was described in the method of collecting and creating it at the beginning of the chapter, is received as input. This function converts the data set file to a data set containing the root words used in it.

    2. Weighting words based on tf/idf

      This section is implemented by the TFID class, which uses mapping and reduction methods to implement the mapping-reducing model (explained in the previous chapter). Since word weighting is based on ngram, we need two helper functions which are defined as follows. In that research, the researchers considered n = 2, which is received as an input along with a sentence of words of each paper profile, the program divides the sentences into two words and shows them as output.

      TFID production mapping

      Each line of the new data set has a line number, which is considered as a record number in the mapping stage, and the content section (including title words, abstract and keywords) is considered as “value”.

      The output of this step is key-value. The key is the extracted bigrams (TFID column) and the value containing the calculated TFID number (which is currently considered to be 1) so that the original TFID value can be calculated by adding the numbers of one (1) corresponding to each bigram in the reduction stage.

      Reduction in TFID production

      In this stage, the input of the program is the output of the mapping stage, which produces TFID, bigrams of the textual part of each paper (title, abstract and keywords) and shows them as output.

      If the weighted papers are part of 80% of the educational collection, the next step III is implemented, otherwise the papers are part of the educational collection, step IV is implemented.

    3. Utility matrix

      After weighting all the textual parts of the papers, these weights along with the bigrams and the ID of each paper enter in a useful matrix which is created in the form of key-value as intermediate Hadoop files so that the key or line number is similar to the ID of the paper. And the value or column is the same bigrams are stored, and the numbers inside the matrix are the tf/idf weight of that bigram in that paper and its range. This 80% is the training part of the data set.

    4. Test set

      We considered 20% of the data set for the test set that the scope of the papers is recognized based on the similarity between the keywords of the papers and the scope of the papers that have been specified and included in the usefulness matrix. First, the profiles of the papers are returned to step I and II to remove the blocking words, root and weight them, and in step III, considering that these profiles are part of the test set, they go to step V to detect the domain.

    5. Domain recognition

      To detect the scope of the papers, first, we obtain the similarity of the vector obtained from the weighting of the words with the scope of the papers that were stored in the usefulness matrix (we consider the bigrams in the columns of the matrix as the scope of each domain) using cosine similarity. , which if there is a high similarity, the domain of the mentioned paper is determined and stored in that domain. To optimize the system, this step is merged with step 2.

      All the previous steps were related to building the dataset and now we enter the part of our recommender system. If the incoming user is new and is using this system for the first time, he will go to the next step, otherwise, he will go to step 2-II.

      The system randomly suggests some papers from each domain, and by choosing one of them, the user’s working domain is determined for the system and the system saves it in the user’s profile. Then it goes to the next step.

  2. Building a recommender system and recommending papers

    For this section, a class named recommender has been created, which uses four other classes named MapperJoin, MapperRecommender, ReducerJoin, and Reducer for mapping-reduction steps. In the following, the mode of operation, input and output are discussed.

    This part is the mapping and reduction operation that takes place in two separate stages and is responsible for setting up the Hadoop driver.

    1. Mapping MapperRecommender

      This class, which performs the mapping operation, performs a simple mapping operation in order to perform the desired operation in a suitable form in the reduction phase, which is the input of the TFID table, which includes the bigrams of the title, abstract, and keywords.

    2. Reducer mapping

      In this mapping, by receiving the output of the mapping stage of the previous stage, it produces the output that provides the possibility of checking the similarity of the papers entered as input and the papers in the dataset. These papers can be recommended to the user who imported the original papers.

    3. Mapping MapperJoin

      This mapping performs the mapping operation to check the similarity between the papers. At this stage, the goal is to put together the same bigrams (with the same key) so that papers with the same bigram are put together, and this is necessary because of the suggestion of papers. Because the papers that have the same bigram are the same papers that have the same words in the title, abstract, keywords or the same or common key area.

    4. Reduce ReducerJoin

      At this stage, similar papers based on the degree of similarity between the papers that the user has chosen for the first time or chosen as his work domain with the papers of the dataset are suggested using cosine similarity.

      At this stage, the recommended paper along with its domain is saved in the user’s profile for further recommendations, and finally the work of the recommended recommender system is finished.

      Finally, the researchers [30] used the two criteria of user satisfaction and accuracy provided in reference [7] for evaluation, which is shown below in formula 3:

SAT=Number of correct research papersTotal number of recommended papers
ACC=RecommendedbySAT+Not recommendedbyUNSATTotal number of papers saved foraspecific subjectE3

According to the above formula, SAT shows the ratio of correct research papers which are consistent with the user domain, to the number of recommended papers and ACC shows the proportion of correct or incorrect research papers consistent with the user domain to the total number of saved papers for a specific domain.

2.6.2 Mahout platform

Here in reference [31], the implementation is done in four phases: the first phase is to extract ideas from the paper, the second phase is to extract the author’s work area from the written paper, the third phase is to create a profile for the author (user), and finally, the fourth phase is to recommend the user’s favorite paper. Which will be explained in the following four phases.

  1. Phase one: extracting ideas from the paper

    In this part, two methods are used to get the idea of the paper. In the first part of the title

    The paper and in the second part of the paper abstract and title are considered to extract the idea of the paper.

    1. Extracting the idea of the paper from the abstract

      In the following code, the abstract part of the paper is used to extract ideas using the dataset containing the paper’s specifications in XML form, and the cue words are removed using the RemovecueWords function, and then the idea of the paper is extracted using the getIdea function. The following shows how these functions work.

      RemovecueWords:

      A class named RemovecueWords is created. Which first receives a list of reference words as a set and considers the sentences of the abstract that contain reference words, removes these words from the sentences and sends the sentence without reference words to the output.

      getIdea:

      In the getIdea function class, it first tags (POS tagger) the words extracted from the previous section, and searches for ideas with consecutive words with the name and attribute tag. For tagging, MaxenTagger is used, which is added to the Java library as a jar file, and as input file, which is a stemmer configuration file, is provided to MaxenTagger.

    2. Extracting the idea of ​​the paper from the body of the paper

      At this stage, considering that the purpose of extracting ideas is to calculate the similarity of the title with the body of the paper, we need to extract the title and body of the paper and perform a series of pre-processing to extract ideas. In this part, the text of the paper is used (the same text as the existing references). Generally, sentences are used that are similar to the title because the title contains important and key terms and sentences that have these terms and are considered similar because they contain important terms here via TF The number of repetitions of words with the highest frequency is found, that is, the sentences that have the most similarity to the title are extracted. The title part of the paper is extracted from the input files for future processing.

      Remove common words:

      In this section, a list of words that are usually repeated the most in the text of the paper is formed, and then the sentences that contain these words are deleted by this function. Then, the similarity of each sentence from the abstract in the previous step is considered with the title, and each sentence that has the highest frequency in terms of similarity, and if it reaches the set threshold limit, that sentence is recorded as an paper idea in the user’s profile.

  2. The second phase: extracting the topic and scope of the paper

    At this stage, two methods are used to extract the topic of the paper. The first method is to use the title of the paper and the references section, and the second method is to use the abstract of the paper.

    1. Extracting the scope of work from the title and references

      In this part, we need to find the root, remove the inhibitory words and separate the sentences in the text according to the punctuation marks, and those sentences that have a high frequency between the punctuation marks after pre-processing are stored as topic in HDFS and user profile.

      1. Title

        Finding blocking words and finding roots

        First, we create a class called Remove stop word, which includes three subclasses: LinkedHashSet, deletestopword, and stem subclasses, which are as follows:

        1. LinkedHashSet:

          This subclass is used to generate stopwords

        2. deleteStopWord:

          This subclass takes the inhibitory words in the previous section as input and puts them in the array, considers the inhibitory words, breaks the sentence from the inhibitory words section, and finally removes the inhibitory words.

        3. Stem:

          This subclass is the implementation of Porter’s algorithm, which takes the broken sentences in the previous section and receives their words as input and shows their roots as output. Rooting through the following code using the porter root finder jar file performs the rooting operation

      2. Referrals

        In addition to the work of the previous part, punctuation marks are used in the references. We receive the punctuation marks in the array and break the references from the punctuation marks part and find the root and finally remove these marks. At the end, after finding the roots of the words of the broken sentences, we find the frequency of these words in the abstract and consider the words with the highest frequency as the author’s work area.

    2. Extracting the work area from the abstract and text of the paper

      It finds the words that have the highest frequency in the body and the abstract, or the words that have the most repetition together, and at the end, it deletes similar words and combines other words.

  3. The third phase: creating a user profile

    According to the information obtained from the previous sections and the information in the data set such as the title, abstract, keywords, scope and idea of the paper, the feature vector is formed to create a profile.

  4. The fourth phase: building a recommender system and recommending papers

At first, according to the feature vectors built in the previous part, Kisenosi similarity is used to detect the similarity of each feature in the author’s profile with the candidate paper, and to build the recommendation system, the Mahout platform is used due to its simplicity in its implementation and also Most of the library functions are ready, the code below the implementation steps of the recommendation part. For example, the author’s field of work is extracted by looking at the title of the paper and finding the inhibitory word in it, and breaking the title of the paper into two sentences from the inhibitory word, then finding the roots of the words of each sentence, and finally the frequency of each It finds an abstract of the words and registers it as the author’s field of work in the author’s profile and presents the most similar paper to the user according to the feature vector in the user’s profile and using cosine similarity. Finally, the researchers used recall criteria and DCG for evaluation, which compared to other researches, the proposed method has provided an acceptable performance.

Advertisement

3. Conclusion

With the increasing number of data in DLS, they, like other businesses, have taken advantage of recommender systems and try to personalize recommendations to users by using the textual information of papers. This chapter examines text-aware recommender systems, recommending papers for referring data to draft papers, personalized recommendations, extracting user interests and behavior and analyzing them for personalized recommendations and extracting the scope of papers, as well as paper recommender systems using It has provided big data platforms. The recommender systems proposed in this chapter used the textual information of papers to create paper profiles, according to which it is easy to recognize the scope of new papers and the scope of work of each user, and provide satisfactory recommendations according to the scope and interests of the user. Also, some researches also formed profiles for users according to the papers and recommended papers using the CF method.

References

  1. 1. Yu M-C, Jim Wu Y-C, Alhalabi W, Kao H-Y, Wu W-H. ResearchGate: an effective altmetric indicator for active researchers? Computers in Human Behavior. 2016;55:1001-1006. DOI: 10.1016/j.chb.2015.11.007
  2. 2. Beel J, Langer S, Genzmehr M, Gipp B, Breitinger C, Nürnberger A. Research paper recommender system evaluation: a quantitative literature survey. In: Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation, 2013. pp. 15-22. DOI: 10.1145/2532508.2532512
  3. 3. Beel J, Gipp B, Langer S, Breitinger C. Paper recommender systems: a literature survey. International Journal on Digital Libraries. 2016;17(4):305-338. DOI: 10.1007/s00799-015-0156-0
  4. 4. Caragea C, Silvescu A, Mitra P, Lee Giles C. Can’t see the forest for the trees? a citation recommendation system. In: Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries, 2013. pp. 111-114. DOI: 10.1145/2467696.2467743
  5. 5. Rokach L, Mitra P, Kataria S, Huang W, Giles L. A supervised learning method for context-aware citation recommendation in a large corpus. INVITED SPEAKER: Analyzing the Performance of Top-K Retrieval Algorithms 2013: 2013
  6. 6. Hong K, Jeon H, Jeon C. UserProfile-based personalized research paper recommendation system. In: 2012 8th International Conference on Computing and Networking Technology (INC, ICCIS and ICMIC), IEEE, 2012. pp. 134-138
  7. 7. Hong K, Jeon H, Jeon C. Personalized research paper recommendation system using keyword extraction based on userprofile. Journal of Convergence Information Technology. 2013;8(16):106
  8. 8. Wang Y, Liu J, Dong XL, Liu T, Huang YL. Personalized paper recommendation based on user historical behavior. In: CCF International Conference on Natural Language Processing and Chinese Computing. Berlin, Heidelberg: Springer; 2012. DOI: 10.1007/978-3-642-34456-5_1
  9. 9. Champiri ZD, Shahamiri SR, Salim SSB. A systematic review of scholar context-aware recommender systems. Expert Systems with Applications. 2015;42(3):1743-1758. DOI: 10.1016/j.eswa.2014.09.017
  10. 10. He Q, Pei J, Kifer D, Mitra P, Giles L. Context-aware citation recommendation. In: Proceedings of the 19th international conference on World wide web, 2010. pp. 421-430. DOI: 10.1145/1772690.1772734
  11. 11. Huang W, Wu Z, Liang C, Mitra P, Lee Giles C. A neural probabilistic model for context based citation recommendation. In: Twenty-ninth AAAI conference on artificial intelligence. 2015. DOI: 10.1609/aaai.v29i1.9528
  12. 12. Rokach L, Mitra P, Kataria S, Huang W, Giles L. A supervised learning method for context-aware citation recommendation in a large corpus. INVITED SPEAKER: Analyzing the Performance of Top-K Retrieval Algorithms 1978:1978
  13. 13. Sun Y, Ni W, Men R. A personalized paper recommendation approach based on web paper mining and reviewer’s interest modeling. In: 2009 International Conference on Research Challenges in Computer Science. IEEE, 2009. pp. 49-52. DOI: 10.1109/ICRCCS.2009.76
  14. 14. Lu Y, He J, Shan D, Yan H. Recommending citations with translation model. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. Oct 2011. pp. 2017-2020. DOI: 10.1145/2063576.2063879
  15. 15. Lee J, Lee K, Kim JG. Personalized academic research paper recommendation system. arXiv preprint arXiv:1304.5457 2013. DOI: 10.48550/arXiv.1304.5457
  16. 16. Liu F, Yu C, Meng W. Personalized web search by mapping user queries to categories. In: Proceedings of the eleventh international conference on Information and knowledge management. 2002. pp. 558-565. DOI: 10.1145/584792.584884
  17. 17. Huang W, Kataria S, Caragea C, Mitra P, Lee Giles C, Rokach L. Recommending citations: translating papers into references. In: Proceedings of the 21st ACM international conference on Information and knowledge management. 2012. pp. 1910-1914. DOI: 10.1145/2396761.2398542
  18. 18. McNee SM, Albert I, Cosley D, Gopalkrishnan P, Lam SK, Rashid AM, et al. On the recommending of citations for research papers. In: Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work. Nov 2002. pp. 116-125. DOI: 10.1145/587078.587096
  19. 19. Medić Z, Šnajder J. An empirical study of the design choices for local citation recommendation systems. Expert Systems with Applications. 2022;200:116852. DOI: 10.1016/j.eswa.2022.116852
  20. 20. Ali Z, Kefalas P, Muhammad K, Ali B, Imran M. Deep learning in citation recommendation models survey. Expert Systems with Applications. 2020;162:113790. DOI: 10.1016/j.eswa.2020.113790
  21. 21. Wang G, Wang HR, Yang Y, Xu DL, Yang JB, Yue F. Group article recommendation based on ER rule in Scientific Social Networks. Applied Soft Computing. 2021;110:107631. DOI: 10.1016/j.asoc.2021.107631
  22. 22. Chaudhuri A, Sarma M, Samanta D. SHARE: Designing Multiple Criteria-Based Personalized Research Paper Recommendation System. Information Sciences. 2022. DOI: 10.1016/j.ins.2022.09.064
  23. 23. Chaudhuri A, Samanta D, Sarma M. Modeling user behaviour in research paper recommendation system. arXiv preprint arXiv:2107.07831. 2021. DOI: 10.48550/arXiv.2107.07831
  24. 24. Patel K, Caragea C, Wu J, Giles CL. Keyphrase extraction in scholarly digital library search engines. In: International Conference on Web Services. Cham: Springer; Sep 2020. pp. 179-196. DOI: 10.1007/978-3-030-59618-7_12
  25. 25. Jelodar H, Wang Y, Xiao G, Rabbani M, Zhao R, Ayobi S, et al. Recommendation system based on semantic scholar mining and topic modeling on conference publications. Soft Computing. 2021;25(5):3675-3696. DOI: 10.1007/s00500-020-05397-3
  26. 26. Strohman T, Bruce Croft W, Jensen D. Recommending citations for academic papers. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 2007. pp. 705-706. DOI: 10.1145/1277741.1277868
  27. 27. Lakhanpal S, Gupta A, Agrawal R. Towards Extracting Domains from Research Publications. In: MAICS. 2015. pp. 117-120
  28. 28. Lakhanpal S, Gupta A, Agrawal R. Discover trending domains using fusion of supervised machine learning with natural language processing. In: 2015 18th International Conference on Information Fusion (Fusion), IEEE. 2015. pp. 893-900
  29. 29. Lakhanpal S, Gupta A, Agrawal R. On discovering most frequent research trends in a scientific discipline using a text mining technique. In: Proceedings of the 2014 ACM Southeast Regional Conference. 2014. pp. 1-4. DOI: 10.1145/2638404.2638528
  30. 30. Jokar N, Honarvar AR, Esfandiari K. A contextual information based scholary paper recommender system using big data platform. Journal of Fundamental and Applied Sciences. 2016;8(2):914-924. DOI: 10.4314/jfas.v8i2s.144
  31. 31. Aghamirzad S, Honarvar AR, Jokar N. A paper recommender system based on user’s profile in big data scholarly. Journal of Fundamental and Applied Sciences. 2016;8(2):941-955. DOI: 10.4314/jfas.v8i2s.150

Written By

Nasrin Jokar, Mehra Esfandiari, Shima Aghamirzadeh and Hossein Hatami

Submitted: 09 October 2022 Reviewed: 23 November 2022 Published: 26 December 2022