Text mining approaches applied to patents: A scoping review protocol

Homa Arshadi; Maryam Okhovati; Zohre Zahedi; Maryam Asharfi

Nov 21, 2022

Text mining approaches applied to patents: A scoping review protocol

DOI

dx.doi.org/10.17504/protocols.io.6qpvr43kzgmk/v1

Homa Arshadi¹,
Maryam Okhovati²,
Zohre Zahedi³,
Maryam Asharfi⁴

¹Ph.D. Candidate in Medical Library and Information Science, Student Research Committee, School of Management and Medical Information Sciences, Kerman University of Medical Sciences, Kerman, Iran.;
²Associate Prof., Medical Library and Information Sciences Department, School of Management and Medical Information Science, Kerman University of Medical Sciences, Kerman, Iran.;
³Assistant Professor of Information Science, Department of Information Science, Faculty of Humanities Persian Gulf University, Bushehr, Iran. Research, Centre for Science & Technology Studies (CWTS), Leiden University, The Netherlands;
⁴Assistant Professor, Department of Industrial Engineering and Management Systems, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran

Homa Arshadi

DOI: dx.doi.org/10.17504/protocols.io.6qpvr43kzgmk/v1

Protocol Citation: Homa Arshadi, Maryam Okhovati, Zohre Zahedi, Maryam Asharfi 2022. Text mining approaches applied to patents: A scoping review protocol. protocols.io https://dx.doi.org/10.17504/protocols.io.6qpvr43kzgmk/v1

Manuscript citation:

Arshadi, H., Okhovati, M., Zahedi, Z., & Ashrafi, M. (2022, November 16). Text mining approaches applied to patents: A scoping review protocol. https://doi.org/10.17605/OSF.IO/YNGHW

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: November 16, 2022

Last Modified: November 21, 2022

Protocol Integer ID: 72810

Keywords: Patent, Text mining, Text mining Techniques, Patent mining, Scoping review

Funders Acknowledgement:

Kerman University of Medical Sciences

Grant ID: 40100010

Abstract

This study will map the current state of different text mining approaches in patents. This will be of value to policy makers and researchers by allowing them to find the latest efforts to patent text mining. The further outcomes will be identifying the different text mining approaches, the most used data sources, metadata and subject areas in different application areas of the patents.
Web of Science Core Collection (Clarivate Analytics), Scopus, IEEE Xplore Digital Library, ACM Guide to Computing Literature digital will be searched to identify published studies. We will also search the reference lists of included papers. Keywords were founded by reviewing IEEE thesaurus, free text method, expert opinions, and the review of some relevant systematic review studies
After running search syntax in each database, the results of all the search will be exported to EndNote 20. Duplicate papers will be removed. The remaining papers will be imported to the Rayyan, for inclusion in review. Two independent research members will screen titles and abstracts of all papers against the inclusion and exclusion criteria. Subsequently the full-text of all potentially relevant papers will be assessed independently by two reviewers. In the case of conflicts, they discussed and then consulted the third author to reach a consensus.
Key findings relevant to the review will be charted from the included studies using a data extraction tool developed in Excel software by the members of the review team. Two review team members will extract data independently and discrepancies will be solved by consulting with a third expert.

Attachments

Patent text mining. ...

107KB

Title

Text mining approaches applied to patents: A scoping review protocol

Original language title

English

Stage of review at time of this submission

 
ABC
  Review stage
    Started
    Completed
  
  Preliminary searches
    Yes 
    Yes
  
  Piloting of the study selection
  process
    Yes 
    Yes 
  
  Formal screening of search results
  against eligibility criteria
    Yes 
    Yes 
  
  Data extraction
    No
    No
  
  Data analysis
    No
    No
  
 

This review is a part of a Ph.D. research project approved by the ethical committee of Kerman University of Medical Sciences, No. 40100010, which will be carried out with the financial support of the Vice Chancellor for Research and Technology of this university. The funding source had no involvement in the study process.

Review question (s)
RQ1. Which text mining techniques are frequently used by researchers in mining patent?
RQ2. Which data sources are the most often used for text mining in patents? 
RQ3. Which metadata (Claims, Abstract, Title or description) are frequently used for text mining?
RQ4. In which subject areas is text mining used more in patents?
RQ5. What is the most preferred sample size selected by text mining researchers when applying text mining techniques to patents?

search keywords/terms:
(text* AND (analy* OR mining OR categorization OR classification OR cluster* OR extract* OR preprocessing OR processing OR transformation)), (data AND mining), "document classification", "document cluster*", "document summarization", "machine learning", "keyword extraction", "keyword discovery", "keyword retrieval", (information OR knowledge) AND (extract* OR discovery OR retrieval), "Latent Dirichlet Allocation", LDA, "Latent Semantic Analysis", LSA, "Natural Language Processing", NLP, "content analysis", "topic extraction", "topic model*", "unstructured text", "unsupervised learning", "Vector Space Model", "VSM", "support vector machines", "naive bayes classifier", "association rules", "k-nearest neighbor", "neural networks" OR "decision trees" AND (patent OR patents)
"patent analy*", "patent mining", "patent cluster*", "patent map*", "patent roadmap*", "patent network", "patent visualization", "patent visualisation", patentometric*, "patent classification*", "patent retrieval" 

The literature search will not be limited by year of publication or geographic area and the language will be limited to English.

URL to search strategy
We will upload this protocol on Open Sciences Framework (OSF)
We give permission for this file to be made publicly available:
Yes

Condition or domain being studied
Studies that include text mining on unstructured data on patent (such as title, abstract, description, claim).

Participants/ Population
The English Original/Conference papers published that meet the eligibility criteria of this review will be included.

Intervention (s), Exposure(s)
This scoping review does not have Intervention (s), Exposure(s) group.

Comparator (s)/ control
This scoping review does not have Comparator (s)/ control group.

Types of study to be included
The text mining techniques can be utilized to extract the information from structured or unstructured data. In this study our focus will be on unstructured text of patent. The following inclusion and exclusion criteria will apply:
Inclusion criteria:
-Studies that include text mining on unstructured data on patent (such as title, abstract, description, claim);
-peer-reviewed papers published in selected databases and in English; 
-publication outlets (Original articles and conference proceedings);
-Full access to the document.
exclusion criteria:
-all publications that do not meet the inclusion criteria;
-Papers which did mining on structured data on patent; 
-non-English results will be removed during the review process;
-Articles that focus on just citation analysis of patents;
-Secondary and tertiary studies, such as reviews, meta-analyses and surveys will be drawn; 
-Editorial, meeting abstract, reviews, book reviews, books, book chapters and cover letters, and commentaries;
-duplicate publications and retracted publications.

Context
Identification of studies that uses text mining methods on unstructured data on patent (such as title, abstract, description, claim (

Primary outcome(s)
This study will map the current state of different text mining approaches in patents.This study will be of value to policy makers and researchers by allowing them to find the latest efforts to patent text mining

Secondary outcomes
The secondary outcomes will be identifying the different text mining approaches, the most used data sources, metadata and subject areas in different application areas of the patents.

Data extraction (selection and coding)
After running search syntax in each database, the results of all the search will be exported to EndNote 20. Duplicate papers will be removed. The remaining papers will be imported to the Rayyan, for inclusion in review. Two independent research members will screen titles and abstracts of all papers against the inclusion and exclusion criteria. Subsequently the full-text of all potentially relevant papers will be assessed independently by two reviewers. In the case of conflicts, they discussed and then consulted the third author to reach a consensus.
Key findings relevant to the review will be charted from the included studies using a data extraction tool developed in Excel software by the members of the review team. The following data will be extracted: title, author, publication year, Journal, techniques, tools or applications of text mining, data sources, patent metadata (Claims, Abstract, Title or description), subject area, sample size and the purpose of the study. Two review team members will extract data independently and discrepancies will be solved by consulting with a third expert.
The results of the search and the study inclusion process will be reported following the principles of the PRISMA-ScR (Preferred Reporting Items for Systematic reviews and Meta-Analysis extension for Scoping Reviews)

Strategy for data synthesis
The extracted data will be categorized based on types of different techniques of patent text mining, patent data sources, and key features like comparison of the frequency of use of different patent metadata, identifying the most used subject areas in this topicand sample size.

Analysis of subgroups or subsets
There will be no analysis of subgroups or subsets.

type and method of review
Scoping review

Country
Iran

Current review status
On-going

any additional information
Appendix 1: Search syntax for this Scoping review in WoS:
 
 
AB
   
    Search syntax
  
  #1
    (TS=(( ( text* AND ( analy* OR mining
  OR categorization OR classification OR cluster* OR extract* OR preprocessing
  OR processing OR transformation ) ) OR ( data AND mining ) OR "document
  classification" OR "document cluster*" OR "document
  summarization" OR "machine learning" OR "keyword
  extraction" OR "keyword discovery" OR "keyword
  retrieval" OR ( ( information OR knowledge ) AND ( extract* OR discovery
  OR retrieval ) ) OR "Latent Dirichlet Allocation" OR LDA OR "Latent
  Semantic Analysis" OR LSA OR "Natural Language Processing" OR NLP
  OR "content analysis" OR "topic extraction" OR
  "topic model*" OR "unstructured text" OR "unsupervised
  learning" OR "Vector Space Model" OR "VSM" OR
  "support vector machines" OR "naive bayes classifier" OR
  "association rules" OR "k-nearest neighbor" OR
  "neural networks" OR "decision trees" ) )) AND TS=(patent
  OR patents )
  
  #2
    TS=(( "patent analy*" OR
  "patent mining" OR "patent cluster*" OR "patent
  map*" OR "patent roadmap*" OR "patent network" OR
  "patent visualization" OR "patent visualisation"  OR patentometric* OR "patent
  classification*" OR "patent retrieval" ))
  
  #1 OR #2
    Limited to: Article, Conference paper
  
 
 

Details of final report/publication(s).
It will be published in a peer-reviewed journal.

A	B	C
Review stage	Started	Completed
Preliminary searches	Yes	Yes
Piloting of the study selection process	Yes	Yes
Formal screening of search results against eligibility criteria	Yes	Yes
Data extraction	No	No
Data analysis	No	No

	A	B
		Search syntax
	#1	(TS=(( ( text* AND ( analy* OR mining OR categorization OR classification OR cluster* OR extract* OR preprocessing OR processing OR transformation ) ) OR ( data AND mining ) OR "document classification" OR "document cluster" OR "document summarization" OR "machine learning" OR "keyword extraction" OR "keyword discovery" OR "keyword retrieval" OR ( ( information OR knowledge ) AND ( extract OR discovery OR retrieval ) ) OR "Latent Dirichlet Allocation" OR LDA OR "Latent Semantic Analysis" OR LSA OR "Natural Language Processing" OR NLP OR "content analysis" OR "topic extraction" OR "topic model*" OR "unstructured text" OR "unsupervised learning" OR "Vector Space Model" OR "VSM" OR "support vector machines" OR "naive bayes classifier" OR "association rules" OR "k-nearest neighbor" OR "neural networks" OR "decision trees" ) )) AND TS=(patent OR patents )
	#2	*TS=(( "patent analy" OR "patent mining" OR "patent cluster" OR "patent map" OR "patent roadmap" OR "patent network" OR "patent visualization" OR "patent* visualisation" OR patentometric* OR "patent classification*" OR "patent retrieval" ))
	#1 OR #2	Limited to: Article, Conference paper

Public workspaceText mining approaches applied to patents: A scoping review protocol

Text mining approaches applied to patents: A scoping review protocol