Deep Knowledge Search

Enhancing search engine capabilities by implementing Deep Knowledge Search, which improves precision and accuracy in retrieving relevant, in-depth knowledge.

April 5, 2024 | Article

In the age of information, the act of searching on the internet spans various categories, including shopping deals, news or academic research. Despite advancements, current search engines like Google face significant challenges in delivering perfectly relevant results. These issues include ambiguity with generic keywords, limitations with overly specific keywords, inconsistent variations in search results, and difficulty isolating high-quality content. Traditional search engines primarily focus on retrieving whole documents, which may not always serve the nuanced needs of researchers seeking specific knowledge phrases. This distinction highlights the difference between publication search, which aims to retrieve full-length documents, and knowledge search, which focuses on extracting specific terminologies and concepts from these publications.  

This white paper explores the critical issues faced by researchers in obtaining high-quality search results and introduces an innovative solution in the form of Deep Knowledge Search designed to enhance the efficiency of knowledge discovery. 

Introducing Deep Knowledge Search presents a transformative approach to addressing these problems. Deep knowledge Search extracts and presents highly relevant knowledge phrases from a vast corpus of publications. This method not only enhances the precision of search results but also significantly accelerates the research process by providing researchers with comprehensive key vocabulary relevant to their queries. This innovation offers a practical solution to the limitations of current search engines, facilitating faster and more efficient knowledge acquisition and application. 

Problems with Search Results

 

Search engines strive to deliver relevant results based on the keywords provided in a search query. However, achieving perfect relevance in the publications shown in the search results is inherently complex. The efficacy of the results often depends on the specificity and choice of the keywords used. Several challenges emerge from this dynamic: 

  1. Ambiguity with Generic Keywords: Broad or generic keywords can result in a deluge of results, many of which might be irrelevant. For example, searching for “apple” can yield results related to the fruit, the technology company, or even references in literature. 
  2. Limitations with Specific Keywords: On the other hand, very specific keywords can either pinpoint exactly what the user is looking for or result in very few or no results at all, especially if the phrasing doesn’t match indexed content. 
  3. Inconsistent Variation: Merely tweaking or changing a few words in the query might not lead to significantly varied results. The core of the content deemed relevant by the search algorithm might remain consistent. 
  4. Overlap in Slight Variations: Two slightly different search queries might yield a significant number of common results, making it hard for users to explore different aspects of a topic. 
  5. Keyword Selection Challenges: Crafting the optimal search query is both an art and a science. It can be daunting for many users, leading to inefficient searches and increased search time. 
  6. Display Limitations: Despite a search engine indicating that there are millions, if not billions, of results, users might only be able to view or scroll through a limited number, often capped at a few hundred. 
  7. Result Redundancy: A considerable number of results might be strikingly similar, if not near duplicates, reducing the diversity of information presented to the user. 
  8. Semantic Similarity: Different web pages might offer content that, while worded differently, is semantically very similar, leading to repetitive information. 
  9. Difficulty Isolating Quality Content: Amongst the sea of search results, only a handful, perhaps 5 to 20, might be of genuine quality and high relevance. Sifting through to find these gems can be laborious for users. 
  10. Varying Domains and Relevance Levels: Results might span across different domains or levels of complexity, which might not align with the user’s intent. For instance, a high school student looking for basic biology information might be overwhelmed with advanced, collegiate-level articles. 
  11. Popularity Over Precision: Often, search results are ranked based on popularity metrics. While this can sometimes yield useful results, it might also prioritize generic, broad-level content over niche, precise information, especially when the search query is generic. 

We have demonstrated in our patent application in the section An Example of Current Results Quality that quality of search results in terms of list of publications are not good enough from the researchers’ perspective.   We are providing the details of that example below with additional details. 

An Example of Current Results Quality

 

For example, when we search for “time series” in Google, it shows that there are huge numbers (about 8.28 billion (about 262 and a half years) of results as shown below. 

Even though such a huge number is shown, a user can see around 200 results at max scrolling through the search pages. 

Further, most results are based on the following areas. 

  1. Time series definition 
  2. Time series analysis 
  3. Time series forecasting 
  4. Time series examples 
  5. Guide to time series 
  6. Recommendations about vocabulary for additional search as shown below 

As seen above, all the results are very generic, introductory or beginner level areas.   These results are not very useful for the researcher to pursue in terms of expanding knowledge at a faster paceDiscovering new vocabulary in the domain is still a challenging, cumbersome and a time-consuming taskOne must manually go through each search result, spend time identifying useful vocabulary. 

Researchers Search Needs

 

We are using researchers in broad terms.  Researchers can mean any of the following. 

  1. scientists involved in research activities  
  2. students who are exploring the quest for knowledge on the internet,  
  3. software developers who are interested in developing new technologies 
  4. Tech support engineers 
  5. innovators who are interested in developing new solutions to real life business problems etc. 

When researchers search, they would like to get results quality results that can help them to: 

  1. Expand their knowledge. 
  2. Utilize the existing knowledge as much as possible to quickly develop solutions to research problems. 
  3. Avoid rediscovering existing knowledge through expensive and time taking research work. 
  4. Improving chances of success in developing right solutions.  
  5. Reducing the risks of failed outcomes that are inherent in research efforts. 
  6. Help them to become experts in a knowledge area instead of remaining beginners. 

      As we are moving from industrial society to knowledge-based society, the importance of quality search results has become very important.   

    https://scholar.google.com is a search engine that returns results from published papers in journals or conferences only instead results from all sundry websites that contains details related to the search query. These results are mostly useful for academicians involved in research.   Google scholar may not be very useful for the broad set of researchers we defined in the earlier section. Even for academic researchers also, the results may not be satisfactory as they are highly dependent on search query. To get good quality results, researchers may go through many papers spending many weeks to improve the search queries to get better results that are relevant to them. 

    Difference between Publications Search vs Knowledge Search

     

    Searching for relevant publications and searching for relevant knowledge phrases from the same from a publication corpus are different text mining tasks, each with its own methodologies and goals. Below are differences between the two tasks.  

     

    Serial No.  Searching Set of Publications  Searching Set of Knowledge Phrases 
    1. When we search for publications, we typically are trying to retrieve whole documents such as papers, articles, and reports that are most relevant to a given query   When searching for knowledge phrases, we are trying to extract specific terminologies, concepts, or nuggets of information present in the publication corpus. 
    2 The result of the search is a list of full-length articles or papers that you then need to read, skim, or analyze further to extract pertinent information.  The result of the search is a collection of phrases that encapsulate the main ideas or knowledge components in the corpus. 
    3 Searching for relevant publications often involves search algorithms that factor in metadata (like titles, abstracts, authors, citations) and full-text content.  Extraction of key phrases involve natural language processing (NLP) and text mining techniques, such as tokenization, part-of-speech tagging, named entity recognition, and more
    4 The goal is to find complete works that offer comprehensive insights, methodologies, results, or discussions about a particular subject.  The aim is to quickly identify and perhaps aggregate the main concepts, technologies, or findings across multiple publications. 

     

     

    Our Innovation Ideas for Researchers’ Search for Knowledge

     

    Our idea is to provide comprehensive key vocabulary in the area a researcher is searching.   The key vocabulary can in turn be used to search for highly relevant results.  For example, if we use the “time series” in the search query and in return if we get the following key vocabulary then the researchers can enhance their knowledge very much. 

    Univariate Time Series  Multivariate Time Series Time Series Forecasting 
    Components of time Series  Time Series Analysis Time Series Classification 
    Time Series Clustering  Time Series Anomaly Detection  High Dimensional Time Series 
    Bayesian Time Series  Financial Time Series  Probabilistic Time Series Forecasting 
    Stochastic Time Series  Nonlinear Time Series  Medical Time Series 

     

    The above kind of vocabulary is very helpful to not only gain good knowledge in the domain but also help the researcher in coming up with the accelerated development of solutions to complex real-life problems. 

    To our best understanding, there is no solution available either in the market or in academic research papers that can provide the above kind of comprehensive vocabulary in the domain of query search. 

    Presenting Knowledge Phrases

     

    We have developed the full solution based on our innovation which provides highly relevant knowledge phrases based on user search query.  The solution presents the knowledge phrases as follows.  

    1. Display as a word cloud the 100+ top scored knowledge phrases. (Figure 1) 
    2. Font sizes to display the word cloud are based on the scores of the knowledge phrases.  (Figure 1) 
    3. Display 200+ top scored knowledge phrases in tabular (Figure 1) 
    4. The knowledge phrases are in the descending order of the scores in the table. (Figure 1) 
    5. Each knowledge phrase in the table is clickable. (Figure 1) 
    6. Clicking a knowledge phrase in the table results in searching Google and displaying results in a separate browser tab (Figure 2). 

    Search Results of a Knowledge Phrase

     

    Demonstration of the Innovation to the Researchers

     

    We believe that this innovation is highly useful to researchers. We have demonstrated the usefulness of our innovation by extracting knowledge phrases based on 350+ thousand research papers in the Deep Learning areaBelow are the results for various search queries relevant in the Deep Learning area.

    About Neural Networks

    About CNN

    About Classification 

    About Clustering

    About Time Series

    About Text Classification

    About generative models

    About Language Models