Explore Similar Documents

  • Updated
Download Icon Download

Explore Similar Documents uses machine-learning algorithms and natural-language processing to suggest patent and non-patent literature references relevant to your topic of interest.

  1. Select the Include Text Types option(s). Tip: Claims is selected by default; select Abstract as well for the most comprehensive results.
  2. In each field, enter phrases, sentences, or whole paragraphs describing key concepts, functionality, or value associated with your topic of interest.

      • The entry fields require a minimum of 200 characters with a maximum of 10,000 for Claims and 5000 for Abstract. The average length for a claim is between 200 and 500 characters.
      • All user-entered text is handled as plain text.
      • Using line breaks does not affect similarity processing.
  3. If needed, select a Priority Date. Note: The default is the current date to ensure the broadest timeline. To limit the time period for prior art analysis, select a different date in the past.
  4. Click the Get Similar Documents button to view and explore the results.



How does similar documents work?

This feature processes your input text and uses four different algorithm streams to find similar patents and non-patent literature relevant to your text. It also ranks these documents based on how similar they are to your text. This technology works very similarly to our get prior art analysis feature which can be launched from a specific patent number. In this case, you’ll be supplying the abstract and/or claims that are the basis for the analysis.

Will similar documents record my search text to improve itself?

No. Our algorithms are trained on the millions of documents in our corpus. We do not use user search text to inform the technology. Your data will be kept secure just like any search queries that are entered into the command line in STNext.

Can I get results for any text input?

As long as you supply the minimum amount of text, you will get a set of matching documents. The more specific and complete your description, the better your set of matching documents.

How many results will I get?

You will get up to 100 patent and 100 journal documents for your similarity analysis. These will be presented as two lists (one of patents and one for journals) of ranked results from most relevant to least relevant.

How is text treated differently in the abstract and claims fields? If you enter text into the claims section, will it only be searched against the claims of other documents?

The text that you use in abstract and claims will be treated like text of a draft patent. Using keyword analysis and natural language processing, the similarity engine processes each field of text and evaluates the overall meaning of the text you have entered. It then uses this information to search among our corpus of known documents to find the best matches for the concepts you have described.

What does the highlighting mean?

The highlighting seen on the far right of the viewer screen displays distinctive terms that the artificial intelligence engine has identified from your input text. These are not the same as hit highlighting in a traditional STNext search, but they do provide insight into why these documents have been selected to match your input text.

What is the corpus that is being searched? When is this updated?

Documents that are part of the CAPlus corpus are available to be matched with your text. These include patents, journal articles, reviews, etc. These documents are scheduled to be updated monthly in 2023.

How does the priority date affect my analysis?

The priority date acts as a filter. If you want to see anything that has been published that may be relevant to your text, set the priority date to today. If you only want to see results before a certain date, set the priority date to that date. Any results relevant to your analysis with priority dates after that date will not be shown in your list.

Why do I see different results when I run the same analysis later?

Our corpus is updated regularly. If you run an analysis on a given day, the next day you could see different results from the same analysis because new documents have been added to the pool of potential matches. You will get up to 100 patent and 100 journal documents for your similarity analysis. If a new document is deemed to be a better match than some of your results from the past, a result from the bottom of your list will be removed to get 200 total documents.

What are the details that I am seeing in the right-hand panel? Where is this data coming from?

The information in the right-hand panel is intended to provide you with a preview of information (title, abstract, claims, dates, accession numbers) about the potentially similar document. It is pulled from multiple sources including but not limited to CAS indexed data. For authoritative CAPlus data on a particular document, please select that document for crossover into the CAPlus database of your choice in your session window.

Sometimes when I do a crossover, the document titles in CAPlus are different from what I saw in the viewer? Why is this?

The enhanced CAPlus titles that you may be accustomed to are not yet available in the viewer. Accession numbers will match between the viewer and CAPlus.

Can I enter my command line query?

Yes, but it is not recommended as it may not produce optimal results because all entered characters are read as plain text. If you have specific query terms that describe a concept, it may be productive to write a few sentences with those terms and use this as your analysis input.

Can I enter non-English text?

Only English is supported at this time.

Can I copy and paste text into the fields?

Yes. The pasted content will be truncated at 10,000 for Claims and 5000 for Abstract.

What if I have a patent number?

If you have a specific patent to use as a basis for similarity, we recommend retrieving that patent using the command line and then using the Get Prior Art Analysis menu option for that patent ID.

What if I have a CAS Registry Number?

CAS Registry Numbers (RNs) can be used as part of the text entries, but the RN will be used as plain text. If you know the name of a key substance, we recommend including the chemical name (and/or trade name, brand name, generic name) as well. Note: Get Similar Documents finds CAS RNs if they are written out in other documents; however, they are not recogized as being “indexed” in user-entered text and matched up with other indexed entities (chemical names, brand names, trade names, etc.).

What if I don’t care about Prior Art Date?

If you do not wish to specify a prior art date, leaving the default date (i.e., the current date) will consider all possible documents contained in CAplus.