May 9, 2025: CAS Registry and Derwent Chemistry Resource (DCR) Extended by the International Chemical Identifier (InChI)

  • Updated
Download Icon Download

The International Chemical Identifier, InChI, provides textually encoded molecular information to facilitate the search for substances. InChI, also referred to as InChI codes, and the InChIKeys generated from it are now available for the vast majority of CAS Registry and DCR substance records.

InChIKeys are widely used in cheminformatics and suitable for cross-database searches and also for searching freely available sources via web browsers. In STNext, InChIKeys can be used for merging substance information from different databases or for identifying duplicates in STNext multifile structural answer sets. InChIs have been implemented in the STN databases DCR, CAS Registry, REAXYSFILESUB, and PS to allow convenient transfer between the databases.

The InChI Principle

InChI is a structure-based unique chemical identifier, developed by IUPAC and the InChI Trust. The related software and algorithms are open source. The structures are converted into a unique, machine-readable character string that can then be used in printed and electronic data sources to represent, store and search for chemical structures.

An InChI is a text string of variable length composed of segments (layers) separated by delimiters. InChIKeys, are a condensed digital representation of the InChI code. They have a fixed length of 27 characters and were initially developed to facilitate web searches for chemical compounds.

The identifiers describe chemical substances in terms of layers of information — the core structure defined by atoms and their bond connectivity, tautomeric information, isotope information, stereochemistry, and electronic charge information.

Layers of the InChIKey String

2025-05-09-01.png

The following example shows InChI (INCH), InChIKey (INKY), and structure.

2025-05-09-04.png

STNext’s InChI Implementation

InChIs (INCH) and InChIKeys (INKY) are both custom display formats and part of CAS Registry and DCR’s ALL, IALL, MAX, IMAX, STD, and ISTD formats.

InChIKeys (INKY) are searchable and can be used with the SELECT, ANALYZE, and SORT commands. Right truncation or character masking is supported in the /INKY search field. INKY is also included in the basic index (/BI) and in the field availability (/FA) search.

The InChIKeys are indexed in the /INKY search field in three different lengths in order to be able to search for structures at different degrees of precision. The search for the first 14 characters retrieves all substances with the same atom skeleton and the answer set contains all stereoisomers, tautomers, isotopes and charged forms of the molecule. On the other hand, the search for the complete string only finds the explicit structure, for example, the exact isomer of a stereo compound.

Layers String Degree of Precision
The first 14 characters AAAAAAAAAAAAAA atoms and connectivity
The first 25 characters AAAAAAAAAAAAAA-BBBBBBBBFV atoms and connectivity, tautomeric and isotope information, stereochemistry
Entire string, 27 characters AAAAAAAAAAAAAA-BBBBBBBBFV-P atoms and connectivity, tautomeric and isotope information, stereochemistry, and electronic charge information

InChI codes (INCH) can be used in the STNext structure editor to generate structure diagrams.

InChIs can also be utilized in the STN databases CAS Registry, REAXYSFILESUB, and PS. In the PS and REAXYSFILESUB databases, only the complete 27-character InChIKey is currently indexed; in the medium term, as for DCR and REGISTRY, additional indexing of the 14- and 25-character strings is planned to enable searches with different degrees of accuracy.