XPATH Examples for STNext XML Reports

  • Updated
Download Icon Download

Overview

STNext now provides XML report output. XML (eXtensible Markup Language) is an industry standard format for representing data with well-established conventions for organizing, searching, and sharing content. XML is a common input format for post processing and analysis tools like Knime, Pipeline Pilot, or electronic lab notebooks.

In STNext, any piece of content that could previously be represented in a standard, enhanced, or table report can now be represented as a data element in an XML report:

_STNext-XPATH-1.png

_STNext-XPATH-2.png

XML Data Elements

In STNext, all XML data elements (Corporate Source, Document Language, Chemical Names, etc.) will contain at least three fundamental pieces of information:

A field-code tag corresponding with the display field of a given answer.

A field-description tag describing the type of content being represented.

A value-tag containing the actual data from STNext.

Example:

<fieldCode>LA</fieldCode>

<fieldDescription>Language</fieldDescription>

<values>

<value> Japanese

</value>

XML Tags and XPath Expressions

A benefit of XML is the ability to navigate and isolate specific pieces of information using XPath expressions (XML Path Language). XPath expressions allow for easy navigation of XML files in the same way URLs allow for easy navigation of websites.

Field-Description Tags and XPath Expressions

STNext’s field detection and allow for easy identification of data elements across files. For example, the field-description tag “Patent Assignee/Corporate Source” can be used to identify common categories of content even when the field-code tags are different.

MEDLINE

<fieldCode>CS</fieldCode>

<fieldDescription>Patent Assignee/Corporate Source</fieldDescription>

<values>

<value> Department of Bio-Medical Engineering, School of Engineering, Tokai University, Isehara, Kanagawa, Japan.

</value>

CAplus

<fieldCode>PA</fieldCode>

<fieldDescription>Patent Assignee/Corporate Source</fieldDescription>

<values>

<value> Toray Research Center Inc., Japan

</value>

Field-Code Tags and XPath Expression Examples

Field-code tags are less wieldy than field-description tags and are effective for exploring content within one specific database or when common identifiers are used across all databases (for example patent identifiers like PRAI, PPPI, etc.)

The examples provided below further illustrate how to work with different types of XML output from STNext using XPath expressions:

Example 1: Corporate Sources

Query used to generate report:

  • Fil cap chemcat; s aldrich?/co; D 1 5000 CO
  • Fil cap medline biosis embase; s "Department of Integrative Bioscience and Biomedical Engineering, Graduate School of Science and engineering, Waseda University"?/cs; d 1-9 cs

Information desired:

Extract all the corporate sources from the XML report across different content files.

XPATH

/stnAnswers/answer/fields/field[fieldDescription="Patent Assignee/Corporate Source"]/values/value

Result

Element='<value>Department of Integrative Bioscience and Biomedical Engineering, Graduate
                        School of Science and Engineering, Waseda University, Tokyo, 162-8480,
                        Japan</value>'
Element='<value>Department of Integrative Bioscience and Biomedical Engineering, Graduate
                        School of Science and Engineering, Waseda University, Honjo, Saitama,
                        367-0035, Japan</value>'
Element='<value>Department of Integrative Bioscience and Biomedical Engineering, Graduate
                        School of Science and Engineering, Waseda University, Saitama, 367-0035,
                        Japan</value>'
Element='<value>   Department of Computational Biology and Medical Sciences, Graduate School
     of Frontier Sciences, The University of Tokyo, 5-1-5, Kashiwanoha,
     Kashiwa-shi, Chiba 277-8562, Japan. ; Department of Life Science, Rikkyo
     University, Tokyo, 171-8501, Japan. ; Department of Integrative
                        Bioscience and Biomedical Engineering, Graduate School of Science and
                        Engineering, Waseda University, Tokyo, Shinjuku 162-8480, Japan.
</value>'
Element='<value>   Department of Computational Biology and Medical Sciences, Graduate School
     of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, 277-8562,
     Japan. ; Department of Molecular Biophysics and Biochemistry, Yale
     University, New Haven, CT, 06520, USA. ; Laboratory for Cell-Free Protein
     Synthesis, RIKEN Center for Biosystems Dynamics Research (BDR), Suita,
     Osaka, 565-0874, Japan. <link href="mailto:yshimizu@riken.jp">mailto:yshimizu@riken.jp</link>; Department of Chemistry and
     Biomolecular Science, Faculty of Engineering, Gifu University, Gifu,
     501-1193, Japan. ; Department of Integrative Bioscience and Biomedical
                        Engineering, Graduate School of Science and Engineering, Waseda
                        University, Tokyo, Shinjuku, 162-8480, Japan.
 
                    </value>'
Element='<value>   Laboratory for Cell-Free Protein Synthesis, RIKEN Center for Biosystems
     Dynamics Research (BDR), Suita, Osaka, 565-0874, Japan. <link href="mailto:yshimizu@riken.jp">
                        mailto:yshimizu@riken.jp</link>;
     Department of Computational Biology and Medical Sciences, Graduate School
     of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, 277-8562,
     Japan. ; Department of Molecular Biophysics and Biochemistry, Yale
     University, New Haven, CT, 06520, USA. ; GeneFrontier Corporation,
     Kashiwa, Chiba, 277-0005, Japan. ; Department of Integrative Bioscience
                        and Biomedical Engineering, Graduate School of Science and Engineering,
                        Waseda University, Shinjuku, Tokyo, 162-8480, Japan.
 
                    </value>'
Element='<value>Department of Integrative Bioscience and Biomedical Engineering, Graduate
                        School of Science and Engineering, Waseda University, 2-2 Wakamatsu-cho,
                        Shinjuku-ku, Tokyo 162-8480, Japan. ; Department of Environmental and
     Life Sciences, Toyohashi University of Technology, Tempaku-cho, Toyohashi,
     Aichi 441-8580, Japan.
</value>'
Element='<value>Department of Integrative Bioscience and Biomedical Engineering, Graduate
                        School of Science and Engineering, Waseda University, Waseda Research
                        Park, Honjo, Saitama 367-0035, Japan. 
                        <link href="mailto:s0830258@ipe.tsukuba.ac.jp">mailto:s0830258@ipe.tsukuba.ac.jp</link>
                    </value>'
Element='<value>Department of Integrative Bioscience and Biomedical Engineering, Graduate


Note: To just get the text without the <value> tags, add the text() method to the XPATH:

/stnAnswers/answer/fields/field[fieldDescription="Patent Assignee/Corporate Source"]/values/value/text()

Back to Examples

Example 2: Single-Component Structures

Query used to generate report:

Fil reg chemcat; s 14028-44-5/rn; d 1-4

Information desired:

Extract all the single component structure images from different content files.

XPATH

//stnAnswers/answer/fields/field[fieldCode/text()="STR"]/values/value/image

To get just single component images:

/stnAnswers/answer/fields/field/values/value/image[starts-with(text(), 'image')]

Result

Element='<image type="file">image1.png</image>'
Element='<image type="file">image2.jpg</image>'
Element='<image type="file">image3.jpg</image>'
Element='<image type="file">image4.jpg</image>'

Back to Examples

Example 3: PatentPak

Query used to generate report:

Fil cap; s PPAK/FA AND PDF/FA AND RESINS AND POLYMER; D 1-5

Information desired:

Extract all PatentPak links.

XPATH

/stnAnswers/answer/fields/field[fieldDescription="PatentPak Patent Information"]/values/value/link

Result

<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvmJ7BnpTB5hD6P3tJwuzsSfVS0P1h3bHjhGRKO4CFIF2gn2kxiL4_40YQFENEEhizdQl0_hiVB46-cOBoLYbJZP/pdf/full?v=1&amp;s=PT&amp;f=528&amp;n=C">PDF</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvmJ7BnpTB5hD6P3tJwuzsSfVS0P1h3bHjhGRKO4CFIF2gn2kxiL4_40YQFENEEhizdQl0_hiVB46-cOBoLYbJZP/pdf/marked-full?v=1&amp;s=PT&amp;f=528&amp;n=C">PDF+</link>
<link href="https://patentpak-test.cas.org/STN/patents/viewer?d=c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvmJ7BnpTB5hD6P3tJwuzsSfVS0P1h3bHjhGRKO4CFIF2gn2kxiL4_40YQFENEEhizdQl0_hiVB46-cOBoLYbJZP&amp;v=1&amp;s=PT&amp;f=528&amp;n=C">Interactive</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvlu-2IiXlTNbxDX7cVyrlJckCQSFqsR12C4_meqUQc0d7V4911sHngRgxllDlnZaF-ajUH2-hrFT5chBFGE5lBO/pdf/full?v=1&amp;s=PT&amp;f=528&amp;n=C">PDF</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvlu-2IiXlTNbxDX7cVyrlJckCQSFqsR12C4_meqUQc0d7V4911sHngRgxllDlnZaF-ajUH2-hrFT5chBFGE5lBO/pdf/marked-full?v=1&amp;s=PT&amp;f=528&amp;n=C">PDF+</link>
<link href="https://patentpak-test.cas.org/STN/patents/viewer?d=c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvlu-2IiXlTNbxDX7cVyrlJckCQSFqsR12C4_meqUQc0d7V4911sHngRgxllDlnZaF-ajUH2-hrFT5chBFGE5lBO&amp;v=1&amp;s=PT&amp;f=528&amp;n=C">Interactive</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvnw1pxzRRA5u5HoDRcwQA-b9PaQHrMNPkl68Dbu5aRlc2ripwhDMdGJLBFmHMWjcO3mk9NzGPrTrYXi3FtVrjmc/pdf/full?v=1&amp;s=PT&amp;f=528&amp;n=C">PDF</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvnw1pxzRRA5u5HoDRcwQA-b9PaQHrMNPkl68Dbu5aRlc2ripwhDMdGJLBFmHMWjcO3mk9NzGPrTrYXi3FtVrjmc/pdf/marked-full?v=1&amp;s=PT&amp;f=528&amp;n=C">PDF+</link>
<link href="https://patentpak-test.cas.org/STN/patents/viewer?d=c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvnw1pxzRRA5u5HoDRcwQA-b9PaQHrMNPkl68Dbu5aRlc2ripwhDMdGJLBFmHMWjcO3mk9NzGPrTrYXi3FtVrjmc&amp;v=1&amp;s=PT&amp;f=528&amp;n=C">Interactive</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvniXGCrH2fnQB0zoSAeAJMbD8enl9yZgsPGKmDbvxi2RbijMLHFHxa3m7yPA2z8qWcXMumBTGKyPVS8HTsOZfdx/pdf/full?v=1&amp;s=PT&amp;f=528&amp;n=C">PDF</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvniXGCrH2fnQB0zoSAeAJMbD8enl9yZgsPGKmDbvxi2RbijMLHFHxa3m7yPA2z8qWcXMumBTGKyPVS8HTsOZfdx/pdf/marked-full?v=1&amp;s=PT&amp;f=528&amp;n=C">PDF+</link>
<link href="https://patentpak-test.cas.org/STN/patents/viewer?d=c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvniXGCrH2fnQB0zoSAeAJMbD8enl9yZgsPGKmDbvxi2RbijMLHFHxa3m7yPA2z8qWcXMumBTGKyPVS8HTsOZfdx&amp;v=1&amp;s=PT&amp;f=528&amp;n=C">Interactive</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvk6eZOK4W6hBf19Ykt_mdbL/pdf/marked-full?v=1&amp;s=PT&amp;f=528&amp;n=C">PDF+</link>
<link href="https://patentpak-test.cas.org/STN/patents/viewer?d=c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvk6eZOK4W6hBf19Ykt_mdbL&amp;v=1&amp;s=PT&amp;f=528&amp;n=C">Interactive</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvlvnQZiNyj8ghzK-kLyWlA9cpwfhGMvvGTdb67B-kPo4XsPP5QOpY-F5yGumJzncFeUarNxtEey6Jm7qmbm3BgM/pdf/full?v=1&amp;s=PT&amp;f=528&amp;n=C">PDF</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvlvnQZiNyj8ghzK-kLyWlA9cpwfhGMvvGTdb67B-kPo4XsPP5QOpY-F5yGumJzncFeUarNxtEey6Jm7qmbm3BgM/pdf/marked-full?v=1&amp;s=PT&amp;f=528&amp;n=C">
<link href="https://patentpak-test.cas.org/STN/patents/viewer?d=c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvlvnQZiNyj8ghzK-kLyWlA9cpwfhGMvvGTdb67B-kPo4XsPP5QOpY-F5yGumJzncFeUarNxtEey6Jm7qmbm3BgM&amp;v=1&amp;s=PT&amp;f=528&amp;n=C">Interactive</link>


Back to Examples

Example 4: Full-Text Links

Query used to generate report:

Fil medline caplus toxcenter; s 10.1001/2012.JAMA.10368/DOI; D 1-3

Information desired:

Extract all full-text links from the XML report across databases.

XPATH

/stnAnswers/answer/fields/field/values/value/link[contains(text(), 'Full-text')]

Note: This Xpath expression uses the text inside the link element to distinguish between full-text links and additional links that may be present in a report (PatentPak, etc.)

Result

<link href="http://chemport-test.cas.org/cgi-bin/cp_sdcgi?vUHEWsjdgD3EFvU51tSZMIzHNq5@IiJhbc3JFm_AZ75OGdnbm50YLwWjvK5UtjMYPnUkcAnuAI1RPW7yGePFIEU@DWdDn_iy88ydril1UtMAH_n3f4WLA7tEUBN65bcqaqO_yWM1YFUWLHGxNnPYLzV5I3l@hUGsX7QdK3Gw6L6_">Full-text</link>
<link href="http://chemport-test.cas.org/cgi-bin/cp_sdcgi?7aQYaOYPTYzl_h7myF1AkdxQsV5wRAkWAImzc0MZYyJbaeEVHMmrowJYOM573GTLNzVTneLvT8rcG77S9@cuiFLrr5se06wjCfjzyeuCW17X4cH@zmSqiZClOF6ZzDGhO3pibWyXX1VG7IXFB7sjVjYJ1JIPaPTxPHtyQeraFw0Y0B">Full-text</link>
<link href="http://chemport-test.cas.org/cgi-bin/cp_sdcgi?Xckqpo2tFTZzs8QmJTJszKOkTkkqgcmF8_FM@hoguxYjmRyqzWzehaG2awTQpqfiIsvwCL4ZSzN_zfrEnDYTppmo6zXRU0RYMc1N0kFRn1h8QS5rtpsFNiK9xEepCHGOhssMJ0j1F5i800QXIWWJTC4pEaUYoooqXpauyxgRcPHaoEoO">Full-text</link>


Back to Examples

Example 5: Patent Family Information

Query used to generate report:

Fil cap wpindex inpadoc; s WO 2020055170/Pn; d 1-3

Information desired:

Extract all patent family information from the XML report.

XPATH

/stnAnswers/answer/fields/field[fieldCode="PPPI"]/values/value

Result

Element='<value>

     PATENT NO.          KIND  DATE      LANGUAGE   PatentPak

     ---------------     ----  --------  ---------- ------------------------

     WO 2020055170        A1   20200319  Korean     <link href="https://patentpak-test.cas.org/STN/patents/IqbX-J-73M7elE2noMeePXP6opC5sETtd1-kv78eXDJERGt1MdiSRbRsT9xpRxG_NUK3HJMff2A7SkuMxzd2Yw2jORML1dsX9y8t9rS8QIq1sR99XL5sKKVKUlIihuek/pdf/full?v=1&amp;s=PT&amp;f=528&amp;n=C">PDF</link> | <link href="https://patentpak-test.cas.org/STN/patents/IqbX-J-73M7elE2noMeePXP6opC5sETtd1-kv78eXDJERGt1MdiSRbRsT9xpRxG_NUK3HJMff2A7SkuMxzd2Yw2jORML1dsX9y8t9rS8QIq1sR99XL5sKKVKUlIihuek/pdf/marked-full?v=1&amp;s=PT&amp;f=528&amp;n=C">PDF+</link> | <link href="https://patentpak-test.cas.org/STN/patents/viewer?d=IqbX-J-73M7elE2noMeePXP6opC5sETtd1-kv78eXDJERGt1MdiSRbRsT9xpRxG_NUK3HJMff2A7SkuMxzd2Yw2jORML1dsX9y8t9rS8QIq1sR99XL5sKKVKUlIihuek&amp;v=1&amp;s=PT&amp;f=528&amp;n=C">Interactive</link>

     KR 2020030469        A    20200320  Korean     <link href="https://patentpak-test.cas.org/STN/patents/IqbX-J-73M7elE2noMeePXP6opC5sETtd1-kv78eXDISgRID1cXE8dsqeyDrUSQuCmCpOhCiGzvGfdJ9VJnSpl2TuiuKa77AWGafbniR7QscqWsJsmd6U67_l4g2LPgx/pdf/full?v=1&amp;s=PT&amp;f=528&amp;n=C">PDF</link>

     CN 112805002         A    20210514  Chinese    <link href="https://patentpak-test.cas.org/STN/patents/IqbX-J-73M7elE2noMeePXP6opC5sETtd1-kv78eXDJqQ2nDpsrjVuBvepWiU55DpgcL5jFz_U3joqa5dUeG3k39m4grjygfNsU2SsH11X6qAqB8afrHUd-USng-l-tR/pdf/full?v=1&amp;s=PT&amp;f=528&amp;n=C">PDF</link>

     EP 3827829           A1   20210602  English    <link href="https://patentpak-test.cas.org/STN/patents/IqbX-J-73M7elE2noMeePXP6opC5sETtd1-kv78eXDK-y3qbBeu0M7U0KMv_dCRAZ_m1mzGzhpLwI5oZqi-xs8Ub1_OsiEXdG7cHgdJwOHuuDz5ts_PPfuD_MIhClidf/pdf/full?v=1&amp;s=PT&amp;f=528&amp;n=C">PDF</link>

</value>'

If we use the XPATH:

/stnAnswers/answer/fields/field[fieldCode="PPPI"]/values/value

We get the following result:

Text='
     PATENT NO.          KIND  DATE      LANGUAGE   PatentPak
     ---------------     ----  --------  ---------- ------------------------
     WO 2020055170        A1   20200319  Korean     '
Text=' | '
Text=' | '
Text='
     KR 2020030469        A    20200320  Korean     '
Text='
     CN 112805002         A    20210514  Chinese    '
Text='
     EP 3827829           A1   20210602  English    '

Back to Examples

Example 6: Chemical Names

Query used to generate report:

Fil reg chemcat; s phenol/cn; d 1-3

Information desired:

Extract chemical names different content files.

XPATH

/stnAnswers/answer/fields/field[fieldCode="CN"]/values/value/text()

Result

Text='Phenol  (CA INDEX NAME)

OTHER NAMES:

'

Text='   2-Allphenol

'

Text='   Benzenol

'

Text='   Carbolic acid

'

Text='   ENT 1814

'

Text='   Hydroxybenzene

'

Text='   Monohydroxybenzene

'

Text='   Monophenol

'

Text='   NSC 36808

'

Text='   Oxybenzene

'

Text='   Phenic acid

'

Text='   Phenyl alcohol

'

Text='   Phenyl hydrate

'

Text='   Phenyl hydroxide

'

Text='   Phenylic acid

'

Text='   Phenylic alcohol

'

If we use the XPATH:

/stnAnswers/answer/fields/field[fieldCode="CN"]/values/value

We get the following result:

Element='<value>Phenol  (CA INDEX NAME)

OTHER NAMES:

</value>'

Element='<value>   2-Allphenol

</value>'

Element='<value>   Benzenol

</value>'

Element='<value>   Carbolic acid

</value>'

Element='<value>   ENT 1814

</value>'

Element='<value>   Hydroxybenzene

</value>'

Element='<value>   Monohydroxybenzene

</value>'

Element='<value>   Monophenol

</value>'

Element='<value>   NSC 36808

</value>'

Element='<value>   Oxybenzene

</value>'

Element='<value>   Phenic acid

</value>'

Element='<value>   Phenyl alcohol

</value>'

Element='<value>   Phenyl hydrate

</value>'

Back to Examples