Overview
STNext now provides XML report output. XML (eXtensible Markup Language) is an industry standard format for representing data with well-established conventions for organizing, searching, and sharing content. XML is a common input format for post processing and analysis tools like Knime, Pipeline Pilot, or electronic lab notebooks.
In STNext, any piece of content that could previously be represented in a standard, enhanced, or table report can now be represented as a data element in an XML report:
XML Data Elements
In STNext, all XML data elements (Corporate Source, Document Language, Chemical Names, etc.) will contain at least three fundamental pieces of information:
A field-code tag corresponding with the display field of a given answer.
A field-description tag describing the type of content being represented.
A value-tag containing the actual data from STNext.
Example:
<fieldCode>LA</fieldCode>
<fieldDescription>Language</fieldDescription>
<values>
<value> Japanese
</value>
XML Tags and XPath Expressions
A benefit of XML is the ability to navigate and isolate specific pieces of information using XPath expressions (XML Path Language). XPath expressions allow for easy navigation of XML files in the same way URLs allow for easy navigation of websites.
Field-Description Tags and XPath Expressions
STNext’s field detection and allow for easy identification of data elements across files. For example, the field-description tag “Patent Assignee/Corporate Source” can be used to identify common categories of content even when the field-code tags are different.
MEDLINE
<fieldCode>CS</fieldCode>
<fieldDescription>Patent Assignee/Corporate Source</fieldDescription>
<values>
<value> Department of Bio-Medical Engineering, School of Engineering, Tokai University, Isehara, Kanagawa, Japan.
</value>
CAplus
<fieldCode>PA</fieldCode>
<fieldDescription>Patent Assignee/Corporate Source</fieldDescription>
<values>
<value> Toray Research Center Inc., Japan
</value>
Field-Code Tags and XPath Expression Examples
Field-code tags are less wieldy than field-description tags and are effective for exploring content within one specific database or when common identifiers are used across all databases (for example patent identifiers like PRAI, PPPI, etc.)
The examples provided below further illustrate how to work with different types of XML output from STNext using XPath expressions:
- Example 1: Corporate sources
- Example 2: Single-component structure images
- Example 3: PatentPak links
- Example 4: Full-text links
- Example 5: Patent family information
- Example 6: Chemical names
Example 1: Corporate Sources
Query used to generate report:
- Fil cap chemcat; s aldrich?/co; D 1 5000 CO
- Fil cap medline biosis embase; s "Department of Integrative Bioscience and Biomedical Engineering, Graduate School of Science and engineering, Waseda University"?/cs; d 1-9 cs
Information desired:
Extract all the corporate sources from the XML report across different content files.
XPATH
/stnAnswers/answer/fields/field[fieldDescription="Patent Assignee/Corporate Source"]/values/value
Result
Element='<value>Department of Integrative Bioscience and Biomedical Engineering, Graduate
School of Science and Engineering, Waseda University, Tokyo, 162-8480,
Japan</value>'
Element='<value>Department of Integrative Bioscience and Biomedical Engineering, Graduate
School of Science and Engineering, Waseda University, Honjo, Saitama,
367-0035, Japan</value>'
Element='<value>Department of Integrative Bioscience and Biomedical Engineering, Graduate
School of Science and Engineering, Waseda University, Saitama, 367-0035,
Japan</value>'
Element='<value> Department of Computational Biology and Medical Sciences, Graduate School
of Frontier Sciences, The University of Tokyo, 5-1-5, Kashiwanoha,
Kashiwa-shi, Chiba 277-8562, Japan. ; Department of Life Science, Rikkyo
University, Tokyo, 171-8501, Japan. ; Department of Integrative
Bioscience and Biomedical Engineering, Graduate School of Science and
Engineering, Waseda University, Tokyo, Shinjuku 162-8480, Japan.
</value>'
Element='<value> Department of Computational Biology and Medical Sciences, Graduate School
of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, 277-8562,
Japan. ; Department of Molecular Biophysics and Biochemistry, Yale
University, New Haven, CT, 06520, USA. ; Laboratory for Cell-Free Protein
Synthesis, RIKEN Center for Biosystems Dynamics Research (BDR), Suita,
Osaka, 565-0874, Japan. <link href="mailto:yshimizu@riken.jp">mailto:yshimizu@riken.jp</link>; Department of Chemistry and
Biomolecular Science, Faculty of Engineering, Gifu University, Gifu,
501-1193, Japan. ; Department of Integrative Bioscience and Biomedical
Engineering, Graduate School of Science and Engineering, Waseda
University, Tokyo, Shinjuku, 162-8480, Japan.
</value>'
Element='<value> Laboratory for Cell-Free Protein Synthesis, RIKEN Center for Biosystems
Dynamics Research (BDR), Suita, Osaka, 565-0874, Japan. <link href="mailto:yshimizu@riken.jp">
mailto:yshimizu@riken.jp</link>;
Department of Computational Biology and Medical Sciences, Graduate School
of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, 277-8562,
Japan. ; Department of Molecular Biophysics and Biochemistry, Yale
University, New Haven, CT, 06520, USA. ; GeneFrontier Corporation,
Kashiwa, Chiba, 277-0005, Japan. ; Department of Integrative Bioscience
and Biomedical Engineering, Graduate School of Science and Engineering,
Waseda University, Shinjuku, Tokyo, 162-8480, Japan.
</value>'
Element='<value>Department of Integrative Bioscience and Biomedical Engineering, Graduate
School of Science and Engineering, Waseda University, 2-2 Wakamatsu-cho,
Shinjuku-ku, Tokyo 162-8480, Japan. ; Department of Environmental and
Life Sciences, Toyohashi University of Technology, Tempaku-cho, Toyohashi,
Aichi 441-8580, Japan.
</value>'
Element='<value>Department of Integrative Bioscience and Biomedical Engineering, Graduate
School of Science and Engineering, Waseda University, Waseda Research
Park, Honjo, Saitama 367-0035, Japan.
<link href="mailto:s0830258@ipe.tsukuba.ac.jp">mailto:s0830258@ipe.tsukuba.ac.jp</link>
</value>'
Element='<value>Department of Integrative Bioscience and Biomedical Engineering, Graduate
Note: To just get the text without the <value> tags, add the text() method to the XPATH:
/stnAnswers/answer/fields/field[fieldDescription="Patent Assignee/Corporate Source"]/values/value/text()
Example 2: Single-Component Structures
Query used to generate report:
Fil reg chemcat; s 14028-44-5/rn; d 1-4
Information desired:
Extract all the single component structure images from different content files.
XPATH
//stnAnswers/answer/fields/field[fieldCode/text()="STR"]/values/value/image
To get just single component images:
/stnAnswers/answer/fields/field/values/value/image[starts-with(text(), 'image')]
Result
Element='<image type="file">image1.png</image>'
Element='<image type="file">image2.jpg</image>'
Element='<image type="file">image3.jpg</image>'
Element='<image type="file">image4.jpg</image>'
Example 3: PatentPak
Query used to generate report:
Fil cap; s PPAK/FA AND PDF/FA AND RESINS AND POLYMER; D 1-5
Information desired:
Extract all PatentPak links.
XPATH
/stnAnswers/answer/fields/field[fieldDescription="PatentPak Patent Information"]/values/value/link
Result
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvmJ7BnpTB5hD6P3tJwuzsSfVS0P1h3bHjhGRKO4CFIF2gn2kxiL4_40YQFENEEhizdQl0_hiVB46-cOBoLYbJZP/pdf/full?v=1&s=PT&f=528&n=C">PDF</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvmJ7BnpTB5hD6P3tJwuzsSfVS0P1h3bHjhGRKO4CFIF2gn2kxiL4_40YQFENEEhizdQl0_hiVB46-cOBoLYbJZP/pdf/marked-full?v=1&s=PT&f=528&n=C">PDF+</link>
<link href="https://patentpak-test.cas.org/STN/patents/viewer?d=c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvmJ7BnpTB5hD6P3tJwuzsSfVS0P1h3bHjhGRKO4CFIF2gn2kxiL4_40YQFENEEhizdQl0_hiVB46-cOBoLYbJZP&v=1&s=PT&f=528&n=C">Interactive</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvlu-2IiXlTNbxDX7cVyrlJckCQSFqsR12C4_meqUQc0d7V4911sHngRgxllDlnZaF-ajUH2-hrFT5chBFGE5lBO/pdf/full?v=1&s=PT&f=528&n=C">PDF</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvlu-2IiXlTNbxDX7cVyrlJckCQSFqsR12C4_meqUQc0d7V4911sHngRgxllDlnZaF-ajUH2-hrFT5chBFGE5lBO/pdf/marked-full?v=1&s=PT&f=528&n=C">PDF+</link>
<link href="https://patentpak-test.cas.org/STN/patents/viewer?d=c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvlu-2IiXlTNbxDX7cVyrlJckCQSFqsR12C4_meqUQc0d7V4911sHngRgxllDlnZaF-ajUH2-hrFT5chBFGE5lBO&v=1&s=PT&f=528&n=C">Interactive</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvnw1pxzRRA5u5HoDRcwQA-b9PaQHrMNPkl68Dbu5aRlc2ripwhDMdGJLBFmHMWjcO3mk9NzGPrTrYXi3FtVrjmc/pdf/full?v=1&s=PT&f=528&n=C">PDF</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvnw1pxzRRA5u5HoDRcwQA-b9PaQHrMNPkl68Dbu5aRlc2ripwhDMdGJLBFmHMWjcO3mk9NzGPrTrYXi3FtVrjmc/pdf/marked-full?v=1&s=PT&f=528&n=C">PDF+</link>
<link href="https://patentpak-test.cas.org/STN/patents/viewer?d=c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvnw1pxzRRA5u5HoDRcwQA-b9PaQHrMNPkl68Dbu5aRlc2ripwhDMdGJLBFmHMWjcO3mk9NzGPrTrYXi3FtVrjmc&v=1&s=PT&f=528&n=C">Interactive</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvniXGCrH2fnQB0zoSAeAJMbD8enl9yZgsPGKmDbvxi2RbijMLHFHxa3m7yPA2z8qWcXMumBTGKyPVS8HTsOZfdx/pdf/full?v=1&s=PT&f=528&n=C">PDF</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvniXGCrH2fnQB0zoSAeAJMbD8enl9yZgsPGKmDbvxi2RbijMLHFHxa3m7yPA2z8qWcXMumBTGKyPVS8HTsOZfdx/pdf/marked-full?v=1&s=PT&f=528&n=C">PDF+</link>
<link href="https://patentpak-test.cas.org/STN/patents/viewer?d=c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvniXGCrH2fnQB0zoSAeAJMbD8enl9yZgsPGKmDbvxi2RbijMLHFHxa3m7yPA2z8qWcXMumBTGKyPVS8HTsOZfdx&v=1&s=PT&f=528&n=C">Interactive</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvk6eZOK4W6hBf19Ykt_mdbL/pdf/marked-full?v=1&s=PT&f=528&n=C">PDF+</link>
<link href="https://patentpak-test.cas.org/STN/patents/viewer?d=c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvk6eZOK4W6hBf19Ykt_mdbL&v=1&s=PT&f=528&n=C">Interactive</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvlvnQZiNyj8ghzK-kLyWlA9cpwfhGMvvGTdb67B-kPo4XsPP5QOpY-F5yGumJzncFeUarNxtEey6Jm7qmbm3BgM/pdf/full?v=1&s=PT&f=528&n=C">PDF</link>
<link href="https://patentpak-test.cas.org/STN/patents/c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvlvnQZiNyj8ghzK-kLyWlA9cpwfhGMvvGTdb67B-kPo4XsPP5QOpY-F5yGumJzncFeUarNxtEey6Jm7qmbm3BgM/pdf/marked-full?v=1&s=PT&f=528&n=C">
<link href="https://patentpak-test.cas.org/STN/patents/viewer?d=c1yXxmR9m-5prlzfPAS9CX9jtkmazhDk9ksue-5hlvlvnQZiNyj8ghzK-kLyWlA9cpwfhGMvvGTdb67B-kPo4XsPP5QOpY-F5yGumJzncFeUarNxtEey6Jm7qmbm3BgM&v=1&s=PT&f=528&n=C">Interactive</link>
Example 4: Full-Text Links
Query used to generate report:
Fil medline caplus toxcenter; s 10.1001/2012.JAMA.10368/DOI; D 1-3
Information desired:
Extract all full-text links from the XML report across databases.
XPATH
/stnAnswers/answer/fields/field/values/value/link[contains(text(), 'Full-text')]
Note: This Xpath expression uses the text inside the link element to distinguish between full-text links and additional links that may be present in a report (PatentPak, etc.)
Result
<link href="http://chemport-test.cas.org/cgi-bin/cp_sdcgi?vUHEWsjdgD3EFvU51tSZMIzHNq5@IiJhbc3JFm_AZ75OGdnbm50YLwWjvK5UtjMYPnUkcAnuAI1RPW7yGePFIEU@DWdDn_iy88ydril1UtMAH_n3f4WLA7tEUBN65bcqaqO_yWM1YFUWLHGxNnPYLzV5I3l@hUGsX7QdK3Gw6L6_">Full-text</link>
<link href="http://chemport-test.cas.org/cgi-bin/cp_sdcgi?7aQYaOYPTYzl_h7myF1AkdxQsV5wRAkWAImzc0MZYyJbaeEVHMmrowJYOM573GTLNzVTneLvT8rcG77S9@cuiFLrr5se06wjCfjzyeuCW17X4cH@zmSqiZClOF6ZzDGhO3pibWyXX1VG7IXFB7sjVjYJ1JIPaPTxPHtyQeraFw0Y0B">Full-text</link>
<link href="http://chemport-test.cas.org/cgi-bin/cp_sdcgi?Xckqpo2tFTZzs8QmJTJszKOkTkkqgcmF8_FM@hoguxYjmRyqzWzehaG2awTQpqfiIsvwCL4ZSzN_zfrEnDYTppmo6zXRU0RYMc1N0kFRn1h8QS5rtpsFNiK9xEepCHGOhssMJ0j1F5i800QXIWWJTC4pEaUYoooqXpauyxgRcPHaoEoO">Full-text</link>
Example 5: Patent Family Information
Query used to generate report:
Fil cap wpindex inpadoc; s WO 2020055170/Pn; d 1-3
Information desired:
Extract all patent family information from the XML report.
XPATH
/stnAnswers/answer/fields/field[fieldCode="PPPI"]/values/value
Result
Element='<value>
PATENT NO. KIND DATE LANGUAGE PatentPak
--------------- ---- -------- ---------- ------------------------
WO 2020055170 A1 20200319 Korean <link href="https://patentpak-test.cas.org/STN/patents/IqbX-J-73M7elE2noMeePXP6opC5sETtd1-kv78eXDJERGt1MdiSRbRsT9xpRxG_NUK3HJMff2A7SkuMxzd2Yw2jORML1dsX9y8t9rS8QIq1sR99XL5sKKVKUlIihuek/pdf/full?v=1&s=PT&f=528&n=C">PDF</link> | <link href="https://patentpak-test.cas.org/STN/patents/IqbX-J-73M7elE2noMeePXP6opC5sETtd1-kv78eXDJERGt1MdiSRbRsT9xpRxG_NUK3HJMff2A7SkuMxzd2Yw2jORML1dsX9y8t9rS8QIq1sR99XL5sKKVKUlIihuek/pdf/marked-full?v=1&s=PT&f=528&n=C">PDF+</link> | <link href="https://patentpak-test.cas.org/STN/patents/viewer?d=IqbX-J-73M7elE2noMeePXP6opC5sETtd1-kv78eXDJERGt1MdiSRbRsT9xpRxG_NUK3HJMff2A7SkuMxzd2Yw2jORML1dsX9y8t9rS8QIq1sR99XL5sKKVKUlIihuek&v=1&s=PT&f=528&n=C">Interactive</link>
KR 2020030469 A 20200320 Korean <link href="https://patentpak-test.cas.org/STN/patents/IqbX-J-73M7elE2noMeePXP6opC5sETtd1-kv78eXDISgRID1cXE8dsqeyDrUSQuCmCpOhCiGzvGfdJ9VJnSpl2TuiuKa77AWGafbniR7QscqWsJsmd6U67_l4g2LPgx/pdf/full?v=1&s=PT&f=528&n=C">PDF</link>
CN 112805002 A 20210514 Chinese <link href="https://patentpak-test.cas.org/STN/patents/IqbX-J-73M7elE2noMeePXP6opC5sETtd1-kv78eXDJqQ2nDpsrjVuBvepWiU55DpgcL5jFz_U3joqa5dUeG3k39m4grjygfNsU2SsH11X6qAqB8afrHUd-USng-l-tR/pdf/full?v=1&s=PT&f=528&n=C">PDF</link>
EP 3827829 A1 20210602 English <link href="https://patentpak-test.cas.org/STN/patents/IqbX-J-73M7elE2noMeePXP6opC5sETtd1-kv78eXDK-y3qbBeu0M7U0KMv_dCRAZ_m1mzGzhpLwI5oZqi-xs8Ub1_OsiEXdG7cHgdJwOHuuDz5ts_PPfuD_MIhClidf/pdf/full?v=1&s=PT&f=528&n=C">PDF</link>
</value>'
If we use the XPATH:
/stnAnswers/answer/fields/field[fieldCode="PPPI"]/values/value
We get the following result:
Text='
PATENT NO. KIND DATE LANGUAGE PatentPak
--------------- ---- -------- ---------- ------------------------
WO 2020055170 A1 20200319 Korean '
Text=' | '
Text=' | '
Text='
KR 2020030469 A 20200320 Korean '
Text='
CN 112805002 A 20210514 Chinese '
Text='
EP 3827829 A1 20210602 English '
Example 6: Chemical Names
Query used to generate report:
Fil reg chemcat; s phenol/cn; d 1-3
Information desired:
Extract chemical names different content files.
XPATH
/stnAnswers/answer/fields/field[fieldCode="CN"]/values/value/text()
Result
Text='Phenol (CA INDEX NAME)
OTHER NAMES:
'
Text=' 2-Allphenol
'
Text=' Benzenol
'
Text=' Carbolic acid
'
Text=' ENT 1814
'
Text=' Hydroxybenzene
'
Text=' Monohydroxybenzene
'
Text=' Monophenol
'
Text=' NSC 36808
'
Text=' Oxybenzene
'
Text=' Phenic acid
'
Text=' Phenyl alcohol
'
Text=' Phenyl hydrate
'
Text=' Phenyl hydroxide
'
Text=' Phenylic acid
'
Text=' Phenylic alcohol
'
If we use the XPATH:
/stnAnswers/answer/fields/field[fieldCode="CN"]/values/value
We get the following result:
Element='<value>Phenol (CA INDEX NAME)
OTHER NAMES:
</value>'
Element='<value> 2-Allphenol
</value>'
Element='<value> Benzenol
</value>'
Element='<value> Carbolic acid
</value>'
Element='<value> ENT 1814
</value>'
Element='<value> Hydroxybenzene
</value>'
Element='<value> Monohydroxybenzene
</value>'
Element='<value> Monophenol
</value>'
Element='<value> NSC 36808
</value>'
Element='<value> Oxybenzene
</value>'
Element='<value> Phenic acid
</value>'
Element='<value> Phenyl alcohol
</value>'
Element='<value> Phenyl hydrate
</value>'