Sunday, July 30, 2017

Wikidata visualizes SMILES strings with John Mayfield's CDK Depict

SVG depiction of D-ribulose.
Wikidata is building up a curated collection of information about chemicals. A lot of data originates from Wikipedia, but active users are augmenting this information. Of particular interest, in this respect, is Sebastian's PubChem ID curation work (he can use a few helping hands!). Followers of my blog know that I am using Wikidata as source of compound ID mapping data for BridgeDb.

Each chemical can have one or two associated SMILES strings. A canonical SMILES, that excludes any chirality, and a isomeric SMILES that does include chirality. Because statement values can be linked to a formatter URL, Wikidata often has values associated with a link. For example, for the EPA CompTox Dashboard identifiers it links to that database. Kopiersperre used this approach to link to John Mayfield's CDK Depict.

Until two weeks ago, the formatter URL for both the canonical and isomeric SMILES was he same. I changed that, so that when a isomeric SMILES is depicted, it shows the perceived R,S (CIP) annotation as well. That should help further curation of Wikidata and Wikipedia content.