RInChI Extended Toolkit

This module contains additional functions from that officially distributed by the InChI trust. It develops a range of tools and programs to manipulate RInChIs, a concise machine readable reaction identifier.


Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.


Authors:

  • C.H.G. Allen 2012
  • N.A. Parker 2013
    1. Hammond 2014
  • D.F. Hampshire 2016-17

RInChI Object Orientated Atom Class Module

This module contains the Atom class and associated functions

Modifications:

    1. Hammond 2014
    1. Hampshire 2017

    Restructuring and changes as documented in Project Report

class rinchi_tools.atom.Atom(index=None)

Bases: object

A class containing a brief description of an atom, for use as nodes in a graph describing a molecule

get_attached_edges()

Get the edges attached to this atom.

Returns: The edges attached to the molecule.

get_hybridisation()

Gets the atom hybridisation. Only defined for C atoms but still useful

Returns:None or a string signalling the hybridisation e.g. “sp2”
get_valence()

Get the valence as determined by counting the number of bonds.

Returns:Number of bonds

RInChI Conversion Module

This module provides a variety of functions for the interconversion of RInChIS, Molfiles, RXNfiles and more.

Modifications:

  • C.H.G. Allen 2012

  • N.A. Parker 2013

    minor additional material added (specifically, .rxn to mol file agent conversion and subsequent amendments for agents in the .rxn to RInChI converter). added support to the rxn2rinchi function for non standard .rxn files containing reaction agents specified separately from the reactants and products.

    1. Hammond 2014

    extended support for non standard .rxn files to the rdf parsing functions. Modified all .rxn handling functions to no longer discard reaction data in the $DTYPE/$DATUM format, instead optionally returns them.

  • D.F. Hampshire 2016

    Removed functions now included in source v1.00 software (commands that interface with RInChI). Similar python functionality can be found from the rinchi_lib.py interfacing file. Some functions are now modified to use this rinchi_lib.py interface. Major restructuring across library means functions have been extensively moved to / from elsewhere.

rinchi_tools.conversion.create_csv_from_directory(root_dir, outname, return_rauxinfo=False, return_longkey=False, return_shortkey=False, return_webkey=False)

Iterate recursively over all rdf files in the given folder and combine them into a single .csv database.

Parameters:
  • root_dir – The directory to search
  • outname – Output file name parameter
  • return_rauxinfo – Include RAuxInfo in the result
  • return_longkey – Include Long key in the result
  • return_shortkey – Include the Short key in the result
  • return_webkey – Include the Web key in the result
Raises:

IndexError – File failed to be recognised for importing

rinchi_tools.conversion.rdf_to_csv(rdf, outfile='rinchi', return_rauxinfo=False, return_longkey=False, return_shortkey=False, return_webkey=False)

Convert an RD file to a CSV file containing RInChIs and other optional parameters

Parameters:
  • rdf – The RD file as a text block
  • outfile – Optional output file name parameter
  • return_rauxinfo – Include RAuxInfo in the result
  • return_longkey – Include Long key in the result
  • return_shortkey – Include the Short key in the result
  • return_webkey – Include the Web key in the result
Returns:

The name of the CSV file created with the requested fields

rinchi_tools.conversion.rdf_to_csv_append(rdf, csv_file, existing_keys=None)

Append an existing CSV file with values from an RD file

Parameters:
  • rdf – The RD file as a text block
  • csv_file – the CSV file path
  • existing_keys – The keys already existing in the CSV file
rinchi_tools.conversion.rdf_to_rinchis(rdf, start=0, stop=0, force_equilibrium=False, return_rauxinfos=False, return_longkeys=False, return_shortkeys=False, return_webkeys=False, return_rinchis=True, columns=None)

Convert an RDFile to a list of RInChIs.

Parameters:
  • rdf – The contents of an RDFile as a string.
  • start – The index of the RXN entry within the RDFile at which to start converting. If set at default value (0), conversion begins from the first RXN entry.
  • stop – The index of the RXN entry within the RDFile at which to stop converting. If set at default value (0), conversion does not stop until the end of the file is reached.
  • force_equilibrium – Whether to set the direction flags explicitly to equilibrium
  • return_rauxinfos – If True, generates and returns RAuxInfo each generated RInChI.
  • return_longkeys – If True, generates and returns Long-RInChIKeys for each generated RInChI.
  • return_shortkeys – If True, generates and returns Short-RInChIKeys for each generated RInChI.
  • return_webkeys – If True, generates and returns Web-RInChIKeys for each generated RInChI.
  • return_rinchis – Return the rinchi. Defaults to True
  • columns – the data to return may be given as list of headers instead.
Returns:

List of dicts of reaction data as defined above. The data types are the keys for each dict

rinchi_tools.conversion.rinchi_to_file(data, rxnout=True)

Takes a file object or a multi-line string and returns a list of output file text blocks (RXN or RDF)

Parameters:
  • data – The string of a file input or a file object.
  • rxnout – Return a reaction file. Otherwise, return an RD file
Returns:

A list of RXN of RD file text blocks

rinchi_tools.conversion.rinchis_to_keys(data, longkey=False, shortkey=False, webkey=False, inc_rinchi=False, inc_rauxinfo=False)

Converts a list of rinchis in a flat file into a dictionary of RInChIs and keys

Parameters:
  • inc_rauxinfo – Include the RAuxInfo in the result
  • data – The data string or file object to parse
  • longkey – Whether to include the longkey
  • shortkey – Whether to include the shortkey
  • webkey – Whether to include the webkey
  • inc_rinchi – Whether to include the original rinchi
Returns:

:

{‘rinchi’: ‘[DATA], ‘rauxinfo’: [DATA, ... }

Return type:

list of dictionaries containing the data produced data with the key as the property name like so

rinchi_tools.conversion.rxn_to_rinchi(rxn_text, ret_rauxinfo=False, longkey=False, shortkey=False, webkey=False, force_equilibrium=False)

Convert a RXN to a dictionary of calculated data.

Parameters:
  • rxn_text – The RXN text as a string
  • ret_rauxinfo – Return RAuxInfo
  • longkey – Return the Long Key
  • shortkey – Return the Short Key
  • webkey – Return the Web Key
  • force_equilibrium – Force the output direction to be an equilibrium
Returns:

:

{‘rinchi’: ‘[DATA], ‘rauxinfo’: [DATA, ... }

Return type:

A dictionary of data with the key as the property name like so

RInChI Database Module

Provides tools for converting, creating, and removing from SQL databases

Modifications:

    1. Hammond 2014
    1. Hampshire 2017

    Python 3 restructuring and new function addition. Significantly modularised the exisiting code

rinchi_tools.database.compare_fingerprints(search_term, db_filename, table_name)

Search db for top 10 closest matches to a RInChI by fingerprinting method. Sent to stdout.

Parameters:
  • search_term – A RInChi or Long-RInChIKey to search with
  • db_filename – the db containing the fingerprints
  • table_name – The table containing the RInChI fingerprints
rinchi_tools.database.convert_v02_v03(db_filename, table_name, v02_rinchi=False, v02_rauxinfo=False, v03_rinchi=False, v03_rauxinfo=False, v03_longkey=False, v03_shortkey=False, v03_webkey=False)

Converts a db of v02 rinchis into a db of v03 rinchis and associated information. N.B keys for v02 are not required as new keys must be generated for the db. Because of the nature of this problem, this is achieved by creating a new db for the processed data and then transferring back to the original

Parameters:
  • db_filename – The db filename to which the changes should be made. The new db is added as a table.
  • table_name – the name for the new v03 rinchi table.
  • v02_rinchi – The name of the v02 rinchi column. Defaults to False (No RInChI in db).
  • v02_rauxinfo – The name of the v02 rauxinfo column. Defaults to False (No rauxinfos in db).
  • v03_rinchi – The name of the v03 new rinchi column. Defaults to False (No rinchi column will be created).
  • v03_rauxinfo – The name of the v03 new rinchi column. Defaults to False (No rauxinfo column will be created).
  • v03_longkey – The name of the v03 new rinchi column. Defaults to False (No longkey column will be created).
  • v03_shortkey – The name of the v03 new rinchi column. Defaults to False (No shortkey column will be created).
  • v03_webkey – The name of the v03 new webkey column. Defaults to False (No webkey column will be created).
rinchi_tools.database.csv_to_sql(csv_name, db_filename, table_name)

Creates or appends an SQL db with values from a CSV file

Parameters:
  • csv_name – The CSV filename
  • db_filename – The SQLite3 db
  • table_name – The name of the table to create or append
rinchi_tools.database.gen_rauxinfo(db_filename, table_name)

Updates a table in a db to give rauxinfos where the column is null

Parameters:
  • db_filename – Database filename
  • table_name – name of table
rinchi_tools.database.rdf_to_sql(rdfile, db_filename, table_name, columns=None)

Creates or adds to an SQLite db the contents of a given RDFile.

Parameters:
  • rdfile – The RD file to add to the db
  • db_filename – The file name of the SQLite db
  • table_name – The name of the table to create or append
  • columns – The columns to add. If None, the default is [rinchi,rauxinfo,longkey,shortkey,webkey]
rinchi_tools.database.recall_fingerprints(lkey, db_filename, table_name)

Recall a fingerprint from the db

Parameters:
  • lkey – The long key to search for
  • db_filename – The db filename
  • table_name – The table name which stores the fingerprints
Returns:

A numpy array the reaction fingerprint as stored in the reaction db

rinchi_tools.database.search_for_roles(db, table_name, reactant_subs=None, product_subs=None, agent_subs=None, limit=200)

Searches for reactions in a particular roles

rinchi_tools.database.search_for_roles_advanced(db, table_name, reactant_subs=None, product_subs=None, agent_subs=None, changing_subs=None, exclusive=False, unique=True, limit=200)

Searches for reactions in a particular functionality

rinchi_tools.database.search_master(search_term, db=None, table_name=None, is_sql_db=False, hyb=None, val=None, rings=None, formula=None, reactant=False, product=False, agent=False, number=1000, keytype=None, ring_type=None, isotopic=None)

Search for an string within a RInChi database. Includes all options.

Parameters:
  • ring_type
  • isotopic
  • db
  • is_sql_db
  • number – Maximum number of initial results
  • search_term – The term to search for
  • table_name – the table to search in
  • reactant – Search for InChIs in the products
  • product – Search for InChIs in the reactants
  • agent – Search for InChIs in the agents
  • keytype – The type of key to look for. If not found, then the function will check if the search term is a key,
  • try to parse the Key regardless. Otherwise, it assumes to look in the RInChIs (and) –
  • args following are dicts of the format {property (All) – count,property2:count2,...}
  • hyb – The hybridisation changes(s) desired
  • val – The valence change(s) desired
  • rings – The ring change(s) desired
  • formula – The formula change(s) desired
Returns:

A dictionary of lists where an inchi was found

rinchi_tools.database.search_rinchis(search_term, db=None, table_name=None, is_sql_db=False, hyb=None, val=None, rings=None, formula=None, ringelements=None, isotopic=None, reactant=False, product=False, agent=False, number=1000)

Search for an Inchi within a RInChi database. Includes all options

Parameters:
  • db
  • is_sql_db
  • number
  • search_term – The term to search for
  • table_name – the table to search in
  • args following are dicts of the format {property (All) – count,property2:count2,...}
  • hyb – The hybridisation changes(s) desired
  • val – The valence change(s) desired
  • rings – The ring change(s) desired
  • formula – The formula change(s) desired
  • reactant – Search for InChIs in the products
  • product – Search for InChIs in the reactants
  • agent – Search for InChIs in the agents
  • ringelements
  • isotopic
Returns:

A dictionary of lists where an inchi was found

rinchi_tools.database.sql_key_to_rinchi(key, db_filename, table_name, keytype='L', column=None)

Returns the RInChI matching the given Long RInChI key for a given database

Parameters:
  • key – The key to search for
  • db_filename – The database in which to search
  • table_name – The table in which to search for the key
  • keytype – The key type to seach for. Defaults to the long key
  • column – Optional column to look for the key in.
Raises:

ValueError – The keytype argument must be one of “L” , “S” or “W”

Returns:

the corresponding RInChI

rinchi_tools.database.update_fingerprints(db_filename, table_name, fingerprint_table_name)

NOT CURRENTLY WORKING. NEEDS UPDATING TO USE MULTITHREADING FOR USABLE PERFORMANCE

Calculates the reaction fingerprint as defined in the reaction Reaction class, and stores it in the given db in a compressed form

Parameters:
  • db_filename – the db filename to update
  • table_name – The table containing the RInChIs
  • fingerprint_table_name – The table to contain the fingerprint

RInChI Substructure Matching Module

This module contains the matcher for matching molecules.

Modifications:

    1. Hampshire 2017
class rinchi_tools.matcher.Backup(matcher_object)

Bases: object

Stores the backed up mappings

backup()

Backs up this iteration of the mapping

depth()

The depth of the iterations

restore()

Restores the previous mapping in the event of a failed mapping

class rinchi_tools.matcher.Matcher(sub, master)

Bases: object

Implementation of VF2 algorithm for matching as a subgraph of another.

made using this site.

Uses the python set implementation widely for best performance.

bonds_compatible(mapping)

Checks if the bonds to the atoms in the mapping are compatible

count_compatable(mapping)

Checks that the terminal sets as computed the mapping have the appropriate bond counts.

Also sets terminal sets for next iteration to avoid unnecessary repeated computation.

gen_possible_mappings()

The function P(s) which generates the mappings to be tested for the particular current mapping M(s)

gen_test_state(mapping)

Generates a test state for testing criteria.

get_backup_mappings()

Get the trial mappings of the atoms in the event that no terminal mappings are found.

The inclusion of the min is fundamental to quick execution of the script

static get_terminal_atoms(atoms_mapped_set, molecule)

Gets the set of atoms in a moleculethat are not in the current mapping but are branches of the current mapping

get_terminal_mappings()

Gets the mappings based on terminal atoms

is_compatible(mapping)
Checks if:
  1. The atom mapping has the correct atom
  2. Checks that other things

Returns a list of compatible atoms.

is_covering()

Checks if all the atoms are mapped from the sublist in the lists

is_sub()

Returns True if a subgraph of G1 is isomorphic to G2.

master_to_sub(index)

Converts a sub graph index to the master index

match()

Extends the isomorphism mapping, and acts as the iterating function in the VF2 algorithm.

This function is called recursively to determine if a complete isomorphism can be found between sub and master. It cleans up the class variables after each recursive call. If an isomorphism is found, we return the mapping.

new_state(mapping)

Generates an new state from a mapping

sub_count()

The number of unique matches found in the molecule

sub_count_unique()
sub_to_master(index)

Converts a sub graph index to the master index

RInChI Object Orientated Molecule Class Module

This module contains the Molecule class and associated functions

Modifications:

    1. Hammond 2014
    1. Hampshire 2017

    Significant restructuring of the class to gain more consistent and less verbose code.

class rinchi_tools.molecule.Molecule(inchi)

Bases: object

A class containing a molecule as defined by an inchi. Contains functions for generating edge lists and node edge tables describing molecular graphs, and functions that use molecular graphs to calculate information about the molecules - ring sizes, atom hybridisation, contained functional groups etc.

Get the shortest path between the start and finish nodes

Adapted from http://eddmann.com/posts/depth-first-search-and-breadth-first-search-in-python/, accessed 06/11/2014

Parameters:
  • graph – an unweighted, undirected vertex-edge graph as a list
  • start – the starting node
  • finish – the finishing node as
Returns:

The shortest path as a list

calculate_edges(edge_list=None)

Sets the node-edge graph as a dict.

Parameters:edge_list – A molecular graph as a list of edges. If no list is passed, the function sets the atoms for its own instance.
calculate_rings()

Sets the ring count property which contains the ring sizes in the format { ring size : number of rings present, ...}

calculate_rings_by_atoms()

Count the rings by atom list eg. “CCCCCN” will return the number of pyridine fragments in the molecule.

Returns:number of rings
chemical_formula_to_dict()

Get the chemical formula as a dict

Returns:A dict with elements as keys and number of atoms as value
static composite_to_simple(inchi)

Splits an inchi with multiple disconnected components into a list of connected inchis

# Modified 2017 D Hampshire to split formula of multiple identical components # cf. http://www.inchi-trust.org/technical-faq/#5.6

Parameters:inchi – A inchi (usually composite
Returns:A list of simple inchis within the composite inchi argument
count_centres(wd=False, sp2=True, sp3=True)

Counts the centres contained within an inchi

Parameters:
  • wd – Whether or not the stereocentre must be well-defined to be counted.
  • sp2 – Count sp2 centres
  • sp3 – Count sp3 centres
Returns:

The number of stereocentres stereo_mols: The number of molecules with stereocentres

Return type:

stereocentres

count_rings()

Count the number of rings in an InChI.

Returns:The number of rings in the InChI.
Return type:ring_count
count_sp2(wd=False)

Count the number of sp2 stereocentres.

Parameters:wd – Whether or not the stereocentre must be well-defined to be counted.
Returns:The number of sp2 stereocentres in the structure.
Return type:sp2_centre_count
count_sp3(wd=False, enantio=False)

Count the number of sp3 stereocentres in a molecule.

Parameters:
  • wd – Whether or not the stereocentre must be well-defined to be counted.
  • enantio – Whether or not the structure must be enantiopure to be counted.
Returns:

The number of sp3 stereocentres in the structure.

Performs a DFS over the molecular graph of a given Molecule object, returning a list of edges that form a spanning tree (tree edges), and a list of the edges that would cyclise this spanning tree (back edges)

The number of back edges returned is equal to the number of rings that can be described in the molecule

Parameters:start – Set which atom should be the starting node
Returns:A list of tree edges. back_edges: A list of back edges. The list length is equal to the smallest number of cycles that can describe the cycle space of the molecular graph
Return type:tree_edges
static edge_list_to_atoms_spanned(edge_list)

Takes an edge list and returns a list of atoms spanned

Parameters:edge_list – An edge list
Returns:A list of all the keys for the atoms which are spanned by the edge list.
edge_list_to_vector(subset)

Converts an edge list to a vector in the (0, 1)^N vector space spanned by the edges of the molecule

Parameters:subset – The vector subset to use
Returns:The vector stored as a list.
static edges_to_atoms(ls)

Sets the node-edge graph as a dict.

Parameters:ls – A molecular graph as a list of edges. If no list is passed, the function sets the atoms for its own instance.
find_initial_ring_set()

For every edge in the molecule, find the smallest ring is it a part of, add it to a list NEEDS REIMPLEMENTATION

Returns:list of all minimal rings, sorted by the number of edges they contain
find_initial_ring_set_trial()

For every edge in the molecule, find the smallest ring is it a part of, add it to a list TRIAL REIMPLEMENTATION, NOT YET WORKING

Returns:list of all minimal rings, sorted by the number of edges they contain
find_linearly_independent(cycles)

Given a list of candidate cycles, sorted by size, this function attempts to find the smallest, linearly independent basis of cycles that spans the entire cycle space of the molecular graph - the Minimum Cycle Basis.

Parameters:cycles – list of candidate cycles sorted by size
Returns:None
find_rings_from_back_edges()

Accepts output from the depth_first_search algorithm, returns a list of all rings within the molecule.

Will NOT find a minimum cycle basis, but can be used to find an initial cycle set when performing the Horton Algorithm (see elsewhere)

find_shortest_path(graph, start, end, path=None)

Recursively iterates over the entire molecular graph, yielding the shortest path between two points

Adapted from https://www.python.org/doc/essays/graphs/, accessed 15/10/2014

Parameters:
  • graph – an unweighted, undirected vertex-edge graph as a list
  • start – the starting node as a number
  • end – the finishing node as a number
  • path – latest iteration of the path
Returns:

The shortest path as a list of indices

generate_edge_list()

Takes the connective layer of an inchi and returns the molecular graph as an edge list, parsing it directly using re.

Returns:A list containing the edges of the molecular graph
Return type:edges
get_formula()

Get chemical empirical formula

Returns:Chemical formula stored as a counter
get_hybrid_count()

Calculate the hybridisation of each atom

Returns:A Counter object containing the hybridisation of the atoms
get_ring_count()

Get the ring count

Returns:a Counter object containing the number of rings of each size
get_ring_count_inc_elements()

Count the rings of a molecule. Result includes the elements of the ring.

Returns:a Counter containing the number of rings of each size and the elements contained by a ring
get_valence_count()

Calculates the valences of each atom in the Molecule

Returns:A Counter object containing the valences of the atoms
has_isotopic_layer()

Does the molecule inchi have an isotopic layer?

Returns:A boolean value
inchi_to_chemical_formula()

Converts an Inchi to a Chemical formula

Returns:The Chemical Formula of the Molecule as a string
inchi_to_layer(l)

Get a particular layer of the InChI

Parameters:l – The layer of the InChI to retrieve
Returns:The InChI layer desired
initialize()

Initialises the molecule

static new(inchi)

Creates a list of new Molecule objects. Safer than Molecule() due to composite InChI implications.

Parameters:inchi – An InChI string
Returns:list of Molecule objects.
static path_to_cycle_edge_list(path)

Converts a cycle described by an ordered list of nodes to an edge list

Parameters:path – The path of the cycle stored as an ordered list
Returns:The edge list
set_atomic_hydrogen()

Takes the molecular graph and the inchi, and sets the number of protons attached to each atom.

Requires initialised atoms.

set_atoms()

Sets the atoms objects with their appropriate indexes and elements for each of the instances of the the Atom class.

vector_to_edge_list(vector)

Takes an edge vector and returns an edge list

Parameters:vector – an edge vector stored in an iterable
Returns:The edge list

RInChI Object Orientated Reaction Class Module

This module contains the Reaction class and associated functions

Modifications:

    1. Hammond 2014
    1. Hampshire 2017

    Significant restructuring of the class to gain more consistent and less verbose code.

class rinchi_tools.reaction.Reaction(rinchi)

Bases: object

This class defines a reaction, as defined by a RInChI. Molecule objects are created from all component InChIs, and the member functions of the class can be used to analyse various parameters that may be changing across the reaction

calculate_reaction_fingerprint(fingerprint_size=1024)

Calculates a reaction fingerprint for a given reaction. Uses a 1024 bit fingerprint by default

Method of Daniel M. Lowe (2015)

This function generates fingerprints for individual molecules using obabel. Could be simply modified to use other software packages ie. RDKIT if desired

Parameters:fingerprint_size – The length of the fingerprint to be generated.
change_across_reaction(func, *args)

Calculates the total change in a parameter across a molecule, Molecule class function and returns a Python Counter object

Parameters:
  • func – The class function to calculate the parameter, which returns a Counter object
  • args – Args if required for the function
Returns:

the change in the parameter

detect_reaction(hyb_i=None, val_i=None, rings_i=None, formula_i=None, isotopic=False, ring_elements=None)

Detect if a reaction satisfies certain conditions. Allows searching for reactions based on ring changes, valence changes, formula changes, hybridisation of C atom changes.

Parameters:
  • args are dicts of the format {property (All) – count,property2:count2,...}
  • hyb_i – The hybridisation change(s) desired
  • val_i – The valence change(s) desired
  • rings_i – The ring change(s) desired
  • formula_i – The formula change(s) desired
  • isotopic – Whether to look for reactions involving an isotopic InChI
  • ring_elements – Look for a ring in the reaction
Returns:

True if the given reaction satisfies all the conditions, otherwise False.

generate_svg_image(outname)

Outputs the reactants, products, and agents as SVG files in the current directory with the given filename

Parameters:outname – the name of the file to output the SVG image
has_isotopic_inchi()
has_ring(ring)
has_substructures(reactant_subs=None, product_subs=None, agent_subs=None, exclusive=True, rct_disappears=True, pdt_appears=True)

Detects if the reaction is a substructure

Parameters:
  • reactant_subs – Lists of reactant inchis
  • product_subs – List of product inchis
  • agent_subs – List of agent inchis
  • exclusive – Match one functionality per molecule of reactant
  • rct_disappears – Only match if substructures not in products
  • pdt_appears – Only match if substructures not in reactants
Returns:

Boolean, whether the substructures are contained

has_substructures_by_populations(reactant_subs=None, product_subs=None, agent_subs=None, changing_subs=None, exclusive=False, unique=True)

Detects if the reaction is a substructure

Parameters:
  • reactant_subs – Dictionary of reactant inchis and their populations in the layer
  • product_subs – Dictionary of product inchis and their populations in the layer
  • agent_subs – Dictionary of product inchis and their populations in the layer
  • changing_subs – Dictionary of inchi changes in populations
  • exclusive – Match one functionality per molecule of reactant
  • unique – Prevent matching the same atoms
Returns:

Boolean, whether the substructures are contained

is_agent(inchi)

Determine whether the reaction is catalytic in a particular chemical

Parameters:inchi – A InChI string specifying a molecule
Returns:True or False (Boolean)
is_balanced()

Determine if a reaction is balanced

Returns:True if Balanced, False otherwise.
longkey()

Set longkey if not already set, then return longkey

static present_in_layer(layer, inchi)

Checks if an InChI is is present in a layer

Parameters:
  • layer – A reaction layer
  • inchi – an Inchi
Returns:

Returns the RInChI if the inchi is present, otherwise returns None.

present_in_reaction(func)

Tests if a molecule is present in the reaction

Parameters:func – function of a Molecule object that returns True if a given condition is satisfied
Returns:If the function returns true for any InChI, the parent RInChI is returned
ring_change()

Determine how the number of rings changes in a reaction. Old method

Returns:A counter containing the changes across the reaction.
shortkey()

Set shortkey if not already set, then return shortkey

stereo_change(wd=False, sp2=True, sp3=True)

Determine whether a reaction creates or destroys stereochemistry. Old Methold

Parameters:
  • wd – Whether only well-defined stereocentres count.
  • sp2 – Whether to count sp2 stereocentres.
  • sp3 – Whether to count sp3 stereocentres.
Returns:

The number of stereocentres created by a reaction stored as a value in a dictionary

webkey()

Set webkey if not already set, then return webkey

RInChI C Library Interface Module

This module provides functions defining how RInChIs and RAuxInfos are constructed from InChIs and reaction data. It also interfaces with the RInChI v1.00 software as provided by the InChI trust.

This file is based on that provided with the official v1.00 RInChI software release, but with modifications to ensure Python 3 compatibility. Documentation was adapted from the official v1.00 release document.

Modifications:

    1. Hampshire 2017
class rinchi_tools.rinchi_lib.RInChI(lib_path='/home/dh493/Documents/rinchi03-extended/rinchi_tools/libs/librinchi.so.1.0.0')

Bases: object

The RInChI class interfaces the C class in the librinchi library

file_text_from_rinchi(rinchi_string, rinchi_auxinfo, output_format)

Reconstructs (or attempts to reconstruct) RD or RXN file from RInChI string and RAuxInfo

Parameters:
  • rinchi_string – The RInChI string to convert
  • rinchi_auxinfo – The RAuxInfo to convert (optional, recommended)
  • output_format – “RD” or “RXN”
Returns:

The text block for the file

inchis_from_rinchi(rinchi_string, rinchi_auxinfo='')

Splits an RInChI string and optional RAuxInfo into components.

Parameters:
  • rinchi_string – A RInChI string
  • rinchi_auxinfo – RAuxInfo string. May be blank but may not be NULL.
Raises:

Exception – RInChi format related errors

Returns:

:

{‘Direction’: [direction character],

‘No-Structures’: [list of no-structures], ‘Reactants’: [list of inchis & auxinfos], ‘Products’: [list of inchis & auxinfos], ‘Agents’: [list of inchis] & auxinfos}

Each Reactant, Product, and Agent list contains a set of (InChI, AuxInfo) tuples. The No-Structures list contains No-Structure counts for Reactants, Products, and Agents.

Return type:

A dictionary of data returned. The structure is as below

rinchi_errorcheck(return_code)

Specifies Python error handling behavior

Parameters:return_code – the return code from the C library
rinchi_from_file_text(input_format, rxnfile_data, force_equilibrium=False)

Generates RInChI string and RAuxInfo from supplied RD or RXN file text.

Parameters:
  • input_format – “AUTO”, “RD” or “RXN” (with “AUTO” as default value)
  • rxnfile_data – text block of RD or RXN file data
  • force_equilibrium (bool) – Force interpretation of reaction as equilibrium reaction
Returns:

tuple pair of the RInChI and RAuxInfo generated

rinchikey_from_file_text(input_format, file_text, key_type, force_equilibrium=False)

Generates RInChI key of supplied RD or RXN file text.

Parameters:
  • input_format – “RD” or “RXN”
  • file_text – text block of RD or RXN file data
  • key_type – 1 letter controlling the type of key generated; “L” for Long-RInChIKey, “S” for Short key
  • “W” for Web key ((Short-RInChIKey),) –
  • force_equilibrium (bool) – Force interpretation of reaction as equilibrium reaction
Returns:

a RInChIKey

rinchikey_from_rinchi(rinchi_string, key_type)

Generates RInChI key of supplied RD or RXN file text.

Parameters:
  • rinchi_string – A RInChI string
  • key_type – 1 letter controlling the type of key generated with “L” for the Long-RInChIKey, “S” for the Short key (Short-RInChIKey), “W” for the Web key (Web-RInChIKey)
Returns:

the RInChiKey

class rinchi_tools.rinchi_lib.StringHandler

Bases: object

Enables seamless use with Python 3 by converting to ascii within the argument objects

classmethod from_param(value)

Performs the conversion

RInChI Tools Module

This module provides functions defining how RInChIs and RAuxInfos are constructed from InChIs and reaction data. It also interfaces with the RInChI v1.00 software as provided by the InChI trust.

Modifications:

  • C.H.G. Allen 2012
  • D.F. Hampshire 2016
rinchi_tools.tools.add(rinchis)

Combines a list of RInChIs into one combined RInChI.

N.B. As stoichiometry is not represented in the input, this is an approximate addition.

Substances from RInChIs are sorted into one of four “pots”:

  • “Used” contains substances which have acted as a reagent, and have not yet been created again as a product.

  • “Made” contains substances which have been created as a product of a step, and have yet to be used again.

  • “Present” contains substance which have been present during a step, but have not yet been used up or

    substances which have been used as a reagent, and later regenerated as a product.

  • “Intermediates” contains substances which have been created as a product, and later used as a reagent.

Each RInChI is considered in turn:

The reactants are considered:
  • If novel, add to “used”.
  • If in “used”, remain in “used”.
  • If in “made”, move to “intermediates”.
  • If in “present”, move to “used”.
  • If in “intermediates”, remain in “intermediates”.
The products are considered:
  • If novel, add to “made”.
  • If in “used”, move to “present”.
  • If in “made”, remain in “made”.
  • If in “present”, remain in “present”.
  • If in “intermediates”, move to “made”.
The extras are considered:
  • If novel, add to “present”.
The pots are then emptied into the following output receptacles:
  • “Used” -> LHS InChIs
  • “Made” -> RHS InChIs
  • “Present” -> BHS InChIs
  • “Intermediates” -> discarded

Finally, the RInChI is constructed in the usual way and returned.

Parameters:rinchis – A list of RInChIs, representing a sequence of reactions making up one overall process. The order of this list is important, as each RInChI is interpreted as a step in the overall process. They must also have a clearly defined direction.
Returns:A RInChI representing the overall process.
rinchi_tools.tools.build_rauxinfo(l2_auxinfo, l3_auxinfo, l4_auxinfo)

Takes 3 sets of AuxInfos and converts them into a RAuxInfo. n.b. The order of Inchis in each list is presumed to be corresponding to that in the RInChI

Parameters:
  • l2_auxinfo – List of layer 2 AuxInfos
  • l3_auxinfo – List of layer 3 AuxInfos
  • l4_auxinfo – List of layer 4 AuxInfos
Returns:

An RAuxInfo

rinchi_tools.tools.build_rinchi(l2_inchis=None, l3_inchis=None, l4_inchis=None, direction='', u_struct='')

Build a RInChI from the specified InChIs and reaction data.

RInChI Builder takes three groups of InChIs, and additional reaction data (currently limited to directionality information), and returns a RInChI.

The first three arguments are groups of InChIs saved as strings within an iterable (e.g. a list, set, tuple). Any or all of these may be omitted. All InChIs must be of the same version number. If a chemical which cannot be described by an InChI is desired within the RInChI, it should be added to the u_struct argument detailed below.

Parameters:
  • l2_inchis – Chemicals in the second layer of the RInChI
  • l3_inchis – Chemicals in the third layer of the RInChI
  • l4_inchis – Chemicals in the fourth layer of a RInChI. It refers to the substances present at the start and end of the reaction (e.g. catalysts, solvents), only referred to as “agents”.
  • direction – This must be “+”, “-” or “=”. “+” means that l2_inchis_input are the reactants, and the l3_inchis the products; “-” means the opposite; and “=” means the l2_inchis and l3_inchis are in equilibrium.
  • u_struct – Defines the number of unknown structures in each layer. This must be a list of the form [#2,#3, #4] where #2 is the number of unknown reactants in layer 2, #3 is number in layer 3 etc.
Returns:

The RinChI made from the input InChIs and reaction data.

Raises:

VersionError – The input InChIs are not of the same version.

rinchi_tools.tools.build_rinchi_rauxinfo(l2_input=None, l3_input=None, l4_input=None, direction='', u_struct='')

Build a RInChI and RAuxInfo from the specified InChIs and reaction data.

RInChI Builder takes three groups of InChIs, and additional reaction data, and returns a RInChI.

The first three arguments are tuples of InChI and RAuxInfo pairs within an iterable (e.g. a list, set, tuple). Any or all of these may be omitted. All InChIs must be of the same version number. If a chemical which cannot be described by an InChI is desired within the RInChI, it should be added to the u_struct argument detailed below.

Parameters:
  • u_struct – Defines the number of unknown structures in each layer. This must be a list of the form [#2,#3, #4] where #2 is the number of unknown reactants in layer 2, #3 is number in layer 3 etc.
  • l2_input – Chemicals in the second layer of the RInChI
  • l3_input – Chemicals in the third layer of the RInChI
  • l4_input – Chemicals in the fourth layer of a RInChI. It refers to the substances present at the start and end of the reaction (e.g. catalysts, solvents), only referred to as “agents”.
  • direction – This must be “+”, “-” or “=”. “+” means that the LHS are the reactants, and the RHS the products; “-” means the opposite; and “=” means the LHS and RHS are in equilibrium.
Returns:

The RInChI and RAuxInfo made from the input InChIs and reaction data.

Raises:

VersionError – The input InChIs are not of the same version.

rinchi_tools.tools.dedupe_rinchi(rinchi, rauxinfo='')

Removes duplicate InChI entries from the RInChI

Parameters:
  • rinchi – A RInChI string
  • rauxinfo – Optional RAuxInfo
Returns:

A RInChI and RAuxInfo tuple

rinchi_tools.tools.generate_rauxinfo(rinchi)

Create RAuxInfo for a RInChI using the InChI conversion function.

Parameters:rinchi – The RInChI of which to create the RAuxInfo.
Returns:The RAuxInfo of the RinChI.
rinchi_tools.tools.inchi_2_auxinfo(inchi)

Run the InChI software on an InChI to generate AuxInfo.

The function saves the InChI to a temporary file, and runs the inchi-1 program on this tempfile as a subprocess. The AuxInfo will not include 2D coordinates, but an AuxInfo of some kind is required for the InChI software to convert an InChI to an SDFile.

Parameters:inchi – An InChI from which to generate AuxInfo.
Returns:The InChI’s AuxInfo (will not contain 2D coordinates).
rinchi_tools.tools.process_stats(rinchis, mostcommon=None)

Takes an iterable

Parameters:
  • rinchis – An iterable of RInChIs
  • mostcommon – Return only the most common items
Returns:

Dictionary of counters containing the information.

rinchi_tools.tools.remove_stereo(inchi)

Removes stereochemistry from an InChI

Parameters:inchi – an InChI as a string
Returns:an InChI
rinchi_tools.tools.rinchi_to_dict_list(data)

Takes a text block or file object and parse a dictionary of RInChI entries

Parameters:data – The text block or file object to parse
Returns:A list of dictionaries containing each dictionary entry
rinchi_tools.tools.split_rinchi(rinchi)

Returns the inchis without RAuxInfo, each in lists, and the direct and no_structs lists

Parameters:rinchi – A RInChI String
Returns:
rct_inchis:
List of reactant inchis
pdt_inchis:
List of product inchis
agt_inchis:
List of agent inchis
direction:
returns the direction character
no_structs:
returns a list of the numbers of unknown structures in each layer
Return type:A tuple containing
rinchi_tools.tools.split_rinchi_inc_auxinfo(rinchi, rinchi_auxinfo)

Returns the inchi and auxinfo pairs, each in lists, the direction character, and a list of unknown structures.

Parameters:
  • rinchi – A RInChI String
  • rinchi_auxinfo – The corresponding RAuxInfo
Returns:

rct_inchis:

List of reactant inchi and auxinfo pairs

pdt_inchis:

List of product inchi and auxinfo pairs

agt_inchis:

List of agent inchi and auxinfo pairs

direction:

returns the direction character

no_structs:

returns a list of the numbers of unknown structures in each layer

Return type:

A tuple containing

rinchi_tools.tools.split_rinchi_only_auxinfo(rinchi, rinchi_auxinfo)

Returns the RAuxInfo

Parameters:
  • rinchi – A RInChI String
  • rinchi_auxinfo – The corresponding RAuxInfo
Returns:

rct_inchis_auxinfo:

List of reactant AuxInfos

pdt_inchis_auxinfo:

List of product AuxInfos

agt_inchis_auxinfo:

List of agent AuxInfos

Return type:

A tuple containing

RInChI Utilities Module

This module provides functions that perform various non RInChI specific tasks.

Modifications:

  • D.F. Hampshire 2016
class rinchi_tools.utils.Hashable(val)

Bases: object

Make an object hashable for counting. Used to count counters

class rinchi_tools.utils.Spinner(delay=None)

Bases: object

A spinner which shows during a long process.

busy = False
delay = 0.1
static spinning_cursor()
start()

Starts the spinner

stop()

Stops the spinner

rinchi_tools.utils.call_command(args, debug=False)

Run a command as a subprocess and return the output

Parameters:
  • args – The command to execute as a string
  • debug – Debug the command
Returns:

The output of query and error code

rinchi_tools.utils.consolidate(items)

Check that all non-empty items in an iterable are identical

Parameters:items – the iterable
Raises:ValueError – Items are not all identical
Returns:the value of all the items in the list
Return type:value
rinchi_tools.utils.construct_output_text(data, header_order=False)

Turns a variable containing a list of dicts or a dict or dict of lists into a single string of data

Parameters:
  • data – The data variable
  • header_order – Optional list of keys for the dictionaries. The list can contain non present keys.
Returns:

The output as a text block

rinchi_tools.utils.counter_to_print_string(counter, name)

Formats counter for printing

Parameters:
  • counter – The Counter object
  • name – Name of the data stored in the counter
rinchi_tools.utils.create_output_file(output_path, default_extension, create_out_dir=True)

Creates an output file

Parameters:
  • output_path – the path of the file to create
  • default_extension – the extension to use for the file
  • create_out_dir – Create an output directory
Returns:

A tuple containing a file object and the path of the file object.

rinchi_tools.utils.output(text, output_path=False, default_extension=False)

Simple output wrapper to print or write outputs.

Parameters:
  • text – text input
  • output_path – Specifies the filename for the output file
  • default_extension – specifies the file extension if none in the outputname
rinchi_tools.utils.read_input_file(input_path, filetype_check=False, return_file_object=False)

Reads an input path into a string

Parameters:
  • input_path – The path of the file to open
  • filetype_check – Check type of file
  • return_file_object – Return a file object instead of a string
Returns:

A multi-line string or a file object

rinchi_tools.utils.string_to_dict(string)

Converts a string of form ‘a=1,b=2,c=3’ to a dictionary of form {a:1,b:2,c:3}

Version 0.02 RInChIKey Generation Library Module

This module provides functions to create Long- and Short-RInChIKeys from RInChIs.

The supplied implementation of the inchi_2_inchikey function uses the InChIKey creation algorithm from OASA, a free python library for the manipulation of chemical formats, now stored permanently in the v02_inchi_key.py module.

Modifications:

  • C.H.G. Allen 2012

  • D.F. Hampshire 2016

    Modified for Python3 compatibility

rinchi_tools.v02_rinchi_key.rinchi_2_longkey(rinchi)

Create Long-RInChIKey from a RInChI.

Parameters:rinchi – The RInChI of which to create the RAuxInfo.
Returns:The Long-RInChIKey of the RinChI.
rinchi_tools.v02_rinchi_key.rinchi_2_shortkey(rinchi)

Create a Short-RInChIKey from a RInChI.

Parameters:rinchi – The RInChI from which to create the Short-RInChIKey
Returns:The Short-RInChIKey of the RInChI

RInChI v0.02 to 1.00 Conversion Module

Modifications:

  • D.F. Hampshire 2016
rinchi_tools.v02_tools.convert_all(rinchi, rauxinfo)

Convert a v0.02 RInChI & RAuxInfo into a v1.00 RInChI & RAuxInfo.

Parameters:
  • rinchi – A RInChI of version 0.02.
  • rauxinfo – A RAuxInfo of version 0.02.
Returns:

rauxinfo:

A RAuxInfo of version 1.00.

rauxinfo:

A RAuxInfo of version 1.00.

Return type:

A tuple containing

rinchi_tools.v02_tools.convert_rauxinfo(rauxinfo)

Convert a v0.02 RAuxInfo into a v1.00 RAuxInfo.

Parameters:rauxinfo – A RAuxInfo of version 0.02.
Returns:A RAuxInfo of version 1.00.
rinchi_tools.v02_tools.convert_rinchi(rinchi)

Convert a v0.02 RInChI into a v1.00 RInChI.

Parameters:rinchi – A RInChI of version 0.02.
Returns:A RInChI of version 1.00.
rinchi_tools.v02_tools.generate_rauxinfo(rinchi)

Create RAuxInfo for a RInChI using a conversion function.

Parameters:rinchi – The RInChI of which to create the RAuxInfo.
Returns:The RAuxInfo of the RinChI
Raises:VersionError – If the generated AuxInfos are not of the same version.