RInChI Extended Toolkit¶
This module contains additional functions from that officially distributed by the InChI trust. It develops a range of tools and programs to manipulate RInChIs, a concise machine readable reaction identifier.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Authors:
- C.H.G. Allen 2012
- N.A. Parker 2013
- Hammond 2014
- D.F. Hampshire 2016-17
RInChI Object Orientated Atom Class Module¶
This module contains the Atom class and associated functions
Modifications:
- Hammond 2014
- Hampshire 2017
Restructuring and changes as documented in Project Report
-
class
rinchi_tools.atom.
Atom
(index=None)¶ Bases:
object
A class containing a brief description of an atom, for use as nodes in a graph describing a molecule
-
get_attached_edges
()¶ Get the edges attached to this atom.
Returns: The edges attached to the molecule.
-
get_hybridisation
()¶ Gets the atom hybridisation. Only defined for C atoms but still useful
Returns: None or a string signalling the hybridisation e.g. “sp2”
-
get_valence
()¶ Get the valence as determined by counting the number of bonds.
Returns: Number of bonds
-
RInChI Conversion Module¶
This module provides a variety of functions for the interconversion of RInChIS, Molfiles, RXNfiles and more.
Modifications:
C.H.G. Allen 2012
N.A. Parker 2013
minor additional material added (specifically, .rxn to mol file agent conversion and subsequent amendments for agents in the .rxn to RInChI converter). added support to the rxn2rinchi function for non standard .rxn files containing reaction agents specified separately from the reactants and products.
- Hammond 2014
extended support for non standard .rxn files to the rdf parsing functions. Modified all .rxn handling functions to no longer discard reaction data in the $DTYPE/$DATUM format, instead optionally returns them.
D.F. Hampshire 2016
Removed functions now included in source v1.00 software (commands that interface with RInChI). Similar python functionality can be found from the rinchi_lib.py interfacing file. Some functions are now modified to use this rinchi_lib.py interface. Major restructuring across library means functions have been extensively moved to / from elsewhere.
-
rinchi_tools.conversion.
create_csv_from_directory
(root_dir, outname, return_rauxinfo=False, return_longkey=False, return_shortkey=False, return_webkey=False)¶ Iterate recursively over all rdf files in the given folder and combine them into a single .csv database.
Parameters: - root_dir – The directory to search
- outname – Output file name parameter
- return_rauxinfo – Include RAuxInfo in the result
- return_longkey – Include Long key in the result
- return_shortkey – Include the Short key in the result
- return_webkey – Include the Web key in the result
Raises: IndexError
– File failed to be recognised for importing
-
rinchi_tools.conversion.
rdf_to_csv
(rdf, outfile='rinchi', return_rauxinfo=False, return_longkey=False, return_shortkey=False, return_webkey=False)¶ Convert an RD file to a CSV file containing RInChIs and other optional parameters
Parameters: - rdf – The RD file as a text block
- outfile – Optional output file name parameter
- return_rauxinfo – Include RAuxInfo in the result
- return_longkey – Include Long key in the result
- return_shortkey – Include the Short key in the result
- return_webkey – Include the Web key in the result
Returns: The name of the CSV file created with the requested fields
-
rinchi_tools.conversion.
rdf_to_csv_append
(rdf, csv_file, existing_keys=None)¶ Append an existing CSV file with values from an RD file
Parameters: - rdf – The RD file as a text block
- csv_file – the CSV file path
- existing_keys – The keys already existing in the CSV file
-
rinchi_tools.conversion.
rdf_to_rinchis
(rdf, start=0, stop=0, force_equilibrium=False, return_rauxinfos=False, return_longkeys=False, return_shortkeys=False, return_webkeys=False, return_rinchis=True, columns=None)¶ Convert an RDFile to a list of RInChIs.
Parameters: - rdf – The contents of an RDFile as a string.
- start – The index of the RXN entry within the RDFile at which to start converting. If set at default value (0), conversion begins from the first RXN entry.
- stop – The index of the RXN entry within the RDFile at which to stop converting. If set at default value (0), conversion does not stop until the end of the file is reached.
- force_equilibrium – Whether to set the direction flags explicitly to equilibrium
- return_rauxinfos – If True, generates and returns RAuxInfo each generated RInChI.
- return_longkeys – If True, generates and returns Long-RInChIKeys for each generated RInChI.
- return_shortkeys – If True, generates and returns Short-RInChIKeys for each generated RInChI.
- return_webkeys – If True, generates and returns Web-RInChIKeys for each generated RInChI.
- return_rinchis – Return the rinchi. Defaults to True
- columns – the data to return may be given as list of headers instead.
Returns: List of dicts of reaction data as defined above. The data types are the keys for each dict
-
rinchi_tools.conversion.
rinchi_to_file
(data, rxnout=True)¶ Takes a file object or a multi-line string and returns a list of output file text blocks (RXN or RDF)
Parameters: - data – The string of a file input or a file object.
- rxnout – Return a reaction file. Otherwise, return an RD file
Returns: A list of RXN of RD file text blocks
-
rinchi_tools.conversion.
rinchis_to_keys
(data, longkey=False, shortkey=False, webkey=False, inc_rinchi=False, inc_rauxinfo=False)¶ Converts a list of rinchis in a flat file into a dictionary of RInChIs and keys
Parameters: - inc_rauxinfo – Include the RAuxInfo in the result
- data – The data string or file object to parse
- longkey – Whether to include the longkey
- shortkey – Whether to include the shortkey
- webkey – Whether to include the webkey
- inc_rinchi – Whether to include the original rinchi
Returns: :
{‘rinchi’: ‘[DATA], ‘rauxinfo’: [DATA, ... }
Return type: list of dictionaries containing the data produced data with the key as the property name like so
-
rinchi_tools.conversion.
rxn_to_rinchi
(rxn_text, ret_rauxinfo=False, longkey=False, shortkey=False, webkey=False, force_equilibrium=False)¶ Convert a RXN to a dictionary of calculated data.
Parameters: - rxn_text – The RXN text as a string
- ret_rauxinfo – Return RAuxInfo
- longkey – Return the Long Key
- shortkey – Return the Short Key
- webkey – Return the Web Key
- force_equilibrium – Force the output direction to be an equilibrium
Returns: :
{‘rinchi’: ‘[DATA], ‘rauxinfo’: [DATA, ... }
Return type: A dictionary of data with the key as the property name like so
RInChI Database Module¶
Provides tools for converting, creating, and removing from SQL databases
Modifications:
- Hammond 2014
- Hampshire 2017
Python 3 restructuring and new function addition. Significantly modularised the exisiting code
-
rinchi_tools.database.
compare_fingerprints
(search_term, db_filename, table_name)¶ Search db for top 10 closest matches to a RInChI by fingerprinting method. Sent to stdout.
Parameters: - search_term – A RInChi or Long-RInChIKey to search with
- db_filename – the db containing the fingerprints
- table_name – The table containing the RInChI fingerprints
-
rinchi_tools.database.
convert_v02_v03
(db_filename, table_name, v02_rinchi=False, v02_rauxinfo=False, v03_rinchi=False, v03_rauxinfo=False, v03_longkey=False, v03_shortkey=False, v03_webkey=False)¶ Converts a db of v02 rinchis into a db of v03 rinchis and associated information. N.B keys for v02 are not required as new keys must be generated for the db. Because of the nature of this problem, this is achieved by creating a new db for the processed data and then transferring back to the original
Parameters: - db_filename – The db filename to which the changes should be made. The new db is added as a table.
- table_name – the name for the new v03 rinchi table.
- v02_rinchi – The name of the v02 rinchi column. Defaults to False (No RInChI in db).
- v02_rauxinfo – The name of the v02 rauxinfo column. Defaults to False (No rauxinfos in db).
- v03_rinchi – The name of the v03 new rinchi column. Defaults to False (No rinchi column will be created).
- v03_rauxinfo – The name of the v03 new rinchi column. Defaults to False (No rauxinfo column will be created).
- v03_longkey – The name of the v03 new rinchi column. Defaults to False (No longkey column will be created).
- v03_shortkey – The name of the v03 new rinchi column. Defaults to False (No shortkey column will be created).
- v03_webkey – The name of the v03 new webkey column. Defaults to False (No webkey column will be created).
-
rinchi_tools.database.
csv_to_sql
(csv_name, db_filename, table_name)¶ Creates or appends an SQL db with values from a CSV file
Parameters: - csv_name – The CSV filename
- db_filename – The SQLite3 db
- table_name – The name of the table to create or append
-
rinchi_tools.database.
gen_rauxinfo
(db_filename, table_name)¶ Updates a table in a db to give rauxinfos where the column is null
Parameters: - db_filename – Database filename
- table_name – name of table
-
rinchi_tools.database.
rdf_to_sql
(rdfile, db_filename, table_name, columns=None)¶ Creates or adds to an SQLite db the contents of a given RDFile.
Parameters: - rdfile – The RD file to add to the db
- db_filename – The file name of the SQLite db
- table_name – The name of the table to create or append
- columns – The columns to add. If None, the default is [rinchi,rauxinfo,longkey,shortkey,webkey]
-
rinchi_tools.database.
recall_fingerprints
(lkey, db_filename, table_name)¶ Recall a fingerprint from the db
Parameters: - lkey – The long key to search for
- db_filename – The db filename
- table_name – The table name which stores the fingerprints
Returns: A numpy array the reaction fingerprint as stored in the reaction db
-
rinchi_tools.database.
search_for_roles
(db, table_name, reactant_subs=None, product_subs=None, agent_subs=None, limit=200)¶ Searches for reactions in a particular roles
-
rinchi_tools.database.
search_for_roles_advanced
(db, table_name, reactant_subs=None, product_subs=None, agent_subs=None, changing_subs=None, exclusive=False, unique=True, limit=200)¶ Searches for reactions in a particular functionality
-
rinchi_tools.database.
search_master
(search_term, db=None, table_name=None, is_sql_db=False, hyb=None, val=None, rings=None, formula=None, reactant=False, product=False, agent=False, number=1000, keytype=None, ring_type=None, isotopic=None)¶ Search for an string within a RInChi database. Includes all options.
Parameters: - ring_type –
- isotopic –
- db –
- is_sql_db –
- number – Maximum number of initial results
- search_term – The term to search for
- table_name – the table to search in
- reactant – Search for InChIs in the products
- product – Search for InChIs in the reactants
- agent – Search for InChIs in the agents
- keytype – The type of key to look for. If not found, then the function will check if the search term is a key,
- try to parse the Key regardless. Otherwise, it assumes to look in the RInChIs (and) –
- args following are dicts of the format {property (All) – count,property2:count2,...}
- hyb – The hybridisation changes(s) desired
- val – The valence change(s) desired
- rings – The ring change(s) desired
- formula – The formula change(s) desired
Returns: A dictionary of lists where an inchi was found
-
rinchi_tools.database.
search_rinchis
(search_term, db=None, table_name=None, is_sql_db=False, hyb=None, val=None, rings=None, formula=None, ringelements=None, isotopic=None, reactant=False, product=False, agent=False, number=1000)¶ Search for an Inchi within a RInChi database. Includes all options
Parameters: - db –
- is_sql_db –
- number –
- search_term – The term to search for
- table_name – the table to search in
- args following are dicts of the format {property (All) – count,property2:count2,...}
- hyb – The hybridisation changes(s) desired
- val – The valence change(s) desired
- rings – The ring change(s) desired
- formula – The formula change(s) desired
- reactant – Search for InChIs in the products
- product – Search for InChIs in the reactants
- agent – Search for InChIs in the agents
- ringelements –
- isotopic –
Returns: A dictionary of lists where an inchi was found
-
rinchi_tools.database.
sql_key_to_rinchi
(key, db_filename, table_name, keytype='L', column=None)¶ Returns the RInChI matching the given Long RInChI key for a given database
Parameters: - key – The key to search for
- db_filename – The database in which to search
- table_name – The table in which to search for the key
- keytype – The key type to seach for. Defaults to the long key
- column – Optional column to look for the key in.
Raises: ValueError
– The keytype argument must be one of “L” , “S” or “W”Returns: the corresponding RInChI
-
rinchi_tools.database.
update_fingerprints
(db_filename, table_name, fingerprint_table_name)¶ NOT CURRENTLY WORKING. NEEDS UPDATING TO USE MULTITHREADING FOR USABLE PERFORMANCE
Calculates the reaction fingerprint as defined in the reaction Reaction class, and stores it in the given db in a compressed form
Parameters: - db_filename – the db filename to update
- table_name – The table containing the RInChIs
- fingerprint_table_name – The table to contain the fingerprint
RInChI Substructure Matching Module¶
This module contains the matcher for matching molecules.
Modifications:
- Hampshire 2017
-
class
rinchi_tools.matcher.
Backup
(matcher_object)¶ Bases:
object
Stores the backed up mappings
-
backup
()¶ Backs up this iteration of the mapping
-
depth
()¶ The depth of the iterations
-
restore
()¶ Restores the previous mapping in the event of a failed mapping
-
-
class
rinchi_tools.matcher.
Matcher
(sub, master)¶ Bases:
object
Implementation of VF2 algorithm for matching as a subgraph of another.
made using this site.
Uses the python set implementation widely for best performance.
-
bonds_compatible
(mapping)¶ Checks if the bonds to the atoms in the mapping are compatible
-
count_compatable
(mapping)¶ Checks that the terminal sets as computed the mapping have the appropriate bond counts.
Also sets terminal sets for next iteration to avoid unnecessary repeated computation.
-
gen_possible_mappings
()¶ The function P(s) which generates the mappings to be tested for the particular current mapping M(s)
-
gen_test_state
(mapping)¶ Generates a test state for testing criteria.
-
get_backup_mappings
()¶ Get the trial mappings of the atoms in the event that no terminal mappings are found.
The inclusion of the min is fundamental to quick execution of the script
-
static
get_terminal_atoms
(atoms_mapped_set, molecule)¶ Gets the set of atoms in a moleculethat are not in the current mapping but are branches of the current mapping
-
get_terminal_mappings
()¶ Gets the mappings based on terminal atoms
-
is_compatible
(mapping)¶ - Checks if:
- The atom mapping has the correct atom
- Checks that other things
Returns a list of compatible atoms.
-
is_covering
()¶ Checks if all the atoms are mapped from the sublist in the lists
-
is_sub
()¶ Returns True if a subgraph of G1 is isomorphic to G2.
-
master_to_sub
(index)¶ Converts a sub graph index to the master index
-
match
()¶ Extends the isomorphism mapping, and acts as the iterating function in the VF2 algorithm.
This function is called recursively to determine if a complete isomorphism can be found between sub and master. It cleans up the class variables after each recursive call. If an isomorphism is found, we return the mapping.
-
new_state
(mapping)¶ Generates an new state from a mapping
-
sub_count
()¶ The number of unique matches found in the molecule
-
sub_count_unique
()¶
-
sub_to_master
(index)¶ Converts a sub graph index to the master index
-
RInChI Object Orientated Molecule Class Module¶
This module contains the Molecule class and associated functions
Modifications:
- Hammond 2014
- Hampshire 2017
Significant restructuring of the class to gain more consistent and less verbose code.
-
class
rinchi_tools.molecule.
Molecule
(inchi)¶ Bases:
object
A class containing a molecule as defined by an inchi. Contains functions for generating edge lists and node edge tables describing molecular graphs, and functions that use molecular graphs to calculate information about the molecules - ring sizes, atom hybridisation, contained functional groups etc.
-
static
breadth_first_search
(graph, start, finish)¶ Get the shortest path between the start and finish nodes
Adapted from http://eddmann.com/posts/depth-first-search-and-breadth-first-search-in-python/, accessed 06/11/2014
Parameters: - graph – an unweighted, undirected vertex-edge graph as a list
- start – the starting node
- finish – the finishing node as
Returns: The shortest path as a list
-
calculate_edges
(edge_list=None)¶ Sets the node-edge graph as a dict.
Parameters: edge_list – A molecular graph as a list of edges. If no list is passed, the function sets the atoms for its own instance.
-
calculate_rings
()¶ Sets the ring count property which contains the ring sizes in the format { ring size : number of rings present, ...}
-
calculate_rings_by_atoms
()¶ Count the rings by atom list eg. “CCCCCN” will return the number of pyridine fragments in the molecule.
Returns: number of rings
-
chemical_formula_to_dict
()¶ Get the chemical formula as a dict
Returns: A dict with elements as keys and number of atoms as value
-
static
composite_to_simple
(inchi)¶ Splits an inchi with multiple disconnected components into a list of connected inchis
# Modified 2017 D Hampshire to split formula of multiple identical components # cf. http://www.inchi-trust.org/technical-faq/#5.6
Parameters: inchi – A inchi (usually composite Returns: A list of simple inchis within the composite inchi argument
-
count_centres
(wd=False, sp2=True, sp3=True)¶ Counts the centres contained within an inchi
Parameters: - wd – Whether or not the stereocentre must be well-defined to be counted.
- sp2 – Count sp2 centres
- sp3 – Count sp3 centres
Returns: The number of stereocentres stereo_mols: The number of molecules with stereocentres
Return type: stereocentres
-
count_rings
()¶ Count the number of rings in an InChI.
Returns: The number of rings in the InChI. Return type: ring_count
-
count_sp2
(wd=False)¶ Count the number of sp2 stereocentres.
Parameters: wd – Whether or not the stereocentre must be well-defined to be counted. Returns: The number of sp2 stereocentres in the structure. Return type: sp2_centre_count
-
count_sp3
(wd=False, enantio=False)¶ Count the number of sp3 stereocentres in a molecule.
Parameters: - wd – Whether or not the stereocentre must be well-defined to be counted.
- enantio – Whether or not the structure must be enantiopure to be counted.
Returns: The number of sp3 stereocentres in the structure.
-
depth_first_search
(start=1)¶ Performs a DFS over the molecular graph of a given Molecule object, returning a list of edges that form a spanning tree (tree edges), and a list of the edges that would cyclise this spanning tree (back edges)
The number of back edges returned is equal to the number of rings that can be described in the molecule
Parameters: start – Set which atom should be the starting node Returns: A list of tree edges. back_edges: A list of back edges. The list length is equal to the smallest number of cycles that can describe the cycle space of the molecular graph Return type: tree_edges
-
static
edge_list_to_atoms_spanned
(edge_list)¶ Takes an edge list and returns a list of atoms spanned
Parameters: edge_list – An edge list Returns: A list of all the keys for the atoms which are spanned by the edge list.
-
edge_list_to_vector
(subset)¶ Converts an edge list to a vector in the (0, 1)^N vector space spanned by the edges of the molecule
Parameters: subset – The vector subset to use Returns: The vector stored as a list.
-
static
edges_to_atoms
(ls)¶ Sets the node-edge graph as a dict.
Parameters: ls – A molecular graph as a list of edges. If no list is passed, the function sets the atoms for its own instance.
-
find_initial_ring_set
()¶ For every edge in the molecule, find the smallest ring is it a part of, add it to a list NEEDS REIMPLEMENTATION
Returns: list of all minimal rings, sorted by the number of edges they contain
-
find_initial_ring_set_trial
()¶ For every edge in the molecule, find the smallest ring is it a part of, add it to a list TRIAL REIMPLEMENTATION, NOT YET WORKING
Returns: list of all minimal rings, sorted by the number of edges they contain
-
find_linearly_independent
(cycles)¶ Given a list of candidate cycles, sorted by size, this function attempts to find the smallest, linearly independent basis of cycles that spans the entire cycle space of the molecular graph - the Minimum Cycle Basis.
Parameters: cycles – list of candidate cycles sorted by size Returns: None
-
find_rings_from_back_edges
()¶ Accepts output from the depth_first_search algorithm, returns a list of all rings within the molecule.
Will NOT find a minimum cycle basis, but can be used to find an initial cycle set when performing the Horton Algorithm (see elsewhere)
-
find_shortest_path
(graph, start, end, path=None)¶ Recursively iterates over the entire molecular graph, yielding the shortest path between two points
Adapted from https://www.python.org/doc/essays/graphs/, accessed 15/10/2014
Parameters: - graph – an unweighted, undirected vertex-edge graph as a list
- start – the starting node as a number
- end – the finishing node as a number
- path – latest iteration of the path
Returns: The shortest path as a list of indices
-
generate_edge_list
()¶ Takes the connective layer of an inchi and returns the molecular graph as an edge list, parsing it directly using re.
Returns: A list containing the edges of the molecular graph Return type: edges
-
get_formula
()¶ Get chemical empirical formula
Returns: Chemical formula stored as a counter
-
get_hybrid_count
()¶ Calculate the hybridisation of each atom
Returns: A Counter object containing the hybridisation of the atoms
-
get_ring_count
()¶ Get the ring count
Returns: a Counter object containing the number of rings of each size
-
get_ring_count_inc_elements
()¶ Count the rings of a molecule. Result includes the elements of the ring.
Returns: a Counter containing the number of rings of each size and the elements contained by a ring
-
get_valence_count
()¶ Calculates the valences of each atom in the Molecule
Returns: A Counter object containing the valences of the atoms
-
has_isotopic_layer
()¶ Does the molecule inchi have an isotopic layer?
Returns: A boolean value
-
inchi_to_chemical_formula
()¶ Converts an Inchi to a Chemical formula
Returns: The Chemical Formula of the Molecule as a string
-
inchi_to_layer
(l)¶ Get a particular layer of the InChI
Parameters: l – The layer of the InChI to retrieve Returns: The InChI layer desired
-
initialize
()¶ Initialises the molecule
-
static
new
(inchi)¶ Creates a list of new Molecule objects. Safer than Molecule() due to composite InChI implications.
Parameters: inchi – An InChI string Returns: list of Molecule objects.
-
static
path_to_cycle_edge_list
(path)¶ Converts a cycle described by an ordered list of nodes to an edge list
Parameters: path – The path of the cycle stored as an ordered list Returns: The edge list
-
set_atomic_hydrogen
()¶ Takes the molecular graph and the inchi, and sets the number of protons attached to each atom.
Requires initialised atoms.
-
set_atoms
()¶ Sets the atoms objects with their appropriate indexes and elements for each of the instances of the the Atom class.
-
vector_to_edge_list
(vector)¶ Takes an edge vector and returns an edge list
Parameters: vector – an edge vector stored in an iterable Returns: The edge list
-
static
RInChI Object Orientated Reaction Class Module¶
This module contains the Reaction class and associated functions
Modifications:
- Hammond 2014
- Hampshire 2017
Significant restructuring of the class to gain more consistent and less verbose code.
-
class
rinchi_tools.reaction.
Reaction
(rinchi)¶ Bases:
object
This class defines a reaction, as defined by a RInChI. Molecule objects are created from all component InChIs, and the member functions of the class can be used to analyse various parameters that may be changing across the reaction
-
calculate_reaction_fingerprint
(fingerprint_size=1024)¶ Calculates a reaction fingerprint for a given reaction. Uses a 1024 bit fingerprint by default
Method of Daniel M. Lowe (2015)
This function generates fingerprints for individual molecules using obabel. Could be simply modified to use other software packages ie. RDKIT if desired
Parameters: fingerprint_size – The length of the fingerprint to be generated.
-
change_across_reaction
(func, *args)¶ Calculates the total change in a parameter across a molecule, Molecule class function and returns a Python Counter object
Parameters: - func – The class function to calculate the parameter, which returns a Counter object
- args – Args if required for the function
Returns: the change in the parameter
-
detect_reaction
(hyb_i=None, val_i=None, rings_i=None, formula_i=None, isotopic=False, ring_elements=None)¶ Detect if a reaction satisfies certain conditions. Allows searching for reactions based on ring changes, valence changes, formula changes, hybridisation of C atom changes.
Parameters: - args are dicts of the format {property (All) – count,property2:count2,...}
- hyb_i – The hybridisation change(s) desired
- val_i – The valence change(s) desired
- rings_i – The ring change(s) desired
- formula_i – The formula change(s) desired
- isotopic – Whether to look for reactions involving an isotopic InChI
- ring_elements – Look for a ring in the reaction
Returns: True if the given reaction satisfies all the conditions, otherwise False.
-
generate_svg_image
(outname)¶ Outputs the reactants, products, and agents as SVG files in the current directory with the given filename
Parameters: outname – the name of the file to output the SVG image
-
has_isotopic_inchi
()¶
-
has_ring
(ring)¶
-
has_substructures
(reactant_subs=None, product_subs=None, agent_subs=None, exclusive=True, rct_disappears=True, pdt_appears=True)¶ Detects if the reaction is a substructure
Parameters: - reactant_subs – Lists of reactant inchis
- product_subs – List of product inchis
- agent_subs – List of agent inchis
- exclusive – Match one functionality per molecule of reactant
- rct_disappears – Only match if substructures not in products
- pdt_appears – Only match if substructures not in reactants
Returns: Boolean, whether the substructures are contained
-
has_substructures_by_populations
(reactant_subs=None, product_subs=None, agent_subs=None, changing_subs=None, exclusive=False, unique=True)¶ Detects if the reaction is a substructure
Parameters: - reactant_subs – Dictionary of reactant inchis and their populations in the layer
- product_subs – Dictionary of product inchis and their populations in the layer
- agent_subs – Dictionary of product inchis and their populations in the layer
- changing_subs – Dictionary of inchi changes in populations
- exclusive – Match one functionality per molecule of reactant
- unique – Prevent matching the same atoms
Returns: Boolean, whether the substructures are contained
-
is_agent
(inchi)¶ Determine whether the reaction is catalytic in a particular chemical
Parameters: inchi – A InChI string specifying a molecule Returns: True or False (Boolean)
-
is_balanced
()¶ Determine if a reaction is balanced
Returns: True if Balanced, False otherwise.
-
longkey
()¶ Set longkey if not already set, then return longkey
-
static
present_in_layer
(layer, inchi)¶ Checks if an InChI is is present in a layer
Parameters: - layer – A reaction layer
- inchi – an Inchi
Returns: Returns the RInChI if the inchi is present, otherwise returns None.
-
present_in_reaction
(func)¶ Tests if a molecule is present in the reaction
Parameters: func – function of a Molecule object that returns True if a given condition is satisfied Returns: If the function returns true for any InChI, the parent RInChI is returned
-
ring_change
()¶ Determine how the number of rings changes in a reaction. Old method
Returns: A counter containing the changes across the reaction.
-
shortkey
()¶ Set shortkey if not already set, then return shortkey
-
stereo_change
(wd=False, sp2=True, sp3=True)¶ Determine whether a reaction creates or destroys stereochemistry. Old Methold
Parameters: - wd – Whether only well-defined stereocentres count.
- sp2 – Whether to count sp2 stereocentres.
- sp3 – Whether to count sp3 stereocentres.
Returns: The number of stereocentres created by a reaction stored as a value in a dictionary
-
webkey
()¶ Set webkey if not already set, then return webkey
-
RInChI C Library Interface Module¶
This module provides functions defining how RInChIs and RAuxInfos are constructed from InChIs and reaction data. It also interfaces with the RInChI v1.00 software as provided by the InChI trust.
This file is based on that provided with the official v1.00 RInChI software release, but with modifications to ensure Python 3 compatibility. Documentation was adapted from the official v1.00 release document.
Modifications:
- Hampshire 2017
-
class
rinchi_tools.rinchi_lib.
RInChI
(lib_path='/home/dh493/Documents/rinchi03-extended/rinchi_tools/libs/librinchi.so.1.0.0')¶ Bases:
object
The RInChI class interfaces the C class in the librinchi library
-
file_text_from_rinchi
(rinchi_string, rinchi_auxinfo, output_format)¶ Reconstructs (or attempts to reconstruct) RD or RXN file from RInChI string and RAuxInfo
Parameters: - rinchi_string – The RInChI string to convert
- rinchi_auxinfo – The RAuxInfo to convert (optional, recommended)
- output_format – “RD” or “RXN”
Returns: The text block for the file
-
inchis_from_rinchi
(rinchi_string, rinchi_auxinfo='')¶ Splits an RInChI string and optional RAuxInfo into components.
Parameters: - rinchi_string – A RInChI string
- rinchi_auxinfo – RAuxInfo string. May be blank but may not be NULL.
Raises: Exception
– RInChi format related errorsReturns: :
- {‘Direction’: [direction character],
‘No-Structures’: [list of no-structures], ‘Reactants’: [list of inchis & auxinfos], ‘Products’: [list of inchis & auxinfos], ‘Agents’: [list of inchis] & auxinfos}
Each Reactant, Product, and Agent list contains a set of (InChI, AuxInfo) tuples. The No-Structures list contains No-Structure counts for Reactants, Products, and Agents.
Return type: A dictionary of data returned. The structure is as below
-
rinchi_errorcheck
(return_code)¶ Specifies Python error handling behavior
Parameters: return_code – the return code from the C library
-
rinchi_from_file_text
(input_format, rxnfile_data, force_equilibrium=False)¶ Generates RInChI string and RAuxInfo from supplied RD or RXN file text.
Parameters: - input_format – “AUTO”, “RD” or “RXN” (with “AUTO” as default value)
- rxnfile_data – text block of RD or RXN file data
- force_equilibrium (bool) – Force interpretation of reaction as equilibrium reaction
Returns: tuple pair of the RInChI and RAuxInfo generated
-
rinchikey_from_file_text
(input_format, file_text, key_type, force_equilibrium=False)¶ Generates RInChI key of supplied RD or RXN file text.
Parameters: - input_format – “RD” or “RXN”
- file_text – text block of RD or RXN file data
- key_type – 1 letter controlling the type of key generated; “L” for Long-RInChIKey, “S” for Short key
- “W” for Web key ((Short-RInChIKey),) –
- force_equilibrium (bool) – Force interpretation of reaction as equilibrium reaction
Returns: a RInChIKey
-
rinchikey_from_rinchi
(rinchi_string, key_type)¶ Generates RInChI key of supplied RD or RXN file text.
Parameters: - rinchi_string – A RInChI string
- key_type – 1 letter controlling the type of key generated with “L” for the Long-RInChIKey, “S” for the Short key (Short-RInChIKey), “W” for the Web key (Web-RInChIKey)
Returns: the RInChiKey
-
RInChI Tools Module¶
This module provides functions defining how RInChIs and RAuxInfos are constructed from InChIs and reaction data. It also interfaces with the RInChI v1.00 software as provided by the InChI trust.
Modifications:
- C.H.G. Allen 2012
- D.F. Hampshire 2016
-
rinchi_tools.tools.
add
(rinchis)¶ Combines a list of RInChIs into one combined RInChI.
N.B. As stoichiometry is not represented in the input, this is an approximate addition.
Substances from RInChIs are sorted into one of four “pots”:
“Used” contains substances which have acted as a reagent, and have not yet been created again as a product.
“Made” contains substances which have been created as a product of a step, and have yet to be used again.
- “Present” contains substance which have been present during a step, but have not yet been used up or
substances which have been used as a reagent, and later regenerated as a product.
“Intermediates” contains substances which have been created as a product, and later used as a reagent.
Each RInChI is considered in turn:
- The reactants are considered:
- If novel, add to “used”.
- If in “used”, remain in “used”.
- If in “made”, move to “intermediates”.
- If in “present”, move to “used”.
- If in “intermediates”, remain in “intermediates”.
- The products are considered:
- If novel, add to “made”.
- If in “used”, move to “present”.
- If in “made”, remain in “made”.
- If in “present”, remain in “present”.
- If in “intermediates”, move to “made”.
- The extras are considered:
- If novel, add to “present”.
- The pots are then emptied into the following output receptacles:
- “Used” -> LHS InChIs
- “Made” -> RHS InChIs
- “Present” -> BHS InChIs
- “Intermediates” -> discarded
Finally, the RInChI is constructed in the usual way and returned.
Parameters: rinchis – A list of RInChIs, representing a sequence of reactions making up one overall process. The order of this list is important, as each RInChI is interpreted as a step in the overall process. They must also have a clearly defined direction. Returns: A RInChI representing the overall process.
-
rinchi_tools.tools.
build_rauxinfo
(l2_auxinfo, l3_auxinfo, l4_auxinfo)¶ Takes 3 sets of AuxInfos and converts them into a RAuxInfo. n.b. The order of Inchis in each list is presumed to be corresponding to that in the RInChI
Parameters: - l2_auxinfo – List of layer 2 AuxInfos
- l3_auxinfo – List of layer 3 AuxInfos
- l4_auxinfo – List of layer 4 AuxInfos
Returns: An RAuxInfo
-
rinchi_tools.tools.
build_rinchi
(l2_inchis=None, l3_inchis=None, l4_inchis=None, direction='', u_struct='')¶ Build a RInChI from the specified InChIs and reaction data.
RInChI Builder takes three groups of InChIs, and additional reaction data (currently limited to directionality information), and returns a RInChI.
The first three arguments are groups of InChIs saved as strings within an iterable (e.g. a list, set, tuple). Any or all of these may be omitted. All InChIs must be of the same version number. If a chemical which cannot be described by an InChI is desired within the RInChI, it should be added to the u_struct argument detailed below.
Parameters: - l2_inchis – Chemicals in the second layer of the RInChI
- l3_inchis – Chemicals in the third layer of the RInChI
- l4_inchis – Chemicals in the fourth layer of a RInChI. It refers to the substances present at the start and end of the reaction (e.g. catalysts, solvents), only referred to as “agents”.
- direction – This must be “+”, “-” or “=”. “+” means that l2_inchis_input are the reactants, and the l3_inchis the products; “-” means the opposite; and “=” means the l2_inchis and l3_inchis are in equilibrium.
- u_struct – Defines the number of unknown structures in each layer. This must be a list of the form [#2,#3, #4] where #2 is the number of unknown reactants in layer 2, #3 is number in layer 3 etc.
Returns: The RinChI made from the input InChIs and reaction data.
Raises: VersionError
– The input InChIs are not of the same version.
-
rinchi_tools.tools.
build_rinchi_rauxinfo
(l2_input=None, l3_input=None, l4_input=None, direction='', u_struct='')¶ Build a RInChI and RAuxInfo from the specified InChIs and reaction data.
RInChI Builder takes three groups of InChIs, and additional reaction data, and returns a RInChI.
The first three arguments are tuples of InChI and RAuxInfo pairs within an iterable (e.g. a list, set, tuple). Any or all of these may be omitted. All InChIs must be of the same version number. If a chemical which cannot be described by an InChI is desired within the RInChI, it should be added to the u_struct argument detailed below.
Parameters: - u_struct – Defines the number of unknown structures in each layer. This must be a list of the form [#2,#3, #4] where #2 is the number of unknown reactants in layer 2, #3 is number in layer 3 etc.
- l2_input – Chemicals in the second layer of the RInChI
- l3_input – Chemicals in the third layer of the RInChI
- l4_input – Chemicals in the fourth layer of a RInChI. It refers to the substances present at the start and end of the reaction (e.g. catalysts, solvents), only referred to as “agents”.
- direction – This must be “+”, “-” or “=”. “+” means that the LHS are the reactants, and the RHS the products; “-” means the opposite; and “=” means the LHS and RHS are in equilibrium.
Returns: The RInChI and RAuxInfo made from the input InChIs and reaction data.
Raises: VersionError
– The input InChIs are not of the same version.
-
rinchi_tools.tools.
dedupe_rinchi
(rinchi, rauxinfo='')¶ Removes duplicate InChI entries from the RInChI
Parameters: - rinchi – A RInChI string
- rauxinfo – Optional RAuxInfo
Returns: A RInChI and RAuxInfo tuple
-
rinchi_tools.tools.
generate_rauxinfo
(rinchi)¶ Create RAuxInfo for a RInChI using the InChI conversion function.
Parameters: rinchi – The RInChI of which to create the RAuxInfo. Returns: The RAuxInfo of the RinChI.
-
rinchi_tools.tools.
inchi_2_auxinfo
(inchi)¶ Run the InChI software on an InChI to generate AuxInfo.
The function saves the InChI to a temporary file, and runs the inchi-1 program on this tempfile as a subprocess. The AuxInfo will not include 2D coordinates, but an AuxInfo of some kind is required for the InChI software to convert an InChI to an SDFile.
Parameters: inchi – An InChI from which to generate AuxInfo. Returns: The InChI’s AuxInfo (will not contain 2D coordinates).
-
rinchi_tools.tools.
process_stats
(rinchis, mostcommon=None)¶ Takes an iterable
Parameters: - rinchis – An iterable of RInChIs
- mostcommon – Return only the most common items
Returns: Dictionary of counters containing the information.
-
rinchi_tools.tools.
remove_stereo
(inchi)¶ Removes stereochemistry from an InChI
Parameters: inchi – an InChI as a string Returns: an InChI
-
rinchi_tools.tools.
rinchi_to_dict_list
(data)¶ Takes a text block or file object and parse a dictionary of RInChI entries
Parameters: data – The text block or file object to parse Returns: A list of dictionaries containing each dictionary entry
-
rinchi_tools.tools.
split_rinchi
(rinchi)¶ Returns the inchis without RAuxInfo, each in lists, and the direct and no_structs lists
Parameters: rinchi – A RInChI String Returns: - rct_inchis:
- List of reactant inchis
- pdt_inchis:
- List of product inchis
- agt_inchis:
- List of agent inchis
- direction:
- returns the direction character
- no_structs:
- returns a list of the numbers of unknown structures in each layer
Return type: A tuple containing
-
rinchi_tools.tools.
split_rinchi_inc_auxinfo
(rinchi, rinchi_auxinfo)¶ Returns the inchi and auxinfo pairs, each in lists, the direction character, and a list of unknown structures.
Parameters: - rinchi – A RInChI String
- rinchi_auxinfo – The corresponding RAuxInfo
Returns: - rct_inchis:
List of reactant inchi and auxinfo pairs
- pdt_inchis:
List of product inchi and auxinfo pairs
- agt_inchis:
List of agent inchi and auxinfo pairs
- direction:
returns the direction character
- no_structs:
returns a list of the numbers of unknown structures in each layer
Return type: A tuple containing
-
rinchi_tools.tools.
split_rinchi_only_auxinfo
(rinchi, rinchi_auxinfo)¶ Returns the RAuxInfo
Parameters: - rinchi – A RInChI String
- rinchi_auxinfo – The corresponding RAuxInfo
Returns: - rct_inchis_auxinfo:
List of reactant AuxInfos
- pdt_inchis_auxinfo:
List of product AuxInfos
- agt_inchis_auxinfo:
List of agent AuxInfos
Return type: A tuple containing
RInChI Utilities Module¶
This module provides functions that perform various non RInChI specific tasks.
Modifications:
- D.F. Hampshire 2016
-
class
rinchi_tools.utils.
Hashable
(val)¶ Bases:
object
Make an object hashable for counting. Used to count counters
-
class
rinchi_tools.utils.
Spinner
(delay=None)¶ Bases:
object
A spinner which shows during a long process.
-
busy
= False¶
-
delay
= 0.1¶
-
static
spinning_cursor
()¶
-
start
()¶ Starts the spinner
-
stop
()¶ Stops the spinner
-
-
rinchi_tools.utils.
call_command
(args, debug=False)¶ Run a command as a subprocess and return the output
Parameters: - args – The command to execute as a string
- debug – Debug the command
Returns: The output of query and error code
-
rinchi_tools.utils.
consolidate
(items)¶ Check that all non-empty items in an iterable are identical
Parameters: items – the iterable Raises: ValueError
– Items are not all identicalReturns: the value of all the items in the list Return type: value
-
rinchi_tools.utils.
construct_output_text
(data, header_order=False)¶ Turns a variable containing a list of dicts or a dict or dict of lists into a single string of data
Parameters: - data – The data variable
- header_order – Optional list of keys for the dictionaries. The list can contain non present keys.
Returns: The output as a text block
-
rinchi_tools.utils.
counter_to_print_string
(counter, name)¶ Formats counter for printing
Parameters: - counter – The
Counter
object - name – Name of the data stored in the counter
- counter – The
-
rinchi_tools.utils.
create_output_file
(output_path, default_extension, create_out_dir=True)¶ Creates an output file
Parameters: - output_path – the path of the file to create
- default_extension – the extension to use for the file
- create_out_dir – Create an output directory
Returns: A tuple containing a file object and the path of the file object.
-
rinchi_tools.utils.
output
(text, output_path=False, default_extension=False)¶ Simple output wrapper to print or write outputs.
Parameters: - text – text input
- output_path – Specifies the filename for the output file
- default_extension – specifies the file extension if none in the outputname
-
rinchi_tools.utils.
read_input_file
(input_path, filetype_check=False, return_file_object=False)¶ Reads an input path into a string
Parameters: - input_path – The path of the file to open
- filetype_check – Check type of file
- return_file_object – Return a file object instead of a string
Returns: A multi-line string or a file object
-
rinchi_tools.utils.
string_to_dict
(string)¶ Converts a string of form ‘a=1,b=2,c=3’ to a dictionary of form {a:1,b:2,c:3}
Version 0.02 RInChIKey Generation Library Module¶
This module provides functions to create Long- and Short-RInChIKeys from RInChIs.
The supplied implementation of the inchi_2_inchikey function uses the InChIKey creation algorithm from OASA, a free python library for the manipulation of chemical formats, now stored permanently in the v02_inchi_key.py module.
Modifications:
C.H.G. Allen 2012
D.F. Hampshire 2016
Modified for Python3 compatibility
-
rinchi_tools.v02_rinchi_key.
rinchi_2_longkey
(rinchi)¶ Create Long-RInChIKey from a RInChI.
Parameters: rinchi – The RInChI of which to create the RAuxInfo. Returns: The Long-RInChIKey of the RinChI.
-
rinchi_tools.v02_rinchi_key.
rinchi_2_shortkey
(rinchi)¶ Create a Short-RInChIKey from a RInChI.
Parameters: rinchi – The RInChI from which to create the Short-RInChIKey Returns: The Short-RInChIKey of the RInChI
RInChI v0.02 to 1.00 Conversion Module¶
Modifications:
- D.F. Hampshire 2016
-
rinchi_tools.v02_tools.
convert_all
(rinchi, rauxinfo)¶ Convert a v0.02 RInChI & RAuxInfo into a v1.00 RInChI & RAuxInfo.
Parameters: - rinchi – A RInChI of version 0.02.
- rauxinfo – A RAuxInfo of version 0.02.
Returns: - rauxinfo:
A RAuxInfo of version 1.00.
- rauxinfo:
A RAuxInfo of version 1.00.
Return type: A tuple containing
-
rinchi_tools.v02_tools.
convert_rauxinfo
(rauxinfo)¶ Convert a v0.02 RAuxInfo into a v1.00 RAuxInfo.
Parameters: rauxinfo – A RAuxInfo of version 0.02. Returns: A RAuxInfo of version 1.00.
-
rinchi_tools.v02_tools.
convert_rinchi
(rinchi)¶ Convert a v0.02 RInChI into a v1.00 RInChI.
Parameters: rinchi – A RInChI of version 0.02. Returns: A RInChI of version 1.00.
-
rinchi_tools.v02_tools.
generate_rauxinfo
(rinchi)¶ Create RAuxInfo for a RInChI using a conversion function.
Parameters: rinchi – The RInChI of which to create the RAuxInfo. Returns: The RAuxInfo of the RinChI Raises: VersionError
– If the generated AuxInfos are not of the same version.