QSAR

Suprapto van Plaosan
10 min readSep 20, 2023

--

QSAR (Quantitative Structure-Activity Relationship) is a computational approach to predict the biological activity of a compound based on its chemical structure. It uses a combination of mathematical and statistical methods to build a model that can predict the activity of a new compound based on the activity of a set of known compounds.

QSAR models are typically built using a large dataset of compounds, their chemical structures and their corresponding activities. The chemical structures are usually represented as numerical descriptors that describe the chemical and physical properties of the compounds. The activities are usually represented as a numerical value, such as an IC50 (half-maximal inhibitory concentration) for a bioactivity assay.

QSAR models can be used to predict a wide range of biological activities, such as toxicity, enzyme inhibition, receptor binding and drug efficacy. They can also be used to identify important structural features that are responsible for the activity of a compound, and to generate hypotheses about the mechanisms of action of the compounds.

QSAR models are widely used in drug discovery and development, as they allow for the prediction of the activity of new compounds before they are synthesized or tested experimentally. However, it’s important to note that the results of QSAR analysis are based on computational models, and they need to be validated experimentally to confirm their accuracy.

Converting a chemical structure to computational data involves the representation of the chemical structure in a format that can be used by computational software.

One common way of representing a chemical structure computationally is through molecular descriptors or fingerprints. These are numerical values that describe different properties of the chemical structure, such as its shape, size, charge, and chemical functionality. There are many different types of molecular descriptors available, such as topological indices, electrotopological state indices, and molecular dynamics descriptors.

Another way to represent a chemical structure computationally is through 3D coordinates, which can be obtained through X-ray crystallography or Nuclear Magnetic Resonance (NMR) spectroscopy. These coordinates can be used for molecular modeling and simulation studies such as molecular docking or molecular dynamics.

The choice of the representation will depend on the computational method that will be applied, as well as the characteristics of the chemical structure. For example, if the goal is to compare the similarity between different compounds, molecular fingerprints would be useful, while if the goal is to predict the binding mode of a small molecule to a protein, the 3D coordinates would be more appropriate.

It’s important to note that once the chemical structure is converted to computational data, it can be used in various computational methods such as QSAR, molecular dynamics, molecular docking, and virtual screening.

Molecular Descriptors

Molecular descriptors are numerical values that describe different properties of a chemical structure. These properties can include physical properties, such as size, shape and charge, as well as chemical properties, such as the distribution of functional groups and the presence of certain chemical moieties. Molecular descriptors can be used to represent chemical structures in a format that can be used by computational software, such as QSAR, molecular dynamics, molecular docking, and virtual screening.

There are many different types of molecular descriptors available, each of which describes a different aspect of a chemical structure. Some common types of molecular descriptors include:

  1. Topological indices: These descriptors describe the connectivity of atoms in a molecule. Examples include the Wiener index and the Randic index.
  2. Electrotopological state indices (E-state indices): These descriptors describe the electronic properties of a molecule. They are based on the electron density distribution of a molecule and are related to chemical reactivity.
  3. Fingerprints: These descriptors describe the presence of certain chemical moieties or functional groups in a molecule. Examples include the ECFP (extended-connectivity fingerprint) and the MACCS (Molecular Accurate Color Codes) fingerprints.
  4. Molecular Dynamics Descriptors: These descriptors describe the dynamic properties of a molecule. They include the radius of gyration, the asphericity and the anisotropy.
  5. Physico-Chemical properties: These descriptors include lipophilicity, polarizability, and polar surface area.
  6. Quantum-Chemical Descriptors: These descriptors are calculated from quantum-chemical calculations and are related to the electronic properties of the molecule

Topological indices

Topological indices are molecular descriptors that describe the connectivity of atoms in a chemical structure. They are based on the topology of a molecule, which is the arrangement of atoms and bonds in a molecule. Topological indices are used to represent chemical structures in a format that can be used by computational software, such as QSAR, molecular dynamics, molecular docking, and virtual screening.

Some common types of topological indices include:

  1. Wiener index: Also known as the connectivity index, it is defined as the sum of the shortest path lengths between all pairs of atoms in a molecule.
  2. Randic index: It is defined as the sum of the reciprocals of the shortest path lengths between all pairs of atoms in a molecule.
  3. Zagreb index: It is defined as the sum of the squares of the degree of atoms in a molecule.
  4. Balaban index: It is defined as the sum of the square roots of the product of the degree of atoms and the degree of their neighbors.
  5. Hyper-Wiener index: It is defined as the sum of the shortest path lengths between all pairs of atoms in a molecule raised to a power.
  6. Szeged index: It is defined as the sum of the product of the degree of atoms and the degree of their neighbors.

Topological indices are used to compare the similarity between different chemical structures, and to identify structural features that are important for a particular biological activity. They can also be used to predict properties such as melting point and boiling point of a compound. However, it’s important to note that the results of topological indices are based on computational models, and they need to be validated experimentally to confirm their accuracy.

Electrotopological state indices

Electrotopological state indices (E-state indices) are molecular descriptors that describe the electronic properties of a chemical structure. They are based on the electron density distribution of a molecule and are related to chemical reactivity. E-state indices are used to represent chemical structures in a format that can be used by computational software, such as QSAR, molecular dynamics, molecular docking, and virtual screening.

E-state indices are calculated using the quantum-chemical electron density at the nuclei, which is calculated by the quantum-mechanical method such as Hartree-Fock or density functional theory. The most common E-state indices include:

  1. E-state indices (EI): The most widely used E-state indices, calculated as the sum of the electron density at the nuclei raised to a power.
  2. E-state fragmental indices (EFI): Similar to EI, but calculated for a specific fragment of the molecule.
  3. E-state connectivity indices (ECI): A combination of EI and topological indices, calculated as the product of the EI and the connectivity index.
  4. E-state valence indices (EVI): Calculated as the sum of the electron density at the nuclei raised to a power, but only for the valence electrons.

E-state indices are used to predict properties such as acidity, basicity, and reactivity of a compound. They can also be used to compare the similarity between different chemical structures and to identify structural features that are important for a particular biological activity. However, it’s important to note that the results of E-state indices are based on computational models, and they need to be validated experimentally to confirm their accuracy.

Fingerprints

Fingerprints are a type of molecular descriptor that describes the presence of certain chemical moieties or functional groups in a chemical structure. They are used to represent chemical structures in a format that can be used by computational software, such as QSAR, molecular dynamics, molecular docking, and virtual screening.

Fingerprints are typically created by converting a chemical structure into a binary vector, where each element of the vector represents the presence or absence of a specific chemical feature or functional group in the molecule. There are several different types of fingerprints, each of which represents a different aspect of a chemical structure, such as structural information, electrostatic information and chemical information.

Some common types of fingerprints include:

  1. Extended-Connectivity Fingerprints (ECFPs): Represent the presence of specific chemical features or functional groups in a molecule.
  2. Molecular Accurate Color Codes (MACCS): Represent the presence of structural keys in a molecule.
  3. Daylight fingerprints: Represent the presence of structural keys, electrostatic and chemical information in a molecule
  4. Morgan fingerprints: Represent the presence of structural keys in a molecule based on the circular fingerprints
  5. Atompair fingerprints: Represent the presence of structural keys in a molecule based on the pairs of atoms

Fingerprints are widely used to compare the similarity between different chemical structures, and to identify structural features that are important for a particular biological activity. They can also be used in virtual screening to identify potential drug candidates based on their similarity to known active compounds. However, it’s important to note that the results of fingerprints are based on computational models, and they need to be validated experimentally to confirm their accuracy.

Molecular Dynamics Descriptors

Molecular dynamics (MD) descriptors are a type of molecular descriptor that describe the dynamic properties of a chemical structure. They are calculated from molecular dynamics simulations and are used to represent chemical structures in a format that can be used by computational software, such as QSAR, molecular dynamics, molecular docking, and virtual screening.

Some common types of MD descriptors include:

  1. Radius of gyration: It is the measure of the size of a molecule, calculated as the root mean square distance of all atoms from the center of mass of the molecule.
  2. Asphericity: It is a measure of the deviation of the shape of a molecule from a perfect sphere, calculated as the difference between the second and third eigenvalues of the moment of inertia tensor of a molecule.
  3. Anisotropy: It is a measure of the deviation of the shape of a molecule from a perfect sphere, calculated as the ratio of the largest eigenvalue of the moment of inertia tensor to the sum of the three eigenvalues of the moment of inertia tensor.
  4. RMSF (Root Mean Square Fluctuation): It is a measure of the flexibility of a molecule, calculated as the root mean square deviation of the atom positions from their average positions in the simulation.
  5. Order parameter: It is a measure of the degree of order of a molecule, calculated as the average cosine of the angle between the local orientation of a molecule and the average orientation of the simulation.
  6. Principal Component Analysis (PCA): It is a statistical method to identify the most important directions of motion of a molecule, calculated as the eigenvectors of the covariance matrix of the atom positions in the simulation.

MD descriptors can be used to predict properties such as solubility, stability, and protein-ligand binding, and to identify structural features that are important for a particular biological activity. However, it’s important to note that the results of MD descriptors are based on computational models, and they

Physico-Chemical properties

Physico-chemical properties are a type of molecular descriptor that describe the physical and chemical properties of a chemical structure. They are calculated based on the chemical structure of the molecule and are used to represent chemical structures in a format that can be used by computational software, such as QSAR, molecular dynamics, molecular docking, and virtual screening.

Some common types of physico-chemical properties include:

  1. Lipophilicity: It is a measure of the solubility of a molecule in lipids, calculated as the partition coefficient of a molecule between an organic solvent and water.
  2. Polarizability: It is a measure of the ability of a molecule to be polarized, calculated as the polarizability of a molecule.
  3. Polar surface area (PSA): It is a measure of the polar surface of a molecule, calculated as the total surface area of atoms in a molecule that are involved in polar interactions.
  4. Hydrogen bond donor: It is a measure of the ability of a molecule to donate hydrogen bonds, calculated as the number of atoms that can participate in hydrogen bonding.
  5. Hydrogen bond acceptor: It is a measure of the ability of a molecule to accept hydrogen bonds, calculated as the number of atoms that can participate in hydrogen bonding.
  6. Molecular weight: It is the mass of a molecule, calculated as the sum of the atomic weights of all atoms in the molecule.

These properties can be used to predict properties such as solubility, stability, and protein-ligand binding, and to identify structural features that are important for a particular biological activity. However, it’s important to note that the results of physico-chemical properties are based on computational models, and they need to be validated experimentally to confirm their accuracy.

Quantum-Chemical Descriptors

Quantum-chemical descriptors are a type of molecular descriptor that describe the electronic properties of a chemical structure. They are calculated using quantum-chemical calculations and are used to represent chemical structures in a format that can be used by computational software, such as QSAR, molecular dynamics, molecular docking, and virtual screening.

Some common types of quantum-chemical descriptors include:

  1. Molecular orbitals (MOs): They are the solutions of the Schrödinger equation for a molecule, calculated as the wavefunctions of the electrons in a molecule.
  2. Electronic density of states (EDOS): They are the distribution of the electronic states of a molecule, calculated as the number of electronic states per unit energy interval.
  3. HOMO (highest occupied molecular orbital): It is the highest-energy occupied molecular orbital of a molecule.
  4. LUMO (lowest unoccupied molecular orbital): It is the lowest-energy unoccupied molecular orbital of a molecule.
  5. Ionization potential (IP): It is the energy required to remove an electron from a molecule.
  6. Electron affinity (EA): It is the energy released when an electron is added to a molecule.

Quantum-chemical descriptors can be used to predict properties such as acidity, basicity, and reactivity of a compound. They can also be used to compare the similarity between different chemical structures and to identify structural features that are important for a particular biological activity. However, it’s important to note that the results of quantum-chemical descriptors are based on computational models, and they need to be validated experimentally to confirm their accuracy.

--

--