This example illustrates the integration of MATLAB®-RDKit for molecular data processing and analysis, specifically focusing on Lipophilicity and Aqueous Solubility data. These parameters are crucial in molecular design and drug discovery, as they significantly impact a compound's pharmacokinetics and bioavailability. By understanding and optimizing these properties, one can enhance a drug's therapeutic efficacy through improved absorption, distribution, and interaction with biological targets.
In this example, we demonstrate how to import a database containing such data in .csv format. The data, as reported in Ref. 1, can be accessed via the link provided in the referenced paper.
The example also showcases how to visualize any selected molecule from the SMILES strings in loaded database, along with its corresponding LogP and LogS values. Subsequently, the database is partitioned based on the normalized LogP and LogS values as shown in the above figure.
Check out the accompanying video to see how this code works in action!
To Run this example use this MATLAB Live Script:
Import_Visualize_and_Partition_Molecular_Datasets.mlx
MathWorks Products (https://www.mathworks.com)
Set up your Python environment by following the instructions provided in the guide found at Python webpage. Make sure to give the python address and to check versions of Python compatible with MATLAB products by release. This allows to build proper MATLAB Interface to Python.
This example uses some of the functions from RDKit. It can be installed easily by following its installation instructions on Linux, Windows, and macOS. You can install RDKit using pip install rdkit.
Follow the steps provided in this MATLAB Live Script:
Import_Visualize_and_Partition_Molecular_Datasets.mlx
Try the exercises provided at the end of this example:
Investigate Correlation: Is there a correlation between the similarities in molecular structures and their corresponding LogP and LogS values?
Analyze Partitioned Data: Load each of the newly saved partitioned datasets and visualize 10 sample molecules from each one. Do you observe any similarities among them?
[1] Oliver Wieder, et. al., "Improved Lipophilicity and Aqueous Solubility Prediction with Composite Graph Neural Networks", Molecules 2021, 26, 6185.
The license is available in the License.txt file in this GitHub repository.
Copyright 2024 The MathWorks, Inc.