NMR Prediction

July 22, 2008

False Negatives and False Positives are Waiting...

Great post from Derek Lowe from In the Pipeline the other day talking about the dangers of not quality checking those fine-looking starting compounds for your project. Chemistry happens and yes, mistakes do too.

In fact, it appears that Derek has been on a kick as of late referring to personal QC.

I Can Has Ugly Molecules?

Oops.

I thought this would once again be a good opportunity to provide you with a link to a poster Sergey Golotvin presented at ENC 2008 entitled, "Validating the Quality of Large Collections of NMR Spectra Automatically".

Long story short, 15,000 1H NMR Spectra from the Aldrich collection were evaluated in complete automation and the software was able to confirm 88% of the collection as having chemical structures that were consistent with the respective spectra. In addition, 4% were flagged by the software as being inconsistent. A closer, manual look at those 5% revealed that there were indeed some truly wrong structures (or incorrect tautomers) in the collection.

This was evaluating the 1H NMR data only. Using additional 2D experiments, such as HSQC, will likely improve these results.

Just an example of a check an organization can build into their process for additional QC of their registration database for example.

Is it perfect? Absolutely not. There are perhaps a few more false positives in there that the software didn't catch, and of course the software provided some false negatives as well, annoying because presumable someone has to look over them manually only to realize that they were indeed the right structure all along. But at least this doesn't involve manually pining over 15,000 spectra!

We continue to run these datasets, and actually have a consortium consisting of several NMR experts in the industry we call ASCI (Automated Structure Confirmation Initiative) where we are testing and validating this technology in the real pharmaceutical world. Identifying the common areas where false negatives and false positives occur and trying to address them with algorithms.

Will we ever solve all the problems, especially in the world of novel chemistry? Of course not, and for that matter there are some existing problems that appear to be too hard to solve.

But that being said, what is the acceptable limit of false positives and false negatives for automated verification by software for the verification of  registered compounds in a library?

Interested in hearing your thoughts.

May 01, 2008

Automated Structure Verification by 1H NMR Only

I've blogged several times about the progress and applications of automated structure verification with the help of ACD/Labs software.

There are really two main approaches right now:

  1. Combined verification which includes automatically verifying the correspondence between a proposed chemical structure and the 1D 1H and 2D HSQC spectra. I've blogged about the publications and applications of this method previously.
  2. 1H only verification. Of course the first approach is preferable from an accuracy standpoint as the additional information gained from the HSQC spectrum increases the selectivity and specificity of the results.

However, I always get questions on how well we perform on 1H NMR only because in some organizations and environments it is simply not feasible to always run an HSQC in tandem with a routine 1H NMR analysis.

As mentioned in previous posts, we've already conducted a validation study on this approach and it was published in a 2006 article in MRC. We've continued to investigate and validate this approach and we recently presented our latest results using 1D 1H NMR data only at ENC 2008.

The poster highlighted a study on the automatic evaluation of over 15,000 Aldrich compounds and spectra from Aldrich NMR Spectra Database.

The results of this study revealed that the software was able to confirm 88%of all spectra as consistent and flagged less than 5% as inconsistent.

Aldrichresults_5

One of the more interesting discoveries in this study was that it revealed some truly wrong structures in the Aldrich NMR database.

More information on these as well as shortcomings in prediction, processing, and analysis are provided in the poster that can be downloaded:

http://www.acdlabs.com/download/publ/2008/enc08_aldrich.pdf

January 30, 2008

What's New in Version 11- Improved 1D NMR Structure Verification Accuracy

Automated structure verification using ACD/Labs software is a method that compares the chemical shifts, intensities, and multiplicities of signals in experimental NMR spectra with those from a predicted NMR spectrum with a proposed structure.

Naturally, in order for this process to be effective the chemical shift prediction, multiplet characterization, and integration measurements must be accurate. This process is described in detail in a 2006 publication in MRC.

As mentioned in the previous post, in version 11 the automated multiplet analysis
algorithm in ACD/Labs software has been significantly improved.

The purpose of this post is to show you how much the improvements of the automated multiplet analysis algorithm in version 11 impacts the performance of automated verification of 1H NMR spectra.

The following results compare the verification improvements in version 11 based on the multiplet analysis enhancements ONLY. In version 11 we made several other improvements in NMR prediction and analysis, and I will get to them in a future post. For now:

For this study, two different data sets each consisting of the 1H NMR spectra of 30 samples and the correct chemical structures were automatically processed, analyzed, and evaluated by the software.

Test Set 1- A set of 30 spectra and their respective chemical structures with reasonably good signal to noise:

Good

V10good

V11good_2

Test Set 2- A set of 30 spectra and their respective chemical structures with lower signal to noise:

Poor

V10poor

V11poor

As you can see clearly, the improvements in multiplet analysis in version 11 heavily impact the performance of automated NMR verification of 1H NMR data in both datasets. Based on the dataset used for this study, the software (employing the version 11 multiplet analysis algorithm) was able to correctly confirm the consistency between the proposed chemical structure and the experimental spectrum ~60% of the time in both datasets.

For a real world application of this system, check out a previous post that described Anthony Macherone's workflow at ASDI (and a link to his presentation).

The next questions is, "If you use more data, how well does this system perform?"

A future post will describe a study that highlights the latest performance statistics of a "combined verification" approach that can automatically identify  correct and incorrect chemical structures based on their 1D 1H and 2D HSQC NMR data. This system was described in a 2007 publication in MRC.

Stay Tuned.

January 09, 2008

ACD/1D NMR Assistant Part 2- Assigning NMR Data

The movie below (if you receive this by email you will have to come to the blog) highlights the unique Multiplet Assignment Preview available in ACD/1D NMR Assistant.

This feature helps users evaluate potential assignments by considering the chemical shifts, multiplet properties, and integration values of experimental multiplets in the spectrum.

Just another reason to embrace NMR software. A piece of paper can't do this:

If you can't get the above video to work, or if you want a bigger version , click here to watch the demonstration.

The above example is just for illustration purposes. The best way to evaluate the software on your own data is to go to the ACD/Labs website and request a free trial.

Stay Tuned....next up, the structure verification algorithm in ACD/1D NMR Assistant!

December 07, 2007

Comparing NMR Prediction Approaches

A new article is now available on JCIM ASAP entitled,

"Toward More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comparison of Neural-Network and Least-Squares Regression Based Approaches"
10.1021/ci700256n

A very interesting read for those interested in the performance of NMR predictions.

Some really nice conclusions on both the speed and accuracy of different NMR prediction approaches using our very own ACD/CNMR Predictor.

As Tony is an author of this piece, head on over to his blog for his synopsis on the work as well as the abstract.

November 14, 2007

New Product Time! Introducing ACD/1D NMR Assistant

I apologize for not posting much lately, but things have been pretty busy in the ACD/Labs NMR world.

I have alluded to this moment in some previous posts for awhile now and I am now happy to unveil a new NMR product to be offered by ACD/Labs.

ACD/1D NMR Assistant.

So first and foremost, what is it?

I have talked a lot about how synthetic chemists currently use ACD/Labs software and blogged about the benefits and key features in the software.

The bottom line for us, was that while we have been successful in selling
ACD/1D and 2D NMR Processor to chemists and students in industry and academic institutions, we believe that there was still quite a bit of work to be done to design a tool for the chemist.

The truth is, ACD/1D NMR Processor has been around for quite some time. From the very beginning, we were developing a product with the NMR Spectroscopist in mind. Naturally, during the evolution of this product many sophisticated and advanced features have been added and as a result the software can sometimes be viewed as bloated, and overly complex for a novice or non-expert user.

In recent years we have worked hard on continuing to add advanced features but to also try and simplify things. At the end of the day, we made the decision to go in a different direction and create a separate product. This way we can continue to develop ACD/1D NMR Processor with the Spectroscopist in mind, and build ACD/1D NMR Assistant with the synthetic chemist in mind.

So as a result, we developed ACD/1D NMR Assistant and it is now finished and available.

How is ACD/1D NMR Assistant different than ACD/1D NMR Processor?

  1. Ease of Use- ACD/1D NMR Assistant includes all of the features available in ACD/1D NMR Processor, we have just de-emphasized some of those features in the software in an effort to greatly simplify the toolbars and interface. I believe that we reduced the learning curve significantly. One example is that upon file import an FID will automatically get FT'ed, phase corrected, and baseline corrected. In addition, the software will look for solvent and water signal automatically and darken them out. Because it includes Shortcut Mode from NMR Processor, users can peak pick, integrate, and characterize multiplets with one simple click and drag over each multiplet.
  2. Assignment Assistance- The big improvement added to ACD/1D NMR Assistant is that the software can now provide the user with feedback on potential assignments. When a structure is proposed users can hover over a multiplet of interest and the software will provide real-time feedback as to what the best assignments are. It does so by highlighting atoms in a structure with a green, yellow, and red color scheme.
  3. Structure Verification- Whether a user simply wants to check their own assignments, or ask the software to provide feedback on the consistency between a proposed structure and an experimental spectrum, the software provides this capability via one simple button click. If the software deems a spectrum-structure pair as inconsistent, it provides direct feedback on which part (or parts) of the spectra should be looked at closer and what the specific problem is.

The ultimate motivation behind this product was the simple idea that, "an NMR spectrum is a means to an end"

While there are several software packages that can process NMR data and print out spectra for evaluation, there is nothing available that really helps users evaluate and interpret an NMR spectrum  and it's relationship with a chemical structure.

That is until now, IMHO.

I think the development of ACD/1D NMR Assistant changes all that.

One more thing...

This product wasn't just built by a few software-oriented people within ACD/Labs. We have been speaking with current customers and chemists for years on how we can improve our offering. I personally have spent a very considerable portion of the last 2 years speaking with chemists, students, and spectroscopists who support chemists, discussing what was lacking in our current offerings and how we can improve our software for these users. Further, I spoke with people who have never used ACD/Labs software and discussed their desires and expectations for an NMR package that would suit their needs.

Finally, after the first stage of new development we sent the software to groups of chemists in 5 different pharmaceutical organizations in the US and had them evaluate the software. These groups ranged from chemists who currently used our 1D NMR Processor, to chemists who had never used it. In doing this we hoped we would be able to appropriately gauge the acceptance and learning curve of the new software. This wasn't beta testing...this was MARKET testing. We are very grateful to these chemists who provided valuable feedback that was implemented in the final version of the software. They were able to point out some very obvious things that our software-focused minds simply overlooked for years.

Following these evaluations, we took the feedback we received and performed another round of development + evaluation to optimize our offering.

And through those exercises, we reach today. A finished product in hand that I am very excited about.

Over the next few weeks, I will be highlighting different features and workflows in the software to educate you on how it works and what it does.

For those of you who want a sneak peek right now, go over to the ACD/Labs website and view the movie that shows the software in action. If you like what you see, just fill out the form at the bottom of the page and you can get a free evaluation copy to try yourself.

Click here to go to the 1D NMR Assistant page.

I hope you like it!

October 26, 2007

Fringe Benefits and Knowledge Management

Last week I blogged about Phil Keyes' and Anthony Macherone's applications of NMR software towards automated structure confirmation.

A few months back, I pointed you to Steve Coombes' workflow when working with ACD/Structure Elucidator.

Phil had a very nice section in his presentation about the "fringe benefits" he was able to derive outside of the main goal of the project, "Automated Structure Verification".

Specifically, Phil pointed to a couple of fringe benefits:

1) A spectral database is grown as a result of the automated structure confirmation. This database is heavily searchable and can be used as a resource within the company. Building the database is part of the workflow. No extra work needs to be done.

2) The software provides an assignment starting point. In running the verification algorithm, the software automatically attempts to assign multiplets in the 1D and 2D spectra, provides feedback of the quality of those assignments, along with the ability to easily edit them:

Keyesimage

Anthony Macherone also mentioned automatically storing data in a searchable database as an additional benefit to conducting automated structure confirmation in his presentation.

On a different application, Steve Coombes spoke a lot about the additional benefits he receives out of ACD/Structure Elucidator.

In this presentation Steve really stresses the knowledge management angle from Structure Elucidator. Sure, the software can help elucidate the chemical structure of unknowns, but it also supports the ability to store the knowledge you gain from working on your data.

In Steve's opinion this is what separates ACD/Labs software from many other packages out there. The "ability to extract the information and knowledge for further use"

It's not just the ability to build databases with structures and spectra. The key is the ability to assign that data electronically and store it in a searchable database. That's knowledge.

And of course by retaining that knowledge through electronic assignments, you can share that knowledge with the software by training the predictions and improving elucidation and verification performance. 

I'd like to thanks these guys for teaching me a nice "marketing" lesson. It's not always about the main application of the software. Always be on the lookout for "fringe benefits"

October 25, 2007

How Accurate are Experimental Chemical Shifts?

Several months ago, I asked, "How Accurate Should NMR Predictions Be?"

Today, I ask how accurate and consistent are actual experimental chemical shifts?

In many ways, this post probably should have preceded the one I linked above because in reality, before a discussion about prediction accuracy can begin, the topic of experimental accuracy needs to be addressed.

The issue of experimental accuracy can be important from two perspectives. For example, accuracy is important for a chemical shift database that is used for producing the predictions (Hence ACD/Labs' Purgatory Database). In addition, it is also important in identifying the accuracy of a predicted chemical shift when comparing it to an experimental one. How can we determine where the inaccuracy occurred?

In the process of producing an experimental NMR spectra, there are many variables that can affect a chemical shift that are not always carefully controlled. They include, but are certainly not limited to:

  • Concentration of the sample
  • Temperature of the probe
  • Equilibration time in the probe
  • Solvent type
  • Residual water content of the solvent
  • pH of the sample (if aqueous media)
  • Digitization of the spectrum
  • Shimming and phasing inaccuracies
  • Choice of reference standard

Of course many of these factors, can significantly affect the chemical shift of the peaks in the spectrum. How much they are affected is sometimes hard to measure, but as an example, we can consider the range of database entries in our database for the shift of the methyl group protons in toluene. All of the following have been published in peer-reviewed journals.

Table

Of course the deviations in these shifts are primarily based on the fact that each chemical shift was recorded in a different solvent. The reason for adding 8 sources of toluene in our database is so we can attempt to take the solvent into account when solvent-specific prediction is performed.

But as mentioned, solvent is not the only variable that can affect how an experimental chemical shift is recorded.

Thoughts?

References:

1. Prog. Nucl. Magn. Reson. Spectrosc.,1996,v.28,p.161 (Toluene-d8)

2. Zh. Org. Khim.,v.12,p.275 (Tetrahydrofuran-d8)

3.  J. Org. Chem.,1997,v.62,p.7512 (Chloroform-d; 300 MHz; 24 C)

4.  J. Org. Chem.,1997,v.62,p.7512 (Acetone-d6; 300 MHz; 24 C)

5.  J. Org. Chem.,1997,v.62,p.7512 (Dimethylsulfoxide-d6; 300 MHz; 24 C)

6.  J. Org. Chem.,1997,v.62,p.7512 (Benzene-d6; 300 MHz; 24 C)

7.  J. Org. Chem.,1997,v.62,p.7512 (Acetonitrile-d3; 300 MHz; 24 C)

8.  J. Org. Chem.,1997,v.62,p.7512 (Methanol-d4; 300 MHz; 24 C)

 

October 18, 2007

Applications of Automated Structure Verification with NMR Software- Part 2

Yesterday I blogged about how Phil Keyes has applied automated structure verification at Lexicon Pharmaceuticals to help validate compound registrations in an open access environment.

Links to the latest performance statistics of our automated structure verification solution for both 1D 1H and combined 1D 1H and 2D HSQC structure verification can be found in the previous post.

As promised, today I will highlight the application of automated structure verification that Anthony Macherone has employed at ASDI.

Anthony works in a high-throughput environment where more than 1000 compounds are directed to 1D 1H NMR analysis per week. Based on this workload, he has implemented a very nice workflow in his laboratory. In his presentation, Anthony mentioned that it in his line of work, the ultimate goals are to:

  1. Maximize instrument efficiency
  2. Maximize throughput
  3. Be cost effective

Sounds like some pretty good goals to me. How Anthony is able to achieve this is of course the really interesting part.

Anthony describes his workflow in three phases, the pre-game, middle-game, and end-game. In the pre-game he uses proprietary software (not ACD/Labs) to screen the compounds and "bin" them into appropriate analytical techniques. In doing so he does not have to run a full battery of analytical data on every compound that is screened. In the middle-game, he automates the sample preparation and acquisition using well-plates and the help of robots.

The end-game is where Anthony employs ACD/Labs software. Once the data is acquired, he applies a custom macro to automatically:

  1. Attach chemical structures to appropriate FID files
  2. Process the data (FT, phasing, baseline correction, and integration)
  3. Run the ACD/Labs automated structure verification algorithm (Provide a red light/green light data assessment)
  4. Store the data in a searchable database

Following the data acquisition and analysis, Anthony only needs to manually evaluate the ambiguous or questionable results (i.e. red light data)

Make sure to check out Anthony's presentation for more details regarding the advantages of these phases, time-savings, accuracy, etc.:

Anthony Macherone- High-Throughput NMR Analysis: The End Game

Again, I would like to thank both Phil Keyes and Anthony Macherone for sharing their applications at our New Jersey User Meeting last week.

October 17, 2007

Applications of Automated Structure Verification with NMR Software- Part 1

Several posts back I pointed you to a couple of articles ACD/Labs were involved in with regards to automated structure verification.

I have pointed to these articles, but I have spent little time talking about it. I will now.

For those new to this idea, it involves using software to automatically confirm the consistency between a chemical structure and an NMR spectrum using NMR prediction. Lee Griffiths from AstraZeneca has done excellent work over the years in this field. Lee was kind enough to present at our European User's Meeting last year to share a summary of his approach towards automated structure using 1D 1H and 13C, and 2D HSQC data.  This presentation can be downloaded here.

In addition, by doing a simple search for "Griffiths" on the Magentic Resonance in Chemistry webpage, you'll find a whole bunch of relevant articles.

We initially published a validation on the performance of automated structure verification using just 1D 1H NMR data. We then proceeded to publish again recently to compare that to the performance of a combined verification approach using 1D 1H and 2D HSQC data.

As a result of these and other studies, much of the focus of late by ACD/Labs has been on the performance of automated structure verification using 1D 1H and 2D HSQC NMR data.

These publications along with posters we presented at SMASH and ENC on this topic should give you a general idea about the performance and accuracy of this approach.

I am not going to discuss the performance of this approach today but rather focus on the real-world applications and performance in an industrial setting.

Last Thursday I was in New Brunswick, New Jersey at our New Jersey User's Meeting where I was blown away by two terrific presentations by our guest speakers, Phil Keyes from Lexicon Pharmaceuticals and Anthony Macherone from ASDI.

Two different applications in two different environments. I'll talk about Phil's today, and Anthony's tomorrow.  Phil's is interesting as he is setting up a really cool system to significantly improve how analytical data is handled in an open access environment, and further to validate Lexicon's compound registration database.

In my opinion, the real crucial thing to point out here is the evolution of an open access environment from a more traditional analytical services setup. It used to be that NMR Spectroscopists would run and handle all the analytical data for compounds that a chemist produced, verify their structures for them, and give them the thumbs up or thumbs down. In this environment, spectroscopists were getting a look at the data from all compounds entering the registration database. In an open access environment this is no longer the case. While NMR spectroscopist certainly see lots of this data still, and they will likely eventually see a compounds data during it's pharmaceutical R&D life cycle, the reality is that there are still going to be some incorrectly or questionably verified structures in a company's registration database that will go on for further testing. Towards the evolution of open access NMR, somewhere along the way, it became OK for compounds to get registered without being approved by an analytical expert. Of course, these aren't being registered blindly, chemists are approving these and in most cases they are more than qualified to do so and are doing a good job. However, I have yet to talk to a NMR spectroscopist who has NOT seen compounds registered incorrectly.

My point is of course to not pick on chemists here. Sometimes these mistakes are unavoidable and the data LOOKS right. Sometimes there is nothing in the 1H NMR spectrum or the LC-MS that suggests that there is anything different present. The key is to better identify when these instances arise in the registration database. Can an automated structure verification solution with NMR software replace and outperform the QC of a chemist for good in an open access environment? No, not right now anyway.

However, the key statement is in Phil's presentation:

"Integrating a system to perform automated compound verification provides value by highlighting compounds for which structural data is complex and subject to interpretation."

Sure there are going to be false positives and false negatives with an automated approach. The question is, if 50 out of 1000 compounds being registered by chemists are incorrect, is there value in automated software highlighting 40 of them?

False negatives can be annoying because it involves the spectroscopist to do unnecessary work on a sample that was correct all along. But other times it might point out the need to run more experiments to prove that it is indeed the right structure. Ideally ALL of the data gets manually evaluated, but in the age of open access NMR where chemists are outnumbering spectroscopists 100:1 in some organizations this is clearly no longer plausible. But is there a balance here? While it isn't plausible to manually evaluate the data for say 1000 compounds, would it be feasible to manually evaluate the 300 of the 1000 samples that software has highlighted as complex or subject to interpretation?

Phil's and Anthony's presentations will be available on the ACD/Labs website shortly, but for my readers, you get advanced access to these presentations.

Phil Keyes- Validating Compound Registrations with Automated NMR Verification in Open Access

For those who want to do advanced reading on the topic for tomorrow's blog entry:

Anthony Macherone- High-Throughput NMR Analysis: The End Game