RXNAV

News Archive

API Updates

RxNorm API Changes

Approximate Matching in the RxNorm API

Background

Aside from approximate matching, the RxNorm API has long offered exact-string and “normalized” matches, in the form of the findRxcuiByString function. They both find RxNorm concepts by name. The normalized search can find strings that diverge in certain ways from the query.

In September 2011, an approximate match string search function called approxMatch was added to the RxNorm API. This was the result of work done earlier as described in a paper presented at the 2011 AMIA Annual Symposium. Approximate matching is less straitjacketed than normalized searching and has correspondingly lower precision.

In May 2013, an enhanced getApproximateMatch superseded approxMatch to provide additional output control and information.

In January 2022, the RxNorm API's internal heuristics supporting getApproximateMatch were replaced with off-the-shelf search software to improve service reliability. The most visible change was that scores no longer had an absolute meaning. The principles for selecting best matches remained the same.

The following paragraphs describe the details of the current getApproximateMatch function in the RxNorm API.

Purpose

The approximate match function finds the strings in the RxNorm data set that are most similar to the input string.

Approximate matching is useful for strings for which an exact or normalized string match fails to return any results. For example, the following strings fail to be mapped to any concepts using the RxNorm API's normalized match, findRxcuiByString:

NORMALIZED SEARCH DOES NOT FIND THESE THINGS… …BECAUSE OF
ACCUPRIL 20 MG TAB TABLET extra word
HYDROCHLOROT 50 MG TABLET unknown abbreviation
Rantidine 15 ML Syrup Oral misspelled word

In contrast to the normalized match, the approximate match function will identify the strings that most closely match the input string.

The approximate match algorithm allows greater divergence between the query and the found strings than the normalized match, so the approximate-match results should be regarded as candidates for manual review.

Details

Approximate-match results are derived principally from the intersection of words (or “tokens”) that appear in both the user's search term and strings from the RxNorm STR field. Approximate matching also applies some drug-centric adjustments. To overcome insignificant differences between strings that are equivalent in practice, approximate-match tokens are prepared in a process that removes sources of discrepancies that are usually not meaningful in drug names:

  • Word order does not matter. For example, the query "codeine caffeine" finds RxAUI 12351156, "acetaminophen / caffeine / codeine".
  • Uppercase is lowered. For example, the query "amiODARONE" finds RxAUI 12259630, "amiodarone Injectable Solution".
  • Stemming chops off suffixes that might be English inflections. For example, both "extended" and "extend" are tokenized as "extend", so the query "alfuzosin extend tab 10 mg" finds RxAUI 6350313, "alfuzosin Extended Release Oral Tablet".
  • Punctuation is generally treated as space.
  • Abbreviations expand, e.g., "hctz" to "hydrochlorothiazide", so the query "olmesartan hctz tablet 40-12.5 mg" finds RxAUI 2071431, "HYDROCHLOROTHIAZIDE 12.5MG/OLMESARTAN 40MG TAB" from VANDF. See list of abbreviations.

Drug name matches are favored over non-drug name matches:

  • Non-drug words – specifically numbers, route and frequency of administration, and dose form words – cannot be the sole basis for a match, unless the query consists of nothing but such words. For example, the query "{10 (prednisone 10 mg oral tablet)} pack 3 po qd x 3d" is composed entirely of non-drug words except for prednisone. Therefore, only prednisone matches are returned, even though RxNorm contains many more strings that might match "10 mg oral tablet pack". (Detecting phrases for route and frequency of administration is an exception to the rule that word order does not matter.)
  • Drug words, if present in the query (without spelling enhancement), must also be present to some degree in every result string. If the user's query includes words from any RxNorm ingredient (TTY=IN) or brand (TTY=BN) string, then at least one of those IN/BN query words must also appear in every result.

Spelling correction is attempted:

  • Spelling variants are considered, so the query "lipitorr oral tablet" finds RxAUI 1004558, "Lipitor 10 MG Oral Tablet". Approximate matching augments the user's query with words (drawn from RxNorm) that differ slightly from each user-specified word of at least 5 letters. Spelling variations are added even if the user-specified word exists in RxNorm, because in a query with re-spellable words that appear in various combinations in various sets of strings, other factors will govern the final result selection. However, the speculative additional words are given less weight than the user-specified word.
  • Prefixes are extended, so the query "palmitoyllysylvalyldiaminobut" finds RxAUI 12253768, "palmitoyllysylvalyldiaminobutyroylthreonine". Extensions are used only if the original word stem is at least 3 letters long and can be extended in no more than 3 ways. Candidate word extensions are given low weight that further diminishes the more letters they add to the user-specified word.

In an effort to avoid surprise results:

  • Duplicate words in RxNorm STRs do not improve match score. For example, "mg" may occur dozens of times in the name of a multi-ingredient tablet, but such a tablet should not necessarily be the best match for any query that mentions "mg". To achieve this, a string is de-duplicated before being included in the search engine's index.
  • Spelling suggestions are English words, mostly. Search engines, like the one that underlies approximate match, regard rare words as a powerful indicator of match quality. Usually, a rare and unexpected spelling variant is harmless because the strings where it appears do not fit the context of the rest of the query. However, in combination with the effects of the history of English, the composition of RxNorm, and spelling suggestions, the magnetism of rare words would cause approximate match to rank foreign-language product names higher than product names the user had specified. As a workaround until a better solution can be found, an effort is made not to mine spelling variations from foreign-language synonyms.