The ten papers selected for this supplement are extended versions of the original papers presented at the 2010 SIG. Iyer, S. V., Harpaz, R., LePendu, P., Bauer-Mehren, A., Shah, N. H. Automated detection of off-label drug use. Because of the intrinsic noisiness of high-throughput measurements, statistical methods have been central to this effort. We explain how the integration and validation steps spring from a Bayesian description of network uncertainty, and conclude by describing an important near-term milestone for systems biology: the construction of a set of rich reference networks for key model organisms. The Stanford Tissue Microarray Database (TMAD) is a repository of data serving a consortium of pathologists and biomedical researchers. Laurence Baker. Risk of angioedema associated with levetiracetam compared with phenytoin: Findings of the observational health data sciences and informatics research network. Interpretation errors related to the GO annotation file format. The information explosion in biology makes it difficult for researchers to stay abreast of current biomedical knowledge and to make sense of the massive amounts of online information. Integration and publication of heterogeneous text-mined relationships on the Semantic Web. Investigation of monogenic disease AAS predicted to be nondeleterious by SIFT were characterized by a significant enrichment for inherited AAS within solvent accessible residues, regions of intrinsic protein disorder, and an association with the loss or gain of various posttranslational modifications. Using natural language processing to construct a metastatic breast cancer cohort from linked cancer registry and electronic medical records data. Profiling risk factors for chronic uveitis in juvenile idiopathic arthritis: a new model for EHR-based research. Shah, N. H., LePendu, P., Bauer-Mehren, A., Ghebremariam, Y. T., Iyer, S. V., Marcus, J., Nead, K. T., Cooke, J. P., Leeper, N. J. Payne, P. O., Shah, N. H., Tenenbaum, J. D., Mangravite, L., Altman, R. B., Dunker, A. K., Hunter, L., Ritchie, M. D., Murray, T., Klein, T. E. Addressing vital sign alarm fatigue using personalized alarm thresholds. gene-expression assays) result in long lists of "significant genes." Observational Health Data Sciences and Informatics (OHDSI) has built on learnings from the Observational Medical Outcomes Partnership to turn methods research and insights into a suite of applications and exploration tools that move the field closer to the ultimate goal of generating evidence about all aspects of healthcare to serve the needs of patients, clinicians and all other decision-makers around the world. Use of allergy medications and terms describing allergic conditions were independently associated with chronic uveitis. As biomedical scientists begin to recognize the many different ways ontologies enable biomedical research, they will drive the emergence of new computer applications that will help them exploit the wealth of research data now at their fingertips. Of 15 semantic groups in the UMLS, seven groups accounted for 92.08% of term occurrences in Mayo data. Confronted with rapidly accumulating data, researchers currently do not have the software tools to undertake the required information integration tasks.We present HyQue, a Semantic Web tool for querying scientific knowledge bases with the purpose of evaluating user submitted hypotheses. Prior studies have shown that evidence of healthcare seeking intent in Internet searches correlates well with healthcare resource utilization. View details for Web of Science ID 000306925000007. Our study shows that not all events are equally detectable, suggesting that specific events might be monitored more effectively using other data sources. In addition, evidence from the trials frequently rests on narrow patient-inclusion criteria and thus may not generalize well to real clinical situations. NIGAM H. SHAH, MBBS, PhD . Callahan, A., Steinberg, E., Fries, J. The authors have developed a data-mining method for systematic, automated detection of ADEs from electronic medical records.This method uses the text from 9.5 million clinical notes, along with prior knowledge of drug usages and known ADEs, as inputs. –Nigam Shah, Associate Professor of Biomedical Informatics and of Biomedical Data Science. We used two datasets: 35,000 patients (5000 depressed) from the Palo Alto Medical Foundation and 5651 patients treated for depression from the Group Health Research Institute.Our models are able to predict a future diagnosis of depression up to 12 months in advance (area under the receiver operating characteristic curve (AUC) 0.70-0.80). Callahan, A., Abeyruwan, S. W., Al-Ali, H., Sakurai, K., Ferguson, A. R., Popovich, P. G., Shah, N. H., Visser, U., Bixby, J. L., Lemmon, V. P. Comparing high-dimensional confounder control methods for rapid cohort studies from electronic health records. Assessing the accuracy of automatic speech recognition for psychotherapy. Our goals were: 1. analyze the frequency and syntactic distribution of Metathesaurus terms in MEDLINE; 2. create a filtered UMLS Metathesaurus based on the MEDLINE analysis; 3. augment the UMLS Metathesaurus where each term is associated with metadata on its MEDLINE frequency and syntactic distribution statistics. Nigam has 3 jobs listed on their profile. However, this annotation process cannot be easily automated and often requires expert curators. This report summarizes the findings (n = 6) and recommendations (n = 15) from the policy meeting, which were clustered into 3 broad areas: (1) policies governing data access for research and personalization of care; (2) policy and research needs for evolving data interpretation and knowledge representation; and (3) policy and research needs to ensure data integrity and preservation. Thus, BioPortal not only provides investigators, clinicians, and developers 'one-stop shopping' to programmatically access biomedical ontologies, but also provides support to integrate data from a variety of biomedical resources. Coulet, A., Garten, Y., Dumontier, M., Altman, R. B., Musen, M. A., Shah, N. H. NCBO Resource Index: Ontology-Based Search and Mining of Biomedical Resources. View details for DOI 10.1016/j.jbi.2010.08.005, View details for Web of Science ID 000285036700017, View details for PubMedCentralID PMC2991587. We develop novel methods to learn from patient-level health data including structured health encounter records, clinical notes, insurance claims, diagnostic imaging, and clinical trial data. Baseline depression severity was the strongest predictor of treatment response for medication and psychotherapy.It is possible to use EHR data to predict a diagnosis of depression up to 12 months in advance and to differentiate between extreme baseline levels of depression. Professor. We hypothesize that drug-disease co-occurrences, extracted from ontology-based annotations of the clinical notes, can be examined for statistical enrichment and used for drug safety surveillance. BioPortal enables community participation in the evaluation and evolution of ontology content by providing features to add mappings between terms, to add comments linked to specific ontology terms and to provide ontology reviews. In this paper, we quantify this trade-off among text processing systems that make different trade-offs between speed and linguistic understanding. A., Shah, N. H. Linking insurance claims across time to characterize treatment, monitoring, and end-of-life care in metastatic breast cancer. Pain, nausea, and vomiting co-occur in 35% of all ED encounter notes.The text-mining methods we describe can be applied to automatically review free-text clinician notes to detect unplanned episodes of care mentioned in these notes. We show that the network-based approaches can be used for constructing patient cohorts as well as for analyzing differences in outcomes by comparing with standard methods, and discuss the advantages offered by network-based approaches. Recent retrospective cohorts and large database studies have raised concern that the use of PPIs is associated with increased cardiovascular (CV) risk. Adverse drug events cause substantial morbidity and mortality and are often discovered after a drug comes to market. New opportunities have emerged to harness data sources that have not been used within the traditional framework. Given our ability to quantify adverse event risks using the clinical notes, this will enable us to address patient safety because we can now rank-order off-label drug use and prioritize the search for their adverse event profiles. A formal concept analysis and semantic query expansion cooperation to refine health outcomes of interest. In comparison, OWL is a Semantic Web language, and is supported by the World Wide Web consortium together with integral query languages, rule languages and distributed infrastructure for information interchange. Professor. The FDA estimates about 1 … We applied term extraction tools on the clinical notes of a million patients to compile a database of statistically significant patterns of drug use. View details for DOI 10.1186/2041-1480-4-S1-I1. We find that PPIs elevate plasma ADMA levels and reduce nitric oxide levels and endothelium-dependent vasodilation in a murine model and ex vivo human tissues. Woei-Jyh Lee, Louiqa Raschid, Padmini Srinivasan, Nigam Shah, Daniel Rubin, and Natasha Noy. Medical School Office Building X229 . Consequently, developing medications to promote β-cell regeneration is a priority. We also found that H2 blockers, an alternate treatment for GERD, were not associated with increased cardiovascular risk; had they been in place, such pharmacovigilance algorithms could have flagged this risk as early as the year 2000.Consistent with our pre-clinical findings that PPIs may adversely impact vascular function, our data-mining study supports the association of PPI exposure with risk for MI in the general population. Tomczak, A., Mortensen, J. M., Winnenburg, R., Liu, C., Alessi, D. T., Swamy, V., Vallania, F., Lofgren, S., Haynes, W., Shah, N. H., Musen, M. A., Khatri, P. U-Index, a dataset and an impact metric for informatics tools and databases. Research on Gun Violence vs Other Causes of Death. The service makes a decision based on three criteria. Using annotations from controlled vocabularies to find meaningful associations. We also identified a subset of CHF patients who were prescribed Cilostazol despite its black box warning, and found that it did not increase mortality in this high-risk group of patients.This proof of principle study shows the potential of text-analytics to mine clinical data warehouses to uncover 'natural experiments' such as the use of Cilostazol in CHF patients. 2015. Nigam Shah Learning from past patient data to provide better care. To determine the biological relevance of a lengthy gene list, the usual solution is to perform enrichment analysis with the GO. Response to letters regarding article, "unexpected effect of proton pump inhibitors: elevation of the cardiovascular risk factor asymmetric dimethylarginine". We have developed methods to detect population level off-label usage using computationally efficient annotation of free text from clinical notes to generate features encoding empirical information about drug-disease mentions. Our approach provides the basis for a data-driven ontology alignment by mapping annotations of experimental data. Traditionally, phenotypes have been discovered by intuition, experience in practice, and advancements in basic science, but these approaches are often heuristic, labor intensive, and can take decades to produce actionable knowledge. Factors such as acute kidney disorder and liver disorder were predictive of first line therapy choices. First, we analyze data from the 2010 Healthcare Cost and Utilization Project National Inpatient Sample (NIS), which contains upwards of 8 million hospitalization records consisting of administrative codes and demographic information. Bauer-Mehren, A., LePendu, P., Iyer, S. V., Harpaz, R., Leeper, N. J., Shah, N. H. Chapter 9: Analyses Using Disease Ontologies, Mining the pharmacogenomics literature-a survey of the state of the art. Researchers in biomedical informatics use ontologies and terminologies to annotate their data in order to facilitate data integration and translational discoveries. Liu, V. X., Bates, D. W., Wiens, J., Shah, N. H. Early Detection of Adverse Drug Reactions in Social Health Networks: A Natural Language Processing Pipeline for Signal Detection. Electronic health records (EHR) represent a rich and relatively untapped resource for characterizing the true nature of clinical practice and for quantifying the degree of inter-relatedness of medical entities such as drugs, diseases, procedures and devices. Such a method implies to specify an initial set of seed concepts, which are based on concept unique identifiers. Learning Effective Treatment Pathways for Type-2 Diabetes from a clinical data warehouse. Repurposing cAMP-Modulating Medications to Promote beta-Cell Replication. Shah, N. H., Bhatia, N., Jonquet, C., Rubin, D., Chiang, A. P., Musen, M. A. BioPortal: ontologies and data resources with the click of a mouse. Jaewon Yang, Julian McAuley, Jure Leskovec, Paea LePendu, Nigam Shah. A significant section of UMLS users use a MySQL installation of the metathesaurus and Perl programming language as their access mechanism. Image processing researchers can extract images and scores for training and testing classification algorithms. Characterizing treatment pathways at scale using the OHDSI network. Hripcsak, G., Duke, J. D., Shah, N. H., Reich, C. G., Huser, V., Schuemie, M. J., Suchard, M. A., Park, R. W., Wong, I. C., Rijnbeek, P. R., van der Lei, J., Pratt, N., Norén, G. N., Li, Y. C., Stang, P. E., Madigan, D., Ryan, P. B. Analyzing Information Seeking and Drug-Safety Alert Response by Health Care Professionals as New Methods for Surveillance. Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers. Accordingly, mirtazapine, an α2-adrenergic receptor antagonist and antidepressant, prevents NE-dependent suppression of β-cell replication. With increasing adoption of electronic health records (EHRs), there is an opportunity to use the free-text portion of EHRs for pharmacovigilance. View details for DOI 10.1136/amiajnl-2011-000523, View details for Web of Science ID 000300768100010, View details for PubMedCentralID PMC3277625. Advanced statistical methods used to analyze high-throughput data (e.g. The Unstructured Information Management Architecture (UIMA) framework and web services are emerging as useful tools for integrating biomedical text mining tools. Next we demonstrated that norepinephrine (NE), a physiologic suppressor of cAMP synthesis in β-cells, impairs β-cell replication via activation of α2-adrenergic receptors. Feasibility of Prioritizing Drug-Drug-Event Associations Found in Electronic Health Records. However, these data are stored in different formats and on different platforms. It allows users to manage microarray data and data-related biological information over the Internet using a web browser. we can also ask "Which disease (or class of diseases) is over-represented in my set of interesting genes or proteins?". It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. For example, by annotating known protein mutations with disease terms from the ontologies in BioPortal, Mort et al. To date, there have not been comparisons of the different semantic-similarity approaches on a single ontology. We're talking about organizations rather than professionals now. Treatment Patterns for Chronic Comorbid Conditions in Patients With Cancer Using a Large-Scale Observational Data Network. Given the ubiquitous nature of mobile Internet search, we hypothesized that analyzing geo-tagged mobile search logs could enable us to machine-learn predictors of future patient visits. View details for DOI 10.1186/2041-1480-2-S1-S3. Predicting the need for a reduced drug dose, at first prescription. Once populated with relationships, PHARE-KB (i) can be visualized in the form of a biological network to guide human tasks such as database curation and (ii) can be queried programmatically to guide bioinformatics applications such as the prediction of molecular interactions. Integrating these data will enable us to facilitate the pace of medical discoveries by providing scientists with a unified view of this diverse information. A Comprehensive Analysis of Five Million UMLS Metathesaurus Terms Using Eighteen Million MEDLINE Citations. Using temporal patterns in medical records to discern adverse drug events from indications. Study subjects were assigned to receive a PPI (Prevacid; 30 mg) or a placebo pill once daily for 4 weeks. Whether the general population might also be at risk has not been addressed.Plasma asymmetrical dimethylarginine (ADMA) is an endogenous inhibitor of nitric oxide synthase. View details for DOI 10.1136/amiajnl-2013-001612, View details for PubMedCentralID PMC3932451. Callahan, A., Pernek, I., Stiglic, G., Leskovec, J., Strasberg, H. R., Shah, N. H. Proton Pump Inhibitor Usage and the Risk of Myocardial Infarction in the General Population. Agarwal, V., Podchiyska, T., Banda, J. M., Goel, V., Leung, T. I., Minty, E. P., Sweeney, T. E., Gyang, E., Shah, N. H. Postmarket Surveillance of Point-of-Care Glucose Meters through Analysis of Electronic Medical Records. We show that these methods flag adverse events early (in most cases before an official alert), allow filtering of spurious signals by adjusting for potential confounding, and compile prevalence information. The seven papers and the commentary selected for this supplement span a wide range of topics including: web-based querying over multiple ontologies, integration of data, annotating patent records, NCBO Web services, ontology developments for probabilistic reasoning and for physiological processes, and analysis of the progress of annotation and structural GO changes. The system's indexing workflow processes the text metadata of diverse resources such as datasets from GEO and ArrayExpress to annotate and index them with concepts from appropriate ontologies. Sung, L., Corbin, C., Steinberg, E., Vettese, E., Campigotto, A., Lecce, L., Tomlinson, G. A., Shah, N. Deep phenotyping of 34,128 adult patients hospitalised with COVID-19 in an international network study. Researchers face the challenge of collecting, evaluating and integrating large amounts of diverse information to compose and evaluate a hypothesis. The classifier is used on 2 362 950 possible drug-disorder pairs comprised of 1602 unique drugs and 1475 unique disorders for which we had data, resulting in 240 high-confidence, well-supported drug-AE associations. We have deployed the mapping and ontology driven querying tools at the TMAD site for general use.We have demonstrated that we can effectively map the diagnosis-related terms describing a sample in TMAD to the NCI-T. Features intrinsic to Metathesaurus terms (well formedness, length and language) generalise easily across clinical institutions, but term frequencies should be adapted with caution. By mining the medical records of over 110 million patients, we examine the extent to which Mendelian variation contributes to complex disease risk. Agarwal, V., Han, L., Madan, I., Saluja, S., Shidham, A., Shah, N. H. LEARNING ATTRIBUTES OF DISEASE PROGRESSION FROM TRAJECTORIES OF SPARSE LAB VALUES. Our extracted relationships have a 70-87.7% precision and involve not only key PGx entities such as genes, drugs, and phenotypes (e.g., VKORC1, warfarin, clotting disorder), but also critical entities that are frequently modified by these key entities (e.g., VKORC1 polymorphism, warfarin response, clotting disorder treatment). Try the Course for Free. Yet, the vast majority of electronic health data lies embedded within the free text of clinical notes and is not gathered into centralized repositories. After a 2-week washout period, participants were crossed over to receive the alternate treatment for the ensuing 4 weeks. 2015. The result is an expanding family of ontologies designed to be interoperable and logically well formed and to incorporate accurate representations of biological reality. Accurate stratification of wounds for risk of slow healing may help guide treatment and referral decisions. Disease status is often summarized by repeated recordings of one or more physiological measures. Comedian Rajeev Nigam of Great Indian Laughter Challenge 2 fame is bereaved. The meeting outcome underscored the need to address a number of important policy and technical considerations in order to realize the potential of personalized or precision medicine in actual clinical contexts. An ontology-neutral framework for enrichment analysis. This work provides new mechanistic insights into cAMP-dependent growth regulation of β-cells and highlights the potential of commonly prescribed medications to influence β-cell growth. Musen, M. A., Shah, N. H., Noy, N. F., Dai, B. Y., Dorf, M., Griffith, N., Buntrok, J., Jonquet, C., Montegut, M. J., Rubin, D. L. A system for ontology-based annotation of biomedical data. Rates of Co-infection Between SARS-CoV-2 and Other Respiratory Pathogens. This large amount of pathway data in a computable form will support visualization, analysis and biological discovery. CLASSES INSTRUCTED (100 series represents upper division, 200 series represents graduate Recently, several studies in patients with acute coronary syndrome have raised the concern that use of PPIs in these patients may increase their risk of major adverse cardiovascular events. The Effectiveness of Multitask Learning for Phenotyping with Electronic Health Records Data. Noy, N. F., Shah, N. H., Whetzel, P. L., Dai, B., Dorf, M., Griffith, N., Jonquet, C., Rubin, D. L., Storey, M., Chute, C. G., Musen, M. A. Ontology-driven indexing of public datasets for translational bioinformatics. Some have used structured databases of patient medical records and health insurance claims recently-going beyond the current paradigm of using spontaneous reporting systems like AERS-to detect drug-safety signals. Of concern, this adverse mechanism is also likely to extend to the general population using PPIs. We use the formalism of finite model theory in this work. Development and utility assessment of a machine learning bloodstream infection classifier in pediatric patients receiving cancer treatments. Our approach achieves a discrimination accuracy of 0.85 in terms of the area under the receiver operator curve (AUC) for the reference set of well-established ADEs and an AUC of 0.68 for the reference set of recently labeled ADEs. The lack of standard reporting for experiment variables and results also makes experiment replicability a significant challenge. So, let's talk about hospitals. View details for DOI 10.1093/jamia/ocv102. SCI experimental methods, data and domain knowledge are locked in the largely unstructured text of scientific publications, making large scale integration with existing bioinformatics resources and subsequent analysis infeasible. Our analysis clearly reveals elevated risks for myocardial infarction in rheumatoid arthritis patients taking Vioxx (odds ratio 2.06) before 2005.Our results show that it is possible to apply annotation analysis methods for testing hypotheses about drug safety using electronic medical records. Wittkop, T., Teravest, E., Evani, U. S., Fleisch, K. M., Berman, A. E., Powell, C., Shah, N. H., Mooney, S. D. Practice-based evidence: profiling the safety of cilostazol by text-mining of clinical notes. B., Zhang, L., Zhuk, O., Prieto-Alhambra, D., Ryan, P. Occurrence and Timing of Subsequent SARS-CoV-2 RT-PCR Positivity Among Initially Negative Patients. We are developing software tools for computer-aided hypothesis design and evaluation, and we would like our tools to take advantage of the information stored in these repositories. We have previously developed methods to map text-annotations of tissue microarrays to concepts in the NCI thesaurus and SNOMED-CT. Leeper, N. J., Bauer-Mehren, A., Iyer, S. V., LePendu, P., Olson, C., Shah, N. H. Mining Biomedical Ontologies and Data Using RDF Hypergraphs. This dataset can be leveraged to quantitatively assess comorbidity, drug-drug, and drug-disease patterns for a range of clinical, epidemiological, and financial applications. Here we describe a preliminary analysis of search logs from healthcare professionals as a source for detecting adverse drug events. Lee, W., Shah, N., Sundlass, K., Musen, M. UMLS-Query: a perl module for querying the UMLS. We trained a highly accurate predictive model that detects novel off-label uses among 1,602 unique drugs and 1,472 unique indications. View details for Web of Science ID 000265602500002, View details for PubMedCentralID PMC2646250. We demonstrate that we enable ontology-based querying and integration of tissue and gene expression microarray data. View details for Web of Science ID 000275419900014, View details for DOI 10.1186/2041-1480-1-S1-I1, View details for Web of Science ID 000297613200031. Sitagliptin was the most effective second-line therapy, and as effective as metformin as a first line therapy. Callahan, Alison, Igor Pernek, Gregor Stiglic, Jure Leskovec, Howard R. Strasberg and Nigam H. Shah. To accelerate phenotype discovery, researchers have used machine learning to find patterns in electronic health records, but have often been thwarted by missing data, sparsity, and data heterogeneity. Kim, D., Quinn, J., Pinsky, B., Shah, N. H., Brown, I. Based on clinical observation, we hypothesized that allergic conditions are associated with chronic uveitis in juvenile idiopathic arthritis patients.This study is a retrospective cohort study using Stanford's clinical data warehouse containing data from Lucile Packard Children's Hospital from 2000-2011 to analyze patient characteristics associated with chronic uveitis in a large juvenile idiopathic arthritis cohort. I'm a second year Ph.D. student in the Biomedical Informatics (BMI) Graduate Program at Stanford University (housed in the Department of Biomedical Data Science in Stanford's School of Medicine).I am co-advised by Emma Brunskill (Computer Science) and Nigam Shah … To validate our results, we evaluate our performance on a gold standard of 1698 DDIs curated from existing knowledge bases, as well as with signaling DDI associations directly from FAERS using established methods.Our method achieves good performance, as measured by our gold standard (area under the receiver operator characteristic (ROC) curve >80%), on two independent EHR datasets and the performance is comparable to that of signaling DDIs from FAERS.
