Extracting noun phrases for all of MEDLINE
A natural language parser that could extract noun phrases for all medical texts would be of great utility in analyzing content for information retrieval. We discuss the extraction of noun phrases from MEDLINE, using a general parser not tuned specifically for any medical domain. The noun phrase extr...
Veröffentlicht in: | Proceedings. AMIA Symposium. - 1998. - (1999) vom: 23., Seite 671-5 |
---|---|
1. Verfasser: | |
Weitere Verfasser: | , , |
Format: | Aufsatz |
Sprache: | English |
Veröffentlicht: |
1999
|
Zugriff auf das übergeordnete Werk: | Proceedings. AMIA Symposium |
Schlagworte: | Journal Article Research Support, U.S. Gov't, Non-P.H.S. |
Zusammenfassung: | A natural language parser that could extract noun phrases for all medical texts would be of great utility in analyzing content for information retrieval. We discuss the extraction of noun phrases from MEDLINE, using a general parser not tuned specifically for any medical domain. The noun phrase extractor is made up of three modules: tokenization; part-of-speech tagging; noun phrase identification. Using our program, we extracted noun phrases from the entire MEDLINE collection, encompassing 9.3 million abstracts. Over 270 million noun phrases were generated, of which 45 million were unique. The quality of these phrases was evaluated by examining all phrases from a sample collection of abstracts. The precision and recall of the phrases from our general parser compared favorably with those from three other parsers we had previously evaluated. We are continuing to improve our parser and evaluate our claim that a generic parser can effectively extract all the different phrases across the entire medical literature |
---|---|
Beschreibung: | Date Completed 01.02.2000 Date Revised 13.11.2018 published: Print Citation Status MEDLINE |
ISSN: | 1531-605X |