José Alejandro Reyes Ortiz, Beatriz Adriana González Beltrán, Mireya Tovar Vidal



El procesamiento automático de textos clínicos ha tomado relevancia en los últimos años, debido a que, diariamente, se genera una gran cantidad de información electrónica que no está estructurada. Este procesamiento puede apoyar a la toma de decisiones clínicas para establecer un tratamiento o realizar un diagnóstico. Este artículo presenta un enfoque de clasificación supervisada de reportes clínicos mediante el algoritmo de Máquinas de Soporte Vectorial (MSV). Se utiliza información lingüística de los textos, con la finalidad de apoyar el diagnóstico de cuatro tipos de cáncer: estómago, pulmonar, cáncer de pecho y cáncer de piel. Una evaluación de información lingüística como el uso de verbos, sustantivos y adjetivos fue desempeñada sobre el conjunto de reportes clínicos. Los resultados de la evaluación de nuestro enfoque son prometedores y proporcionan un referente como herramienta para el procesamiento de textos clínicos en apoyo a los diagnósticos clínicos.

Palabras Claves: Apoyo al diagnóstico de cáncer, características lingüísticas, clasificación de textos, procesamiento de lenguaje natural.



Automatic processing of clinical texts has become relevant in recent years, due to the large amount of electronic and unstructured data that is produced daily. This processing can support clinical decision making such as establishing a treatment or providing a diagnosis. This paper presents a supervised classification of clinical reports using the Support Vector Machine (SVM) algorithm and linguistic information from texts, in order to support the diagnosis of four types of cancer: digestive cancer, lung cancer, breast cancer and skin cancer. An evaluation of linguistic information such as the use of verbs, nouns and adjectives was performed. Evaluation results of our approach are promising and serve as a reference to the processing of clinical texts as support for clinical diagnoses.

Keywords: Cancer diagnosis support, linguistic features, natural language processing, text classification.

Texto completo:

1347-1361 PDF


Chang, Ch., Lin, Ch. LIBSVM, A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, no. 3, pp. 27-28, 2001.

Divita, G., Carter, M. E., Tran, L. T., Redd, D., Zeng, Q. T., Duvall, S., & Gundlapalli, A. V. v3NLP Framework: Tools to Build Applications for Extracting Concepts from Clinical Text. eGEMs, vol. 4, no. 3, 2016.

Garla, V. N., & Brandt, C., Ontology-guided feature engineering for clinical text classification. Journal of biomedical informatics, vol. 45, no. 5, pp. 992-998, 2012.

Kumar, L. S., & Padmapriya, A. Evidence based subsequent disease extraction from EMR Health Record by Grade Measure. In IEEE Online International Conference on Green Engineering and Technologies (IC-GET), pp. 1-5, 2016.

Garla, V., Taylor, C., & Brandt, C. Semi-supervised clinical text classification with Laplacian SVMs: an application to cancer case management. Journal of biomedical informatics, vol. 46, no. 5, pp. 869-875, 2013.

Garner, S.R. Weka: The Waikato environment for knowledge analysis. In: Proc. of the New Zealand Computer Science Research Students Conference, pp. 57-64, 1995.

Helmut, S., Improvements in Part-of-Speech Tagging with an Application to German. Proceedings of the ACL SIGDAT-Workshop. Dublin, Ireland, 1995.

Jindal, P., & Roth, D., Extraction of events and temporal expressions from clinical narratives. Journal of biomedical informatics, vol. 46, pp. 13-19, 2013.

Kuwayama, K., Miyaguchi, H., Iwata, Y. T., Kanamori, T., Tsujikawa, K., Yamamuro, T., & Inoue, H. Three-step drug extraction from a single sub-millimeter segment of hair and nail to determine the exact day of drug intake. Analytica Chimica Acta, 948, pp. 40-47, 2016.

Ling, Y., Pan, X., Li, G., & Hu, X., Clinical documents clustering based on medication/symptom names using multi-view nonnegative matrix factorization. IEEE transactions on nanobioscience, vol. 14, no. 5, pp. 500-504, 2015.

Ma, L., Wang, Z., & Zhang, Y., Extracting Depression Symptoms from Social Networks and Web Blogs via Text Mining. In International Symposium on Bioinformatics Research and Applications, pp. 325-330, 2017.

Mahmood, A. A., Wu, T. J., Mazumder, R., & Vijay-Shanker, K., Dimex: A text mining system for mutation-disease association extraction. PloS one, vol. 11, no. 4, 2016.

Nguyen, M. T., & Nguyen, T. T., DESRM: a disease extraction system for real-time monitoring. International Journal of Computational Vision and Robotics, vol. 5, no. 3, pp. 282-301, 2015.

Parlak, B., & Uysal, A. K., Classification of medical documents according to diseases. In 23th IEEE Signal Processing and Communications Applications Conference (SIU), pp. 1635-1638, 2015.

Paul, R., & Hoque, A. S. M. L., Clustering medical data to predict the likelihood of diseases. In IEEE Fifth International Conference on Digital Information Management (ICDIM), pp. 44-49, 2013.

Paul, M. J., & Dredze, M., Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models. In HLT-NAACL, pp. 168-178, 2013.

Peters, S. A., Jones, C. R., Ungell, A. L., & Hatley, O. J., Predicting drug extraction in the human gut wall: assessing contributions from drug metabolizing enzymes and transporter proteins using preclinical models. Clinical pharmacokinetics, vol. 55, no. 6, pp. 673-696. 2016.

Riley, D. S., Extracting symptoms from homoeopathic drug provings. British Homoeopathic Journal, vol. 86, no. 4, pp. 225-228. 1997.

Roberts, A., Gaizauskas, R., & Hepple, M., Extracting clinical relationships from patient narratives. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, Association for Computational Linguistics, pp. 10-18, 2008.

Saeed M, Villarroel M, Reisner AT, et al. Multiparameter intelligent monitoring in intensive care II (MIMIC-II): a public-access ICU database. Crit Care Med, vol. 39, pp. 952–60, 2011.

Enlaces refback

  • No hay ningún enlace refback.