Bicocca Open Archive

This paper describes our contribution to the Answer Localization track of the MedVidQA 2022 Shared Task. We propose two answer localization approaches that use only textual information extracted from the video. In particular, our approaches exploit the text extracted from the video's transcripts along with the text displayed in the video's frames to create a set of features. Having created a set of features that represents a video's textual information, we employ four different models to measure the similarity between a video's segment and a corresponding question. Then, we employ two different methods to obtain the start and end times of the identified answer. One of them is based on a random forest regressor, whereas the other one uses an unsupervised peak detection model to detect the answer's start time. Our findings suggest that for this task, leveraging only text-related features (transmitted either verbally or visually) and using a small amount of training data, lead to significant improvements over the benchmark Video Span Localization model that is based on deep neural networks.

Kusa, W., Peikos, G., Espitia, O., Hanbury, A., Pasi, G. (2022). DoSSIER at MedVidQA 2022: Text-based Approaches to Medical Video Answer Localization Problem. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp.432-440). Association for Computational Linguistics [10.18653/v1/2022.bionlp-1.43].

DoSSIER at MedVidQA 2022: Text-based Approaches to Medical Video Answer Localization Problem

Kusa, W;Peikos, G;Espitia, O;Hanbury, A;Pasi, G

2022

Abstract

This paper describes our contribution to the Answer Localization track of the MedVidQA 2022 Shared Task. We propose two answer localization approaches that use only textual information extracted from the video. In particular, our approaches exploit the text extracted from the video's transcripts along with the text displayed in the video's frames to create a set of features. Having created a set of features that represents a video's textual information, we employ four different models to measure the similarity between a video's segment and a corresponding question. Then, we employ two different methods to obtain the start and end times of the identified answer. One of them is based on a random forest regressor, whereas the other one uses an unsupervised peak detection model to detect the answer's start time. Our findings suggest that for this task, leveraging only text-related features (transmitted either verbally or visually) and using a small amount of training data, lead to significant improvements over the benchmark Video Span Localization model that is based on deep neural networks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				NLP, answer localization
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				21st Workshop on Biomedical Language Processing, BioNLP 2022 at the Association for Computational Linguistics Conference, ACL 2022 - 26 May 2022
			
	Anno del convegno
	
				2022
			
	Titolo degli atti
	
				Proceedings of the Annual Meeting of the Association for Computational Linguistics
			
	ISBN del volume degli atti
	
				9781955917278
			
	Collana o serie
	
				PROCEEDINGS OF THE CONFERENCE - ASSOCIATION FOR COMPUTATIONAL LINGUISTICS. MEETING
			
	Data di pubblicazione
	
				2022
			
	Pagina iniziale
	
				432
			
	Pagina finale
	
				440
			
	DOI dell'intervento
	
				https://dx.doi.org/10.18653/v1/2022.bionlp-1.43
			
	URL alternativo
	
				https://aclanthology.org/2022.bionlp-1.43
			
	Fulltext
	
				open
			
	Citazione
	
				Kusa, W., Peikos, G., Espitia, O., Hanbury, A., Pasi, G. (2022). DoSSIER at MedVidQA 2022: Text-based Approaches to Medical Video Answer Localization Problem. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp.432-440). Association for Computational Linguistics [10.18653/v1/2022.bionlp-1.43].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
10281-441079_VoR.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 728.5 kB Formato Adobe PDF Visualizza/Apri	728.5 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/441079

Citazioni

2

1

Social impact