Image Captioning using a Developed Architecture of Deep Neural Networks - دانشکده فنی و مهندسی
Image Captioning using a Developed Architecture of Deep Neural Networks
نوع: Type: thesis
مقطع: Segment: masters
عنوان: Title: Image Captioning using a Developed Architecture of Deep Neural Networks
ارائه دهنده: Provider: zahra famil sattari
اساتید راهنما: Supervisors: Dr. Hassan Khotanlou
اساتید مشاور: Advisory Professors:
اساتید ممتحن یا داور: Examining professors or referees: Dr mansouri , Dr mohammadi
زمان و تاریخ ارائه: Time and date of presentation: 16/2/2022
مکان ارائه: Place of presentation: Departmamt computer
چکیده: Abstract: Image caption generation is an interdisciplinary field of research in machine vision and natural language processing that has attracted much attention. Image Captioning has been used as one of the favorite applications in recent years in many fields such as medical diagnosis, image indexing, image linking text. To generate image captions, it is necessary to identify important objects and their properties and how they relate to each other in an image. And also to generate sentences that are semantically and syntactically correct. Based on the results of studies and reported accuracy, this is a difficult task for the machine to perceive the image like a human. However, artificial intelligence has paved the way for exploration. Most of the proposed methods in the field of image description production follow the encoder-decoder framework. In these methods, each word is generated based on the characteristics of the image and the previously generated words. Given the results obtained in the image description, there is still much room for improvement in the results of evaluation metrics and generating good captions. And also the other challenge that exists is that most of the methods available on the recurrent part of the network have worked on caption generation and ignored the effect of the extracted features. In this proposed method, the encoder-decoder framework is used. The encoder part of the model will use ResNet to extract the global features and the decoder part consists of three important parts: Attention-LSTM, Language-LSTM, and Attention-layer. The attention mechanism uses local evidence to better demonstrate features and reasoning in the generation of image descriptions. Our method was able to improve the evaluation metrics of METEOR, ROUGE well
فایل: ّFile: Download فایل