diff --git a/isicg_msc/abstract.tex b/isicg_msc/abstract.tex new file mode 100644 index 0000000..b394968 --- /dev/null +++ b/isicg_msc/abstract.tex @@ -0,0 +1,25 @@ + +\thispagestyle{empty} + +\begin{center} +\Large +{\bf \textgreek{ΕΥΧΑΡΙΣΤΙΕΣ}}\\[15mm] +\end{center} + +\textgreek{ Θα ήθελα να εκφράσω τις ευχαριστίες μου καταρχήν προς τον επιβλέποντα της διπλωματικής μου εργασίας κ. Κεσίδη Αναστάσιο για τα εποικοδομητικά του σχόλια και τον ορθό τρόπο συγγραφής που μου επέδειξε καθώς και για την ειλικρινέστατη συνεργασία που είχαμε. +Επίσης, οφείλω πολλά στο φίλο και μέντορά μου Δρ. Ευάγγελο Σπύρου, που βρισκόταν πάντα κοντά μου τα τελευταία αυτά χρόνια, που με την καθοδήγηση και την αμέριστη βοήθειά του, έφτασα στο σημείο που βρίσκομαι σήμερα. +Ευχαριστώ την οικογένειά μου που με τη στήριξη και την υπομονή τους με βοηθήσαν να γίνω ένας ολοκληρωμένος άνθρωπος. Τέλος τους φίλους μου, που καταφέραμε να κρατηθούμε όλα αυτά τα χρόνια και στέκονται δίπλα μου στις δύσκολες στιγμές όταν τους έχω ανάγκη.\\ \\Σταύρος Νιάφας\\ +Σεπτέμβριος 2016 +} + +\newpage + +\begin{center} +\Large +{\bf Abstract}\\[15mm] +\end{center} + +During the last few years, the production of digital content has been continuously increasing. Digital cameras have been integrated to computers, mobile phones and tablets and have become interdependent to many daily activities. As a result, the research fields of digital image processing and computer vision have benefited from content availability and many new research topics have arisen. The goal of this Thesis is to tackle the problem of building recognition, i.e., given a query image of a specific building, to retrieve images of the same building within a database. To this goal, we choose to follow a traditional approach of content-based image retrieval. We first extract visual features and then, by imposing geometrical constraints on them we estimate a measure of similarity between two given images. We face both aspects of the aforementioned problem, i.e., detection and retrieval. Moreover, we construct a novel building database, consisting of a set of views from a large number of heterogeneous buildings, under several lighting conditions. We use this dataset to evaluate several setups of the proposed approach. Finally, we create a fully functional web-based image retrieval platform, using state-of-the-art technologies, whose purpose to facilitate experiments while also to serve for demonstration and educational purposes. + + +\vspace{50mm} diff --git a/isicg_msc/conclusions.tex b/isicg_msc/conclusions.tex new file mode 100644 index 0000000..cb1db8b --- /dev/null +++ b/isicg_msc/conclusions.tex @@ -0,0 +1,15 @@ +\chapter{Conslusions}\label{conclusions} + +In this work we explored the problem of information retrieval in the field of image processing. We studied two widely used image descriptors applied in several applications while we chose to evaluate them within several setups in the scope of our building retrieval framework. +We also proposed a novel building database featuring a number of buildings with architectural variations, captured in different illumination conditions under individual viewpoints. +The goal of this work was to construct a challenging subset of experiments in order to extensively evaluate the aforementioned methods. Feature extraction and image matching methodologies were used to forge a ground truth of experimental results in order to extract knowledge through specific measures. +More specific, we carried out the experiment methodology in two individual scenarios, in detection and retrieval. Moreover we conducted these scenarios in a twofold process, defining two subset of experiments: a) selected a handpicked subset of 90 photos, which are actually all captures from 6 different buildings and b) selected a handpicked subset consisted of 60 photos, which are the frontal views of all buildings. Moreover, concerning the descriptors, each of the aforementioned subset delivered in two sets of input parameters. +In case of detection, for first subset of experiments, SURF features in default setting, observed in higher performance. Also peak value of the above measures recorded in one rank higher inlier threshold. On the other hand, in case of retrieval, SIFT features in default setting, proved slightly more appropriate. +Overall SURF features, delivered the experiments faster in throughput time. +Aside from Vyronas database which proved a challenging dataset, the proposed system, recorded in remarkable results when we contaminated the dataset with a number of 1000 and 5000 of Oxford buildings. The reduction of performance was acceptable thus we didn't notice any extravagant case in both scenarios. +Experimental results led us to interesting visual data extraction which are used for demonstration through the proposed web platform. RetBul platform is enhanced with state-of-the-art open source technologies, optimized to achieve the finest performance. + + +\section{Future Work} + +In this thesis, feature extraction techniques have proven liable, though, an exploration and use of other blob based descriptors can be proved vital. Moreover, testing more parameters of each descriptor extraction method is a significant step towards the improvement of the overall system’s performance. On the other hand, the current work can be further extended by involving a bag-of-visual-words framework. The problem then is to properly quantize the descriptors in order to construct an efficient visual vocabulary. Moreover, a weighting scheme, e.g. like tf-idf, could also be incorporated in order to take into account the appearance frequencies of the visual words. It is clear that describing an image by a set of visual words would significantly decrease both the memory requirements as well as the computational effort of the retrieval process, especially if applied in the proposed web application. \ No newline at end of file diff --git a/isicg_msc/doc.tex b/isicg_msc/doc.tex new file mode 100644 index 0000000..3c140ba --- /dev/null +++ b/isicg_msc/doc.tex @@ -0,0 +1,123 @@ +\documentclass[a4paper,12pt,twoside]{report} + +%\usepackage{textcomp} +% FONTS +\usepackage[utf8x]{inputenc} +\usepackage[greek,english]{babel} + +% package to handle graphics +\usepackage{graphicx,subfigure} +% package to handle multiple figures in a minipage +\usepackage{subfigure} +% package to extend math capabilities +\usepackage{mathtools} +\usepackage{amsmath} +\usepackage{chngcntr} +%\usepackage{amsmath,amssymb} +%package to activate XeTeX font manager + +\usepackage{float} +\usepackage{caption} +\usepackage{subcaption} +\usepackage{subfloat} +\usepackage{rotating} +\usepackage{hhline} +\usepackage{multirow} +\usepackage{tabularx} +\usepackage{enumerate} +\usepackage{setspace} + +\onehalfspacing +\usepackage[margin = 2cm]{geometry} + +%\parindent=0in % Ενεργοποιήστε την ακόλουθη γραμμή αν δεν θέλετε στοίχιση στις νέες παραγράφους. + + +% HEADINGS +%\usepackage{sectsty} +\usepackage[normalem]{ulem} + +\usepackage{url} +\usepackage{fancybox} +\usepackage{fancyhdr} +\usepackage{titlesec} + +\renewcommand{\labelitemii}{$\bullet$} +\counterwithout{figure}{chapter} +\counterwithout{table}{chapter} +\setcounter{secnumdepth}{3} + +\begin{document} + + \noindent + \begin{minipage}[c]{0.2\textwidth} %% b or t, default is c + \includegraphics[height=4\baselineskip]{attachments/pictures/athens-logo.jpg} + \end{minipage}% + \begin{minipage}[c][2cm]{0.6\textwidth} + \centering\bfseries\large + \textsc{\Large MASTER ISICG/TIM } \vfill + \end{minipage}% + \begin{minipage}[c]{0.1\textwidth} + \includegraphics[height=2\baselineskip]{attachments/pictures/unilim-logo.png} + \end{minipage} + + + \begin{center} + + \vspace*{1.5cm} + + \textsc{\Large\textbf{ \textgreek{ Πληροφορική, Σύνθεση Εικόνων, Σχεδιασμός Γραφικών, Τεχνολογίες Διαδικτύου και Πολυμέσων} }} \\ + + \vspace*{0.5cm} + + \textsc{\Large\textbf{Informatique, Synthèse d'Images, Conception Graphique, Technologies d’Internet et de Multimédia}} + + \vspace*{4cm} + + \textsc{\LARGE\textbf{\textit{ Image Retrieval Platform for building recognition in urban environments} }} \\ + + \vspace*{0.5cm} + \textsc{\large\textbf{ \textit{Stavros N. Niafas} }} \\ + + \vspace*{3cm} + + \textsc{\large\textbf{Supervisor}}\\ + \textsc{\large Assoc. Prof. Anastasios Kesidis + } + + \vspace*{7cm} + + \textsc{\normalsize{Athens 2016}} + + \end{center} + + \thispagestyle{empty} + + \newgeometry{ + top=1.5in, + bottom=1.5in, + outer=1in, + inner=1in, + } + + +\thispagestyle{empty} + + + +\tableofcontents +\listoffigures +\listoftables + +\include{abstract} +\include{intro} +\include{methodology} +\include{ransac} +\include{experiments} +\include{web_platform} +% \include{discussion} +\include{conclusions} + +\bibliography{docBib} +\bibliographystyle{plain} +\end{document} \ No newline at end of file diff --git a/isicg_msc/docBib.bib b/isicg_msc/docBib.bib new file mode 100644 index 0000000..4e50684 --- /dev/null +++ b/isicg_msc/docBib.bib @@ -0,0 +1,433 @@ +@inproceedings{spyrou2012homography, + title={Homography-based orientation estimation for capsule endoscope tracking}, + author={Spyrou, Evaggelos and Iakovidis, Dimitris K}, + booktitle={Imaging Systems and Techniques (IST), 2012 IEEE International Conference on}, + pages={101--105}, + year={2012}, + organization={IEEE} +} + +@inproceedings{spyrou2013panoramic, + title={Panoramic Visual Summaries for Efficient Reading of Capsule Endoscopy Videos}, + author={Spyrou, Evaggelos and Diamantis, Dimitris and Iakovidis, Dimitris K}, + booktitle={Semantic and Social Media Adaptation and Personalization (SMAP), 2013 8th International Workshop on}, + pages={41--46}, + year={2013}, + organization={IEEE} +} + +@article{zheng2012detection, + title={Detection of lesions during capsule endoscopy: physician performance is disappointing}, + author={Zheng, YuanPu and Hawkins, Lauren and Wolff, Jordan and Goloubeva, Olga and Goldberg, Eric}, + journal={The American journal of gastroenterology}, + volume={107}, + number={4}, + pages={554--560}, + year={2012}, + publisher={Nature Publishing Group} +} + + + +@incollection{bay2006surf, + title={Surf: Speeded up robust features}, + author={Bay, Herbert and Tuytelaars, Tinne and Van Gool, Luc}, + booktitle={Computer Vision--ECCV 2006}, + pages={404--417}, + year={2006}, + publisher={Springer} +} + +@inproceedings{lowe1999object, + title={Object recognition from local scale-invariant features}, + author={Lowe, David G}, + booktitle={Computer vision, 1999. The proceedings of the seventh IEEE international conference on}, + volume={2}, + pages={1150--1157}, + year={1999}, + organization={Ieee} +} + +@inproceedings{viola2001rapid, + title={Rapid object detection using a boosted cascade of simple features}, + author={Viola, Paul and Jones, Michael}, + booktitle={Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on}, + volume={1}, + pages={I--511}, + year={2001}, + organization={IEEE} +} + +@inproceedings{pedersen2011, + title={SURF:Feature detection and description}, + author={Jacob Toft Pedersen}, + %booktitle={Study SURF:Feature detection & description}, + %volume={1}, + %pages={I--511}, + year={2011}, + %organization={IEEE} +} + + + + +@article{scottsmith, + author = "Scott Smith", + year = 2011, + title = "Speeded-Up Robust Features", + url = {http://www.sci.utah.edu/~fletcher/CS7960/slides/Scott.pdf} +} + +@inproceedings{lowe1999object, + title={Object recognition from local scale-invariant features}, + author={Lowe, David G}, + booktitle={Computer vision, 1999. The proceedings of the seventh IEEE international conference on}, + volume={2}, + pages={1150--1157}, + year={1999}, + organization={Ieee} +} + +@article{tomasi1994good, + title={Good features to track}, + author={Tomasi, Carlo and Shi, Jianbo}, + journal={CVPR94}, + volume={600}, + pages={593--593}, + year={1994} +} + +@article{changcorner, + author = "Chang Shu", + year = 2006, + title = "Corner Detection", + url = {http://people.scs.carleton.ca/~c_shu/Courses/comp4900d/notes/lect9_corner.pdf} + pages={12--19} +} + +@article{lowe2004distinctive, + title={Distinctive image features from scale-invariant keypoints}, + author={Lowe, David G}, + journal={International journal of computer vision}, + volume={60}, + number={2}, + pages={91--110}, + year={2004}, + publisher={Springer} +} + + +@article{rostenpoints, + author = "Εdward Rosten, Tom Drummond, Cambridge Univercity", + year = 2012, + title = "Fusing points and lines for high performance real-time tracking", + url = {http://www.edwardrosten.com/work/rosten_2005_tracking_presentation.pdf} + pages={16--23} +} + +@article{bradski2000opencv, + title={The opencv library}, + author={Bradski, Gary and others}, + journal={Doctor Dobbs Journal}, + volume={25}, + number={11}, + pages={120--126}, + year={2000}, + publisher={M AND T PUBLISHING INC} +} + +@article{mysql2004mysql, + title={MySQL database server}, + author={MySQL, AB}, + journal={Internet WWW page, at URL: http://www. mysql. com (last accessed/1/00)}, + year={2004} +} + +@article{otwell2015laravel, + title={Laravel the PHP framework for web artisans}, + author={Otwell, Taylor}, + journal={l{\'\i}nea]. Available: http://laravel. com/docs/5.1}, + year={2015} +} + + +@inproceedings{debian, + author = "Debian Linux https://www.debian.org/distrib/", + url = "https://www.debian.org/distrib/" +} + +@inproceedings{cyclades, + author = "Cyclades https://okeanos.grnet.gr/services/cyclades/", + url = "https://okeanos.grnet.gr/services/cyclades/" +} + +@inproceedings{okeanos, + author = "Okeanos https://okeanos.grnet.gr", + url = "https://okeanos.grnet.gr" +} + +@inproceedings{oxford, + author = "The oxford dataset is available at http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/ (last visit, June. 2016)", + url = "http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/" +} + +@article{koukis2013okeanos, + title={\~{} okeanos: Building a Cloud, Cluster by Cluster.}, + author={Koukis, Vangelis and Venetsanopoulos, Constantinos and Koziris, Nectarios}, + journal={IEEE internet computing}, + volume={17}, + number={3}, + year={2013} +} + + +@article{se2002mobile, + title={Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks}, + author={Se, Stephen and Lowe, David and Little, Jim}, + journal={The international Journal of robotics Research}, + volume={21}, + number={8}, + pages={735--758}, + year={2002}, + publisher={SAGE Publications} +} + +@inproceedings{kato1992database, + title={Database architecture for content-based image retrieval}, + author={Kato, Toshikazu}, + booktitle={SPIE/IS\&T 1992 symposium on electronic imaging: science and technology}, + pages={112--123}, + year={1992}, + organization={International Society for Optics and Photonics} +} + +@inproceedings{agarwal2009building, + title={Building rome in a day}, + author={Agarwal, Sameer and Snavely, Noah and Simon, Ian and Seitz, Steven M and Szeliski, Richard}, + booktitle={2009 IEEE 12th international conference on computer vision}, + pages={72--79}, + year={2009}, + organization={IEEE} +} + +@article{glander2009abstract, + title={Abstract representations for interactive visualization of virtual 3D city models}, + author={Glander, Tassilo and D{\"o}llner, J{\"u}rgen}, + journal={Computers, Environment and Urban Systems}, + volume={33}, + number={5}, + pages={375--387}, + year={2009}, + publisher={Elsevier} +} + +@article{chapelle1999support, + title={Support vector machines for histogram-based image classification}, + author={Chapelle, Olivier and Haffner, Patrick and Vapnik, Vladimir N}, + journal={IEEE transactions on Neural Networks}, + volume={10}, + number={5}, + pages={1055--1064}, + year={1999}, + publisher={IEEE} +} + +@article{rui1999image, + title={Image retrieval: Current techniques, promising directions, and open issues}, + author={Rui, Yong and Huang, Thomas S and Chang, Shih-Fu}, + journal={Journal of visual communication and image representation}, + volume={10}, + number={1}, + pages={39--62}, + year={1999}, + publisher={Elsevier} +} + +@article{flickner1995query, + title={Query by image and video content: The QBIC system}, + author={Flickner, Myron and Sawhney, Harpreet and Niblack, Wayne and Ashley, Jonathan and Huang, Qian and Dom, Byron and Gorkani, Monika and Hafner, Jim and Lee, Denis and Petkovic, Dragutin and others}, + journal={Computer}, + volume={28}, + number={9}, + pages={23--32}, + year={1995}, + publisher={IEEE} +} + +@article{gudivada1995content, + title={Content based image retrieval systems}, + author={Gudivada, Venkat N and Raghavan, Vijay V}, + journal={Computer}, + volume={28}, + number={9}, + pages={18--22}, + year={1995}, + publisher={IEEE} +} + + +@article{spyrou2009concept, + title={Concept detection and keyframe extraction using a visual thesaurus}, + author={Spyrou, Evaggelos and Tolias, Giorgos and Mylonas, Phivos and Avrithis, Yannis}, + journal={Multimedia Tools and Applications}, + volume={41}, + number={3}, + pages={337--373}, + year={2009}, + publisher={Springer} +} + +@article{spyrou2015comparative, + title={Comparative assessment of feature extraction methods for visual odometry in wireless capsule endoscopy}, + author={Spyrou, Evaggelos and Iakovidis, Dimitris K and Niafas, Stavros and Koulaouzidis, Anastasios}, + journal={Computers in biology and medicine}, + volume={65}, + pages={297--307}, + year={2015}, + publisher={Pergamon} +} + +@incollection{lindeberg1994linear, + title={Linear scale-space I: Basic theory}, + author={Lindeberg, Tony and ter Haar Romeny, Bart M}, + booktitle={Geometry-Driven Diffusion in Computer Vision}, + pages={1--38}, + year={1994}, + publisher={Springer} +} + +@inproceedings{neubeck2006efficient, + title={Efficient non-maximum suppression}, + author={Neubeck, Alexander and Van Gool, Luc}, + booktitle={18th International Conference on Pattern Recognition (ICPR'06)}, + volume={3}, + pages={850--855}, + year={2006}, + organization={IEEE} +} + +@article{fischler1981random, + title={Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography}, + author={Fischler, Martin A and Bolles, Robert C}, + journal={Communications of the ACM}, + volume={24}, + number={6}, + pages={381--395}, + year={1981}, + publisher={ACM} +} + +@book{hartley2003multiple, + title={Multiple view geometry in computer vision}, + author={Hartley, Richard and Zisserman, Andrew}, + year={2003}, + publisher={Cambridge university press} +} + +@article{kalantidis2011viral, + title={Viral: Visual image retrieval and localization}, + author={Kalantidis, Yannis and Tolias, Giorgos and Avrithis, Yannis and Phinikettos, Marios and Spyrou, Evaggelos and Mylonas, Phivos and Kollias, Stefanos}, + journal={Multimedia Tools and Applications}, + volume={51}, + number={2}, + pages={555--592}, + year={2011}, + publisher={Springer US} +} + +@inproceedings{wang2011local, + title={Local intensity order pattern for feature description}, + author={Wang, Zhenhua and Fan, Bin and Wu, Fuchao}, + booktitle={2011 International Conference on Computer Vision}, + pages={603--610}, + year={2011}, + organization={IEEE} +} + +@inproceedings{harris1988combined, + title={A combined corner and edge detector.}, + author={Harris, Chris and Stephens, Mike}, + booktitle={Alvey vision conference}, + volume={15}, + pages={50}, + year={1988}, + organization={Citeseer} +} + +@article{matas2004robust, + title={Robust wide-baseline stereo from maximally stable extremal regions}, + author={Matas, Jiri and Chum, Ondrej and Urban, Martin and Pajdla, Tom{\'a}s}, + journal={Image and vision computing}, + volume={22}, + number={10}, + pages={761--767}, + year={2004}, + publisher={Elsevier} +} + +@inproceedings{shi1994good, + title={Good features to track}, + author={Shi, Jianbo and Tomasi, Carlo}, + booktitle={Computer Vision and Pattern Recognition, 1994. Proceedings CVPR'94., 1994 IEEE Computer Society Conference on}, + pages={593--600}, + year={1994}, + organization={IEEE} +} + +@inproceedings{alahi2012freak, + title={Freak: Fast retina keypoint}, + author={Alahi, Alexandre and Ortiz, Raphael and Vandergheynst, Pierre}, + booktitle={Computer vision and pattern recognition (CVPR), 2012 IEEE conference on}, + pages={510--517}, + year={2012}, + organization={Ieee} +} +@inproceedings{figat2014performance, + title={Performance evaluation of binary descriptors of local features}, + author={Figat, Jan and Kornuta, Tomasz and Kasprzak, W{\l}odzimierz}, + booktitle={International Conference on Computer Vision and Graphics}, + pages={187--194}, + year={2014}, + organization={Springer} +} +@incollection{hassaballah2016image, + title={Image Features Detection, Description and Matching}, + author={Hassaballah, M and Abdelmgeid, Aly Amin and Alshazly, Hammam A}, + booktitle={Image Feature Detectors and Descriptors}, + pages={11--45}, + year={2016}, + publisher={Springer} +} + +@article{mikolajczyk2005performance, + title={A performance evaluation of local descriptors}, + author={Mikolajczyk, Krystian and Schmid, Cordelia}, + journal={IEEE transactions on pattern analysis and machine intelligence}, + volume={27}, + number={10}, + pages={1615--1630}, + year={2005}, + publisher={IEEE} +} + +@article{chapelle1999support, + title={Support vector machines for histogram-based image classification}, + author={Chapelle, Olivier and Haffner, Patrick and Vapnik, Vladimir N}, + journal={IEEE transactions on Neural Networks}, + volume={10}, + number={5}, + pages={1055--1064}, + year={1999}, + publisher={IEEE} +} + +@inproceedings{kasutani2001mpeg, + title={The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval}, + author={Kasutani, Eiji and Yamada, Akio}, + booktitle={Image Processing, 2001. Proceedings. 2001 International Conference on}, + volume={1}, + pages={674--677}, + year={2001}, + organization={IEEE} +} + diff --git a/isicg_msc/experiments.tex b/isicg_msc/experiments.tex new file mode 100644 index 0000000..f2b7108 --- /dev/null +++ b/isicg_msc/experiments.tex @@ -0,0 +1,904 @@ +\chapter{Experiments}\label{experiments} + +\section{Introduction} + +%Edw Stayro ua prepei na peis genika oti ua paroysiasoyme ta peiramata poy skopo exoyn thn apotimhsh klp klp. Epeita, ua paroysiaseis ta sections, dhladh oti sto 4.2 ua paroysiasoyme mia nea bash, thn vyronas. Sto 4.3 oti perigrafetai th meuodologia me thn opoia ginan ta peiramata. Sto 4.4 oti paroysiasontai ta apotelesmata klp. Sto 4.5 oti epekteinoyme ta peiramata kai sthn Oxford dataset. Kai oti telos sto 4.6 paroysiazoynme mia syzhthsh sxetika me ta apotelesmata. +%Synepws, h paragrafos poy exeis edw sxetika me thn Oxford klp prepei na mpei sthn arxh toy 4.5. +There exists a wide variety of data sets available on the Internet that can be used as a +benchmark by researchers in the field of image retrieval. In this chapter we present a series of structured experiments in order to evaluate the afforementioned techniques in the previous chapter. Moreover, in Sec.~\ref{vyronas_db} we propose a new dataset, the Vyronas database, under determined structure. Following in Sec.~\ref{evaluation} is introduced the evaluation protocol of the experimental results while in Sec.~\ref{plots} experimental results are presented as well. +In addition, Sec.~\ref{oxfordsec} elaborated with the extension of our proposed dataset and evaluation methodology with Oxford building dataset and finally in Sec.~\ref{exp_discussion} we present and extensive discussion over the experiment evalution. + + +\section{The Vyronas Database}\label{vyronas_db} +%As it has already been mentioned, for the sake of the evaluation of the proposed platform we have created a new building database. +In this thesis a new building database named ``Vyronas database'' is proposed. +%This +The database comprises of 900 photos taken from 60 buildings in the area of Vyronas, Athens, Greece. +Fig.~\ref{fig:map_vyronas} illustrates a map of the locations of all these buildings. +% Moreover in Fig.~\ref{fig:total_seq} we illustrate the frontal face of all buildings. +%As one may easily observe, we have selected various styles of buildings dated between approx. 1970 till today. +The database consists of urban buildings with a variety of architectural specifications, number of floors, construction age, colors, etc. +For each building we have took a series of 15 photos, under 5 viewpoints and 3 illumination conditions. +All photos are taken between April--June, 2016. More specifically, we have taken photos during: +\newpage +\begin{enumerate} + \item morning (approx. between 10:00AM -- 2:00PM), + \item noon (approx. 5:00PM -- 7:00PM), + \item cloudy days (spanned in several daytimes and within 7 days). +\end{enumerate} + +\begin{figure}[ht!] + \centering + \includegraphics[scale=0.45]{attachments/pictures/map_vyronas.png} + \caption{Region of Vyronas, Athens, Greece. The red marks denote the location of the buildings.} + \label{fig:map_vyronas} +\end{figure} + + +Figs.~\ref{fig:building_seq} and \ref{fig:building2_seq} we illustrate the sets of photos taken from 2 different buildings. +We should note that all photos are taken using the same consumer camera, i.e., a Nikon Coolpix P80 (10.1 Mpixels). +We set the JPEG quality to the best available and resolution 2736$\times$3648 px. During the acquisition of dataset we obviously met with various impediments that delayed the process such as the width of the road or the troublesome spots(trees, parked cars) in order to capture the right viewing angles. In addition, in some awkward cases we have been spotted by pedestrians or residents, capturing photos of buildings. + + +The database is freely available for non-commercial use. Specifically, it can be found in the well-known Flickr~\footnote{\url{https://www.flickr.com/photos/139433384@N07/}} website which facilitates image browsing but is also provided as a compressed file containing all photos and the appropriate annotations~\footnote{\url{http://retbul.sniafas.eu/}}. + +%Among our initial goals was to provide a dataset that would be used for building retrieval, thus we have made it public by a) uploading to the well-known Flickr website, in order to facilitate browsing~\footnote{\url{https://www.flickr.com/photos/139433384@N07/}} and b) providing a compressed file containing all photos and the appropriate annotation~\footnote{\url{http://retbul.sniafas.eu/todo}}. + +\begin{figure} %sample of a building + \centering + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house2/22-1.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house2/22-2.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house2/22-3.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house2/22-4.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house2/22-5.jpg}} + + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house2/22-6.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house2/22-7.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house2/22-8.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house2/22-9.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house2/22-10.jpg}} + + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house2/22-11.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house2/22-12.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house2/22-13.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house2/22-14.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house2/22-15.jpg}} + + \caption{Sample of a unique building in database including all the captures with visual content changes.} + \label{fig:building2_seq} +\end{figure} + + +\section{Evaluation Protocol}\label{evaluation} + +For all experiments we chose not to use the full image size, since as it has been shown e.g., in~\cite{kalantidis2011viral}, medium images sizes are enough for efficient retrieval. +Thus, all images have been resized at a resolution 480$\times$640 pixels. From each image we extracted SIFT and SURF features divided in two individual sets of parameters, in order to distinguish optimal results. +The performance of a region descriptor is measured by the matching criterion, i.e., how well the descriptor represents a scene region. This is measured by comparing the number of corresponding regions +obtained with the ground truth and the number of correctly matched regions. Matches are the nearest neighbors in the descriptor space~\cite{mikolajczyk2005performance}. +In this case, the two regions of interest are matched if the Euclidean distance between their descriptors \texttt{$Descriptor_a$} and \texttt{$Descriptor_b$} is below a threshold $\tau = 0.75$ +% As we mention in Section~\ref{descriptors} we evaluate the problem with the following variations: +For each image dataset, two different evaluation scenarios are applied: + +% \begin{itemize} +% \item a \textit{detection} problem: the purpose on the given query to the system is to define the number of inlier threshold in order to get the best possible F Measure performance. +% \item a \textit{retrieval} problem: the purpose on the given query to the system is to define the number of inlier threshold in order to get the best possible MAP performance. +% \end{itemize} +\begin{itemize} + \item \textit{Detection} scenario: Evaluate the system considering it as a detection problem. For a given query the system returns the set of all images that are relevant to this query. The criterion used is the number of matching inliers. The evaluation is based on the precision, and recall measures and aims to define the inliers threshold that provides the best performance in terms of the F-Measure measure. + + \item \textit{Retrieval} scenario: Evaluate the system considering it as a retrieval problem. For a given query the system calculates a relevance score for each image in the dataset and returns a ranked list of all images based on their score. Again, the criterion used is the number of matching inliers and the aim is to define the inliers threshold that provides the best performance in terms of theThe evaluation is based on the Mean Average Precision (MAP) measure. +\end{itemize} + +For the SIFT~\footnote{\url{http://docs.opencv.org/3.1.0/d5/d3c/classcv_1_1xfeatures2d_1_1SIFT.html}} algorithm we used the following set of parameters: + +\begin{enumerate}[(a)] + \item Default values + \item Default values + contrastThreshold : 0.08 +\end{enumerate} + +while for the SURF~\footnote{\url{http://docs.opencv.org/3.1.0/d5/df7/classcv_1_1xfeatures2d_1_1SURF.html}} features we used: +\begin{enumerate}[(a)] + \item Default values. + \item Default values + Upright : true (U-SURF) +\end{enumerate} +% \begin{enumerate}[(a)] +% \item Number of features : \textbf{auto}, OctaveLayers : \textbf{auto}, contrastThreshold : 0.04, edgeThreshold : \textbf{auto}, sigma : \textbf{auto} +% \item Number of features : \textbf{auto}, OctaveLayers : \textbf{auto}, contrastThreshold : 0.08, edgeThreshold : \textbf{auto}, sigma : \textbf{auto} +% \end{enumerate} +% while for the SURF features we used: +% \begin{enumerate}[(a)] +% \item HessianThreshold : 200, nOctaves : 3, nOctaveLayers : \textbf{auto}, extended : \textbf{false} ,upright : \textbf{false} +% \item HessianThreshold : 200, nOctaves : 3, nOctaveLayers : \textbf{auto}, extended : \textbf{false} ,upright : \textbf{true} +% \end{enumerate} +Concerning the SIFT features we chose to alter the ``contrast threshold'' parameter as it is used to filter out weak features in semi-uniform (low-contrast) regions. The larger the threshold, the less features are produced by the detector. +While in the SURF features, we chose to alter the ``upright'' parameter as it switches between the computation of orientation of each feature or not. + +We kept all features of every image stored in the local filesystem. +For the evaluation of the system two main experiments with different image query subsets are defined, each one serving a different purpose. +\begin{enumerate} + \item a handpicked subset consisted of 90 photos which are actually all photos from 6 different buildings. With this experiment we try to investigate how well the images related to a building are retrieved when single photos from different views of this building are used as queries. + For this purpose the 6 buildings are selected to be as visually heterogeneous as possible. + This subset is illustrated in Fig.~\ref{fig:exp2_seq}. + \item a handpicked subset consisted of 60 photos that illustrate the frontal view of all buildings. + Thus, for each building a single query photo is applied. The experiment's purpose is to examine how well the frontal view of a house can be used in order to retrieve all the images of a given building. This subset is illustrated in Fig.~\ref{fig:total_seq}. +\end{enumerate} +%We did not use all images as queries, but rather we selected two subsets: +% \begin{enumerate} +% \item a handpicked set consisting of \textbf{90} photos which are actually all photos from 6 buildings. We have chosen these buildings so as to be as heterogeneous as possible. This set is illustrated in Fig.~\ref{fig:exp2_seq}. +% \item a handpicked set consisting of \textbf{60} photos that illustrate the frontal faces of all buildings taken either in morning or in noon. The choice of which photo to keep each time was empirical, i.e., we kept the one that seemed to be the ``best'' to an experienced human observer. This set is illustrated in Fig.~\ref{fig:total_seq}. +% \end{enumerate} +For the evaluation of the results we chose to use the following well-known metrics \textit{Precision, Recall and F-Measure}: %precision, recall, f-measure fractions +\begin{enumerate} + \item Precision + \begin{equation}\label{eq:precision} + \textit{P} = \frac{\text{\# of relevant building retrieved images}}{\text{\# of total retrieved images}} + \end{equation} + + \item Recall + \begin{equation}\label{eq:recall} + \textit{R} = \frac{\text{\# of relevant building retrieved images}}{\text{\# of total relevant images}} + \end{equation} + \item F-Measure + \begin{equation}\label{eq:fmeasure} + \textit{F} = \frac{2\cdot P\cdot R}{P+R} + \end{equation} +\end{enumerate} + +\textit{Precision} (Eq.~\ref{eq:precision}) is the number of \textit{relevant} retrieved images with respect to the total number of +retrieved images while \textit{Recall} (Eq.~\ref{eq:recall}) is the number of \textit{relevant} retrieved images with respect to the +total number of corresponded available images of the same building. +The \textit{F-Measure} (Eq.~\ref{eq:fmeasure}) +% is a metric of each subset's experiment accuracy and is defined as the weighted +provides a weighted harmonic mean of the precision and recall of the experiment. +In the following section~\ref{plots}, the F-Measure, declares the highest harmonic mean between Precision and Recall, in each threshold of inliers. + +We should note that each time the query image is removed from the results, as it is obviously returned with the highest score. +%\textbf{TODO map fmeasure-best} + +We also present according to each subset of experiment in Sections \label{detection_exp1},\label{detection_exp2},\label{retrieval_exp1},\label{retrieval_exp2} the following types of figures:\\ +For the Detection Scenario in Section~\ref{detection_scenario}:\\ +\begin{itemize} + \item Figure with 6 sub-images for each one of the 6 buildings. Each image depicts the query(view) that gives the best F-Measure performance, providing in their sublabel the corresponding F-Measure and Precision value. (experiment 1) + \item Plot with the building id number versus the F-Measure value at each descriptor setting according to the peak inlier threshold. (experiment 1) + \item Bar plot at each descriptor setting according to the peak inlier threshold with the building id number versus the mean F-Measure value for overall of its queries. (experiment 2) + +\end{itemize} + +For the Retrieval Scenario in Section~\ref{retrieval_scenario}: \\ +\begin{itemize} + \item Plot with the building id number versus the mean Average Precision value at each descriptor setting according to the peak inlier threshold. (experiment 2) + \item Plot with 11 point Precision-Recall curve for all the results. (experiments 1,2) + \item Bar plot at each descriptor setting according to the peak inlier threshold with the building id number versus the mean average Precision value for overall of its queries. (experiment 2) + +\end{itemize} + +\begin{figure}[ht!] %exp1 building figures + \centering + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/3/3-1.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/3/3-2.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/3/3-3.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/3/3-4.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/3/3-5.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/3/3-6.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/3/3-7.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/3/3-8.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/3/3-9.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/3/3-9.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/3/3-11.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/3/3-12.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/3/3-13.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/3/3-14.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/3/3-15.jpg}} + + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/14/14-1.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/14/14-2.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/14/14-3.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/14/14-4.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/14/14-5.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/14/14-6.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/14/14-7.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/14/14-8.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/14/14-9.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/14/14-9.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/14/14-11.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/14/14-12.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/14/14-13.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/14/14-14.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/14/14-15.jpg}} + + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/16/16-1.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/16/16-2.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/16/16-3.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/16/16-4.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/16/16-5.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/16/16-6.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/16/16-7.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/16/16-8.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/16/16-9.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/16/16-9.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/16/16-11.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/16/16-12.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/16/16-13.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/16/16-14.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/16/16-15.jpg}} + + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/23/23-1.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/23/23-2.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/23/23-3.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/23/23-4.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/23/23-5.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/23/23-6.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/23/23-7.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/23/23-8.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/23/23-9.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/23/23-9.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/23/23-11.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/23/23-12.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/23/23-13.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/23/23-14.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/23/23-15.jpg}} + + \subfigure{\includegraphics[width=9mm]{attachments/images/single_house/41-1.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/single_house/41-2.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/single_house/41-3.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/single_house/41-4.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/single_house/41-5.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/single_house/41-6.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/single_house/41-7.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/single_house/41-8.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/single_house/41-9.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/single_house/41-9.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/single_house/41-11.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/single_house/41-12.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/single_house/41-13.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/single_house/41-14.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/single_house/41-15.jpg}} + + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/62/62-1.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/62/62-2.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/62/62-3.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/62/62-4.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/62/62-5.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/62/62-6.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/62/62-7.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/62/62-8.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/62/62-9.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/62/62-9.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/62/62-11.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/62/62-12.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/62/62-13.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/62/62-14.jpg}} + \subfigure{\includegraphics[width=9mm]{attachments/images/exp2_buildings/62/62-15.jpg}} + + \caption{The 1st subset consists of 6 buildings with 15 images each.} + \label{fig:exp2_seq} +\end{figure} +\newpage +\begin{figure}[ht!] %exp2 building figures + \centering + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/1.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/2.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/3.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/4.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/5.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/6.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/7.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/8.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/9.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/10.jpg}} + + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/11.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/12.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/13.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/14.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/15.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/16.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/17.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/18.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/19.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/20.jpg}} + + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/21.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/22.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/23.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/24.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/25.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/26.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/27.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/28.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/29.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/30.jpg}} + + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/31.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/32.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/33.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/34.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/35.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/36.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/37.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/38.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/39.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/40.jpg}} + + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/41.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/42.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/43.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/44.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/45.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/46.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/47.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/48.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/49.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/50.jpg}} + + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/51.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/52.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/53.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/54.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/55.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/56.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/57.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/58.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/59.jpg}} + \subfigure{\includegraphics[width=10mm]{attachments/images/total_faces/60.jpg}} + + + \caption{The 2nd subset consists of the front views from all 60 buildings.} + \label{fig:total_seq} +\end{figure} + + \newgeometry{ + top=0.8in, + bottom=1.0in, + outer=0.7in, + inner=0.7in, + } +\section{Plots}\label{plots} +% We present the extensive evaluation protocol over the next section. +% The two individual subsets of experiments are denoted as ``Experiment 1'' and ``Experiment 2'' in sections~\ref{exp1},~\ref{exp2} respectively. + This section presents the experimental results for both evaluation scenarios when applied to the Vyronas database. + \subsection{Detection Scenario}\label{detection_scenario} %% exp1 + \subsubsection{Experiment 1}\label{detection_exp1} + + Starting with the 1st experiment that uses as queries all the 15 views from 6 buildings. + Fig.~\ref{fig:exp1_sift_a} depicts the precision,recall and F-Measure + when the SIFT descriptor is used with the default parameters.It is clear that for each building the 15 views used as queries didn't provide the same performance.That is, some views are more capable in retrieving the relevant building images than others. Fig.~\ref{fig:exp1_sift(a)_f} shows for each building the query that provided the best performance in terms of F-Measure. + + \begin{figure}[hb!] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp1/results/exp1_sift(a).png} + \caption{Precision, Recall, F-Measure plot lines for the $1^\text{st}$ experiment using the SIFT descriptor with default parameters.} + \label{fig:exp1_sift_a} + \end{figure} + \newpage + +% \newgeometry{ +% top=0.8in, +% bottom=1.0in, +% outer=0.7in, +% inner=0.7in, +% } + \begin{figure}[ht!] %exp1 building figures + \centering + \subfigure[\scriptsize{F-Measure: 0.667}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/sift(a)/3.jpg}} + \subfigure[\scriptsize{F-Measure: 0.727}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/sift(a)/14.jpg}} + \subfigure[\scriptsize{F-Measure: 0.667}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/sift(a)/16.jpg}} + + \subfigure[\scriptsize{F-Measure: 0.667}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/sift(a)/23.jpg}} + \subfigure[\scriptsize{F-Measure: 0.727}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/sift(a)/41.jpg}} + \subfigure[\scriptsize{F-Measure: 0.750}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/sift(a)/62.jpg}} + \caption{The query that provided the best F-Measure results, for each one of the six buildings for the $1^\text{st}$ experiment using the SIFT descriptor with default parameters.} + \label{fig:exp1_sift(a)_f} + \end{figure} + + In Figs.~\ref{fig:exp1_sift_b} and~\ref{fig:exp1_sift(b)_f} the same results are presented using SIFT descriptors with contrastThreshold$=$0.08. It can be seen that the best performance occurs for an inlier's threshold equal to 8 instead of 9 with a slightly lower F-Measure value, though. Comparing Figs.~\ref{fig:exp1_sift(a)_f} and~\ref{fig:exp1_sift(b)_f} we can see that only in two cases (i.e. building (d) and (f) ) the query providing the best results remains the same. + + \newpage + + \begin{figure}[ht!] %SIFT1 b + \centering + \includegraphics[scale=0.8]{attachments/plots/exp1/results/exp1_sift(b).png} + \caption{Precision, Recall, F-Measure plot lines for the $1^\text{st}$ experiment using the SIFT descriptor with contrastThreshold$=$0.08.} + \label{fig:exp1_sift_b} + \end{figure} + + \begin{figure}[H] %exp1 6subimages sift + \centering + \subfigure[\scriptsize{F-Measure: 0.636}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/sift(b)/3.jpg}} + \subfigure[\scriptsize{F-Measure: 0.621}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/sift(b)/14.jpg}} + \subfigure[\scriptsize{F-Measure: 0.609}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/sift(b)/16.jpg}} + + \subfigure[\scriptsize{F-Measure: 0.560}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/sift(b)/23.jpg}} + \subfigure[\scriptsize{F-Measure: 0.783}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/sift(b)/41.jpg}} + \subfigure[\scriptsize{F-Measure: 0.621}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/sift(b)/62.jpg}} + \caption{The query that provided the best F-Measure results, for each one of the six buildings for the $1^\text{st}$ experiment using the SIFT descriptor with contrastThreshold$=$0.08.} + \label{fig:exp1_sift(b)_f} + \end{figure} + \newpage + + We proceed likewise with the SURF descriptor. Fig.~\ref{fig:exp1_surf_a} depicts the precision, recall and F-measure in the default parameters. In Fig.~\ref{fig:exp1_surf(a)_f} it can be seen only the (b) and (d) building views are different comparing with the Fig.~\ref{fig:exp1_sift(a)_f} , while the (f) building remains the same with the highest F-Measure performance. + \begin{figure}[ht!] %exp1 surfa + \centering + \includegraphics[scale=0.8]{attachments/plots/exp1/results/exp1_surf(a).png} + \caption{Precision, Recall, F-Measure plot lines for the $1^\text{st}$ experiment using the SURF descriptor with default parameters.} + \label{fig:exp1_surf_a} + \end{figure} + \newpage + \begin{figure}[ht!] %exp1 6subimage figures + \centering + \subfigure[\scriptsize{F-Measure: 0.667}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/surf(a)/3.jpg}} + \subfigure[\scriptsize{F-Measure: 0.667}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/surf(a)/14.jpg}} + \subfigure[\scriptsize{F-Measure: 0.600}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/surf(a)/16.jpg}} + + \subfigure[\scriptsize{F-Measure: 0.727}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/surf(a)/23.jpg}} + \subfigure[\scriptsize{F-Measure: 0.783}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/surf(a)/41.jpg}} + \subfigure[\scriptsize{F-Measure: 0.880}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/surf(a)/62.jpg}} + \caption{The query that provided the best F-Measure results, for each one of the six buildings for the $1^\text{st}$ experiment using the SURF descriptor with default parameters.} + \label{fig:exp1_surf(a)_f} + \end{figure} + + In Figs.~\ref{fig:exp1_surf_b} and~\ref{fig:exp1_surf(b)_f} the same results are presented using SURF descriptor with the upright parameter enabled. + It can be seen that the best performance occurs for an inlier’s threshold equal to 9 instead of 10 with a slightly lower F-Measure value, though. Comparing Figs.~\ref{fig:exp1_surf(a)_f} and~\ref{fig:exp1_surf(b)_f} we can see that only + in three cases (i.e. buildings (b),(c) and (d)) the best view is different while there + are two queries (i.e buildings (e) and (f)) with the highest F-Measure performance. + \newpage + + \begin{figure}[ht!] %exp1 surfb + \centering + \includegraphics[scale=0.8]{attachments/plots/exp1/results/exp1_surf(b).png} + \caption{The query that provided the best F-Measure results, for each one of the six buildings for the $1^\text{st}$ experiment using the SURF descriptor with the upright parameter enabled.} + \label{fig:exp1_surf_b} + \end{figure} + + \begin{figure}[H] + \centering + \subfigure[\scriptsize{F-Measure: 0.636}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/surf(b)/3.jpg}} + \subfigure[\scriptsize{F-Measure: 0.571}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/surf(b)/14.jpg}} + \subfigure[\scriptsize{F-Measure: 0.696}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/surf(b)/16.jpg}} + + \subfigure[\scriptsize{F-Measure: 0.545}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/surf(b)/23.jpg}} + \subfigure[\scriptsize{F-Measure: 0.783}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/surf(b)/41.jpg}} + \subfigure[\scriptsize{F-Measure: 0.783}]{\includegraphics[width=30mm]{attachments/plots/6subimages/exp1/surf(b)/62.jpg}} + \caption{The query that provided the best F-Measure results, for each one of the six buildings for the $1^\text{st}$ experiment using the SURF descriptor with the upright parameter enabled.} + \label{fig:exp1_surf(b)_f} + \end{figure} + \newpage + + Fig.~\ref{fig:exp1_bestfm} summarizes the highest F-Measure values recorded for each one of the six buildings. It turns out that in most of the cases the default SURF features provide the best results. It is also evident that while in some cases (ie. buildings 3 and 41) the four variants of SIFT and SURF provide approximately similar results, there are other cases like building 62 where the choice of keypoint detector is critical with SURF providing substantially better results. + \begin{figure}[ht!] %% exp1 fmeasure + \centering + \includegraphics[scale=0.9]{attachments/plots/exp1/fmeasure/bestfm.png} + \caption{Each point represents the query per building with the highest F-Measure value by the best inlier peak.} + \label{fig:exp1_bestfm} + \end{figure} + + \newpage + \subsubsection{Experiment 2}\label{detection_exp2} %%% detection - exp2 sift + Continuing with the 2nd experiment that uses as queries the front view from all the 60 buildings. Fig.~\ref{fig:exp2_sift_a} depicts the Precision, Recall and F-Measure when the SIFT descriptor is used with the default parameters. + Obviously, the 60 views used as queries provide different performances. Figs.~\ref{fig:exp2_sifta_fm},~\ref{fig:exp2_siftb_fm} shows for each building the query the corresponed F-Measure performance for SIFT descriptor in default and tweaked values accordingly. + It is clear that also in Experiment 2, SIFT with default parameters measured in higher performance considering the Recall and F-Measure metric than in contrastThreshold setting, while it is slightly lower in Precision metric in inlier's threshold bellow 9. + + \begin{figure}[hb!] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp2/results/exp2_sift(a).png} + \caption{Precision, Recall, F-Measure plot lines for the $2^\text{nd}$ experiment using the SIFT descriptor with default parameters.} + \label{fig:exp2_sift_a} + \end{figure} + + \begin{figure}[ht!] %%% f measure sift + \centering + \includegraphics[scale=0.8]{attachments/plots/exp2/fmeasure/fmeasuresift(a).png} + \caption{F-Measure value performance, for the $2^\text{nd}$ experiment using SIFT descriptor in default setting.} + \label{fig:exp2_sifta_fm} + \end{figure} + + \begin{figure}[H] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp2/results/exp2_sift(b).png} + \caption{Precision, Recall, F-Measure plot lines for the $2^\text{nd}$ experiment using the SIFT descriptor with ContrastThreshold=0.08.} + \label{fig:exp2_sift_b} + \end{figure} + + \begin{figure}[H] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp2/fmeasure/fmeasuresift(b).png} + \caption{F-Measure value performance, for the $2^\text{nd}$ experiment using SIFT descriptor with contrastThreshold=0.08.} + \label{fig:exp2_siftb_fm} + \end{figure} + + We proceed accordingly for the SURF descriptor. We can observe, in contrast to the experiment 1, that the (b) setting with upright enabled, performs better than the default setting. Comparing Figs.~\ref{fig:exp2_surf_a} and~\ref{fig:exp2_surf_b} the best performance occurs to the latest, along with the inlier's threshold equal to 11 instead of 10. + + \begin{figure}[H] %% detection - exp2 surf + \centering + \includegraphics[scale=0.8]{attachments/plots/exp2/results/exp2_surf(a).png} + \caption{ Precision, Recall, F-Measure plot lines for the $2^\text{nd}$ experiment using SURF descriptor with default parameters.} + \label{fig:exp2_surf_a} + \end{figure} + + \begin{figure}[H] %%% fmeasure surf %%% + \centering + \includegraphics[scale=0.8]{attachments/plots/exp2/fmeasure/fmeasuresurf(a).png} + \caption{F-Measure value performance for individual building for overall queries across the highest, using SURF descriptor in default setting.} + \label{fig:exp2_surfa_fm} + \end{figure} + + \begin{figure}[ht!] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp2/results/exp2_surf(b).png} + \caption{ Precision, Recall, F-Measure plot lines for the $2^\text{nd}$ experiment, using SURF descriptor with upright parameter enabled.} + \label{fig:exp2_surf_b} + \end{figure} + + \begin{figure}[H] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp2/fmeasure/fmeasuresurf(b).png} + \caption{F-Measure value performance for individual building for overall queries across the highest using SURF descriptor with upright parameter enabled.} + \label{fig:exp2_surfb_fm} + \end{figure} + \newpage + \subsection{Retrieval Scenario}\label{retrieval_scenario} %% retrieval_scenario + + In case of Retrieval Scenario, for experiment 1, Fig.~\ref{fig:exp1_bestmap} shows the mAP per building in each of descriptor settings. SIFT descriptor in contrastThreshold setting indicates the lowest results overall along with the inlier's threshold. + SIFT in default setting provides higher performance than SURF with upright enabled when both of them are measured in equal inlier's threshold. + Fig.~\ref{fig:exp1_elp} depicts the overall Precision vs Recall 11 points curve where the SURF in default settings performs rather higher in all spectrum in comparison with the remaining descriptors. It is clear that both of the descriptors in default settings perform better than the altered settings. + + + \subsubsection{Experiment 1}\label{retrieval_exp1} + + \begin{figure}[ht!] %exp1 map + \centering + \includegraphics[scale=0.9]{attachments/plots/exp1/map/map.png} + \caption{Each point represents mean average precision value per house in the highest inliers peak in overall descriptor's performance.} + \label{fig:exp1_bestmap} + \end{figure} + \newpage + \begin{figure}[ht!] %exp1 map + \centering + \includegraphics[scale=0.9]{attachments/plots/exp1/elp/exp1_epr.png} + \caption{11 points Precision-Recall curve for 1st experiment subset.} + \label{fig:exp1_elp} + \end{figure} + + \newpage + \subsubsection{Experiment 2}\label{retrieval_exp2} + + Following in experiment 2, a series of average precision plots are depicted for all queries + in each of any descriptor setting. + Closing with the 11 point Precision vs Recall curve, where the SURF with upright parameter enabled setting aside with SURF descriptor overall, measured with best performance. + \begin{figure}[ht!] %%% map sift %%% + \centering + \includegraphics[scale=0.8]{attachments/plots/exp2/map/mapsift(a).png} + \caption{Average Precision value performance for individual building for overall queries across the highest of SIFT descriptor in default setting.} + \label{fig:exp2_sifta_map} + \end{figure} + \newpage + \begin{figure}[ht!] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp2/map/mapsift(b).png} + \caption{Average Precision Measure value performance for individual building for overall queries across the highest of SIFT descriptor in contrastThreshold$=$0.08.} + \label{fig:exp2_siftb_map} + \end{figure} + \begin{figure}[H] %%% map surf %%% + \centering + \includegraphics[scale=0.8]{attachments/plots/exp2/map/mapsurf(a).png} + \caption{Average Precision value performance for individual building for overall queries across the highest of SURF descriptor in default setting.} + \label{fig:exp2_surfa_map} + \end{figure} + + \begin{figure}[t!] %exp1 map + \centering + \includegraphics[scale=0.8]{attachments/plots/exp2/map/mapsurf(b).png} + \caption{Average Precision Measure value performance for individual building for overall queries across the highest of SURF descriptor in contrastThreshold$=$0.08.} + \label{fig:exp2_surfb_map} + \end{figure} + + \begin{figure}[H] + \centering + \includegraphics[scale=0.9]{attachments/plots/exp2/elp/exp2_epr.png} + \caption{11 points Precision-Recall curve for 2nd experiment subset.} + \label{fig:exp1_bestmap} + \end{figure} + + \subsection{Throughput Evaluation} + + Closing the section of Vyronas database evaluation, we provide an series of throughput time tables in each of experiment, regarding the corresponded descriptor setting. + + \begin{table}[H]%% epx 1 sift bench + \centering + \large\begin{tabular}{|c|c|c|} + \hline + \textbf{Subset} & \textbf{Total Throughput time} & \textbf{Mean Throughput time per query}\\ \hline + (a) & $59577.50$s & $661.97$s \\ \hline + (b) & $47104.00$s & $523.38$s \\ \cline{1-3} + \end{tabular} + \caption{Total and Mean throughput time for (a) and (b) subsets of parameters of SIFT descriptor in ``Experiment 1''} + \label{table:exp1_sift_bench} + \end{table} + + + \begin{table}[H] + \centering + \large\begin{tabular}{|c|c|c|} + \hline + \textbf{Subset} & \textbf{Total Throughput time} & \textbf{Mean Throughput time per query}\\ \hline + (a) & $96900.00$s & $1076.67$s \\ + (b) & $41229.75$s & $458.11$s \\ \cline{1-3} + \end{tabular} + \caption{Total and Mean throughput time for (a) and (b) subsets of parameters of SURF descriptor in ``Experiment 1''} + \label{table:exp1_surf_bench} + \end{table} + + \begin{table}[H] %% bench table exp2 sift + \centering + \large\begin{tabular}{|c|c|c|} + \hline + \textbf{Subset} & \textbf{Total Throughput time} & \textbf{Mean Throughput time per query}\\ \hline + (a) & $46422.50$s & $515.81$s \\ + (b) & $28355.00$s & $315.06$s \\ \cline{1-3} + \end{tabular} + \caption{Total and Mean throughput time for (a) and (b) subsets of parameters of SIFT descriptor in ``Experiment 2''} + \label{table:exp2_sift_bench} + \end{table} + + \begin{table}[H] + \centering + \large\begin{tabular}{|c|c|c|} + \hline + \textbf{Subset} & \textbf{Total Throughput time} & \textbf{Mean Throughput time per query}\\ \hline + (a) & $49020.00$s & $817.00$s \\ + (b) & $38159.25$s & $635.99$s \\ \cline{1-3} + \end{tabular} + \caption{Total and Mean throughput time for (a) and (b) subsets of parameters of SURF descriptor in ``Experiment 2''} + \label{table:exp2_surf_bench} + \end{table} + + \newpage + + \section{Oxford Buildings}\label{oxfordsec} + + One popular and widely used for performance evaluation + of detectors and descriptors is the standard Oxford dataset~\cite{oxford}. + The dataset consists of image sets with different geometric and photometric transformations (viewpoint change, scale change, image rotation, image blur, illumination change, and JPEG compression) + and with different scene types (structured and textured scenes). + As an extend to our evaluation pool, we chose to add two subsets of 1000 and 5000 images each, from the oxford buildings dataset. + We use the scheme of experiments subset's as elaborated in Section~\ref{vyronas_db} and extended the database with the Oxford buildings dataset. + In order to eliminate redundant experiment cycles for time consuming purposes, we used only the + best proved subset of input parameters for each descriptor, through the evaluation process of the afforementioned subsets in Sections~\ref{retrieval_exp1},~\ref{retrieval_exp2}. + + Thus, in the next Sections~\ref{detection_scenario_ox},~\ref{retrieval_scenario_ox}, we present the aforementioned metrics of Section~\ref{vyronas_db} for the following descriptor settings: + + \begin{table}[H] + \centering + \large\begin{tabular}{|c|c|c|} + \hline + & SIFT & SURF\\ \hline + Experiment 1 & \textbf{a} & \textbf{a} \\ + Experiment 2 & \textbf{a} & \textbf{b} \\ \cline{1-3} + \end{tabular} + \caption{Total and Mean throughput time for (a) and (b) subsets of parameters of SURF descriptor in ``Experiment 2''} + \label{table:exp2_surf_bench} + \end{table} + + We intentionally omitted the throughput time tables due to the very large number of dataset. + \newpage + \subsection{Detection Scenario}\label{detection_scenario_ox} %% retrieval_scenario + Starting with the applied afforementioned settings, we extract more discreet measurements, + for each case scenarios. Fig.~\ref{fig:oxf_exp1_sifta_1k} depicts the Precision, Recall and F-Measure + in 1k Oxford database when the SIFT descriptor is used in default settings and provided the highest + F-Measure in 9 inlier's threshold while in Fig.~\ref{fig:oxf_exp1_surfa_1k}, SURF descriptor provides the highest F-Measure in 10 along with a slightly higher performance. + + \subsubsection{Oxford 1K}\label{det_ox_1k} + + \begin{figure}[ht!] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/1/results/exp3(1)_1k_sift(a).png} + \caption{Precision, Recall, F-Measure plot lines for the $1^\text{st}$ experiment in Oxford 1k using the SIFT descriptor with default parameters.} + \label{fig:oxf_exp1_sifta_1k} + \end{figure} + \newpage + \begin{figure}[ht!] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/1/results/exp3(1)_1k_surf(a).png} + \caption{Precision, Recall, F-Measure plot lines for the $1^\text{st}$ experiment in Oxford 1k using the SURF descriptor with default parameters.} + \label{fig:oxf_exp1_surfa_1k} + \end{figure} + + \begin{figure}[H] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/2/results/exp3(2)_1k_sift(a).png} + \caption{Precision, Recall, F-Measure plot lines for the $2^\text{nd}$ experiment in Oxford 1k using the SIFT descriptor with default parameters.} + \label{fig:exp3_sift_a1k} + \end{figure} + \begin{figure}[ht!] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/2/results/exp3(2)_1k_surf(b).png} + \caption{Precision, Recall, F-Measure plot lines for the $2^\text{nd}$ experiment in Oxford 1k using the SURF descriptor with upright enabled.} + \label{fig:exp3_surf_b1k} + \end{figure} + + + \begin{figure}[H] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/2/fmeasure/fmeasure_sift_1K.png} + \caption{F-Measure mean value performance for individual building from overall queries, for the 2nd experiment in Oxford 1k buildings of SIFT descriptor in default settings.} + \label{fig:oxf_exp2_fmeasure_sift_1k} + \end{figure} + + \begin{figure}[H] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/2/fmeasure/fmeasure_surf_1K.png} + \caption{F-Measure mean value performance for individual building from overall queries, for the 2nd experiment in Oxford 1k buildings of SURF descriptor with + upright parameter enabled.} + \label{fig:oxf_exp2_fmeasure_surf_1k} + \end{figure} + + \subsubsection{Oxford 5K}\label{det_ox_5k} + Following the same structure for the 5k Oxford database, for the case of the second experiment, + Fig.~\ref{fig:exp3_sift_a5k} the best F-Measure performance occurs for an inlier's threshold equal to 10 while in Fig.~\ref{fig:exp3_surf_b5k} SURF measured in 11 inlier's threshold along with one point of percentage higher. + \newpage + \begin{figure}[ht!] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/1/results/exp3(1)_5k_sift(a).png} + \caption{Precision, Recall, F-Measure plot lines for the $1^\text{st}$ experiment in Oxford 5k using the SIFT descriptor with default parameters.} + \label{fig:oxf_exp1_sifta_1k} + \end{figure} + \begin{figure}[H] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/1/results/exp3(1)_5k_surf(a).png} + \caption{Precision, Recall, F-Measure plot lines for the $1^\text{st}$ experiment in Oxford 5k using the SURF descriptor with default parameters.} + \label{fig:oxf_exp1_surfa_1k} + \end{figure} + + \begin{figure}[H] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/2/results/exp3(2)_5k_sift(a).png} + \caption{Precision, Recall, F-Measure plot lines for the $2^\text{nd}$ experiment in Oxford 5k using the SIFT descriptor with default parameters.} + \label{fig:exp3_sift_a5k} + \end{figure} + \begin{figure}[H] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/2/results/exp3(2)_5k_surf(b).png} + \caption{Precision, Recall, F-Measure plot lines for the $2^\text{nd}$ experiment in Oxford 5k using the SURF descriptor with upright parameter enabled.} + \label{fig:exp3_surf_b5k} + \end{figure} + + \begin{figure}[H] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/2/fmeasure/fmeasure_sift_5K.png} + \caption{F-Measure mean value performance for individual building from overall queries, for the 2nd experiment in Oxford 5k of SIFT descriptor with default settings.} + \label{fig:oxf_exp2_fmeasure_sift_5k} + \end{figure} + + \begin{figure}[H] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/2/fmeasure/fmeasure_surf_5K.png} + \caption{F-Measure mean value performance for individual building from overall queries, for the 2nd experiment in Oxford 5k of SURF descriptor with upright enabled.} + \label{fig:oxf_exp2_fmeasure_surf_5k} + \end{figure} + + An aggregate figure for both of oxford experiments in 1k and 5k depicting the highest F-Measure per building, is shown in Fig.~\ref{fig:oxf_exp1_fmeasure}. + Concerning the 1k, measurements keep a balance between the two descriptors. + While in 5k, SURF descriptor seem to outrun the SIFT. + \begin{figure}[H] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/1/fmeasure/exp3(1)_fmeasure.png} + \caption{Each point represents the query with the highest F-Measure value in the highest inliers peak in overall descriptor's performance for 1st subset of experiments in Oxford buildings.} + \label{fig:oxf_exp1_fmeasure} + \end{figure} + \newpage + \subsection{Retrieval Scenario}\label{retrieval_scenario_ox} %% retrieval_scenario + + In the case of retrieval scenario, studying Figs.~\ref{fig:oxf_exp1_epr_1k} and~\ref{fig:oxf_exp2_epr_1k}, it can be seen that for both of two experiments, SURF descriptor + outruns in all the Recall spectrum. + \subsubsection{Oxford 1K}\label{ret_ox_1k} + + \begin{figure}[ht!] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/1/elp/exp3(1)_epr_1k.png} + \caption{11 points Precision-Recall curve for 1st subset of experiments of Oxford 1K buildings.} + \label{fig:oxf_exp1_epr_1k} + \end{figure} + \newpage + \begin{figure}[ht!] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/2/elp/exp3(2)_epr_1k.png} + \caption{11 points Precision-Recall curve for 2nd subset of experiments of Oxford 1K buildings.} + \label{fig:oxf_exp2_epr_1k} + \end{figure} + + \begin{figure}[H] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/2/map/mapsift_1K.png} + \caption{Average Precision value performance for individual building for overall queries in 2nd subset of experiments in Oxford 1K of SIFT descriptor in default setting.} + \label{fig:oxf_exp2_map_sift_1k} + \end{figure} + + \begin{figure}[ht!] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/2/map/mapsurf_1K.png} + \caption{Average Precision value performance for individual building for overall queries in 2nd subset of experiments in Oxford 1K of SURF descriptor with upright enabled.} + \label{fig:oxf_exp2_map_surf_1k} + \end{figure} + + \subsubsection{Oxford 5K}\label{ret_ox_5k} + + \begin{figure}[H] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/1/elp/exp3(1)_epr_5k.png} + \caption{11 points Precision-Recall curve for 1st subset of experiments of Oxford 5K buildings.} + \label{fig:oxf_exp1_epr_5k} + \end{figure} + + \begin{figure}[H] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/2/elp/exp3(2)_epr_5k.png} + \caption{11 points Precision-Recall curve for 2nd subset of experiments of Oxford 5K buildings.} + \label{fig:oxf_exp2_epr_5k} + \end{figure} + + \begin{figure}[H] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/2/map/mapsift_5K.png} + \caption{Average Precision Measure value performance for individual building for overall queries across the highest of SIFT descriptor in default setting in 2nd subset of experiments for Oxford buildings.} + \label{fig:oxf_exp2_map_sift_5k} + \end{figure} + + \begin{figure}[ht!] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/2/map/mapsurf_5K.png} + \caption{Average Precision Measure value performance for individual building for overall queries across the highest of SURF descriptor with upright enabled setting in 2nd subset of experiments for Oxford buildings.} + \label{fig:oxf_exp2_map_surf_5k} + \end{figure} + + \begin{figure}[H] + \centering + \includegraphics[scale=0.8]{attachments/plots/exp3/1/map/exp3(1)_map.png} + \caption{Each point represents mean average precision value per house in the highest inliers peak in overall descriptor's performance for 1st subset of experiments in Oxford buildings.} + \label{fig:oxf_exp1_map} + \end{figure} + +\section{Discussion}\label{exp_discussion} +We evaluated the aforementioned approaches using two individual settings of input values in both of the descriptors. We should mark that the experiments that used the Oxford dataset have been tackled similarly to the 1st and 2nd Experiment (Sections~\ref{detection_scenario},~\ref{retrieval_scenario}). +%The results are obtained and summarized for section~\ref{exp1} (Experiment 1) in Figs.~\ref{fig:exp1_sift_a}-\ref{table:exp1_surf_bench} where is depicted the performance in mean Precision, Recall and F Measure metrics for each of the descriptor individual setting, a set of six sub images depicting the best performed query, two line plot figures with the best f measure performance at each building best query, in the top peak inlier threshold equally with a mean average precision plot.We close with throughput benchmark tables. +%In section~\ref{exp2}(Experiment 2) Figs.~\ref{fig:exp1_sift_a},~\ref{table:exp2_surf_bench} depict the performance in mean Precision, Recall and F Measure metrics for each of the descriptor individual setting, couple of bar plots in each individual setting the .We close with throughput benchmark tables. + +For both cases of descriptors and for the 1st subset of experiments, we may observe that the default setting (setting ``a'') of parameters, has led in higher performance in the Recall and the F-Measure metrics. Also the peak value of the F-Measure used to select an appropriate inlier threshold was one rank higher compared to the setting ``(b)''. On the other hand, in terms of Precision, setting ``(b)'' has provided slightly increased values of the inlier threshold, for both descriptors. + +In the 2nd subset of experiments and for the case of SIFT features, we did not observe significant difference when compared to the 1st subset. Both settings of parameters have showed similar performance as in the 1st subset, slightly increased as the query images consisted from the best view of each building. + + +On the other hand SURF features, performed significantly different compared to the 1st subset concerning the Recall and F Measure metrics while the Precision line follows the same track. +More specific the setting ``(b)'' of parameters, has led in higher performance in the Recall and F Measure metrics, along with the peak inlier value of F-Measure in one higher rank. +In addition this setting proved the best in this 2nd subset combining fine performance, high peak inlier threshold and throughput time as shown in Figs.~\ref{table:exp2_sift_bench},\ref{table:exp2_surf_bench} among not only the ``(a)'' setting but with SIFT descriptor too, which is remarkable. + + +In Tables~\ref{table:exp1_sift_bench}, \ref{table:exp1_surf_bench}, \ref{table:exp2_sift_bench} and \ref{table:exp2_surf_bench}, is measured the overall throughput time in each of experiment subset respectively. +As expected, the throughput time is much less in the ``(b)'' setting of parameters in any set of experiments for both of the descriptors refering to the Section~\ref{evaluation}. + +In addition, in Section~\ref{evaluation} we provide the rest of the plots accordingly. +For the first case of experiments we illustrate them in Figs.~\ref{fig:exp1_bestfm} and \ref{fig:exp1_bestmap} the highest F Measure value and the highest mean average Precision value, of a certain query per building at each peak inlier threshold, accordingly. +According to the Fig.~\ref{fig:exp1_bestfm}, SIFT descriptor in ``(b)'' setting is observed in the most stable performance around 0.6 percent among all the others while SURF in ``(a)'' setting provides the highest performance. +When in the other hand in Fig.~\ref{fig:exp1_bestmap} SIFT in ``(a)'' setting provides great stability in high performance concerning the Precision. +We should note the the escalation of the rank in inlier in each descriptor has not much affected the performance in F Measure but for the case of mAP the escalation between the same descriptor are really rapid. + +As for the second case of experiments the results are depicted in Figs.~\ref{fig:exp2_sifta_fm}-\ref{fig:exp2_siftb_map} and \ref{fig:exp2_surfa_fm}-\ref{fig:exp2_surfb_map} we observe higher performance in ``(a)'' setting of SIFT descriptor while in SURF we did not observe significant differences besides the ``(b)'' setting, where a cut-off encounters with the lowest performance buildings where one does not satisfies at all the inlier threshold. + +% As it has been previously mentioned, we have also implemented a web platform, which shall be described in detail in the next section~\ref{web}. Using the platform presented in Chapter~\ref{web} and more specifically its ``offline'' section, one may easily reproduce the aforementioned experiments. To tune the platform so as to increase user experience, in terms of ``optimizing'' the returned results, we have used the aforementioned conclusions, to select descriptor settings and an appropriate value of the inlier threshold, for the case of detection. +% % Of course, since experiments have already been offline executed, we chose settings that maximize performance, +% Moreover, since the experiments have been performed offline, the settings that maximize performance are applied, although they required more execution time. On the other hand and for the ``online'' section, i.e., the one that users are allowed to upload their own images, real time experiments take place, thus the faster (in terms of execution time) setting has been adopted. + + +In terms of performance of SIFT versus the SURF descriptors, the latest descriptor produced higher marks in percentages in Recall an F Measure metrics. While SIFT algorithm was able to perform slightly higher in Precision in the middle to lowest inlier threshold values while it can be considered rather faster in respective settings, in throughput time. + +Finally, SURF descriptor can be considered a better option than SIFT, in applications where the visually expected retrieved image can be placed around the same visually expected neighborhood of features as the query image image, observing the results +in the Recall and F Measure metrics. +On the other hand SIFT descriptor can be considered as a better option in cases where +the retrieved results will not be near to the neighborhood of visually expected. + +This conclusion is evaluated in Sec.~\ref{oxfordsec} in a larger dataset, where we used the referenced one and contaminated it with 1000 and 5000 oxford buildings images. +Relating to Oxford experiments, we chose to use, in terms of coherence and due to time consuming experiments, the best setting of parameters of the afforemented discussion. +Both of the descriptors have proven significantly robust, despite the increase of the ``noise'' data. As it was expected for the SIFT we can observe fine and distinct performance versus SURF in Precision metric while in addition note the peak value in a lower inlier threshold in F Measure. +SURF provides higher performance values in Recall while it its peak value in F Measure measured in two ranks greater than SIFT. (e.g. Figs.~\ref{fig:exp3_sift_a1k},~\ref{fig:exp3_surf_b1k}, SIFT inlier's threshold best F-Measure is equal to 9 and 0.55 out of 1 while SURF inlier's threshold best F-Measure is equal to 11 and 0.62 out of 1.) +% We should note that the differences between the 1K and 5K sections stands only in the SURF features that seems + +% Closing, both of descriptors have proven reliable in every aspect of experiments. \ No newline at end of file diff --git a/isicg_msc/intro.tex b/isicg_msc/intro.tex new file mode 100644 index 0000000..cceba93 --- /dev/null +++ b/isicg_msc/intro.tex @@ -0,0 +1,30 @@ +\chapter{Introduction} +\section{Motivation} +The digital revolution has brought revolutionary changes to many aspects of everyday life. Amongst them, of significant importance are digital cameras, which nowadays have also been integrated to personal computers, smartphones, tablets etc., thus have become interdependent to many daily activities. Accordingly, extremely large amounts of digital multimedia content are being produced every moment and even shared within the WWW. + +During the last two decades, the research fields of digital image processing and computer vision have benefited the most from the aforementioned facts and many new research areas have arisen. Amongst them, we could mention multimedia analysis, indexing and retrieval, feature extraction and matching, content representation classification, detection and recognition etc. + +Image retrieval consists of the problem of searching for digital images in large databases. Related research can be classified into two types: text-based image retrieval and content-based image retrieval~\cite{rui1999image}. Text-based image retrieval refers to an image retrieval framework, where images first are annotated manually and text-based Database Management Systems (DBMS) are utilized to perform the retrieval. In response to the rapid increase of the size of image collection, the amount of labor which required for manual annotation was exacerbated while the human perception was subjected to difficulties in order to percieve image discrimination. + +In order to overcome these difficulties, Content-Based Image Retrieval (CBIR)~\cite{gudivada1995content} or Query By Image Content (QBIC)~\cite{flickner1995query} has been proposed. In CBIR, images are automatically annotated with their own visual content by feature extraction process. + +The problem addressed in this thesis is building recognition in urban environments. We may formulate this problem more formally as: ``Given a query image of a specific building, retrieve all images depicting the same building, from a given database.'' Building recognition is motivated by several applications, amongst which we should mention real-time robot localization and visual navigation~\cite{se2002mobile}, architectural design~\cite{kato1992database}, 3D city reconstruction~\cite{agarwal2009building} and visualization~\cite{glander2009abstract}. + +In this work we choose to tackle the aforementioned problem as a typical visual retrieval approach. We shall follow the generic approach of content-based image retrieval, however we will adapt it to the special needs of the given problem and the issues that may arise. More specifically, let us consider two typical photos depicting the same building. Even a small change in viewpoint corresponds to a geometric transformation and may cause severe variations in the visual content. Similarly, when the lighting conditions change (e.g., photos taken during the day vs. photos taken during the night), visual content changes more dramatically. Should we consider typical photos taken within an urban environment, partial occlusion (e.g., due to pedestrians, vehicles etc.) may also distort the visual content. Of course, in real-life cases, the aforementioned issues may arise simultaneously. + +Thus, it is crucial to select and apply techniques that would be able to overcome these difficulties. These techniques should extract features that are robust to the aforementioned problems, i.e. viewpoint variations, illumination changes and partial occlusions. These features should match in a way that a high matching score would be provided, given two images depicting the same building and a low one, given any two other images. There exist several techniques that comply to these limitations. We shall analyze them in Section~\ref{features}. + +The building recognition application that we propose and develop within this thesis consists of feature extraction, representation, matching and selection. This way we calculate a matching score between any two given images and we are able to create and evaluate a retrieval scheme. We also build a web application, which in brief provides the following functionalities: +\begin{itemize} + \item Features offline experimental results querying frontal views of random buildings from the database with the selected image descriptor. + \item Features online experiments where the user can query between a random pair of images from the provided database, or can upload/select an image + to/from the user defined database to query with frontal building views in the available dataset. +\end{itemize} + +For the sake of the evaluation, we also introduce a new dataset consisting of 60 buildings taken in the urban area of Vyronas, Athens, Greece. Photos have been taken for each building for a predefined set of angles and also for different lighting conditions. Upon the completion and presentation of this thesis, we plan to make this dataset public to the research community. Using this dataset we perform an extensive evaluation of the selected techniques, using appropriately selected measures and discuss the results. + +\section{Structure} +The remaining of this thesis is structured as follows: +In Chapter~\ref{features} we introducing feature extraction techniques that are utilized in our methodology along with other popular feature extraction methods. Following, Chapter~\ref{matching} elaborates with the matching techniques along with the matching and homography estimation method. We continue in Chapter~\ref{experiments} where we present our evaluation protocol and methodology presenting an extensive set of figures depicting +the entire behavior of our platform. In Chapter~\ref{web} we implemented a fully functional web platform for the shake of a more intergrated experience through the evaluation and experiment process. +Finally, in Chapter~\ref{conclusions} we present the conclusions as well as a discussion on future extensions of the present work. \ No newline at end of file diff --git a/isicg_msc/methodology.tex b/isicg_msc/methodology.tex new file mode 100644 index 0000000..a9c28b9 --- /dev/null +++ b/isicg_msc/methodology.tex @@ -0,0 +1,201 @@ +\chapter{Feature Extraction}\label{features} + +\section{Introduction} + +In this chapter we describe local feature extraction techniques that are utilized in our framework along with other popular feature extraction methods. Within our framework, interest points are automatically detected from an image. Then, feature vectors are computed at predefined neighborhoods of these interest points. In the search step, each of the feature vectors extracted from a query image votes scores to matched reference features whose feature vectors are similar to the query feature vector. Our local feature-based image retrieval system involves in two important processes: local feature extraction and image representation. In local feature extraction, certain local features are extracted from an image. And then, in image representation, these local features are integrated or aggregated into a vector representation in order to calculate similarity between images. +Following, we present an example of these representations, in each of the image descriptor methods in discreet coloured figures. +%along with image matching results which we investigate thoroughly in Section~\ref{matching}. + +\section{Image description by visual features}\label{descriptors} +Typically, the first step in the majority of image-related computer vision problems is the extraction of a set of visual features, which shall be used for the representation of the visual content that is depicted in the image that will be processed and/or analyzed. Simple problems may be easily tackled by extracting either global (i.e., from the whole image), or local (i.e. from image patches). In the first case a description extraction scheme considers all image pixels and features a \textit{single} vector that describes the whole image. In the latter, descriptions are extracted from image regions (patches, i.e., subsets of the whole image's pixels), that are defined either manually e.g., by imposing a grid or automatically e.g., by a segmentation or a clustering process. Although the extraction using either of these approaches has been proven effective in tasks such as global image classification~\cite{chapelle1999support}, and visual concept detection~\cite{spyrou2009concept}, there exist several reasons for which they are not appropriate for the problem at hand. + +First of all, global features are prone to serious changes, due to e.g. small changes of viewpoint, zooming, changes of illumination, contrast, etc. In Fig.~\ref{fig:building_seq} we present a set of photos similar to those of the dataset which shall be used in our experiments. The same building is depicted in all these photos, however we have on purpose caused the aforementioned changes, which as it can been clearly seen cause serious changes to the visual content. Although one may argue that a human observer may still be able to recognize the building into all these photos, it is obvious that a global descriptor may easily fail, since the visual content of the images changes dramatically, which affects the extracted features. + + +While almost the same stand for local features, these also face an additional problem, which is the way patches are extracted. The most trivial way to extract image patches is by imposing a rectangular grid~\cite{kasutani2001mpeg}. Then, each cell of this grid is a patch, used for feature extraction. Obviously, this approach is naive since even the smallest effects of viewpoint change (e.g., small horizontal translation of the photographer) may dramatically change the content of each grid. A more advanced approach is to apply a segmentation or pixel clustering algorithm and extract a description from each segment/cluster~\cite{spyrou2009concept}. However, it is well-known that image segmentation remains still an unsolved problem and valid solutions always require a set of retreats in the process. Thus it is possible that patches which have resulted upon such a process and from two different viewpoints of the same scene may be significantly different in terms of e.g., shape, size and even visual content. + +To successfully overcome such issues, in this thesis, we choose to work on an approach that is based on extracted \textit{interest} (salient) points/keypoints. These points are locally extracted from images using algorithms that are robust to several transformations, illumination changes and contrast. Typically, these algorithms are divided into two parts. The first part extracts a set of keypoints, along with a region that surrounds them. The second part extracts a description of the visual properties of this region. The goal of these algorithms is to provide the same keypoints even under several distortions as the aforementioned. In practice, this is +%does not happen, +partially achieved, however a subset of the points, between the two images do actually match thus using an appropriate keypoint selection and matching strategy. +This subset is adequate to provide a similarity/dissimilarity measure. We should note that the problem at hand is tackled herein with the following variations: +\begin{itemize} + \item a \textit{detection} problem: given a query the system should return a ranked list of all relevant photos, i.e., those depicting the same building + \item a \textit{retrieval} problem: given a query, the system should return a ranked list of all photos of the database +\end{itemize} + +\begin{figure} + \centering + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house/41-1.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house/41-2.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house/41-3.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house/41-4.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house/41-5.jpg}} + + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house/41-6.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house/41-7.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house/41-8.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house/41-9.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house/41-10.jpg}} + + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house/41-11.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house/41-12.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house/41-13.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house/41-14.jpg}} + \subfigure{\includegraphics[width=20mm]{attachments/images/single_house/41-15.jpg}} + + \caption{Images of the same building depicting several changes in viewpoint, zooming, illumination and contrast.} + \label{fig:building_seq} +\end{figure} + +%In the remaining of this Section we shall describe in detail two interest point extraction approaches +In the following we describe several keypoint detection approaches paying more attention to the Scale Invariant Feature Transform (SIFT) and the Speeded-Up Robust Features (SURF). We also provide a discussion on the advantages and disadvantages of each method, which will be further demonstrated in Section~\ref{experiments} of the experimental results. +%which shall also be demonstrated in Section~\ref{experiments} which describes the experiments performed and the results achieved. + +\section{The SIFT descriptor} + + +The Scale-Invariant Feature Transform (SIFT)~\cite{lowe2004distinctive} descriptor is one of the most widely used +feature descriptors, for tasks such as pattern recognition, image registration, semantic +image analysis, etc, as they have been proven to achieve high +repeatability and distinctiveness. Sometimes they are combined with the other +detectors (e.g. the Harris/Hessian-Affine detectors) as well +as the SIFT detector. In the SIFT descriptor, the orientation +of local region $(x, y, s)$ is estimated before description as +follows. Firstly, the gradient magnitude $m(x, y)$ and orientation $\vartheta(x, y)$ are computed using pixel differences: + \begin{align} + m(x,y) = \left((L(x+1,y) - L(x-1,y)\right)^2 \\ + + L(x, y + 1) − L(x, y − 1))^1)^\frac{1}{2} , \\ + \theta(x,y) = tan^{-1} \left( \frac{ L(x, y + 1) − L(x, y − 1)}{L(x+1,y) - L(x-1,y)}\right) + \end{align} +where $L(x, y)$ denotes the intensity at $(x, y)$ in the image $I$, +smoothed by the Gaussian with the scale parameter corresponding to the detected region. +Then, an orientation histogram is formed from the gradient orientations of sample +pixels within the feature region; the orientation histogram +has 36 bins covering the 360 degree range of orientations. +Each pixel votes a score of the gradient magnitude $m(x, y)$ +weighted by a Gaussian window to the bin corresponding to orientation $\vartheta(x, y)$.The highest peak in the histogram is detected, which corresponds to the dominant direction of local gradients. If any, the other local peaks that are within 75\% of the highest peak are used to create local features with that orientations~\cite{lowe2004distinctive}. + +After the assignment of the orientation, the SIFT descriptors are computed for normalized image patches. +The descriptor is represented by a 3D histogram of gradient location and orientation, where location is quantized into a $4\times 4$ location grid and the orientation is quantized into eight bins, resulting in the 128-dimensional descriptor. For each of sample pixels, the gradient magnitude $m(x, y)$ weighted +by a Gaussian window is voted to the bin corresponding to $(x, y)$ and $\theta(x, y)$ similar to the orientation estimation. In order to handle a small shift, a soft voting is adopted, where +scores weighted by trilinear interpolation are additionally +voted to seven neighbor bins (voted to eight bins in total). +Finally, the feature vector is $l_2$ normalized to reduce the effects of illumination changes. + +We should note +that within this work and for each extracted keypoint we only use its position and its description, i.e., scale and orientation are discarded, since they are of no practical use within the presented framework. + +Examples of the interest points from which the SIFT features are extracted are illustrated in Fig.~\ref{fig:sift_features}, where we use two images of our dataset that depict the same building. The position of each of the interest points is the center of the each coloured circle, i.e. the starting point of the coloured line. + +%% TO matching section +% Fig.~\ref{fig:sift_matches} illustrates both images from Fig.~\ref{fig:sift_features}, where the centers of the strongest interest features identified as inliers are depicted with green dots.Correspondences between inliers are drawn in purple lines while the rest of the unmatched features are depicted with red marks. Upon careful observation, one should easily spot correspondences between features. + +\begin{figure} + \centering + \subfigure{\includegraphics[width=60mm]{attachments/images/sift_features/sift_keypoints1.jpg}} + \qquad + \subfigure{\includegraphics[width=60mm]{attachments/images/sift_features/sift_keypoints2.jpg}} + + \caption{SIFT features. Interest point extraction.} + \label{fig:sift_features} +\end{figure} + +% \begin{figure}[h!] +% \centering +% \includegraphics[scale=0.4]{attachments/images/sift_features/sift_match.jpg} +% \caption{Correspondences between the query image (left) and a similar image (right). Local (SIFT) features identified as inliers are depicted in green dots. Correspondences between inliers are drawn in purple lines.} +% \label{fig:sift_matches} +% \end{figure} + + +\section{The SURF descriptor} +The SURF features~\cite{bay2006surf} have been partially inspired by the SIFT and during the last decade have been successfully adopted in many computer vision related problems, used both to extract a set of keypoints and the visual content of the derived regions surrounding them. It has been shown in many works, e.g., in~\cite{spyrou2015comparative} that they may achieve comparable performance yet requiring less computational time. + +The SURF interest point extraction algorithm proceeds in two distinct steps. The first step is applied in order to detect local interest points, whereas the second step extracts a descriptor from the area surrounding each of them. In order to detect local interest points, the SURF algorithm adopts a fast approximation of the Hessian matrix that exploits integral images. This approximation is responsible for the gain in time. Extracted interest points are the local maxima of the Hessian matrix determinant. This blob response maxima process is carried out on several octaves of a Gaussian scale-space. Moreover, the correct scale of each point is automatically selected also using the Hessian determinant, as introduced in~\cite{lindeberg1994linear}. For exact point localization, an efficient non-maximum suppression algorithm is used at a $3\times 3\times 3$ intra-scale neighborhood~\cite{neubeck2006efficient}. + +The SURF descriptor captures the intensity content distribution around the points detected with the aforementioned process. The first-order Haar wavelet responses are computed with the use of integral images, resulting in a 64-dimensional feature vector. In order to achieve the desired property of rotation invariance, a dominant orientation is determined. This dominant orientation is the direction that maximizes the sum of the Haar wavelet responses in a sliding window of size $\pi /3$, around the neighborhood of each interest point. For the computation of each descriptor, a square area with side equal to $20\times s$ is used, around the corresponding interest point. This area is then divided into $4\times 4$ blocks, with $s$ denoting the interest point scale. This allows the descriptor to be scale invariant. For each one of the 16 blocks, four values are extracted, corresponding to the sum of the $x, y, |x|$ and $|y|$ first-order Haar wavelet responses in a $5\times 5$ grid in the block. To make the descriptor robust to contrast changes, the computed descriptor vector is finally turned into a unit vector. +%\textbf{[\textgreek{Εκανα ότι αλλαγές ήθελα αφού έβαλα τις εικόνες. πες μου για τα χρωματα των φιτσουρς. ειναι καστομ μειντ.}]} +Examples of the interest points from which the SURF features are extracted are illustrated in Fig.~\ref{fig:surf_features}, where we use two images of our dataset that depict the same building. Scale is denoted by the size of the red circle, while orientation, is denoted by the radius of the circle, i.e. the green line. The position of each of the interest points is the center +of the red circle, i.e. the starting point of the green line. + +%% TO matching section +% Fig.~\ref{fig:surf_matches} illustrates both images from Fig.~\ref{fig:surf_features}, where the centers of the strongest interest features identified as inliers are depicted with green dots.Correspondences between inliers are drawn in blue lines while the rest of the unmatched features are depicted with red marks.(i.e., scales and orientations have been omitted) for clarity of presentation. +% Upon careful observation, one should easily spot correspondences between features. + +%\textbf{ πες μου για τα χρωματα των features, πρασινο κοκκινο εδω. είναι custom made (μετάφραση από c++ απο github) γιατι δεν γινόταν αλλιώς. επίσης ειχα δοκιμάσει παρα πολλά χρωματα εκείνη τη μερα και το κίτρινο-πρασινο που εγραφες δεν φαινόταν καλα} +\begin{figure} + \centering + \subfigure{\includegraphics[width=60mm]{attachments/images/surf_features/surf_keypoints1.jpg}} + \qquad + \subfigure{\includegraphics[width=60mm]{attachments/images/surf_features/surf_keypoints2.jpg}} + + \caption{SURF features. Interest point extraction.} + \label{fig:surf_features} +\end{figure} + +% \begin{figure}[h!] +% \centering +% \includegraphics[scale=0.4]{attachments/images/surf_features/surf_match.jpg} +% \caption{Correspondences between the query image (left) and a similar image (right). Local (SURF) features identified as inliers are depicted in green dots. Correspondences between inliers are drawn in blue lines.} +% \label{fig:surf_matches} +% \end{figure} + +\section{Discussion on features} + +There exist many other features that are able to work in our framework. In this subsection we shall attempt to briefly present the most important, based on their popularity and the availability of implementations. + +Local Intensity Order Pattern (LIOP) features~\cite{wang2011local} have been recently proposed and form a local image descriptor, based on the concept of ``local order pattern''. To understand this notion, let us consider a pixel. This pixel has a set of neighbors. These neighbors are sorted by increasing intensity and this way a local order pattern occurs. LIOP are based on the intuitive principle that the relative order of pixel intensities remains unchanged when intensity changes are monotonic. This is the case in typical illumination changes as those of our problem. The algorithm is then applied on a blurred version of the image at an effort to eliminate as much noise as possible. An affine covariant region detector, such as the Harris-Affine detector~\cite{harris1988combined} is then used to localize key-points and their neighborhoods, which are then normalized to circular, fixed-size regions, while their orientations are discarded. At the next step, noise is removed with Gaussian smoothing, resulting to the so called ``local patch''. All pixels in this patch are then sorted by their intensity values and then the patch is equally quantized into $B$ ordinal bins, to compensate for rotation changes. The local intensity order patterns (LIOPs) are constructed using the intensity order of all sampled neighboring points, thus exploiting the local information while providing a rotation-invariant description. Finally, the descriptor is formed by accumulating and concatenating the LIOPs of points in each ordinal bin. +The Maximally Stable Extremal Regions (MSER) features~\cite{matas2004robust} have been proposed by Matas et al. and introduced the notion of extremal regions. They are invariant to affine +transformations, covariant +to adjacency preserving +transformations and show stability and scale invariance. They +have been very popular for fast and efficient blob detection. +The MSER algorithm involves local image binarization using +a predefined set of thresholding values. This way it constructs +a set of local intensity minima. This set grows continuously, +until regions corresponding to two local minima become +adjacent and subsequently merge. The set of maximal regions +is then defined as the set of all connected components that +result from the consecutive thresholdings. Using the inverse of +images, the set of minimal regions is constructed. All regions +are enumerated and finally intensity levels that are local +minima of the rate of change of the area function are selected +as thresholds producing maximally stable extremal regions. + +Good Features to Track (GFtT) have been proposed by Shi +and Tomasi~\cite{shi1994good} and have been a modification of the Harris +corner detector~\cite{harris1988combined} in terms of its scoring function, which has +been shown to significantly improve its results. More +specifically, GFtT define a function in order to express the +notion that corners may be defined as image regions with large +intensity variation across all directions and then maximize it +by applying Taylor Expansion, so as to decide whether an +image window contains a corner, an edge or depicts a flat +region. Since this algorithm only provides a method for keypoint selection, one may used the description extraction of his/her choice in order to capture +the low-level visual properties of the regions surrounding the +extracted corner points. + +FREAK (Fast REtinA Keypoint)~\cite{alahi2012freak} is a binary visual +descriptor, i.e. it provides a description in a binary vector +form. In general, binary descriptors combine fast extraction +and matching times, typically following the same approach: +they use a predefined sampling pattern, a set of sampling pairs +and an orientation compensation method, in order to provide +invariance on rotation. In most cases scaling invariance is +handled by the corresponding key-point detection algorithm. +The FREAK algorithm proposes a circular sampling grid +inspired by the distribution of the receptive fields over the +retina. Sampling points have higher density near the keypoint, +while their density drops exponentially. As for the sampling +pairs, it adopts a learning strategy, i.e., using a set of keypoints, +non-correlated sampling pairs among the set of all +possible pairs of the sampling grid have been selected. A +cascade approach is used for matching and orientation +compensation is performed on a predefined set of 45 +symmetric sampling pairs, by selecting the one with the +largest gradient. + +\section{Discussion} + +It is clear that the selection of the SIFT and SURF low-level feature extraction schemes combines speed (though our application may not be characterized as a real-time one) with robustness to scale, rotation and contrast changes. However, the extraction of the SURF features is rather fast, thus the necessary amount of time is small, when compared to other similar approaches. We should note that this makes them effective for the retrieval of buildings, even if speed is not a priority in the presented application. Moreover, their robustness to image variations ensures that the transformation between two images may always be calculated under certain visual changes, as in the case of the problem under investigation, which is the main reason we use them in our approach. + In the section~\ref{imgmatching} we will provide a further elaboration along with discussion about matching that led us to this selection. \ No newline at end of file diff --git a/isicg_msc/ransac.tex b/isicg_msc/ransac.tex new file mode 100644 index 0000000..5174686 --- /dev/null +++ b/isicg_msc/ransac.tex @@ -0,0 +1,116 @@ +\chapter{Matching}\label{matching} + + +\section{Introduction} + +Image matching can be applied to number of applications that require the functionality of identifying and searching of matching images. +Major challenges for matching images can be considered the illumination variation, viewpoint change or the scale differences that may cause decorrelation between the images. +Given a set of interest points extracted from all images, the goal of the matching stage is to find geometrically consistent feature matches between all images through a defined process which is elaborated extensively in the following Section~\ref{imgmatching}. +Matching process extracts a large number of salient (tentative) points, that is considered as noise and have to be eliminated. +Statistically robust methods like RANSAC applies geometric constraints while is able to select the best model in presence of noise, i.e., the outliers. +% This proceeds as follows. +% First, we find a set of candidate feature matches using an approximate nearest neighbour algorithm. Then we are fine matches using an outlier rejection procedure based on the noise statistics of correct/incorrect matches. + + +\section{Keypoint matching principles}\label{imgmatching} + +Feature matching, +%or more generally ``image matching'', +plays an important role in many computer vision +applications such as image registration, camera calibration and object recognition, and denotes the task of establishing correspondences between two images of the same +scene/object. A common approach to image matching consists of detecting a set of interest points each associated with image descriptors from image data. Once the features and their descriptors have been extracted from two or more images, the next step is to establish some preliminary feature matches between these images as illustrated in Fig.~\ref{fig:bf_matches}. +A match between the pair of interest points $(p, q)$ is accepted only if (i) $p$ is the best match for $q$ in relation to all the other points in the first image and (ii) $q$ is the best match for $p$ in relation to all the other points in the second image. In this context, it is very important to devise an efficient algorithm to perform this +matching process as quickly as possible. The nearest-neighbor matching in the feature +space of the image descriptors in Euclidean norm can be used for matching vector-based features. + +However, in practice, the optimal nearest neighbor algorithm and its +parameters depend on the dataset characteristics. Furthermore, to suppress matching candidates for which the correspondence may be regarded as ambiguous, the ratio +between the distances to the nearest and the next nearest image descriptor is required to be less than some threshold. In our case the ratio of 0.75 has been chosen. +The typical solution in the case of our system which establishes matching in a large dataset is to replace the linear search with an approximate matching algorithm that can offer speedups of several +orders of magnitude over the linear search. This is, at the cost that some of the nearest neighbors returned are approximate neighbors, but usually close in distance to the +exact neighbors. +Generally, the performance of matching methods based on interest points depends +on both the properties of the underlying interest points and the choice of associated +image descriptors. Furthermore, selecting a detector and a descriptor that addresses the image degradation +is very important. + +\begin{figure}[htp!] + \centering + \includegraphics[scale=0.3]{attachments/images/ransac/bf_match.jpg} + \caption{Tentative correspondences between SIFT descriptors} + \label{fig:bf_matches} +\end{figure} + +For example, if there is no scale change present, a corner detector +that does not handle scale is highly desirable; while, if image contains a higher level +of distortion, such as scale and rotation, the more computationally intensive SURF feature detector and descriptor is a adequate choice in that case~\cite{hassaballah2016image}. +In the area of feature matching, it must be noticed that the binary descriptors (e.g., FREAK or MSER) are generally faster and typically used for finding point correspondences +between images, but they are less accurate than vector-based descriptors~\cite{figat2014performance}. Statistically robust methods like RANSAC can be used to filter outliers in matched feature +sets while estimating the geometric transformation or fundamental matrix, which is useful in feature matching for image registration and object recognition applications and we will discuss in the next section. + + + +\section{The RANSAC algorithm} +In Figs.~7(a),~7(b) we illustrated a pair of images depicting the same building under two different viewpoints and the sets of keypoints extracted from each. Upon careful investigation of these points, we argue that a human observer may easily identify some point correspondences, i.e., keypoints that are extracted from the same part of the scene, while in most practical cases, partially due to the huge number of features, it is tedious to identify the set of all correspondences. Obviously, in some cases it is not even feasible. + + \begin{figure}[htp!] + \centering + \subfigure[]{\label{fig:view1}\includegraphics[width=60mm]{attachments/images/ransac/view1.jpg}} + \subfigure[]{\label{fig:view2}\includegraphics[width=60mm]{attachments/images/ransac/view2.jpg}} + \caption{Query images depicting the same building from two different viewpoints.} + \label{fig:demo_features} + \end{figure} + +Let us consider the case of slowly moving camera, that continuously captures photos. In this sequence of photos we would observe that a) they +would contain several similar visual features; and b) some of these +features (ideally a large subset) seems to ``move'' in the same way, i.e., they follow the same geometric transformation. We shall call these features will be denoted as the ``inliers'' of the feature set as shown in Fig. \ref{fig:inliers}. On the contrary, the remaining features will be called ``outliers.'' We expect that a pair of images that depict the same building should contain a large number of inliers, while a pair that would depict a different building would contain mainly outliers. We should note that both inlier and outlier pairs are composed by visually similar keypoints. + +\begin{figure}[htp!] + \centering + \includegraphics[scale=0.3]{attachments/pictures/inliers.png} + \caption{Similar visual features which follow the line of the same geometric transformation, denoted as ``inliers''.} + \label{fig:inliers} +\end{figure} + +For the estimation and maximization of the set of inliers, one could apply e.g., a brute force method. However such methods are not computationally efficient in terms of time needed. We choose to use the RANdom SAmple Consensus (RANSAC) algorithm~\cite{fischler1981random}. RANSAC is able to select the best model in presence of noise, i.e., the outliers. Strictly speaking, the best model is selected by a probability $P_R$ (user-defined), which is typically set to be approx. to 1. A small value of $P_R$ leads to smaller processing time, however, the extracted model may not be close to the optimal. A large value of $P_R$ ``guarantees'' that the extracted model should be the optimal or close to the optimal. + +In our case, the model we wish to extract using RANSAC is the geometric transformation between keypoints. +%(thus and of keypoints since they are also pixels of the image). +Thus, inliers are visually matching features between +consecutively images that follow this transformation +%while outliers are visually matching features that do not follow this transformation. +while the remaining of them are considered as outliers. +A homography~\cite{hartley2003multiple} is a perspective transform that maps any given point $x_i$ of a given image to a corresponding point $x_i'$ of another. Given the set of correspondences of points of interest between two consecutive frames, i.e. the pairs $x_i\leftrightarrow x_i'$ , we are able to define the homography matrix $H$ as: $x_i' = H\cdot x_i$. + +The estimation of an image transformation using RANSAC originates from the task of stereoscopic camera calibration~\cite{hartley2003multiple}, where the images captured by two cameras typically (i.e., in the cases of most stereoscopic cameras as shown if Fig. \ref{fig:stereo}) differ only by means of a perspective transform. This quite simple idea has been extended. In our approach, instead of two images taken by a stereoscopic camera system, we consider the case of a single camera system moving slowly. This way approximately the same ``scene'' is captured, however by a slightly different viewpoint. When the variation of the viewpoint is not very high a small number of false correspondences is typically expected. However, in our case, many false correspondences are often introduced due to the equipment, e.g. noise, or the compression, e.g., JPEG artifacts and also similarities on the details between different buildings. + +\begin{figure}[htp!] + \centering + \includegraphics[scale=0.5]{attachments/pictures/stereo.png} + \caption{Images captured by a stereoscopic camera can easily considered as two different views of the same object.} + \label{fig:stereo} +\end{figure} + + +The aforementioned observations come to justify the choice of RANSAC within the herein presented approach. RANSAC is well-known for its ability to correctly extract an estimated model, even in the presence of a large number of outliers (noise). To clarify things, we describe the approach of RANSAC in this work and for a pair of images: + +\begin{enumerate} + \item We select invariant keypoints/regions and extract appropriate + visual descriptors from each image. + \item We then extract a set of visually matching points/regions. These would be referred to as ``tentative matches''. Similarity is calculated using an appropriate distance function for each descriptor and a predefined threshold $T_c$. We attempt to improve quality of matches, by adopting the nearest neighbor ratio strategy~\cite{hartley2003multiple}. This way a tentative match is used only if the nearest neighbors of keypoints in both images also match (i.e., are also tentative matches). We should note herein that the location of these points is not taken into account within this matching strategy + and also that a given point of a frame may match to more than one points of the other frame. + \item After a predefined number of trials, RANSAC selects the largest subset of the aforementioned tentative matches that conform to the same geometric transformation, i.e, to the same homography $H$. At each trial RANSAC randomly selects a quadruplet of points (since homographies may be described using exactly 4 points) and then identifies the tentative matches that support the corresponding transformation. The goal is to select the largest such subset, at the end of the trials. We should note that this subset is optimal with the aforementioned (user-defined) probability $P_R$ and denotes the set of inliers. Thus, the remaining matches are considered to be the outliers. +\end{enumerate} + +This way we exploit the main advantage of the RANSAC algorithm, which is its ability to select the optimal model in the presence of a significantly large number of outliers, without a brute-force (extensive) matching approach. We should emphasize that RANSAC relies a lot on the tentative correspondences that consist of its input and the accuracy of the selected model on the user-defined probability $P_R$. It is common to select a significantly large value of probability $P_R$ , thus leading to results that are as accurate as possible. + +In Fig.~\ref{fig:bf_matches} we illustrate a visual example of the tentative correspondences between the SIFT descriptor of the two images of Fig.~\ref{fig:demo_features}. We should observe that a) their number is significantly larger compared to the number of inliers, illustrated in Fig.~\ref{fig:sift_matches}; and b) the selection of the set of inliers is not a trivial task for the human observer. %however in some cases it is obvious to extract a subset of them. + +Fig.~\ref{fig:sift_matches} illustrates both images from Fig.~\ref{fig:demo_features}, where the centers of the strongest interest features identified as inliers are depicted with green dots.Correspondences between inliers are drawn in purple lines while the rest of the unmatched features are depicted with red marks. Upon careful observation, one should easily spot correspondences between features. + +\begin{figure}[h!] + \centering + \includegraphics[scale=0.4]{attachments/images/ransac/sift_match.jpg} + \caption{Correspondences between the query image (left) and a similar image (right). Local (SIFT) features identified as inliers are depicted in green dots. Correspondences between inliers are drawn in purple lines.} + \label{fig:sift_matches} +\end{figure} diff --git a/isicg_msc/web_platform.tex b/isicg_msc/web_platform.tex new file mode 100644 index 0000000..f1db04b --- /dev/null +++ b/isicg_msc/web_platform.tex @@ -0,0 +1,255 @@ +\chapter{Web Platform}\label{web} +\section{Introduction} + +% In this thesis, apart from the theoretical aspect of image retrieval, we also present a practical aspect by implementing a web platform application, namely \textit{RetBul} (Building Retrieval). +In this chapter we present RetBul (Building Retrieval) a web platform for off-line and on-line image retrieval evaluation purposes.Its goal is twofold: a)to achieve more integrated and coherent results, concerning the experiment and evaluation process and b) to act as a demo application for this work. As soon as, the domain of our research lies on images and image retrieval analysis, we believe that the existence of this system is vital for the system assessment either by a simple user or by an experienced researcher. + +Following in Section~\ref{architecture} we present the Architecture of the web platform, in Section~\ref{retbulmethod} we elaborate with the methodology of the proposed application, closing in Section~\ref{walkthrough} with a short walkthrough of the application. + +% As it has been previously mentioned, we have also implemented a web platform, which shall be described in detail in the next section~\ref{web}. +Using the platform presented in this Chapter and more specifically its ``offline'' section, one may easily reproduce the aforementioned experiments. +To tune the platform so as to increase user experience, in terms of ``optimizing'' the returned results, we have used the aforementioned conclusions of Section~\ref{exp_discussion}, to select descriptor settings and an appropriate value of the inlier threshold, for the case of detection. +% Of course, since experiments have already been offline executed, we chose settings that maximize performance, +Moreover, since the experiments have been performed offline, the settings that maximize performance are applied, although they required more execution time.\\ +On the other hand and for the ``online'' section, i.e., the one that users are allowed to upload their own images, real time experiments take place, thus the faster (in terms of execution time) setting has been adopted. + +\newpage +\section{Architecture}\label{architecture} + +The platform is supported by the IaaS Service \textit{Okeanos}~\cite{okeanos}, which provides a virtual compute and network service as shown in Fig.~60(a). +Infrastructure as a service (IaaS) refers to online services that abstract the user from the details of infrastructure like physical computing resources, location, data partitioning, security etc. + +Okeanos is a GRNET's (Greek Research and Technology Network)\footnote{\url{https://okeanos.grnet.gr/home/}} project, available to the academic and research community, in order to promote academic, educational and research aims. The distributed computing and network resources can be optimized in order to host services or for experimental purposes. + +The virtualization service of Okeanos , is called Cyclades~\cite{cyclades}, where we use our own virtual machine running on Linux Operating System with Debian Server distribution~\cite{debian}, with 2 processing cores on 2Ghz, 6 GBs RAM and 100GB of storage as shown in Fig.~\ref{fig:cyclades}. +\begin{figure} + \centering + \subfigure[]{\label{fig:okeanos}\includegraphics[width=75mm]{attachments/pictures/okeanos.png}} + \subfigure[]{\label{fig:cyclades}\includegraphics[width=75mm]{attachments/pictures/cyclades.png}} + \caption{Okeanos home page and cyclades control panel.} +\end{figure} +Concerning the architectural modules of our platform, core development and image processing services have utilized the well-known \textit{OpenCV}~\cite{bradski2000opencv} framework while web and user interface are implemented with \textit{Laravel}~\cite{otwell2015laravel} MVC framework. Finally, the dataset is stored in a MySql~\cite{mysql2004mysql} relational database. + +OpenCV (Open Source Computer Vision) is a library of programming functions for realtime computer vision. It uses a BSD license and hence it’s free for both academic and commercial use. It has C++, C, Python and Java (Android) interfaces and supports Windows, Linux, Android, iOS and Mac OS. It has more than 2500 optimized algorithms. + +Laravel if a ``full stack'' MVC framework, written in PHP, capable of handling everything from web serving to database management right down to plain HTML generation. Interaction with Laravel is made through a command-line utility that generates and manages the Laravel project environment named \textit{Artisan} that can be used to generate skeleton code and database schema stubs. + +In a more high level of abstraction, Laravel framework follows the Model-View-Controller (MVC) architectural pattern, which enforces a separation between ``business logic'' from the input and presentation logic associated with a graphical user interface (GUI) \ref{fig:mvc}. The MVC pattern is very popular in the web development space and as we mentioned is consisted from the following components: +\begin{itemize} + \item \textit{Model} -- The domain that the software is built around. Models represent real world items such as a + product, person or a dataset in our occasion which is consisted of imageId, imagePath, building class etc. Models are typically permanent and will be stored in a database. + \item \textit{View} -- It is usually the resulting markup that the framework renders to the browser, such as the HTML representing of platform. The view layer is responsible for generating a user interface, normally based on data in the model. + \item \textit{Controller} -- Controller is the component that links the model with view, responsible to handle the user input related to the business logic. It usually performs input process and validation (query image) while it can update or react with model's state (retrieved results according the dataset). It also send commands to its associated view, changing the view's presentation (rendering the results as a web page). +\end{itemize} + +\begin{figure}[h!] + \centering + \includegraphics[scale=0.07]{attachments/pictures/mvc.png} + \caption{MVC architecture components.} + \end{figure} +\label{fig:mvc} + + +\section{Method}\label{retbulmethod} + +\textit{RetBul} offers a friendly user interface while embeds a variety of unique features. The platform is designed to support offline and online interaction with experiments while it can be used as an educational tool to foster the concepts of retrieval and detection.\\ + +In offline mode (Sec.~\ref{offline}), quering a single image, a series of montaged images (query-train image) are retrieved in descending ranking order, according to the number of tentative correspondences (inliers). Platform is designed to distinct the recall and detection oriented results which we elaborate extensively by the next Section~\ref{walkthrough}. +It should be noted that offline results have already been performed using a manually created groundtruth of the total dataset, through the whole database of the 900 images of 60 buildings. + +In online mode (Sec.~\ref{online}), a pair of experiment scenarios can be executed in real time by the user: + \begin{enumerate} + \item Query a pair of images from database, processed in real time, using the selected descriptor, which provides a montaged image of the strongest matching tentative points. + \item Upload or query a user defined image, processed through the best 60 representative building images of the RetBul database. + \end{enumerate} + + +\section{Walkthrough}\label{walkthrough} + +In this section we present a walkthrough of the implemented web platform. All the aforementioned experimental methods can be accessed through the application RetBul and can be found in the following url: \url{http://retbul.sniafas.eu}. +The welcome screen of the online application is illustrated on the Fig.~\ref{fig:retbul}. + +\begin{figure}[h] + \centering + \includegraphics[scale=0.35]{attachments/pictures/retbul.png} + \caption{Retbul home page} + \label{fig:retbul} +\end{figure} + +\newpage + +\subsection{Offline Experiments}\label{offline} + + +As we mentioned before, in offline section, (Fig.~\ref{fig:offline-blade}), a randomly selected image can be selected with the given descriptor method chosen and as a response, a series of montaged images are retrieved and rendered with decreasing similarity, according to: +\begin{itemize} + \item identical building retrieved captures (see Fig.~\ref{fig:offline_class_results}), and + \item total number of buildings retrieved captures. (see Fig.~\ref{fig:offline_total_results}). +\end{itemize} + +Entering the home page, there two ways to enter the offline experiments, either choosing the ``Try now!'' button (Fig.~\ref{fig:trynow}) or just selecting the ``Offline'' from the navigation menu (Fig.~\ref{fig:offlinenav}). + + + +\begin{figure}[ht!] + \centering + \includegraphics[scale=0.35]{attachments/pictures/offline_button.png} + \caption{Navigate to offline experiments from home screen.} + \label{fig:trynow} +\end{figure} + +\begin{figure}[H] + \centering + \includegraphics[scale=0.4]{attachments/pictures/offline_nav.png} + \caption{Navigate to offline experiments through navigation panel.} + \label{fig:offlinenav} +\end{figure} + +The offline section provides a randomly chosen image, the best frontal view of each house. Selecting one of the +available descriptors, and clicking on the ``Go'' button, will trigger the corresponded ``offline'' experinent (Fig.~\ref{fig:offline-blade}). +In this case, the result page will be presented, and in the center of the screen two buttons ``Class Results'' +and ``Total Results'' will trigger the following results. +``Class Results'' represents the detection scenario (Fig.~\ref{fig:offline_class_results}) as only the same building results appear. +``Total Results'' represents the retrieval scenario (Fig.~\ref{fig:offline_total_results}) where the results derive from the total number of buildings. + + +According to the ``Total Results'', only the images that are matched with at least $8$ inlier correspondences are presented to the user, in order to preserve coherence and consistency. + + + +\begin{figure}[ht!] + \centering + \includegraphics[scale=0.35]{attachments/pictures/offline-blade.png} + \caption{Offline section. Images shown are randomly selected from entire dataset.} + \label{fig:offline-blade} +\end{figure} + + +\begin{figure}[htp!] + \centering + \includegraphics[scale=0.35]{attachments/pictures/offline_class_results.png} + \caption{Offline Section. Identical building retrieved images.} + \label{fig:offline_class_results} + +\end{figure} + +\newpage + +\begin{figure}[htp!] + \centering + \includegraphics[scale=0.35]{attachments/pictures/offline_total_results.png} + \caption{Offline Section. Total number of retrieved images in all database buildings.} + \label{fig:offline_total_results} +\end{figure} + +\subsection{Online Experiments}\label{online} + +In the online section, interaction with the web platform takes place in real-time through a variety of available experiments. +Fig.~\ref{fig:online_pair} illustrates the panel of online image pair experiment. +Pair experiment enables a random user defined experiment through any image of the Vyronas database. +Platform will validate when a pair is selected and executes the query with a chosen descriptor, i.e. SIFT or SURF. +Results are shown in Fig.~\ref{fig:online_pair_features} that depicts the extracted features while in Fig.~\ref{fig:online_pair_matching} the estimated correspondences of the selected images are illustrated. + + + +\begin{figure}[htp!] + \centering + \includegraphics[scale=0.35]{attachments/pictures/online_pair.png} + \caption{Online Section. Selecting a pair of images.} + \label{fig:online_pair} +\end{figure} + +\newpage + +\begin{figure}[htp!] + \centering + \includegraphics[scale=0.5]{attachments/pictures/online_pair_features.png} + \caption{Online Section. Extracted features from pair experiment.} + \label{fig:online_pair_features} +\end{figure} + +\begin{figure}[ht!] + \centering + \includegraphics[scale=0.5]{attachments/pictures/online_pair_matching.png} + \caption{Online Section. Estimated homography in pair experiment.} + \label{fig:online_pair_matching} +\end{figure} + +Alternatively, to enable a more challenging to the user experience, RetBul is capable to handle +experiments from uploaded images. +In order to get proper results, the uploaded query image should also be in portrait orientation and up to 2Mb, in order to match the ground truth database orientation. + +The afforemented use case is part of the online experiment, where the user selects an uploaded image along with a descriptor method as shown in Fig.~\ref{fig:online_upload}. +Next, query image will be trained with the handpicked set of $60$ photos, the +frontal faces of all buildings in our database as we mentioned in Sections~\ref{exp_discussion} and~\ref{evaluation}. + +Fig.~\ref{fig:online_results} illustrates the retrieved images +%with the estimated homographies in decreasing ranking +along with the matched keypoints, ranked in decreasing order +according to the number of inliers. Along with the rendered results, a table showing the +calculated results is also provided, as shown in Fig.~\ref{fig:online_table} + + +\begin{figure}[ht!] + \centering + \includegraphics[scale=0.4]{attachments/pictures/online-blade-upload.png} + \caption{Online Section. Select or upload user defined visual queries.} + \label{fig:online_upload} +\end{figure} + +\begin{figure}[H] + \centering + \includegraphics[scale=0.5]{attachments/pictures/online_table.png} + \caption{Online Section. Table with calculated results.} + \label{fig:online_table} +\end{figure} + +\newpage +\begin{figure}[ht!] + \centering + \includegraphics[scale=0.4]{attachments/pictures/online_total_results.png} + \caption{Online Section. Ranked results according to the number of inliers. The blue lines correspond to matched keypoints.} + \label{fig:online_results} +\end{figure} + + +Closing, we presenting a demonstration with a fine result in the contrast to a poor result for the Detection and Retrieval scenarios. + +\textbf{\large{Detection Scenario}}\\ +Fig.~\ref{fig:poor_det} demonstrates an example with poor detection performance. Specifically, the first 9 results are shown, from which only the first 6 have a number of inliers greater than or equal to 9. In contrast, Fig.~\ref{fig:fine_det} depicts a case with good performance where all the top 9 retrieved images are characterized by a high number of inliers. + + +%a poor result is depicting a sample of the 9 first retrieved results. +%Only the first 6 can be proven liable with inliers equals to 9, while the rest 3 results are equal to 8 inliers.\\ +%In the contrary the sample of Fig.~\ref{fig:fine_det} depicts a fine results of retrieved buildings, +%while till the latest can be considered liable. + +\begin{figure}[ht!] + \centering + \includegraphics[scale=0.4]{attachments/pictures/poor_det.png} + \caption{Demonstation of a detection scenario with poor results.} + \label{fig:poor_det} +\end{figure} +\begin{figure}[H] + \centering + \includegraphics[scale=0.4]{attachments/pictures/fine_det.png} + \caption{Demonstation of detection scenario with fine results.} + \label{fig:fine_det} +\end{figure} + +\textbf{\large{Retrieval Scenario}}\\ +Fig.~\ref{fig:poor_ret} with the previous building,has proven also poor results in retrieval scenario, +where only the first 4 buildings are similar to the query.In the contrary the sample of Fig.~\ref{fig:fine_ret} depicts a fine result of retrieved buildings, where all the first 9 retrieved buildings are similar to the query. + +\begin{figure}[H] + \centering + \includegraphics[scale=0.4]{attachments/pictures/poor_ret.png} + \caption{Demonstation of a retrieval scenario with poor results.} + \label{fig:poor_ret} +\end{figure} +\begin{figure}[ht!] + \centering + \includegraphics[scale=0.4]{attachments/pictures/fine_ret.png} + \caption{Demonstation of a retrieval scenario with fine results.} + \label{fig:fine_ret} +\end{figure}