Merge pull request #88 from CogSciUOS/ToBeMaster

updated report to final version
CogSciUOS · Nov 27, 2020 · 1362b4b · 1362b4b
2 parents cc5fa0a + 1bd7dbc
commit 1362b4b
Show file tree

Hide file tree

Showing 10 changed files with 516 additions and 444 deletions.
diff --git a/documentation/report/Appendices/Appendix_B.tex b/documentation/report/Appendices/Appendix_B.tex
@@ -12,7 +12,18 @@ \subsection{Supplementary Information}
 \subsubsection{File moving service}
 \label{subsec:FileService}
 
-In this section it is in detail described how the service of filemoving is constructed. As Windows is used as the operating system of the sorting machine, the development was done with the .NET framework\footnote{For further information on the .NET framework, see \url{https://docs.microsoft.com/en-us/dotnet/framework/get-started/overview}} in the programming language C\#. The package provided is called Topshelf.\footnote{For further information on Topshelf, see \url{https://github.com/Topshelf/Topshelf}} Topshelf is a service hosting framework for building Windows services using .NET. With the package, it is possible to develop a console application in the development phase, compile it as a service, and install it later via the console. Previously, it was not possible to debug services during the development phase. The function of the service is based on the FileSystemWatcher object from the System.IO namespace.\footnote{For further information on the SystemFileWatcher, see \url{https://docs.microsoft.com/en-us/dotnet/api/system.io.filesystemwatcher?view=netframework-4.8}} In the main program, a list of files in the source folder is kept. Files that are older than one hour are moved to the target folder on the external drive. The selected files are moved by a function that is called, when an event is triggered. The event is triggered by the FileSystemWatcher after subscribing to different flags. Shortly after initialization, the service was adjusted because removing the images from the C disk straight away caused the sorting program to stop. The problem is solved by keeping the most recent 1000 images and moving older images to the external disk.
+In this section it is in detail described how the service of filemoving is constructed. As Windows is used as the operating system of the sorting machine, the development was done with the .NET framework\footnote{For further information on the .NET framework, see \url{https://docs.microsoft.com/en-us/dotnet/framework/get-started/overview} (visited on 04/24/2020)} in the programming language C\#. The package provided is called Topshelf.\footnote{For further information on Topshelf, see \url{https://github.com/Topshelf/Topshelf} (visited on 04/24/2020)} Topshelf is a service hosting framework for building Windows services using .NET. With the package, it is possible to develop a console application in the development phase, compile it as a service, and install it later via the console. Previously, it was not possible to debug services during the development phase. The function of the service is based on the FileSystemWatcher object from the System.IO namespace.\footnote{For further information on the SystemFileWatcher, see \url{https://docs.microsoft.com/en-us/dotnet/api/system.io.filesystemwatcher?view=netframework-4.8} (visited on 04/24/2020)} In the main program, a list of files in the source folder is kept. Files that are older than one hour are moved to the target folder on the external drive. The selected files are moved by a function that is called, when an event is triggered. The event is triggered by the FileSystemWatcher after subscribing to different flags. Shortly after initialization, the service was adjusted because removing the images from the C disk straight away caused the sorting program to stop. The problem is solved by keeping the most recent 1000 images and moving older images to the external disk.
+
+\subsubsection{Benefits of Data Set Creation with Tensorflow}
+\label{subsec:BenefitsDataSet}
+
+In the following, TensorFlow's own binary storage format \texttt{TFRecord} is introduced. Motivated by faster learning times and avoiding loss of information, and in accordance with the large amount of data collected, this format was chosen.
+
+The file format is optimized for images and text data. These are stored in tuples which always consist of file and label. In our case, the difference in reading time is significant, because the data is stored in the network and not on a SSD on the local PC. The serialized file format allows the data to be streamed through the network efficiently. Therefore, this storage format facilitates the mix and match of data sets and network architectures. Another advantage is that the file is transportable over several systems. 
+
+Working with these files simplifies the next steps of image transformations. With the \mbox{\texttt{tf.data}} API complex, input pipelines from simple and reusable components are created, even for large data sets. The preferred pipeline for our asparagus project can apply complex transformations to the images and combine them into stacks for training, testing and validating in arbitrary ratios. A data set can be changed, e.g.\ by using different labels or by transformations like mapping, repeating, batching, and many others.
+
+Besides the described functional transformations of the input pipeline under \mbox{\texttt{tf.data.dataset}}, an iterator gives sequential access to the elements in the data set. The iterator stays at the current position and allows to call the next element as a tuple of tensors. Initializable iterators go through the data set in parallel. In addition, different parameters are passed to start the call. This is especially handy when searching for the right parameters in parallel.
 
 \newpage