-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathoutro.tex
41 lines (27 loc) · 9.31 KB
/
outro.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
This thesis analysed research repositories, by first presenting the state of the general scientific endeavour, namely its challenges (e.g., the reproducbility crisis) and new workflows (e.g., \glsfirst{oa} or the emergence of the \glsfirst{ntro}). Chapter \ref{ch:evolution} considers the \glsfirst{ir} as a starting point and outlines how this type of digital library needs to evolve in order to adapt to the new realities, either by implementing new functionality, or by adopting a distributed model in which distinct systems interoperate, in order to achieve the common vision of the scientific community, of research repositories able to preserve, disseminate and link any type output, no matter its structural or semantic characteristics. A detailed description of a newer type of repository, focused on research data, is included in the same chapter; examining both types of repositories provides an overview of both new requirements and existing features for novel repository solutions.
The new generation of repositories is defined by the implementation of complex and innovative workflows and thus requires both ingenuity but also inspiration from other domains in order to develop the requisite functionality. Software engineering is one such domain and in this work three original contributions employing concepts from this discipline are presented:
\begin{enumerate}
\item In Chapter \ref{ch:blockchain} a research data licencing and reuse monitoring system implemented using blockchains is presented. This system uses smart contracts in order to formalise the authoring, publication, dissemination and reuse of datasets, ensuring both their fair usage and that the original authors receive credit for the outputs. The motivation for such a system stems from the desire to reconcile the perceived risks in data sharing activities (e.g., conditioning of academic success on publication volume and impact) and from the need for accelerated dissemination of research. While the practical implementation of such a contract, using technologies such as Ethereum and Solidity, is not trivial, repositories could easily implement the underlying workflow in a user friendly manner, with minimal impact on the computing and administrative resources.
\item Chapter \ref{ch:rdf} presents a novel bibliographical data model based on \glsfirst{rdf}, characterised by flexibility and facilitation of interoperability. It considers an existing repository solution, Figshare, for which an \gls{rdf} system is devised in order to replace the current relational database model, enhancing both its support for both standard bibliographic record schemas and custom user-defined metadata fields. Moreover, it proposes a workflow based on \glsfirst{xslt}, which can provide extensive flexibility in terms of dissemination output formats.
\item Chapter \ref{ch:migration} introduces SLAM, the Stateful Library Analysis and Migration framework, an \glsfirst{etl} pipeline for migrating bibliographical records across repositories. This type of tool is of utmost importance in the context of replacing outdated systems with new repository solutions, endeavour which requires the transfer of all existing bibliographic records with no data loss and with minimal disruption to end users. For this, SLAM employs a metadata analysis module implemented using the Elasticsearch stack, a state machine for recording and replaying migration steps, and a modular architecture which allows connecting to various repositories and other external systems. This implementation was validated by running five distinct migrations over the course of fourteen months.
\end{enumerate}
Summarising, this thesis brings the following four main contributions:
\begin{itemize}
\item Provides an in-depth exploration of the digital libraries ecosystem, focusing on institutional and data repositories, and investigating the requirements and implementation of the next generation of solutions.
\item Proposes a solution for managing the licencing and reuse of repository records, implemented using smart contracts and blockchain technologies.
\item Presents a mean of advancing the architecture of an existing repository solution, Figshare, that employs \gls{rdf} in order to implement a more flexible and interoperable system.
\item Introduces an \gls{etl} framework for migrating bibliographic records across repositories, which leverages software engineering patterns and processes in order to deliver a reliable and feature-rich tool to repository administrators.
\end{itemize}
All research results and other contents of the thesis have been included in the following peer-reviewed publications authored by the candidate:
\begin{enumerate}
\item Adrian-Tudor P\u{a}nescu, Tibor \v{S}imko and Christine Vanoirbeek. ``Targeted Annotation of Scientific Literature and Data Resources in Invenio Digital Libraries''. In: \emph{Proceedings of Open Repositories} (2014). \texttt{URL}: \url{http://hdl.handle.net/10024/97585}.
\item Adrian-Tudor P\u{a}nescu and Vasile Manta. ``Current Issues In Research Output Management''. In: \emph{Buletinul Institutului Politehnic din Ia\c{s}i, Automatic Control and Computer Science Section} (Dec. 2016).
\item Adrian-Tudor P\u{a}nescu and Vasile Manta. ``RDF-based workflows for the figshare research data repository''. In: \emph{Proceedings of the 21st International Conference on System Theory, Control and Computing (ICSTCC)} (2017), pp. 860--865. \texttt{DOI}: \href{https://doi.org/10.1109/ICSTCC.2017.8107145}{10.1109/ICSTCC.2017.8107145}.
\item Adrian-Tudor P\u{a}nescu and Vasile Manta. ``Smart Contracts for Research Data Rights Management over the Ethereum Blockchain Network''. In: \emph{Science \& Technology Libraries} 37.3 (Jul. 2018), pp. 235--245.\\\texttt{DOI}: \href{https://doi.org/10.1080/0194262X.2018.1474838}{10.1080/0194262X.2018.1474838}.
\item Christopher Frederick Isambard Blumzon and Adrian-Tudor P\u{a}nescu. ``Data Storage''. In \emph{Good Research Practice in Non-Clinical Pharmacology and\\Biomedicine}. Ed. by Anton Bespalov, Martin C. Michel, and Thomas Steckler. 2020. pp. 277--297. \texttt{DOI}: \href{https://doi.org/10.1007/164\_2019\_288}{10.1007/164\_2019\_288}.
\item Adrian-Tudor P\u{a}nescu, Teodora-Elena Grosu and Vasile Manta. ``SLAM: An ETL System for Performing Digital Library Migrations''. Accepted for publication in \emph{Information Technology and Libraries}. \texttt{URL}: \url{https://ejournals.bc.edu/index.php/ital}.
\end{enumerate}
When considering future research directions it is important to note that the results in this thesis constitute only a few of the building blocks that new repository solutions will require; at the same time, apart from the knowledge transfer from other domains, these results also conjecture a possibly distributed architecture for repositories, similar to the design of \emph{microservices}. Such a structure could ensure that each functionality provided by a repository is handled by the party best suited for it, both at the technological and human resource levels.
Moreover, the results in this paper only scratch the surface of what certain novel technologies can achieve in the world of research repositories. In \cite{dsbc} the authors list five different areas in which blockchain technologies could be adapted to solve repository issues, from which Chapter \ref{ch:blockchain} tackles only one, while linked data solutions, similar to the one presented in Chapter \ref{ch:rdf}, display unlimited potential in ensuring unhindered (meta)data flow between the various systems in the research ecosystem (for a review of its complexity see \cite{101}).
This work cannot be in any way prescriptive on the design of the next generation of research repositories. For example, the recent \gls{covid} epidemic has made the case for the fast publication, dissemination and review of science\cite{cochran}, which might come in direct contradiction with the more intricate workflows presented in this thesis. At the same time, the fact that current \glspl{ir} need to evolve in order to support \glspl{ntro} has become an accepted truth and thus, various options for either transforming existing solutions or building new ones need to be considered.
To conclude, it is of utmost importance that research on repositories is carried on and that this research is fully connected to the procedural and social realities of the scientific enterprise. The reproducibility crisis and the rising impact of preprints demonstrated that repositories can no longer act as an auxiliary component of the research life cycle but as an integral part of it, providing a stage for the dissemination of science and a platform upon which both \emph{humans} and \emph{machines} can discover new ways of interacting with content, as envisioned by the \gls{fair} principles. Since the publication of the first scientific article by the Royal Society in 1665, the open science movement is one of the most important occurrences in the world of research and the repository, as a system, has a real opportunity of becoming the central component in the technological infrastructure required for achieving its goals. Moreover, as a subset of libraries, repositories play a key role in preserving an integral part of a civilisation's heritage, scientific research, and thus need a high degree of attention when analysing the technological means for properly achieving their functions.