pgr_refine.tex

\section{Refinement}
We discuss three refinements for our proposed design.
\subsection{Merge Intermediate Results}\label{sec:merge}
As described in Section \ref{sec:utilize_cache}, computation servers maintain a list
of intermediate results they have computed.  Therefore, a computation server is
able to merge these intermediate results by merging their names.  A data
requester can send an {\sc interest} with the merged name, whose {\sc data}
packet has not been cached.  The {\sc interest} will eventually reach the
computation server, who can fetch these unmerged intermediate results from cache
and combine them into a merged intermediate results.  If some parts of
intermediate results have no longer existed in cache, the computation server can
re-compute them.  Once a merged name replace unmerged ones, these unmerged
intermediate results will never be requested any more, and they will be ejected
from cache when their TTL expires or their spaces must be cleaned to cache new
data.  By merging intermediate results, we can always keep cache entries at a
small scale.
\subsection{Robustness \& Load Balance} \label{sec:robust}
An computation server will become an single point failure if it is the only one
server assigned for one operation.  Besides this, if the operation is
computation-intensive or very popular among applications, the computation server
will be overloaded easily.  Therefore, it is desired to assign multiple
computation servers for one operation.  To achieve that, all computation servers
should announce the same prefix for the operation they are assigned.  

Given an {\sc interest}, which computation server will receive the {\sc
  interest} is up to NDN routers' decision.  As a result, a computation server
may not know the computation results of another computation server performing
the same operation.  This would affect the correctness of metadata fetching
described in Section \ref{sec:utilize_cache}.  To avoid this problem, we use
SYNC, a NDN communication protocol which can synchronize name sets of
multi-parties, to synchronize name sets of intermediate results of different
computation server for one operation. 

Locality information can be inserted into the name to improve the routing efficiency. 

\subsection{Storage Management} \label{sec:gg}
It is very important for our NDN-based solution to remove useless intermediate results from cache while keeping the most popular intermediate results in cache, in order to save storage resources and increase effectiveness of cache.  
In our solution, we adopt a soft state storage management. 
Two different types of storages are used.
The first one is non-persistent storage which is supported by the cache in NDN routers and the local storage of computation servers.
When {\sc data} traverse the NDN network, they are cached in NDN routers. 
They can be ejected out of cache only in two cases:
1) the {\it data} has to be ejected to save spaces for other data;
2) the {\it data} has expired.
When Least Frequently Used policy is used, the first case usually suggests that the {\sc data} is not popular, and should be removed.
In other word, the life of the {\sc data} is determined by its popularity.
In the second case, the TTL of {\sc data} determines its life time. 
As we mentioned before, we can set TTL of {\sc data} according to the popularity statistics maintained by computation servers. 
In both cases, non-persistent storage in NDN router reflects the data's popularity.

Unlike cache in NDN router, we use the local storage of computation servers to reflect the freshness of data. 
A window-based storage is used to guarantee that only the most recent data is stored.
The length of window is longer than the TTL of cached data, so that in case when data are ejected out of cache, 
the {\sc interest} can be eventually satisfied by the local storage of computation server without recomputation.
If there is no hit in the local storage of computation server, re-computation will be triggered.

Another type of storage, persistent storage, is used for long-term popular data.
Repo in NDN provides such a facility. 
Since computation servers have full information about the popularities, life time, and size of data, it can decide which data should be stored in Repo. 
Once data is stored in Repo, it will stay there forever until it is removed explicitly by computation servers.