test

bruinfish · bruinfish · commit 73dcbe16f463 · 2012-06-08T14:04:46.000-07:00
diff --git a/pgr_conclusion.tex b/pgr_conclusion.tex
@@ -1 +1,16 @@
+\section{Comparison}
+\subsection{Data Naming}
+In Nectar, since the rewriting procedure is done in compiling period, so the naming of IR is mainly language specific. However, leveraging the feature of named data network, in our system, IRs are claimed and retrieved by name, a plaintext string with the format universally known by all servers. The name is independent from programming language. For example, one server’s intermediate result computed via Java could be fetched by another server running C\# through a name. And therefore our solution provides flexibility in server’s language selection during collaboration.
+\subsection{Program Decomposing}
+In Nectar, the rewriting will be executed only once when the computation request created. However, in our system, the program decomposing will be done multiple times, accompanying with the step-by-step interest name splitting.  Every time the computing task is trimmed and “bounce” to another server, program decomposing is executed. Our step-by-step solution provides also the flexibility in program decomposing as the servers in every step could dynamically and flexibly change the way to further decompose the task. Such flexibility could let the intermediate server receiving such task being able to decide further execution according to the immediate computing, network and other situation with a better granularity. For example, if the request comes to a server after several steps, and the server detects at that time point, there is a computation running on its job list, which will generate the data needed within a very short time. In this case, depending on the delay estimation algorithm comparing between waiting locally and further request in the network, the server could utilize the flexibility our NDN solution provides to improve the system performance.
+\subsection{IRs Matching}
+In Nectar, the rewriting procedure will completely execute locally. The requesting server will determine what to ask and a centralized server will help the server determine who to ask. First, such centralized solution might face single-point failure problem and centralize the workload on the requesting server and the server keeping fingerprints. Second, related to the program decomposing part, it is the intermediate server who knows what kind of intermediate it could provide better. In nectar, the intermediate server provides such knowledge by registering fingerprint in centralized server. However, considering the complicated real life computing situation, for example, the highly time-intensive computation, it is very likely that the delay in such reporting will cause a miss hit of intermediate result reusing. Such kind of problem is caused by the mismatching between the knowledge the request server could get and the fact of server providing the intermediate result, and therefore our solution as letting the server providing the intermediate result itself determine the IR matching problem could provide better performance. Only the one providing the service could know what kind of service it could provide better.
+\subsection{IRs Granularity}
+According to Nectar design, the claim, as in Nectar’s language, AddEntry is determined during the compilation period. At that stage, the granularity of each data unit, as indexed by one fingerprint, is determined and could never be changed again. However, in our design, as there is no two separated entities, as in Nectar, fingerprint keeping server and computation server, the computation server is responsible for keeping the intermediate result, and therefore could more freely manage the intermediate result according to the dynamic situation, and therefore provides better, or more precise, more suitable granularity of intermediate result without extra communication cost between two separated entities in Nectar, if they want to improve their solution about this issue. We keep the functionality one and therefore reduce mismatching problem of two entities, or the communication overhead if two entities solution trying to tackle this issue. For example, if the computation server detects that most of the requests it received asking for data ranging from 1-300, and former the data granularity it kept is 100, as 1-100, 101-200, 201-300, in this situation it could aggregate its data entry and provide one entry with bigger granularity as 1-300. As a result, by doing this, from a viewpoint of performance, it could reduce the communication steps. As if we do no such aggregation, for each 1-300 request, the computation server should return a name list consisting of three names as 1-100, 101-200, 201-300 and the requester then has to send interest and fetch data one-by-one and finally do the aggregation locally. Another example, if the computation server detects the frequently requested granularity is smaller than the entry it kept, if the entry splitting cost is lower enough, compared with the benefit to do so, then the computation server could do that and also improve the performance.
+
+\subsection{IRs Storage}
+In Nectar’s solution, the storage issue is tackled by adding an extra mechanism as garbage collector running on the fingerprint server, keeping the hard state. However, in our NDN solution, all intermediate results are either kept in the data generator server or in the cache in the network router. The problem of nectar as, in order to provide better performance and avoid single-point failure, it has to keep multiple copy in the network, and in this kind of solution, synchronization is always the problem when trying to modify the data copy as garbage collection in this case. In our NDN solution, we utilize the nature cache service in NDN network and introduce no extra infrastructure or software service. When a data is no longer needed, the soft state in every router will just smoothly get broken as the data piece meets its expiration lifetime. Also, the computation server, as the actually owner of such data, could freely determine whether or not, taking the re-computation cost, estimated by considering a better granularity and more dimensions, like real-time parameters.
+
 \section{Conclusion}
+Through this work, we propose an intermediate computation result sharing mechanism based on named data network in data center network scenario. Different from Nectar in which the system forms the intermediate computing data unit in the initial compile period and extract the computation data from the data generator and manage all the intermediate result in an extra, special and centralized server, our system ask two questions to Nectar solution, as first, why form the intermediate result as early as in the compiling stage, but however such data will be used later on in other’s unpredictable execution stage. Second, why make 1 into 2, that is, extracting the intermediate results from their generators, who have the most complete knowledge about the data, including the time cost and other more complex property, like real-time related prosperities, without very convincible benefits. So on the contrast, we tackle these two problems in an opposite way through nature of named data network. In our current work, we mainly focus on how to decompose the intermediate results and how to address them. In future we will turn our focus into how to store and fetch them considering better utilizing NDN infrastructure like router cache. Further more, experiment and evaluation should be executed to see the performance improvement of our system in the real data center.
+
diff --git a/pgr_nectar.tex b/pgr_nectar.tex
@@ -24,9 +24,9 @@ \section{Nectar's solutions}
 First, in order for rewriter to rewrite the programs, the
 programs must be written in the certain language that is
 understandable to the rewriter. In fact, Nectar can only support the programs that were
-written in the C# while many programs ran in data centers do not
-adopt C# or Microsoft's DryadLINQ. Also, as mentioned in the Nectar's paper,
-even if a program is written in the C#, if it invokes any external library written in
+written in the C\# while many programs ran in data centers do not
+adopt C\# or Microsoft's DryadLINQ. Also, as mentioned in the Nectar's paper,
+even if a program is written in the C\#, if it invokes any external library written in
 other languages, these parts of computations cannot benefit
 from the intermediate results because rewriter cannot understand them. 
 
diff --git a/pgr_related.tex b/pgr_related.tex
@@ -1 +1,48 @@
 \section{Related Work}
+In data center related work under traditional IP network, there are quite a few
+existing work focusing on reducing redundant computations via caching, like
+DryadInc \cite{Isard:2007:DDD:1272996.1273005}, Coment
+\cite{He:2010:CBS:1807128.1807139}, the stateful bulk processing system
+\cite{Logothetis:2010:SBP:1807128.1807138} and Nectar
+\cite{gunda2010nectar}. And among them we choose Nectar as our main reference
+and comparison target since it attempts to provide a more comprehensive solution
+to the problem of automatic management of data and computation in data
+center. In order to make direct and straightforward comparison, we borrowed the
+data center use cases as incremental computing and sub-computation directly from
+Nectar, attempting to prove that NDN solution could provide better solution than
+Nectar, which to us, by somehow represents the existing most centralized special
+server based solution under traditional IP network.  In NDN network, or even
+content centric network as a wider scope, currently no existing working
+published aiming at such solution for data center network scenario. The most
+related work, if we just consider the appearance data center network scenario in
+CCN related research, is in \cite{lee2010greening} where the author use data
+center as one CCN use case to illustrate the energy saving performance of
+CCN. Obviously, from the motivation to the evaluation, our work is totally
+different from their work.  Currently the main concern of the community is still
+in some fundamental problems such as general naming mechanism, data security,
+routing, rather than such quite specific problem in specific scenario as data
+center here. Good part of this is that our work could be relatively fresh in
+ideas, but the negative part is that since plenty of more fundamental problem
+existing to address, as well as the potential and future of NDN is unknown, our
+work which focus on the quite particular data center intermediate result sharing
+problem is like a building built on the unstable ground. Therefore the
+contribution and value of our work quite largely depends on the success of
+NDN. However, even through NDN were finally proved as unsuitable for the general
+whole internet, for the individual data center, which is relatively small in
+scale and centralized controlled by each companies or organizations, it still
+could be promising if we could prove our NDN solution with attractive
+improvement of performance compared with traditional IP – special servers based
+solution. Since previous examples like optical circuit switches are getting more
+popular and accepted, we do have the reason to believe that our work could have
+quite much contribution if the solution could be enough promising, even if NDN
+lost the bigger campaign.  About content centric network naming, which is one of
+the key issues we want to address under the scenario, there’s quite a few
+related works talking about the structures and rules of naming, such as
+\cite{ghodsi2011naming} or \cite{primes}, but their main motivation is for
+security or scalability under the scenario of the whole internet, which is very
+different with ours.  There are also some existing studies on CCN caching. For
+example, in \cite{carofiglio2011modeling}, the author developed an analytical
+model for the performance evaluation of content transfer in CCN that allows
+explicit characterization of steady state dynamics. Our work could draw
+inspiration from this work in the network topology design part and therefore
+design a better topology and caching strategy.
diff --git a/progress_report.tex b/progress_report.tex
@@ -19,5 +19,6 @@
 \input{pgr_refine}
 \input{pgr_related}
 \input{pgr_conclusion}
-
+\bibliographystyle{abbrv}
+\bibliography{proposal}
 \end{document}