Skip to content

Commit 5c344bc

Browse files
committed
Slide update
1 parent 7278504 commit 5c344bc

File tree

3 files changed

+496
-14
lines changed

3 files changed

+496
-14
lines changed

index.Rmd

Lines changed: 133 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -512,18 +512,149 @@ Like MPI (BSP) on unstructured graphs!
512512
*** =right
513513
![](assets/img/graphx.png)
514514

515+
--- &twocol
516+
517+
## Spark Graphs - GraphX
518+
519+
*** =left
520+
521+
Maddeningly, graph algorithms are not yet fully available from Python - in particular, Pregel.
522+
523+
Can try to mock communications up along edges of an unstructured mesh, but unbelievably slow.
524+
525+
Still, gives a hint what's possible.
526+
527+
*** =right
528+
![](assets/img/unstructured-mesh.png)
529+
530+
Notebook: Spark 3 - Unstructured Mesh
531+
532+
---
533+
534+
## Spark's Capabilities
535+
536+
For data analysis, Spark is already there - like parallel R without the headaches and with a growing level of packages.
537+
538+
Lots of typical statstics + machine learning.
539+
540+
For traditional high performance computing, seems a little funny so far: Scalapack-style distributed block matricies are there, with things like PCA, but not linear solves!
541+
542+
Graph support will enable a lot of really interesting applications (Spark 2.x - this year?)
543+
544+
Very easy to set up a local Spark install on your laptop.
545+
546+
---
547+
548+
## Spark Cons
549+
550+
JVM Based (Scala) means C/Python interoperability always fraught.
551+
552+
Not much support for high-performance interconnects (although that's coming from third parties - [HiBD group at OSU](http://hibd.cse.ohio-state.edu))
553+
554+
Very little explicit support for multicore yet, which leaves some performance on the ground.
555+
515556
--- .segue .dark
516557

517558
## Tensorflow: http://tensorflow.org
518559

560+
--- &twocol
561+
562+
## A Quick Intro to TensorFlow
563+
564+
*** =left
565+
566+
TensorFlow is an open-source dataflow for numerical computation with dataflow graphs, where the data is always in the form of tensors (n-d arrays).
567+
568+
From Google, who uses it for machine learning.
569+
570+
Heavy number crunching, can use GPUs or CPUs, and will distribute tasks of a complex workflow across resources.
571+
572+
(Current version only has initial support for distributed; taking longer to de-google the distributed part than anticipated)
573+
574+
*** =right
575+
![](assets/img/tensors_flowing.gif)
576+
577+
http://www.tensorflow.org
578+
579+
--- &twocol
580+
581+
## TensorFlow Graphs
582+
583+
*** =left
584+
585+
As an example of how a computation is set up, here is a linear regression example.
586+
587+
Linear regression is already built in, and doesn't need to be iterative, but this example is quite general and shows how it works.
588+
589+
Variables are explicitly introduced to the TensorFlow runtime, and a series of transformations on the
590+
variables are defined.
591+
592+
When the entire flowgraph is set up, the system can be run.
593+
594+
The integration of tensorflow tensors and numpy arrays is very nice.
595+
596+
597+
*** =right
598+
![](assets/img/tf_regression_code.png)
599+
![](assets/img/tf_regression_fit.png)
600+
601+
--- &twocol
602+
603+
## TensorFlow Mandelbrot
604+
605+
*** =left
606+
607+
All sorts of computations on regular arrays can be performed.
608+
609+
Some computations can be split across GPUs, or (eventually) even nodes.
610+
611+
All are multi-threaded.
612+
613+
*** =right
614+
![](assets/img/tf_mandelbrot.png)
615+
616+
--- &twocol
617+
618+
## TensorFlow Wave Equation
619+
620+
*** =left
621+
622+
All sorts of computations on regular arrays can be performed.
623+
624+
Some computations can be split across GPUs, or (eventually) even nodes.
625+
626+
All are multi-threaded.
627+
628+
*** =right
629+
![](assets/img/tf_wave_eqn.png)
630+
631+
519632
---
520633

521-
## An intro to TensorFlow
634+
## TensorFlow: Caveats
635+
636+
* Tensors only means limited support for, eg, unstructured meshes, hash tables (bioinformatics)
637+
* Distribution of work remains limited and manual (but is expected to improve - Google uses this)
638+
639+
## TensorFlow: Pros
640+
641+
* C++ - interfacing is much simpler than
642+
* Fast
643+
* GPU, CPU support, not unreasonble to expect Phi support shortly
644+
* Great for data processing, image processing, or computations on n-d arrays
522645

523646
--- .segue .dark
524647

525648
## Conclusions
526649

527650
---
528651

529-
foo bar baz
652+
## Building an Execution Plan for a Better Tomorrow
653+
654+
All of the approaches we've seen implicitly or explicitly constructed dataflow graphs to describe where data needs to move.
655+
656+
Then can build optimization on top of that to improve data flow, movement; optimization often leaves room for improvement.
657+
658+
These approaches are extremely promising, and already completely useable at scale for some sorts of tasks.
659+
660+
None will replace MPI yet, but any have the opportunity to make some work much more productive, and reduce time-to-science

0 commit comments

Comments
 (0)