You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: index.Rmd
+133-2Lines changed: 133 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -512,18 +512,149 @@ Like MPI (BSP) on unstructured graphs!
512
512
*** =right
513
513

514
514
515
+
--- &twocol
516
+
517
+
## Spark Graphs - GraphX
518
+
519
+
*** =left
520
+
521
+
Maddeningly, graph algorithms are not yet fully available from Python - in particular, Pregel.
522
+
523
+
Can try to mock communications up along edges of an unstructured mesh, but unbelievably slow.
524
+
525
+
Still, gives a hint what's possible.
526
+
527
+
*** =right
528
+

529
+
530
+
Notebook: Spark 3 - Unstructured Mesh
531
+
532
+
---
533
+
534
+
## Spark's Capabilities
535
+
536
+
For data analysis, Spark is already there - like parallel R without the headaches and with a growing level of packages.
537
+
538
+
Lots of typical statstics + machine learning.
539
+
540
+
For traditional high performance computing, seems a little funny so far: Scalapack-style distributed block matricies are there, with things like PCA, but not linear solves!
541
+
542
+
Graph support will enable a lot of really interesting applications (Spark 2.x - this year?)
543
+
544
+
Very easy to set up a local Spark install on your laptop.
545
+
546
+
---
547
+
548
+
## Spark Cons
549
+
550
+
JVM Based (Scala) means C/Python interoperability always fraught.
551
+
552
+
Not much support for high-performance interconnects (although that's coming from third parties - [HiBD group at OSU](http://hibd.cse.ohio-state.edu))
553
+
554
+
Very little explicit support for multicore yet, which leaves some performance on the ground.
555
+
515
556
--- .segue .dark
516
557
517
558
## Tensorflow: http://tensorflow.org
518
559
560
+
--- &twocol
561
+
562
+
## A Quick Intro to TensorFlow
563
+
564
+
*** =left
565
+
566
+
TensorFlow is an open-source dataflow for numerical computation with dataflow graphs, where the data is always in the form of tensors (n-d arrays).
567
+
568
+
From Google, who uses it for machine learning.
569
+
570
+
Heavy number crunching, can use GPUs or CPUs, and will distribute tasks of a complex workflow across resources.
571
+
572
+
(Current version only has initial support for distributed; taking longer to de-google the distributed part than anticipated)
573
+
574
+
*** =right
575
+

576
+
577
+
http://www.tensorflow.org
578
+
579
+
--- &twocol
580
+
581
+
## TensorFlow Graphs
582
+
583
+
*** =left
584
+
585
+
As an example of how a computation is set up, here is a linear regression example.
586
+
587
+
Linear regression is already built in, and doesn't need to be iterative, but this example is quite general and shows how it works.
588
+
589
+
Variables are explicitly introduced to the TensorFlow runtime, and a series of transformations on the
590
+
variables are defined.
591
+
592
+
When the entire flowgraph is set up, the system can be run.
593
+
594
+
The integration of tensorflow tensors and numpy arrays is very nice.
595
+
596
+
597
+
*** =right
598
+

599
+

600
+
601
+
--- &twocol
602
+
603
+
## TensorFlow Mandelbrot
604
+
605
+
*** =left
606
+
607
+
All sorts of computations on regular arrays can be performed.
608
+
609
+
Some computations can be split across GPUs, or (eventually) even nodes.
610
+
611
+
All are multi-threaded.
612
+
613
+
*** =right
614
+

615
+
616
+
--- &twocol
617
+
618
+
## TensorFlow Wave Equation
619
+
620
+
*** =left
621
+
622
+
All sorts of computations on regular arrays can be performed.
623
+
624
+
Some computations can be split across GPUs, or (eventually) even nodes.
625
+
626
+
All are multi-threaded.
627
+
628
+
*** =right
629
+

630
+
631
+
519
632
---
520
633
521
-
## An intro to TensorFlow
634
+
## TensorFlow: Caveats
635
+
636
+
* Tensors only means limited support for, eg, unstructured meshes, hash tables (bioinformatics)
637
+
* Distribution of work remains limited and manual (but is expected to improve - Google uses this)
638
+
639
+
## TensorFlow: Pros
640
+
641
+
* C++ - interfacing is much simpler than
642
+
* Fast
643
+
* GPU, CPU support, not unreasonble to expect Phi support shortly
644
+
* Great for data processing, image processing, or computations on n-d arrays
522
645
523
646
--- .segue .dark
524
647
525
648
## Conclusions
526
649
527
650
---
528
651
529
-
foo bar baz
652
+
## Building an Execution Plan for a Better Tomorrow
653
+
654
+
All of the approaches we've seen implicitly or explicitly constructed dataflow graphs to describe where data needs to move.
655
+
656
+
Then can build optimization on top of that to improve data flow, movement; optimization often leaves room for improvement.
657
+
658
+
These approaches are extremely promising, and already completely useable at scale for some sorts of tasks.
659
+
660
+
None will replace MPI yet, but any have the opportunity to make some work much more productive, and reduce time-to-science
0 commit comments