-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #7 from pedropark99/test-intro
Update README and restructure the intro of the book
- Loading branch information
Showing
36 changed files
with
1,245 additions
and
3,258 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,4 +17,6 @@ Chapters/metastore_db | |
Chapters/*.html | ||
Chapters/*/* | ||
|
||
Scripts/__pycache__/ | ||
Scripts/__pycache__/ | ||
|
||
index_files/ |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,39 @@ | ||
# Introd-pyspark | ||
# Introd-pyspark | ||
|
||
<a href="https://pedro-faria.netlify.app/publications/book/introd-pyspark/en/"><img src="Cover/cover1.png" width="250" height="366" class="cover" align="right"/></a> An open and introductory book for the Python API of Apache Spark. The book "Introduction to pyspark" provides a quick introduction for the `pyspark` Python package, which is the Python API of Apache Spark. | ||
|
||
|
||
|
||
With `pyspark` you are able to use the Python language to write Spark applications and run them on a Spark cluster in a scalable and elegant way. This book focus on teaching the fundamentals of `pyspark`, and how to use it for big data analysis. | ||
|
||
Some of the main subjects discussed in the book are: | ||
|
||
- How an Apache Spark application works? | ||
- What are Spark DataFrames? | ||
- How to transform and model your Spark DataFrame. | ||
- How to import data into Apache Spark. | ||
- How to work with SQL inside pyspark. | ||
- Tools for manipulating specific data types (e.g. string, dates and datetimes). | ||
- How to use window functions. | ||
|
||
|
||
## About the author | ||
|
||
Pedro Duarte Faria have a bachelor degree in Economics from Federal University of Ouro Preto - Brazil. Currently, he is a Data Engineer at Blip, and an Associate Developer for Apache Spark 3.0 certified by Databricks. | ||
|
||
The author have more than 3 years of experience in the data analysis market. He developed data pipelines, reports and analysis for research institutions and some of the largest companies in the brazilian financial sector, such as the BMG Bank, Sodexo and Pan Bank, besides dealing with databases that go beyond the billion rows. | ||
|
||
Furthermore, Pedro is specialized on the R programming language, and have given several lectures and courses about it, inside graduate centers (such as PPEA-UFOP), in addition to federal and state organizations (such as FJP-MG). As researcher, he have experience in the field of Science, Technology and Innovation Economics. | ||
|
||
Personal Website: <https://pedro-faria.netlify.app/> | ||
|
||
Twitter: [@PedroPark9](https://twitter.com/PedroPark9) | ||
|
||
Mastodon: [@pedropark99@fosstodon.org](https://fosstodon.org/@pedropark99) | ||
|
||
|
||
## License | ||
|
||
Copyright © 2024 Pedro Duarte Faria. This book is licensed by the CC-BY 4.0 Creative Commons Attribution 4.0 International Public License. | ||
|
||
<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.