Skip to content

Commit 95955ef

Browse files
[Edit] Python: matplotlib - .scatter() (#6646)
* [Edit] Python (Pillow): pillow * Update pillow.md * [Edit] Python: matplotlib - .scatter() * Add files via upload * Update content/matplotlib/concepts/pyplot/terms/scatter/scatter.md * Update content/matplotlib/concepts/pyplot/terms/scatter/scatter.md * Update content/matplotlib/concepts/pyplot/terms/scatter/scatter.md ---------
1 parent 9c72fac commit 95955ef

File tree

4 files changed

+163
-30
lines changed

4 files changed

+163
-30
lines changed
Lines changed: 163 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,70 +1,203 @@
11
---
22
Title: '.scatter()'
3-
Description: 'Creates a scatter plot of x vs. y values.'
3+
Description: 'Creates scatter plots to visualize relationships between variables.'
44
Subjects:
55
- 'Data Science'
66
- 'Data Visualization'
77
Tags:
8-
- 'Graphs'
9-
- 'Libraries'
8+
- 'Charts'
109
- 'Matplotlib'
1110
CatalogContent:
1211
- 'learn-python-3'
1312
- 'paths/data-science'
1413
---
1514

16-
The **`.scatter()`** method in the matplotlib library is used to draw a scatter plot, showing a relationship between variables.
15+
The **`.scatter()`** method in Matplotlib creates scatter plots to visualize relationships between numerical variables. Scatter plots display the values of two variables as points on a Cartesian coordinate system, helping to identify correlations, patterns, and outliers in your data. This visualization tool is invaluable for data analysis, allowing researchers and data scientists to explore how changes in one variable might influence another.
16+
17+
Scatter plots are widely used in statistics, scientific research, and data science to examine the relationship between paired data. They're particularly useful for detecting trends, clusters, and anomalies that might not be apparent in tabular data.
1718

1819
## Syntax
1920

2021
```pseudo
21-
matplotlib.pyplot.scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, edgecolors, plotnonfinite)
22+
matplotlib.pyplot.scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, edgecolors=None, plotnonfinite=False, data=None, **kwargs)
2223
```
2324

24-
Both 'x' and 'y' parameters are required, and represent float or array-like objects. Other parameters are optional and modify plot features like marker size and/or color.
25+
**Parameters:**
26+
27+
- `x, y`: Arrays or list-like objects representing the data point coordinates
28+
- `s`: Marker size in points^2 (default: `None` which is interpreted as `rcParams['lines.markersize'] ** 2`)
29+
- `c`: Marker color; can be a single color, an array of colors, or a sequence of colors (default: `None`)
30+
- `marker`: Marker style (default: 'o' for circle)
31+
- `cmap`: `Colormap` name or `Colormap` instance for mapping intensities of colors (default: `None`)
32+
- `norm`: Normalize object for scaling data values to `Colormap` range (default: `None`)
33+
- `vmin`, `vmax`: Minimum and maximum values for color scaling (useful with `cmap`)
34+
- `alpha`: Float between 0 and 1 for the blending value/transparency (default: `None`)
35+
- `linewidths`: Width of marker borders (default: `None`)
36+
- `edgecolors`: Colors of marker borders (default: `None` which means inheriting from `c`)
37+
- `plotnonfinite`: Boolean indicating whether to plot points with non-finite `c` (default: `False`)
2538

26-
`.scatter()` takes the following arguments:
39+
**Return value:**
2740

28-
- `x` and `y`: Positional arguments of type float or array.
29-
- `s`: A float or an array (of size equal to x or y) specifying marker size.
30-
- `c`: An array or list specifying marker color.
31-
- `marker`: Sets the marker style, specified with a shorthand code (e.g. ".": point, "o": circle) or an instance of the class.
32-
- `cmap`: A Colormap instance used to map scalar data to colors. (Default: "viridis")
33-
- `norm`: Normalization method used to scale scalar data to a range of (0 to 1) before mapping. Linear scaling is default.
34-
- `vmin` and `vmax`: Sets the data range for the colormap (if norm is not specified).
35-
- `alpha`: Sets the transparency value of the markers - range between 0 (transparent) and 1 (opaque).
36-
- `linewidths`: Sets the linewidth of the marker edge.
37-
- `edgecolors`: Sets the edge color of the marker.
38-
- `plotnonfinite`: Boolean value determining whether to plot nonfinite (`inf`, `-inf`, `nan`) values. Default is `False`.
41+
The method returns a `PathCollection` object.
3942

40-
## Examples
43+
## Example 1: Creating a Basic Scatter Plot
4144

42-
Examples below demonstrate the use of `.scatter()` to plot values and vary marker properties.
45+
This example demonstrates how to create a basic scatter plot with Matplotlib, visualizing the relationship between two variables:
4346

4447
```py
4548
import matplotlib.pyplot as plt
49+
import numpy as np
50+
51+
# Generate random data for demonstration
52+
np.random.seed(42) # For reproducibility
53+
x = np.random.rand(50) * 10 # 50 random values between 0 and 10
54+
y = 2 * x + 1 + np.random.randn(50) # Linear relationship with some noise
55+
56+
# Create a scatter plot
57+
plt.figure(figsize=(8, 6)) # Set figure size
58+
plt.scatter(x, y) # Create the scatter plot
59+
60+
# Add labels and title
61+
plt.xlabel('X-axis')
62+
plt.ylabel('Y-axis')
63+
plt.title('Basic Scatter Plot Example')
64+
65+
# Add a grid for better readability
66+
plt.grid(True, linestyle='--', alpha=0.7)
67+
68+
# Display the plot
69+
plt.show()
70+
```
71+
72+
This code creates a scatter plot showing the relationship between randomly generated `x` and `y` values, where `y` has a linear relationship with `x` plus some random noise. The plot displays 50 data points, each represented by a circle marker.
4673

47-
x1 = [5, 13, 21, 28, 31, 34, 39, 44, 49]
48-
y1 = [14, 28, 44, 56, 67, 53, 47, 30, 11]
74+
![Scatter plot showing a linear relationship with noise between two variables, with 50 blue circular markers scattered around a diagonal trend, grid lines, and labeled axes](https://raw.githubusercontent.com/Codecademy/docs/main/media/matplot-scatter-output-1.png)
4975

50-
plt.scatter(x1, y1)
76+
## Example 2: Customizing Scatter Plots with Size, Color, and Transparency
77+
78+
This example shows how to customize scatter plots by varying marker size, color, and transparency based on additional data dimensions:
79+
80+
```py
81+
import matplotlib.pyplot as plt
82+
import numpy as np
83+
84+
# Generate sample data
85+
np.random.seed(0)
86+
x = np.random.rand(100) * 10
87+
y = np.random.rand(100) * 10
88+
sizes = np.random.rand(100) * 500 # Varying marker sizes
89+
colors = np.random.rand(100) # Values for colormapping
90+
91+
# Create a scatter plot with customized appearance
92+
plt.figure(figsize=(10, 8))
93+
scatter = plt.scatter(x, y,
94+
s=sizes, # Set marker sizes
95+
c=colors, # Set colors
96+
cmap='viridis', # Choose colormap
97+
alpha=0.6, # Set transparency
98+
edgecolors='black', # Add black edges to markers
99+
linewidths=0.5) # Set edge width
100+
101+
# Add labels, title, and grid
102+
plt.xlabel('Feature 1')
103+
plt.ylabel('Feature 2')
104+
plt.title('Scatter Plot with Size and Color Variation')
105+
plt.grid(True, linestyle='--', alpha=0.3)
106+
107+
# Add a colorbar to show the mapping of colors
108+
plt.colorbar(scatter, label='Color Value')
109+
110+
plt.tight_layout()
51111
plt.show()
52112
```
53113

54-
Output:
114+
This example creates a more advanced scatter plot where:
115+
116+
- The size of each marker varies based on the `sizes` array
117+
- The color of each marker is determined by the `colors` array and the 'viridis' colormap
118+
- Markers have partial transparency (alpha=0.6) and thin black edges
119+
- A colorbar is added to explain what the colors represent
55120

56-
![Output of matplotlib.pyplot.scatter() function example](https://raw.githubusercontent.com/Codecademy/docs/main/media/matplotlib-scatter-1.png)
121+
![Colorful scatter plot of 100 points with varying marker sizes and a Viridis colormap, semi-transparent circles with black edges, colorbar on the side, and labeled axes.](https://raw.githubusercontent.com/Codecademy/docs/main/media/matplot-scatter-output-2.png)
122+
123+
## Example 3: Using Scatter Plots for Real-world Data Analysis
124+
125+
This example demonstrates how to use scatter plots for analyzing real-world data, specifically the relationship between height and weight in a dataset:
57126

58127
```py
59128
import matplotlib.pyplot as plt
129+
import numpy as np
130+
131+
# Sample height (cm) and weight (kg) data for two groups
132+
# Group 1 (e.g., males)
133+
heights_1 = np.array([170, 175, 180, 165, 160, 185, 190, 175, 180, 185])
134+
weights_1 = np.array([68, 72, 78, 65, 60, 85, 90, 75, 77, 85])
135+
136+
# Group 2 (e.g., females)
137+
heights_2 = np.array([160, 165, 170, 155, 150, 160, 165, 155, 170, 160])
138+
weights_2 = np.array([55, 58, 62, 53, 50, 58, 62, 51, 63, 56])
139+
140+
plt.figure(figsize=(10, 6))
60141

61-
x2 = [11, 22, 33, 44, 55]
62-
y2 = [11, 22, 33, 44, 55]
142+
# Create scatter plots for both groups with different colors and labels
143+
plt.scatter(heights_1, weights_1, c='blue', label='Group 1', alpha=0.7, s=100)
144+
plt.scatter(heights_2, weights_2, c='red', label='Group 2', alpha=0.7, s=100)
145+
146+
# Calculate and plot trendlines (best fit lines)
147+
z1 = np.polyfit(heights_1, weights_1, 1)
148+
p1 = np.poly1d(z1)
149+
plt.plot(heights_1, p1(heights_1), "b--", alpha=0.8)
150+
151+
z2 = np.polyfit(heights_2, weights_2, 1)
152+
p2 = np.poly1d(z2)
153+
plt.plot(heights_2, p2(heights_2), "r--", alpha=0.8)
154+
155+
# Add labels, title, and legend
156+
plt.xlabel('Height (cm)')
157+
plt.ylabel('Weight (kg)')
158+
plt.title('Height vs. Weight Comparison Between Groups')
159+
plt.legend()
160+
161+
# Add grid and adjust layout
162+
plt.grid(True, linestyle='--', alpha=0.4)
163+
plt.tight_layout()
63164

64-
plt.scatter(x2, y2, s=150, c='#88c988', linewidth=3, marker='p' , edgecolor='#175E17', alpha=0.75)
65165
plt.show()
66166
```
67167

68-
Output:
168+
This example visualizes the relationship between height and weight for two different groups, possibly representing males and females. Key features include:
169+
170+
- Different colors to distinguish between the two groups
171+
- Semi-transparent markers for better visibility when points overlap
172+
- Trend lines showing the linear relationship for each group
173+
- Appropriate labels, title, and legend to make the plot informative
174+
- A grid to help with reading values off the chart
175+
176+
![Scatter plot comparing height and weight for two groups, with blue and red markers representing each group, semi-transparent circles, trendlines in dashed style, and labeled axes and legend.](https://raw.githubusercontent.com/Codecademy/docs/main/media/matplot-scatter-output-3.png)
177+
178+
## Frequently Asked Questions
179+
180+
### 1. How do I create a scatter plot with different colors for different categories?
181+
182+
To create a scatter plot with different colors for different categories, use the `c` parameter with a list of colors or a categorical variable, and specify a `Colormap` with the `cmap` parameter. For categorical data, you can manually assign colors to each category.
183+
184+
### 2. Can I adjust the size of the markers in a scatter plot?
185+
186+
Yes, you can adjust the marker size using the `s` parameter. This parameter accepts a single value for uniform size or an array of values for varying sizes. Note that the values represent the area of the marker in points squared.
187+
188+
### 3. How do I add a colorbar to my scatter plot?
189+
190+
To add a colorbar, store the scatter plot object that's returned when you call `plt.scatter()`, then pass this object to `plt.colorbar()`. For example:
191+
192+
```py
193+
scatter = plt.scatter(x, y, c=colors, cmap='viridis')
194+
plt.colorbar(scatter, label='Color Value')
195+
```
196+
197+
### 4. Can I create bubble charts with Matplotlib's scatter function?
198+
199+
Yes, a bubble chart is essentially a scatter plot where the marker size varies according to a third variable. Use the `s` parameter to set the marker sizes based on your third variable.
200+
201+
### 5. How do I control transparency in scatter plots?
69202

70-
![Output of matplotlib.pyplot.scatter() function example 2](https://raw.githubusercontent.com/Codecademy/docs/main/media/matplotlib-scatter-2.png)
203+
Use the `alpha` parameter to control transparency. The value should be between 0 (completely transparent) and 1 (completely opaque). This is particularly useful when dealing with overlapping points.

media/matplot-scatter-output-1.png

33.9 KB
Loading

media/matplot-scatter-output-2.png

101 KB
Loading

media/matplot-scatter-output-3.png

56.3 KB
Loading

0 commit comments

Comments
 (0)