|
1 | 1 | ---
|
2 | 2 | Title: '.scatter()'
|
3 |
| -Description: 'Creates a scatter plot of x vs. y values.' |
| 3 | +Description: 'Creates scatter plots to visualize relationships between variables.' |
4 | 4 | Subjects:
|
5 | 5 | - 'Data Science'
|
6 | 6 | - 'Data Visualization'
|
7 | 7 | Tags:
|
8 |
| - - 'Graphs' |
9 |
| - - 'Libraries' |
| 8 | + - 'Charts' |
10 | 9 | - 'Matplotlib'
|
11 | 10 | CatalogContent:
|
12 | 11 | - 'learn-python-3'
|
13 | 12 | - 'paths/data-science'
|
14 | 13 | ---
|
15 | 14 |
|
16 |
| -The **`.scatter()`** method in the matplotlib library is used to draw a scatter plot, showing a relationship between variables. |
| 15 | +The **`.scatter()`** method in Matplotlib creates scatter plots to visualize relationships between numerical variables. Scatter plots display the values of two variables as points on a Cartesian coordinate system, helping to identify correlations, patterns, and outliers in your data. This visualization tool is invaluable for data analysis, allowing researchers and data scientists to explore how changes in one variable might influence another. |
| 16 | + |
| 17 | +Scatter plots are widely used in statistics, scientific research, and data science to examine the relationship between paired data. They're particularly useful for detecting trends, clusters, and anomalies that might not be apparent in tabular data. |
17 | 18 |
|
18 | 19 | ## Syntax
|
19 | 20 |
|
20 | 21 | ```pseudo
|
21 |
| -matplotlib.pyplot.scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, edgecolors, plotnonfinite) |
| 22 | +matplotlib.pyplot.scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, edgecolors=None, plotnonfinite=False, data=None, **kwargs) |
22 | 23 | ```
|
23 | 24 |
|
24 |
| -Both 'x' and 'y' parameters are required, and represent float or array-like objects. Other parameters are optional and modify plot features like marker size and/or color. |
| 25 | +**Parameters:** |
| 26 | + |
| 27 | +- `x, y`: Arrays or list-like objects representing the data point coordinates |
| 28 | +- `s`: Marker size in points^2 (default: `None` which is interpreted as `rcParams['lines.markersize'] ** 2`) |
| 29 | +- `c`: Marker color; can be a single color, an array of colors, or a sequence of colors (default: `None`) |
| 30 | +- `marker`: Marker style (default: 'o' for circle) |
| 31 | +- `cmap`: `Colormap` name or `Colormap` instance for mapping intensities of colors (default: `None`) |
| 32 | +- `norm`: Normalize object for scaling data values to `Colormap` range (default: `None`) |
| 33 | +- `vmin`, `vmax`: Minimum and maximum values for color scaling (useful with `cmap`) |
| 34 | +- `alpha`: Float between 0 and 1 for the blending value/transparency (default: `None`) |
| 35 | +- `linewidths`: Width of marker borders (default: `None`) |
| 36 | +- `edgecolors`: Colors of marker borders (default: `None` which means inheriting from `c`) |
| 37 | +- `plotnonfinite`: Boolean indicating whether to plot points with non-finite `c` (default: `False`) |
25 | 38 |
|
26 |
| -`.scatter()` takes the following arguments: |
| 39 | +**Return value:** |
27 | 40 |
|
28 |
| -- `x` and `y`: Positional arguments of type float or array. |
29 |
| -- `s`: A float or an array (of size equal to x or y) specifying marker size. |
30 |
| -- `c`: An array or list specifying marker color. |
31 |
| -- `marker`: Sets the marker style, specified with a shorthand code (e.g. ".": point, "o": circle) or an instance of the class. |
32 |
| -- `cmap`: A Colormap instance used to map scalar data to colors. (Default: "viridis") |
33 |
| -- `norm`: Normalization method used to scale scalar data to a range of (0 to 1) before mapping. Linear scaling is default. |
34 |
| -- `vmin` and `vmax`: Sets the data range for the colormap (if norm is not specified). |
35 |
| -- `alpha`: Sets the transparency value of the markers - range between 0 (transparent) and 1 (opaque). |
36 |
| -- `linewidths`: Sets the linewidth of the marker edge. |
37 |
| -- `edgecolors`: Sets the edge color of the marker. |
38 |
| -- `plotnonfinite`: Boolean value determining whether to plot nonfinite (`inf`, `-inf`, `nan`) values. Default is `False`. |
| 41 | +The method returns a `PathCollection` object. |
39 | 42 |
|
40 |
| -## Examples |
| 43 | +## Example 1: Creating a Basic Scatter Plot |
41 | 44 |
|
42 |
| -Examples below demonstrate the use of `.scatter()` to plot values and vary marker properties. |
| 45 | +This example demonstrates how to create a basic scatter plot with Matplotlib, visualizing the relationship between two variables: |
43 | 46 |
|
44 | 47 | ```py
|
45 | 48 | import matplotlib.pyplot as plt
|
| 49 | +import numpy as np |
| 50 | + |
| 51 | +# Generate random data for demonstration |
| 52 | +np.random.seed(42) # For reproducibility |
| 53 | +x = np.random.rand(50) * 10 # 50 random values between 0 and 10 |
| 54 | +y = 2 * x + 1 + np.random.randn(50) # Linear relationship with some noise |
| 55 | + |
| 56 | +# Create a scatter plot |
| 57 | +plt.figure(figsize=(8, 6)) # Set figure size |
| 58 | +plt.scatter(x, y) # Create the scatter plot |
| 59 | + |
| 60 | +# Add labels and title |
| 61 | +plt.xlabel('X-axis') |
| 62 | +plt.ylabel('Y-axis') |
| 63 | +plt.title('Basic Scatter Plot Example') |
| 64 | + |
| 65 | +# Add a grid for better readability |
| 66 | +plt.grid(True, linestyle='--', alpha=0.7) |
| 67 | + |
| 68 | +# Display the plot |
| 69 | +plt.show() |
| 70 | +``` |
| 71 | + |
| 72 | +This code creates a scatter plot showing the relationship between randomly generated `x` and `y` values, where `y` has a linear relationship with `x` plus some random noise. The plot displays 50 data points, each represented by a circle marker. |
46 | 73 |
|
47 |
| -x1 = [5, 13, 21, 28, 31, 34, 39, 44, 49] |
48 |
| -y1 = [14, 28, 44, 56, 67, 53, 47, 30, 11] |
| 74 | + |
49 | 75 |
|
50 |
| -plt.scatter(x1, y1) |
| 76 | +## Example 2: Customizing Scatter Plots with Size, Color, and Transparency |
| 77 | + |
| 78 | +This example shows how to customize scatter plots by varying marker size, color, and transparency based on additional data dimensions: |
| 79 | + |
| 80 | +```py |
| 81 | +import matplotlib.pyplot as plt |
| 82 | +import numpy as np |
| 83 | + |
| 84 | +# Generate sample data |
| 85 | +np.random.seed(0) |
| 86 | +x = np.random.rand(100) * 10 |
| 87 | +y = np.random.rand(100) * 10 |
| 88 | +sizes = np.random.rand(100) * 500 # Varying marker sizes |
| 89 | +colors = np.random.rand(100) # Values for colormapping |
| 90 | + |
| 91 | +# Create a scatter plot with customized appearance |
| 92 | +plt.figure(figsize=(10, 8)) |
| 93 | +scatter = plt.scatter(x, y, |
| 94 | + s=sizes, # Set marker sizes |
| 95 | + c=colors, # Set colors |
| 96 | + cmap='viridis', # Choose colormap |
| 97 | + alpha=0.6, # Set transparency |
| 98 | + edgecolors='black', # Add black edges to markers |
| 99 | + linewidths=0.5) # Set edge width |
| 100 | + |
| 101 | +# Add labels, title, and grid |
| 102 | +plt.xlabel('Feature 1') |
| 103 | +plt.ylabel('Feature 2') |
| 104 | +plt.title('Scatter Plot with Size and Color Variation') |
| 105 | +plt.grid(True, linestyle='--', alpha=0.3) |
| 106 | + |
| 107 | +# Add a colorbar to show the mapping of colors |
| 108 | +plt.colorbar(scatter, label='Color Value') |
| 109 | + |
| 110 | +plt.tight_layout() |
51 | 111 | plt.show()
|
52 | 112 | ```
|
53 | 113 |
|
54 |
| -Output: |
| 114 | +This example creates a more advanced scatter plot where: |
| 115 | + |
| 116 | +- The size of each marker varies based on the `sizes` array |
| 117 | +- The color of each marker is determined by the `colors` array and the 'viridis' colormap |
| 118 | +- Markers have partial transparency (alpha=0.6) and thin black edges |
| 119 | +- A colorbar is added to explain what the colors represent |
55 | 120 |
|
56 |
| - |
| 121 | + |
| 122 | + |
| 123 | +## Example 3: Using Scatter Plots for Real-world Data Analysis |
| 124 | + |
| 125 | +This example demonstrates how to use scatter plots for analyzing real-world data, specifically the relationship between height and weight in a dataset: |
57 | 126 |
|
58 | 127 | ```py
|
59 | 128 | import matplotlib.pyplot as plt
|
| 129 | +import numpy as np |
| 130 | + |
| 131 | +# Sample height (cm) and weight (kg) data for two groups |
| 132 | +# Group 1 (e.g., males) |
| 133 | +heights_1 = np.array([170, 175, 180, 165, 160, 185, 190, 175, 180, 185]) |
| 134 | +weights_1 = np.array([68, 72, 78, 65, 60, 85, 90, 75, 77, 85]) |
| 135 | + |
| 136 | +# Group 2 (e.g., females) |
| 137 | +heights_2 = np.array([160, 165, 170, 155, 150, 160, 165, 155, 170, 160]) |
| 138 | +weights_2 = np.array([55, 58, 62, 53, 50, 58, 62, 51, 63, 56]) |
| 139 | + |
| 140 | +plt.figure(figsize=(10, 6)) |
60 | 141 |
|
61 |
| -x2 = [11, 22, 33, 44, 55] |
62 |
| -y2 = [11, 22, 33, 44, 55] |
| 142 | +# Create scatter plots for both groups with different colors and labels |
| 143 | +plt.scatter(heights_1, weights_1, c='blue', label='Group 1', alpha=0.7, s=100) |
| 144 | +plt.scatter(heights_2, weights_2, c='red', label='Group 2', alpha=0.7, s=100) |
| 145 | + |
| 146 | +# Calculate and plot trendlines (best fit lines) |
| 147 | +z1 = np.polyfit(heights_1, weights_1, 1) |
| 148 | +p1 = np.poly1d(z1) |
| 149 | +plt.plot(heights_1, p1(heights_1), "b--", alpha=0.8) |
| 150 | + |
| 151 | +z2 = np.polyfit(heights_2, weights_2, 1) |
| 152 | +p2 = np.poly1d(z2) |
| 153 | +plt.plot(heights_2, p2(heights_2), "r--", alpha=0.8) |
| 154 | + |
| 155 | +# Add labels, title, and legend |
| 156 | +plt.xlabel('Height (cm)') |
| 157 | +plt.ylabel('Weight (kg)') |
| 158 | +plt.title('Height vs. Weight Comparison Between Groups') |
| 159 | +plt.legend() |
| 160 | + |
| 161 | +# Add grid and adjust layout |
| 162 | +plt.grid(True, linestyle='--', alpha=0.4) |
| 163 | +plt.tight_layout() |
63 | 164 |
|
64 |
| -plt.scatter(x2, y2, s=150, c='#88c988', linewidth=3, marker='p' , edgecolor='#175E17', alpha=0.75) |
65 | 165 | plt.show()
|
66 | 166 | ```
|
67 | 167 |
|
68 |
| -Output: |
| 168 | +This example visualizes the relationship between height and weight for two different groups, possibly representing males and females. Key features include: |
| 169 | + |
| 170 | +- Different colors to distinguish between the two groups |
| 171 | +- Semi-transparent markers for better visibility when points overlap |
| 172 | +- Trend lines showing the linear relationship for each group |
| 173 | +- Appropriate labels, title, and legend to make the plot informative |
| 174 | +- A grid to help with reading values off the chart |
| 175 | + |
| 176 | + |
| 177 | + |
| 178 | +## Frequently Asked Questions |
| 179 | + |
| 180 | +### 1. How do I create a scatter plot with different colors for different categories? |
| 181 | + |
| 182 | +To create a scatter plot with different colors for different categories, use the `c` parameter with a list of colors or a categorical variable, and specify a `Colormap` with the `cmap` parameter. For categorical data, you can manually assign colors to each category. |
| 183 | + |
| 184 | +### 2. Can I adjust the size of the markers in a scatter plot? |
| 185 | + |
| 186 | +Yes, you can adjust the marker size using the `s` parameter. This parameter accepts a single value for uniform size or an array of values for varying sizes. Note that the values represent the area of the marker in points squared. |
| 187 | + |
| 188 | +### 3. How do I add a colorbar to my scatter plot? |
| 189 | + |
| 190 | +To add a colorbar, store the scatter plot object that's returned when you call `plt.scatter()`, then pass this object to `plt.colorbar()`. For example: |
| 191 | + |
| 192 | +```py |
| 193 | +scatter = plt.scatter(x, y, c=colors, cmap='viridis') |
| 194 | +plt.colorbar(scatter, label='Color Value') |
| 195 | +``` |
| 196 | + |
| 197 | +### 4. Can I create bubble charts with Matplotlib's scatter function? |
| 198 | + |
| 199 | +Yes, a bubble chart is essentially a scatter plot where the marker size varies according to a third variable. Use the `s` parameter to set the marker sizes based on your third variable. |
| 200 | + |
| 201 | +### 5. How do I control transparency in scatter plots? |
69 | 202 |
|
70 |
| - |
| 203 | +Use the `alpha` parameter to control transparency. The value should be between 0 (completely transparent) and 1 (completely opaque). This is particularly useful when dealing with overlapping points. |
0 commit comments