Skip to content

add swarm plot #5087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
suterberg opened this issue Mar 14, 2025 · 8 comments
Open

add swarm plot #5087

suterberg opened this issue Mar 14, 2025 · 8 comments
Labels
feature something new P3 backlog

Comments

@suterberg
Copy link

I can't find swarm plot in plotly, but can use this way to plot a swarm map:

import plotly.express as px
import numpy as np
import pandas as pd

np.random.seed(1)
y0 = np.random.randn(50) - 1

mm=pd.DataFrame({"org":y0})
tmp=[]
for k in y0:
    tmp.append(np.floor(k*10)/10)
mm["cut"]=tmp
cc=pd.DataFrame(columns=["x","y"])
width=0.08
for s in mm.groupby("cut"):
    # print(type(s))
    x=s[0]
    mp=s[1]
    ls=len(s[1])
    mid= int(ls/2)
    org= -mid
    if ls%2==0:
        org=org+0.5
        for i in range(ls):
            cc.loc[len(cc)]=[x,(i+org)*width]
        # for i in range(ls):
        #     if i == mid:
        #         org=org+1
        #     cc.loc[len(cc)]=[x,(i+org)*width]
    else:
        for i in range(ls):
            cc.loc[len(cc)]=[x,(i+org)*width]
  
# print(mm)
# print(cc)

fig = px.scatter(cc,x='x',y='y',range_y=[-2,2])
# fig=px.strip(cc,x='x',y='y',range_y=[-2,2])

fig.show()

this is a example for how to deal data and transform data for swarm
Image

@gvwilson gvwilson changed the title Swarm plot add swarm plot Mar 17, 2025
@gvwilson gvwilson added feature something new P3 backlog labels Mar 17, 2025
@gvwilson
Copy link
Contributor

thanks @suterberg - have you posted this to our forums https://community.plotly.com/ as well? you may get a faster response there.

@rl-utility-man
Copy link
Contributor

rl-utility-man commented Apr 12, 2025

Would a version of this example -- with e.g. more comments, a static data set, and support for more than one category -- be welcome in the scatterplot documentation? Would that be an efficient way to largely address the concern?

@suterberg
Copy link
Author

Would a version of this example -- with e.g. more comments, a static data set, and support for more than one category -- be welcome in the scatterplot documentation? Would that be an efficient way to largely address the concern?

  1. Needs 1-dimensional data
  2. Classify data with an interval of 0.1
  3. Aggregate data through groupby
  4. Traverse aggregated data, add two-dimensional points based on the number of data, obtain two-dimensional data cc
  5. Draw image

@rl-utility-man
Copy link
Contributor

rl-utility-man commented Apr 13, 2025

@suterberg Cool! To get a sense of where this might go, you could look at a couple examples of documentation pull requests that @SimaRaha and I have submitted: #4994 is most of the way through the review process and is pretty routine. #4983 is a bit atypical, but has some deep similarities to the swarm plot. For example, both #4983 and the swarm plot example above harness scatter to build a new graph type. Both would add documentation rather than adding a new function to the library. The changed files tabs on the pull requests show you exactly what we're submitting -- it's a combination of code like the above and terse commentary -- and the conversation tab shows the documentation expectations checklist; there's also a contributing guide here:
https://github.com/plotly/plotly.py/blob/main/CONTRIBUTING.md An easy way to start a documentation PRs is by clicking the "suggest an edit to this page" in the plotly documentation. We should likely get a reaction from @LiamConnors to #4983 in the coming weeks that will be instructive about whether and how to pursue swarmplot as a documentation pull request.

I hope that is helpful and apologize if some of that was obvious. Thank you for starting this constructive conversation!

@suterberg
Copy link
Author

suterberg commented Apr 13, 2025

@suterberg Cool! To get a sense of where this might go, you could look at a couple examples of documentation pull requests that @SimaRaha and I have submitted: #4994 is most of the way through the review process and is pretty routine. #4983 is a bit atypical, but has some deep similarities to the swarm plot. For example, both #4983 and the swarm plot example above harness scatter to build a new graph type. Both would add documentation rather than adding a new function to the library. The changed files tabs on the pull requests show you exactly what we're submitting -- it's a combination of code like the above and terse commentary -- and the conversation tab shows the documentation expectations checklist; there's also a contributing guide here: https://github.com/plotly/plotly.py/blob/main/CONTRIBUTING.md An easy way to start a documentation PRs is by clicking the "suggest an edit to this page" in the plotly documentation. We should likely get a reaction from @LiamConnors to #4983 in the coming weeks that will be instructive about whether and how to pursue swarmplot as a documentation pull request.

I hope that is helpful and apologize if some of that was obvious. Thank you for starting this constructive conversation!

import numpy as np
import pandas as pd
import plotly.express as px


def swarm(
    data_frame=None,
    pitch_x=0.1,
    pitch_y=0.1,
    point_size=16,
    range_x=None,
    range_y=None,
):
    if data_frame is None:
        np.random.seed(1)
        data_frame = pd.DataFrame({"org": np.random.randn(50) - 1})
    else:
        data_frame = pd.DataFrame({"org": data_frame})
    if range_x is None:
        range_x = [
            np.floor((data_frame["org"].min() - pitch_x) / pitch_x) * pitch_x,
            np.floor((data_frame["org"].max() + pitch_x) / pitch_x) * pitch_x,
        ]

    tmp = []
    for val in data_frame["org"]:
        tmp.append(np.floor(val / pitch_x) * pitch_x)
    data_frame["cut"] = tmp

    darw_date = pd.DataFrame(columns=["x", "y", "value"])

    for gp in data_frame.groupby("cut"):
        x = gp[0]
        nums = len(gp[1])
        mid = int(nums / 2)
        org = -mid

        values = gp[1]["org"].sort_values()
        values = values.to_list()

        if nums % 2 == 0:
            org = -mid + 0.5
            for i, val in enumerate(values):
                index = len(darw_date)
                darw_date.loc[index] = [x, (i + org) * pitch_y, val]
        else:
            for i, val in enumerate(values):
                index = len(darw_date)
                darw_date.loc[index] = [x, (i + org) * pitch_y, val]


    if range_y is None:
        range_y = [darw_date["y"].min() - pitch_y, darw_date["y"].max() + pitch_y]
    fig = px.scatter(
        darw_date,
        x="x",
        y="y",
        range_x=range_x,
        range_y=range_y,
        hover_data="value",
    )
    fig.update_traces(marker_size=point_size)
    fig.show()


if __name__ == "__main__":
    swarm(pitch_x=0.1, range_y=[-0.6, 0.6])

这个可能是你想要的,但是我认为没什么意义

@rl-utility-man
Copy link
Contributor

rl-utility-man commented Apr 16, 2025

I'm busy with some other things right now, but plan to either make code based on your examples into a documentation pull request or convince a collaborator to do so. I'll revise into something that makes sense to me and that is well commented, but will tag you on the PR -- so you should get an email about it -- and welcome your comments on it either in the pull request or here. If someone else beats me to it, that's terrific too. I plan to support multiple categories; I think I can pull much of what I need from #4983. Many thanks!

@suterberg
Copy link
Author

suterberg commented Apr 18, 2025

from decimal import ROUND_FLOOR, ROUND_HALF_DOWN, ROUND_HALF_UP, Decimal, getcontext
from math import pi
from typing import Literal

import numpy as np
import pandas as pd
import plotly.express as px

# 设置精度
getcontext().prec = 10


def swarm(
    data_frame=None,
    pitch_x=0.1,
    pitch_y=0.1,
    point_size=16,
    range_x=None,
    range_y=None,
    round_rule: Literal["floor", "hf_up", "hf_down"] = "hf_up",
):
    if data_frame is None:
        np.random.seed(1)
        data_frame = pd.DataFrame({"org": np.random.randn(50) - 1})
    else:
        print(data_frame)
        data_frame = pd.DataFrame({"org": data_frame})

    if range_x is None:
        range_x = [
            np.floor((data_frame["org"].min() - 2 * pitch_x) / pitch_x) * pitch_x,
            np.floor((data_frame["org"].max() + 2 * pitch_x) / pitch_x) * pitch_x,
        ]

    tmp = []
    rule = ROUND_HALF_UP
    match round_rule:
        case "floor":
            rule = ROUND_FLOOR
        case "hf_up":
            rule = ROUND_HALF_UP
        case "hf_down":
            rule = ROUND_HALF_DOWN

    for val in data_frame["org"]:
        a = Decimal(f"{val}")
        b = Decimal(f"{pitch_x}")
        tmp.append((a / b).quantize(Decimal("0"), rule) * b)
    data_frame["cut"] = tmp
    darw_date = pd.DataFrame(columns=["x", "y", "value"])

    for gp in data_frame.groupby("cut"):
        x = gp[0]
        nums = len(gp[1])
        mid = int(nums / 2)
        org = -mid

        values = gp[1]["org"].sort_values()
        values = values.to_list()

        if nums % 2 == 0:
            org = -mid + 0.5
            for i, val in enumerate(values):
                index = len(darw_date)
                darw_date.loc[index] = [x, (i + org) * pitch_y, val]
        else:
            for i, val in enumerate(values):
                index = len(darw_date)
                darw_date.loc[index] = [x, (i + org) * pitch_y, val]
    print(darw_date)

    if range_y is None:
        range_y = [darw_date["y"].min() - pitch_y, darw_date["y"].max() + pitch_y]
    fig = px.scatter(
        darw_date,
        x="x",
        y="y",
        range_x=range_x,
        range_y=range_y,
        hover_data="value",
    )
    fig.update_traces(
        marker_size=point_size,
        hovertemplate="<b>value</b>: %{customdata}",
    )
    fig.show()


if __name__ == "__main__":
    from sklearn.datasets import load_iris

    data = load_iris()
    feature_names = [
        "Sepal length",
        "Sepal width",
        "Petal length",
        "Petal width",
    ]
    X_df = pd.DataFrame(data.data, columns=feature_names)
    swarm(X_df["Sepal length"], pitch_x=0.2, pitch_y=0.05, range_y=[-0.8, 0.8])
    swarm(X_df["Sepal length"], pitch_x=0.1, pitch_y=0.03, range_y=[-0.4, 0.4])
    swarm(pitch_x=0.1, range_y=[-0.6, 0.6])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new P3 backlog
Projects
None yet
Development

No branches or pull requests

3 participants