Merge pull request #6697 from segmentio/lizkane222-patch-18

pwseg · web-flow · commit 561591faaff8 · 2024-06-24T20:09:08.000-05:00
Update Predictions Data Requirements
diff --git a/src/unify/Traits/predictions/index.md b/src/unify/Traits/predictions/index.md
@@ -54,16 +54,46 @@ When you build a Custom Predictive Goal, you'll first need to select a cohort, o
 
 The target event is the Segment event that you want to predict. In creating a prediction, Segment determines the likelihood of the user performing the target event. Segment lets you include up to two target events and an event property in your prediction.
 
-#### Selecting events (optional)
+### Access and data requirements
 
-Some customers want to specifically include or exclude events that get fed into the model. For example, if you track different events from an EU storefront compared to a US storefront and you only want to make predictions using data from the US, you could unselect the events from the EU space. This step is optional, Segment only recommends using it if you have a clear reason in mind for removing events from becoming a factor in the model.
+In machine learning, better data leads to better predictions. Because Segment prioritizes trust and performance, Segment has a number of data checks to ensure that each prediction is reliable and of high quality. Segment provides guidance in the UI before you create a trait, but some checks only occur during model training. If a trait fails, you’ll see an error message and description in the UI. 
+
+This sections lists Segment's access and data requirements, service limits, and best practices for Predictions.
+
+#### Definitions
+
+- **Feature Window**: The past time period that contains the data used for model training.
+- **Target Window**: The time horizon for which you want to make the prediction. You can select this in the UI for each trait.
+- **Target Event**: The event predicting the likelihood of customer action.
+
+For example, to predict a customer's propensity to purchase over the next 30 days, set the Target Window to 30 days and the Target Event to `Order Completed` (or the relevant purchase event that you track).
+
+#### Predictions access requirements
 
+To access Predictions, you must:
 
-#### Data requirements
+- Track more than 1 event type, but fewer than 5,000 event types. An event type refers to the total number of distinct events seen across all users in an Engage Space within the past 15 days.
+  - If you currently track more than 5,000 distinct events, reduce the number of tracked events below this limit and wait around 15 days before creating your first prediction.
+  - Events become inactive if they've not been sent to an Engage Space within the past 15 days.
+- To prevent events from reaching your Engage Space, modify your event payloads to set `integrations.Personas` to `false`.
+  - For more information on using the integrations object, see [Spec: Common Fields](/docs/connections/spec/common/#context:~:text=In%20more%20detail%20these%20common%20fields,Destinations%20field%20docs%20for%20more%20details.), [Integrations](https://segment.com/docs/connections/spec/common/#context:~:text=Kotlin-,Integrations,be%20sent%20to%20rest%20of%20the%20destinations%20that%20can%20accept%20it.,-Timestamps), and [Filtering with the Integrations object](https://segment.com/docs/guides/filtering-data/#filtering-with-the-integrations-object).
+  - Analytics.js example: `analytics.track("Button Clicked", {button:"submit form"}, {"integrations":{"Personas":false}})`
 
-Segment doesn't enforce data requirements for predictions. In machine learning, however, data quality and quantity are critical. Segment recommends that you make predictions for at least 50,000 users and choose a target event that at least 5,000 users have performed in the last 30 days. 
+#### Successful trait computation
 
-You can create predictions outside of these suggestions, but your results may vary.
+This table lists the requirements for a trait to compute successfully:
+
+| Requirement                      | Details                                                                                     |
+|----------------------------------|---------------------------------------------------------------------------------------------|
+| Event Types                  | Track at least 5 different event types in the Feature Window.                               |
+| Historical Data              | Ensure these 5 events have data spanning 1.5 times the length of the Target Window. For example, to predict a purchase propensity over the next 60 days, at least 90 days of historical data is required. |
+| Subset Audience (if applicable) | Ensure the audience contains more than 1 non-anonymous user.                                 |
+| User Limit                   | Ensure that you are making a prediction for fewer than 100 million users. If you track more than 100 million users in your space, define a smaller audience in the **Make a Prediction For** section of the custom predictions builder. |
+| User Activity                | At least 100 users performing the Target Event and at least 100 users not performing the Target Event. |
+
+#### Selecting events (optional)
+
+Some customers want to specifically include or exclude events that get fed into the model. For example, if you track different events from an EU storefront compared to a US storefront and you only want to make predictions using data from the US, you could unselect the events from the EU space. This step is optional, Segment only recommends using it if you have a clear reason in mind for removing events from becoming a factor in the model.
 
 > info "Predictive Traits and anonymous events"
 > Predictive Traits are limited to non-anonymous events, which means you'll need to include an additional `external_id` other than `anonymousId` in the targeted events. If want to create Predictive Traits based on anonymous events, reach out to your CSM with your use case for creating an anonymous Predictive Trait and the conditions for trait.
@@ -96,7 +126,6 @@ Churn predictions are only made for eligible customers. In the previous example,
 
 Segment then uses this criteria to build the prediction and create specific percentile cohorts. You can then use these cohorts to target customers with retention flows, promo codes, or one-off email and SMS campaigns.
 
-
 ## Use cases
 
 For use cases and information on how Segment builds prediction, read [Using Predictions](/docs/unify/traits/predictions/using-predictions/).