Skip to content

Alternate AIP-193 recommendation #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 107 additions & 0 deletions aip/general/0193/aip.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Errors

Error handling is an important part of designing simple and intuitive APIs.
Consistent error handling allows developers to know how to expect to receive
errors, and to reduce boilerplate by having common error-handling logic, rather
than being expected to constantly add verbose error handling everywhere.

## Guidance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since there already is protobuf-level guidance, why not include protobuf? or does it translate? https://google.aip.dev/193.

I'm thinking about how to reconcile this proposal with existing practices in google.aip.dev (which, as the only public repository up until now, has served as the base of other forks IIUC).


Services **must** clearly distinguish successful responses from error responses
by using appropriate HTTP codes:
- Informational responses **must** use HTTP status codes between 100 and 199.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing 100 to 199 status code is per RFC 7807

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure what an "informational response" is - and I'd rather not have to go to the HTTP RFCs to find out. Does this range indicate success or failure?

Copy link
Author

@saurabhsahni saurabhsahni Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These may not indicate success or failure. These are often used to indicate intermediate status of an HTTP request. For instance, a server that supports HTTP/2 can respond with 101 Switching Protocols upgrading a HTTP/1.1 connection to a HTTP/2 connection: https://datatracker.ietf.org/doc/html/rfc7540#section-3.2

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding clarification

- Successful responses **must** use HTTP status codes between 200 and 399.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are 3xx responses really successful in the context of APIs? 304 could be considered successful, but anything else really suggests further action.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. 3XX should mean redirection. Updating this

- Errors indicating a problem with the user's request **must** use HTTP status
codes between 400 and 499.
- Errors indicating a problem with the server's handling of an valid request
**must** use HTTP status codes between 500 and 599.

### Structure

Error responses **should** conform to the following interface:

```typescript
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why typescript when the existing aip.dev is using protobuf / OpenAPI? https://aip-dev.github.io/aip.dev/136.

interface Error {
//A machine-readable code indicating the type of error (like `name_too_long`). This value is parseable for programmatic error handling.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super-nit: // without a space after it makes me wince :)

code: string;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code: string instead of type?: string, format: uri-reference

  • There's a need to return a machine-readable error code beyond HTTP status. While technically the type parameter returns a unique URI reference for a given problem, comparing URI references programmatically doesn't seem an ideal experience.
  • Now any string would qualify as a uri-reference, it’s confusing to recommend something as a uri-reference, if we think most API providers should just return error codes.
  • To ensure we’re compliant with RFC 7807, an alternative name that’s used frequently by several APIs could be “code” instead of “type”.
  • This one field could be set as required.
  • Example usage: Apple, Stripe, Github, Microsoft, Facebook

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably specify that strings should be comparable using ordinal comparisons - not case-insensitively, for example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding clarification below. One thing to note is the recommendation is to use only lowercase letters here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion, consensus was to to stick with type?: string without format: uri-reference because comparing URI references programmatically isn't an ideal experience. @dret we were wondering if there has been a discussion around dropping format: uri-reference from RFC 7807?


//A human readable description of the problem. Should not change from occurrence to occurrence (except for localization).
title?: string

//The HTTP status code between 100 and 500
status?: integer

//A human-readable explanation specific to this occurrence of the problem
detail?: string
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shwoodard @lukesneeringer message is more standard name for this field. Though RFC 7807 has this as detail.
@lukesneeringer @shwoodard we should drop detail from the recommendation because adding explanation specific to an occurrence of the problem may lead to developers parsing this string.


//A unique identifier that identifies the specific occurrence of the problem. Can be provided to the API owner for debugging purposes.
id?: string
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id?: string instead of instance?: string, format: uri-reference

  • More on “instance”:
    • When the "instance" URI is dereferenceable, the problem details object can be fetched from it. It might also return information about the problem occurrence in other formats through use of proactive content negotiation (see [HTTP], Section 12.5.1).
    • When the "instance" URI is not dereferencable, it serves as a unique identifier for the problem occurrence that may be of significance to the server, but is opaque to the client.
  • Practical use cases of returning additional problem details for a specific occurrence through a URL seem limited.
  • To represent a unique identifier for an error, while being compliant with RFC 7807, an alternative name could be “id” (Example usage: Apple, Facebook)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternate options:

  • errorInstanceId
  • traceId
  • requestId
  • occurrenceId

Consensus was to use occurrenceId. id, errorId could be confused as an id unique for this error type instead of the specific occurrence. `

}
```

- The `title` field is intended for consumption by humans, and therefore
**may** change, even within a single version.
- The `code` field is intended to have code written against it, and therefore
**must not** change. Values for this field should be 0-63 characters, and use
only lower-case letters, numbers, and the `-` character.



### Messages

Error messages **should** help a reasonably technical user understand and
resolve the issue, and **should not** assume that the user is an expert in the
particular API. Additionally, error messages **must not** assume that the user
will know anything about its underlying implementation.

Error messages **should** be brief but actionable. Any extra information
**should** be provided in a `details` field. If even more information is
necessary, the service **should** provide a link where a reader can get more
information or ask questions to help resolve the issue.

### Localization

Error messages **must** be in American English. If a localized error message is
also required, the service **should** provide the following structure within
its `details`:

```typescript
interface LocalizedMessage {
// The locale for this error message.
// Follows the spec defined at http://www.rfc-editor.org/rfc/bcp/bcp47.txt.
// Examples: 'en-US', 'de-CH', 'es-MX'
locale: string;

// The localized error message in the above locale.
message: string;
}
```

### Partial errors

APIs **should not** support partial errors. Partial errors add significant
complexity for users, because they usually sidestep the use of error codes, or
move those error codes into the response message, where the user must write
specialized error handling logic to address the problem.

However, occasionally partial errors are unavoidable, particularly in bulk
operations where it would be hostile to users to fail an entire large request
because of a problem with a single entry.

Methods that require partial errors **should** use long-running operations (as
described in AIP-151), and the method **should** put partial failure
information in the metadata message. The errors themselves **must** still be
represented with an error object.

## Further reading

- For which error codes to retry, see AIP-194.

## Changelog

- **2020-09-02**: Refactored errors AIP to be more generic.
- **2020-01-22**: Added a reference to the `ErrorInfo` message in gRPC.
- **2019-10-14**: Added guidance restricting error message mutability to if
there is a machine-readable identifier present.
- **2019-09-23**: Added guidance about error message strings being able to
change.
7 changes: 7 additions & 0 deletions aip/general/0193/aip.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to change anything in this file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like google.aip.dev moves this front-matter to the file itself (which I tend to agree with)

https://raw.githubusercontent.com/aip-dev/google.aip.dev/master/aip/general/0001.md.

Is there an outstanding issue to move this to the more succinct format?

id: 193
state: approved
created: 2019-07-26
placement:
category: polish
order: 30