Seeking Developer Feedback: Go Driver Error Improvements

Isabella_Siu · November 18, 2020, 9:07pm

Hi, the team would like user perspective on one of the things we’re working on: error improvements. Currently, writing mgo’s IsDup helper with the driver would look something like this:

func IsDuplicateKeyError(err error) bool {
	switch e := err.(type) {
	case mongo.CommandError:
		return e.Code == 11000 || e.Code == 11001 || e.Code == 12582 || 
(e.Code == 16460 && strings.Contains(e.Error(), " E11000 "))
	case mongo.WriteException:
		if len(e.WriteErrors) > 0 {
			for _, we := range e.WriteErrors {
				if we.Code == 11000 || we.Code == 11001 || we.Code == 12582 || 
(we.Code == 16460 && strings.Contains(we.Error(), " E11000 ")) {
					return true
				}
			}
		}
	case mongo.BulkWriteException:
		if len(e.WriteErrors) > 0 {
			for _, we := range e.WriteErrors {
				if we.Code == 11000 || we.Code == 11001 || we.Code == 12582 || 
(we.Code == 16460 && strings.Contains(we.Error(), " E11000 ")) {
					return true
				}
			}
		}
	default:
		return false
	}
}

Tentatively, the plan is to add an interface with some helpers that CommandError, WriteException, and BulkWriteException will implement:

type ServerError interface {
    error
    HasErrorCode(int) bool
    HasErrorLabel(string) bool
    Unwrap() error
}

Which would turn that DuplicateKey helper function into this:

func IsDuplicateKeyError(err error) bool {
	if e, ok := err.(mongo.ServerError); ok {
		return e.HasErrorCode(11000) || e.HasErrorCode(11001) || e.HasErrorCode(12582) || 
			(e.HasErrorCode(16460) && strings.Contains(e.Error(), " E11000 "))
	}
	return false
}

IsDuplicateKeyError would also be one of the standalone helpers added for commonly checked errors:

IsTimeout(error) bool
IsDuplicateKeyError(error) bool

ex.

Var err error
if mongo.IsTimeout(err) {
	 // handle the error
}

Would this solution serve your needs and are there other helpers that would be nice to have?

Alessandro_Sanino · November 19, 2020, 7:29am

Hello @Isabella_Siu and thanks for your work on the golang driver

My first feedback would be: if you plan to use error codes in this way, probably is better to enumerate them uysing golang iota feature

type ErrorCode uint8
const (
    ErrorCodeType1 ErrorCode = iota
    ErrorCodeType2 ErrorCode = iota
)

and change the interface to the following

type ServerError interface {
    error
    HasErrorCode(ErrorCode) bool
    HasErrorLabel(string) bool
    Unwrap() error
}

same reasoning can also be applied to the ErrorLabels, if they can be enumerated in some way, but without iota

type ErrorLabel string

var (
    ErrorLabelType1 ErrorLabel = ErrorLabel("just an example type")
    ErrorLabelType2 ErrorLabel = ErrorLabel("just another example type")
)

hope it helps

There is a whole literature in how to do the IsError “pattern” in golang and usually it’s similar to the way you are proposing to implement, so I have no more feedback.

Just do it for all errors (including for example ErrNoDocuments) to be consistent with current codebase

Again, thanks for your work

TopherGopher · November 20, 2020, 9:54pm

@Isabella_Siu -
This is coming along nicely - thank you so much for sharing the proposal with us.
I love the idea of extending the error types to have .HasErrorCode() and .HasErrorLabel(), but I personally am with @Alessandro_Sanino in that I’d also like iota enumerated types so that performing error checking is really light and fast.

Taking a look at the new IsDup error function, I can see that while this proposal simplifies some logic, at the end of the day, it doesn’t seem to solve the fundamental issue. This line in particular is what’s concerning me (e.HasErrorCode(16460) && strings.Contains(e.Error(), " E11000 ")). If there is one error code in the error type and another code in the string, then it doesn’t seem like it’s simple to determine what the actual error is.

Maybe something that might help is to create an internal map in mongo-go-driver. Extrapolate certain error codes to categories. So rather than having an IsDup function, the error could contain an additional field/accessor, ErrorCategory(), in this case it would be set to an iota int const of DuplicateKey, which is set by the driver when the result comes back from the database (as opposed to having to call it after every insert). I would argue that there are quite a few errors that could be categorized.

Having fine-grained access to the specific code/error is nice, but since there’s no documentation on what the individual error codes mean outside of the golang driver (e.g. DuckDuckGo & Google). Having the low level error code just typically isn’t directly helpful.

Possible map

type ErrorCategory int {
  ErrDuplicateKey iota
  ErrDuplicateIndex
}
errCategoryMap := map[int]int{
  11000: ErrDuplicateKey,
  11001: ErrDuplicateKey,
   ...
}

Isabella_Siu · November 23, 2020, 5:20pm

Hi, @Alessandro_Sanino and @TopherGopher thanks for your feedback!

The reason that the error code and labels aren’t iota enumerated is that they’re parsed from the server errors, which the driver doesn’t have control over, and this allows us the flexibility we need to handle changes in that. The driver API has stricter constraints than the server error codes, for example, the server can stop including an error code fairly easily, but we would be stuck with the iota value until the next major release. If you want to take a look, the server error codes are listed here.

This is why IsDuplicateKeyError ends up having complex logic. Between major server versions the way the server changed how it signals that, so the function needs to catch all the different ways different versions return it. We’re currently only planning to add standalone helpers for error checking that requires more complex logic and not things that can be checked with equality or by calling the HasErrorCode() or HasErrorLabel() functions.

Specifically in regards to the idea of error categories, our feeling is that any category that could be handled that way either is already handled by the server labels and HasErrorLabel() or could be handled with a standalone helper. If you have ideas for any additional error helpers, we’d be happy to consider them.

TopherGopher · December 1, 2020, 9:22pm

The problem comes down to lack of knowledge of the error codes themselves. I googled around for a long time trying to find the reference. I appreciate you linking it.
Now if each error in that document had a corresponding detail - or there was a table on the mongo doc site - it might be more feasible for me to know which all errors to check for given each server version in order to maintain compatibility in a library for both old and new versions of mongo.

Let me give you an example:
How am I as an end user supposed to know things like 16460 isn’t a duplicate key error unless it has E11000 in the string for mongo versions X-Y? 16460 isn’t listed in the document linked (nor is 11001/12582), so what does it mean when I get that code and E11000 isn’t in the string? The IsDup helper helps with this case, but are there other errors where similar operations has to be done?

I see in the linked dictionary table, that there is already the beginnings of a notion of categories. Couldn’t we expand on that? So when an error comes back from the API, that mapping would come into play.

Commonly, errors are used for two purposes - either to make some sort of informed/automatic programmatic decision or to provide a message to a user. Having to check for multiple codes for various common use cases gets cumbersome with larger code bases.
By setting this category using a map on the way in, that avoids all the look-up logic later.

Just to run through a few to give you an idea of what I would love to check for (these names are totally negotiable):

- ErrCategoryBadInputParameter:
    - InvalidLength
    - InvalidBSON
    - InvalidPath
    - InvalidRoleModification
    - InvalidIdField
    - InvalidDBRef
    - InvalidOptions
    - InvalidNamespace
    - InvalidReplicaSetConfig
    - InvalidSyncSource
    - InvalidSSLConfiguration
    - InvalidPipelineOperator
    - InvalidViewDefinition
    - InvalidIndexSpecificationOption
    - InvalidUUID
    - InvalidResumeToken
    - InvalidSeedList
    - InvalidTopologyType
    - InvalidHeartBeatFrequency
    - InvalidServerType

- ErrCategoryConflict:
    - WriteConflict
    - ConflictingOperationInProgress
    - ObjectIsBusy
    - TenantMigrationConflict
    - PreparedTransactionInProgress
    - DuplicateKey

- ErrCategorySpecification:
    - DatabaseDifferCase
    - KeyNotFound
    - IndexOptionsConflict
    - IndexKeySpecsConflict
    - IndexAlreadyExists
    - IndexNotFound
    - NamespaceNotFound

- ErrCategoryTooLarge:
    - ExceededMemoryLimit
    - QueryExceededMemoryLimitNoDiskUseAllowed
    - OutOfDiskSpace
    - BSONObjectTooLarge
    - OBSOLETE_KeyTooLong

ErrCategoryBadConnection:
- DNSHostNotFound
- DNSProtocolError
- NodeNotFound
- ReplicaSetNotFound
- NetworkTimeout

Aborted:
- TransactionExceededLifetimeLimitSeconds
- PeriodicJobIsStopped

TopherGopher · December 2, 2020, 9:52pm

I think a use-case might be helpful.
Say I’m getting this error back from the DB (The Code is 2) when I attempt to run a $push using UpdateOne():

multiple write errors: [{write errors: [{The field 'myArray' must be an array but is of type null in document {_id: ObjectId('5fc80a1650cd0c9ecb7cc8fb')}}]}, {<nil>}]

In this case, I want to programmatically retry using a $set rather than $push and the operation should succeed.
The best way I know how to check for this:

	// Attempt a $push query into a null array - error occurs
	mErr := mongo.WriteException{}
	if isMongoErr := errors.As(err, &mErr); isMongoErr {
		for _, wErr := range mErr.WriteErrors {
			if wErr.Code == 2 && strings.Contains(wErr.Message, "must be an array but is of type null") {
				// Perform a $set rather than $push
			}
		}
	}

Does this new error proposal simplify that logic at all? That’s one of the most common use-cases in our code-base.
I realize I could resolve that particular issue using custom registries instead, but if we use a new registry, there’s a LOT of testing we would need to do, thus this logic instead.

Isabella_Siu · December 3, 2020, 9:28pm

Hi @TopherGopher!

Three more things that are being added to this API are ServerError.HasMessage(string) to check for a message substring, ServerError.HasErrorCodeWithMessage(int, string) to check for a message substring on errors with a specific code, and a standalone IsNetworkError(error) to check if errors are network errors.

The goal of the standalone helpers is to handle all cases where multiple error representations benefit from being linked, so a user wouldn’t be expected to know all the historical ways IsDuplicateKey error could be represented. It’s only Duplicate Key errors right now, but any errors in the future require more complex logic, they would also get helpers.

We don’t want to set up our own go driver error categories because setting up our own categories requires us to maintain them on fairly unstable server error codes. While we’re happy to add standalone helpers for checking for useful groups of errors, I’m not sure of the benefit of these specific categories. Many of the errors in each category wouldn’t occur together, and wouldn’t necessarily want to be handled the same way. I’m also not sure of the benefit of knowing that an error is, for example, a bad input parameter without knowing what’s wrong with which parameter. Do you have a specific use case you’re thinking of for these?

As an additional note, there is a ticket to add the error codes to the documentation, which should make them easier to find, but as of right now, that server file is the best source.

For the case that you’re describing, you could instead do:

 if mErr, ok := err.(mongo.ServerError); ok {
     	If mErr.HasErrorCodeWithMessage(2, "must be an array but is of type null") {
     		// Perform a $set rather than $push
     	}
 }

For your specific use case, the issue is that nil slices get marshaled to BSON null values, so myArray in your collection may either be an array or null. You could avoid mixing null and array by using a custom registry or aggregation updates. To work around this with a custom registry, you would use a registry with a SliceCodec with SetEncodeNilAsArray(true). Alternatively, to work around this with aggregation updates, you would use aggregation to conditionally append or set.

For example, on a collection that contains {x: null} and {x: [1,2]}, you could append or set x to [3] with the following:

 var pipeline = mongo.Pipeline{
     bson.D{
 		{"$set", bson.D{
 			{"x", bson.D{
 	    		{"$concatArrays", bson.A{
 	    	        bson.D{{"$ifNull", bson.A{"$x", bson.A{}}}},
 	    	        bson.A{3},
 	            }},
 	        }},
 	    }},
 	  },
 }
 res, err := coll.UpdateMany(context.Background(), bson.D{}, pipeline)

TopherGopher · December 3, 2020, 10:35pm

Heya back @Isabella_Siu -
You’re right, HasErrorCodeWithMessage definitely simplifies this logic, but question - will the code always be 2 across all mongo server versions for a nil insert? Is it a different code say if I try to $push to a datetime for example? How can I as a developer go about figuring that out? If I could answer the question of “have I handled all possible incarnations of this error?” then I would have more confidence in the error handling.

Those were just example groupings and could easily be changed, but let me walk you through my thoughts on general handling of a few of those categories and maybe you’ll follow where I’m coming from:

Things in ErrCategoryBadInputParameter, in our code base, we would completely give up, not retry. A user needs to input something new or a developer needs to change a value - the queries will not succeed (I think). If someone wanted more fine-grained control, they could still check for the underlying error rather than using a category.
ErrCategoryConflict - maybe remove DuplicateKey from there, but I believe that all the others might succeed if retried, correct? I could be wrong there.
ErrCategorySpecification would imply me dropping or renaming a value
ErrCategoryBadConnection I would probably do a retry with exponential back-off in an error handler.

Maybe I don’t understand the meaning behind the errors themselves, which is possibly why I’m grouping them together incorrectly, but maybe that just demonstrates that it’s hard to find documentation and understand what these errors mean, and therefore, how to handle them.

Ahhh - I didn’t know that nil helper made it into master yet - I’ve been using the old godoc site rather than gopkg, which doesn’t pick up that package, but I found it now on the new doc site. I see the SliceCodec you’re talking about and the Array option - very cool to know that’s available now

Unfortunately, if we update the registry, then that change will affect literally hundreds of queries, thus the reason we’re gun-shy to use a registry. Makes us nervous to do something widespread in our library without a lot of testing and we have not had time for that lately.

That aggregation query is really nifty and exactly what we’re after. I’ll have to give it a go. I truly appreciate you crafting that because it’s something we’ve been struggling with for a while.

Isabella_Siu · December 8, 2020, 10:58pm

Hi @TopherGopher !

To clarify, it’s not common for error codes to change between server versions, and if it happens with any more error codes, we will add helpers for them. We do agree that it should be easier to check what error codes correspond to what behavior, and you can follow that ticket here. For better or worse, testing a scenario with a live server is the most accurate way to identify error codes. The server, not the driver, controls what error codes are returned for what operations.

Overall, we, as the driver, can’t provide error categories for all the errors because we don’t necessarily know all of them. Having the groupings would somewhat imply that users would sometimes want to handle all the errors in a grouping the same, and as a lot of these errors wouldn’t be handled programmatically, that isn’t true for many of them. For example the things in BadInputParameter and Specification, would be fixed by changing the actual code, where knowing what category of error it is doesn’t provide any additional benefit. Errors that are because of bad connections will be grouped together with IsNetworkError helper, so those should be handled. Though it should be noted that the driver will retry on its own for certain errors that we know benefit from being retried, including network errors.

And you’re welcome! I’m glad that that was helpful.

TopherGopher · December 9, 2020, 11:56pm

After chatting it out with you and given that ticket ^ for making errors easier, all of the helpers your team has proposed will be perfect (imho).
IsTimeout(), IsNetworkError() and IsDuplicateKeyError() are all awesome sounding.

The idea that all the mongo errors will conform to a common mongo.ServerError interface rocks. I think HasErrorLabel and HasErrorCode paired with that common interface will radically simplify our error checking hunks. Thank you for this thoughtful design and help in understanding it.

Dustin_Currie · December 15, 2020, 2:11am

This all looks great. Thanks for all the hardwork! The one improvement I’d suggest is producing the error code and label from the interface.

type ServerError interface {
  error
  ErrorCode() int
  ErrorLabel() string
  Unwrap() error
}

The helper methods will be what most people use. But in special situations, no one is going to loop through all the possible error codes/labels to see which one the error “has”, they will end up parsing the string, which is a no-no. Better to just produce the error and label.

Dustin_Currie · December 15, 2020, 2:12am

Also, Dave Cheney would suggest making this interface private and just having public helper methods.

https://dave.cheney.net/2016/04/27/dont-just-check-errors-handle-them-gracefully

But I do remember that the reasons for this were a bit arcane.

Isabella_Siu · December 17, 2020, 4:41pm

Hi @Dustin_Currie!

The reason we didn’t decide to do return functions is that the errors that ServerError wraps often have multiple labels and multiple codes, for example, a WriteException can contain multiple write errors, each with their own code. When users are trying to programmatically handle errors, they usually have something specific in mind, and the “has” functions allow us to abstract away checking the internals. If you just want to see what the error is, it’s best to print it.

While the idea of just having public helpers is interesting, the current interface design is set up to echo the existing design choices in the go library. Though I agree, users will likely use the standalone helpers much more than the interface.

Dustin_Currie · December 17, 2020, 7:46pm

That makes sense. The current strategy looks good to me. After reminding myself of the suggestions in that Dave Cheney post, I think the way you guys have done it is far more clear. Thanks!