We have a field in mongo which is stored in old data as a string and new data as a uint, so I attempted to support that pattern by using UnmarshalBSONValue as such:
type MachineScoreType uint8
func (m *MachineScoreType) UnmarshalBSONValue(t bsontype.Type, b []byte) (err error) {
if len(b) == 0 {
*m = NOT_SET
return nil
}
if len(b) == 5 && b[0] == 1 && b[1] == 0 && b[2] == 0 && b[3] == 0 {
// Empty string case - bizarre what mongo passes us, but this is the empty string
*m = NOT_SET
return nil
}
if (b[0] == 0x0003 || b[0] == 0x000a || b[0] == 0x000b || b[0] == 0x000c || b[0] == 0x000d || b[0] == 0x000e) && len(b) > 4 {
// If this is a string machine score, it will be proceeded by a few NUL/whitespace chars
// e.g. ' offTopicScore'
// It also has a trailing space
// bytes hex before trimming: 0E0000006F6666546F70696353636F726500
s := string(b[4 : len(b)-1])
*m = MachineScoreTypeFromString(s)
} else {
// Otherwise, this is a uint8
*m = MachineScoreType(binary.LittleEndian.Uint16(b))
}
// Now we should have a uint8 - check to see if it's valid
if !m.IsValid() {
*m = INVALID_SCORE
return errors.WithStack(fmt.Errorf("'%s' (%X) is not a valid score type. valid choices: [%s]", string(b), b, strings.Join(machineScoreStrings, ", ")))
}
return nil
}
This works, but the fact that there are a range of control characters before a string (e.g. it could be 0x000a-0x000e) is confusing. With UnmarshalJSON
, a string is prefixed and suffixed with a "
.
First question - can you help me understand why there are different NUL/control chars before a string?
Second and most important question - how should I be doing this in a more standard way? Decoding either as an uint8 or a string? (note - I realize mongo stores uint16 rather than uint8, but just want to be clear about what we’re ultimately trying to get from the function.