Hey everyone! I have been lurking the forums for a bit of time and am curious what some of the best ways to go about troubleshooting Device sync or triggers are because I’ve been trying some new things at my job using device sync and I get weird errors or problems and I’d like to know some more wholesale ways to tackle and go about troubleshooting problems. The documentation doesn’t have anything for troubleshooting problems and errors so any advice for me to learn to be more independent with this product line would be greatly appreciated! Thank you in advance!!!
EDIT NOTE: - Everyone is going to have a different approach, these are just some approaches that worked for me, also note that some of the “extra” I always did, is something that just I did, if others in TS do it too, that’s awesome and amazing. But this shouldn’t be an expectation put on the TS department, I just liked to go the extra mile when I troubleshoot and resolve issues.
There’s a lot more detail in my blog for troubleshooting problems with Realm/Device Sync/App Services, as well as more details in something else I’m writing. But the below is just stuff I do and did to aid the process.
@UrDataGirl is there any specific type of issue you’d like addressed in how to troubleshoot?
So this comes from 2 years on Technical Services side, this is how I personally went about troubleshooting various issues:
- Triggers and Functions
– I would copy and paste the trigger and function and modify the variables and parameters to fit my test clusters, and see specifically what they are trying to do with it, and identify its actual behavior. From there I’d look at what the expected result vs actual result was and debug the scripts accordingly.
– For Authentications (Bread and butter for me for 12+ years handling that sort of stuff in InfoSecDevOps work.) I would verify the certificates being called, headers in place, etc. Sometimes you get wacky stuff when a third party service was concerned.
– Remember that functions largely are written in Node.JS 10, so verify and make sure the function you’re calling and so on is supported in Node.JS 10 to start with, or you’ll waste a lot of time debugging what should work, but doesn’t because you’re using the wrong version. Recently this was found to be a leading cause of say JWT authentication because Node.JS 10 isn’t supported for JWT 9, only JWT 8 works with Atlas functions right now so you have to look for things like that.
– Libraries that should be supported don’t work that great, such as NodeFetch, you can’t use NodeFetch 3, only 2 will work and even then it’s terribly slow, as is Axios. So to troubleshoot libraries and the like you really have to play a wack o mole sometimes to find what works best. It’s usually a lot of guesswork and testing to find the best formula of what works for your use case and is efficient.
Logs are amazing, the more logs the better.
– My coworkers would use Splunk and other services to look at peoples stuff, but I usually cheated, because constantly doing all of the queries over and over just felt stupid to me, so I automated queries and outputs. I would route the customer splunk logs (as a customer you can route your logs with the data api) to a dashboard I built that funneled results to an Atlas test cluster I had, and from there I just put in the customers Realm App ID from the URL of the app and populate their logs. And then index the error messages and then just do a quick sort to separate all the errors that were the same thing etc. Then forward to another tab in my dashboard that would show the data trends for the errors and see what exactly things were looking like, then compare to the Atlas system metrics.
– For you on the above you can do the following:
— Run the Data API to forward logs to a central collection in your Atlas cluster, then from there connect the BI connector to Tableau or Flask etc. and run it to a dashboard and run the data models etc. the sorts and so on. for very similar effect, the only issue is Splunk Logs support sees may have better details than what you may see, so any errors that are unclear or not making sense definitely do open a ticket for support to look at.
Device Sync Issues
– Pending on the error message and behavior, I would look at the core issue, for say iOS issues, I would always ask for TestFlight logs whenever possible. etc. Crashalytics and TestFlight were so much better in details of what is happening in an app than the Device Sync logs are for anything client side. Those are almost always gold.
– Having language specific knowledge for your SDK choice is a must, (Like me, I know 11 languages and very fluent in 5 of them) otherwise you’re going to just look a fool when the issue can be a functional problem with the code itself, and you’re not seeing it because you don’t know the language.
— Easy case example:
---- I had a Swift SDK issue that was at the point the customer was about to abandon the product because they couldn’t make it work without corruption issues, etc. And odd errors coming up, the way I solved/resolved a lot of the issues was by identifying it wasn’t any one issue, but multiple issues. Resource conflicts for dependencies, mismatching dependencies needed from one package to the next etc. And then threading issues with the pointers.
---- Knocking things out one at a time, was far above and beyond what anyone in TS is required to do, don’t mention this on any surveys etc. TS doesn’t care for the recognition or needs it, they just want to help you be successful with your app. (And they’ll get scolded for it. not like I care tbh, because I did my job and the customer is still a customer and is happy with Realm, I still talk to them as they found me on LinkedIn) But the resolution just came down to threading and timing services to share and engage the resources as they were needed and isolating dependency versions between services that needed one version vs another version. Then after that all the problems associated with Realm crashing etc. (Device Sync today) went away. But this was all because of my knowledge of Swift it was even possible to help the customer in this example.
Look at the WHOLE environment.
– Don’t be a fool and waste time with tunnel vision on one thing, because that issue can just be a symptom of other issues.
— Great case example, I worked with a customer who had issues with Axios for the better part of almost 2 years that no one had actually addressed root cause. Was found to be performance issues with Axios, by moving to a different service they gained the functionality and speeds they were desiring. That wouldn’t have been possible to determine, or find had I not took a step back and walked through their entire environment with services to map out what is supposed to do what, and how everything was connected in their environment.
– Look at all dependencies, all SDKs and all APIs in use for the ecosystem that Atlas or Realm is interacting with, in fact the issue may not even be related to Device Sync, and may be what’s up in Atlas. Or it could be an issue with something not even related to either.
— Great case example:
----I handled a customer who was using PostGRE SQL that was routing data to production systems running CNC machines etc. It would then forward data to a middleware translation service that converted JSON to the appropriate file types the CNC machines were using and then would convert back the results to JSON and so on. Without knowing this service existed, it would have been impossible to determine why the Realm data on the clients were getting corrupted data displays, I spent 2 weeks with the customer walking them through rebuilding the middleware service so that the Device Sync App collecting the data and controlling the CNC machines was properly taking in the data from the middleware service.
----- This goes back to knowing the language of the SDK, and taking a step back to see the WHOLE environment, not just focusing on the one part. The amount of engineering resources that would have been wasted on something that would have been enormous had the middleware not been given a deeper look.
@UrDataGirl Another thing is USE CASE
- Don’t use Realm(Device Sync) for anything server side, only use it client side. Easy rule of thumb, if it’s a Driver, it’s backend. If it’s an SDK, it’s client end.
–Great case example, Node.JS SDK vs Node.JS Driver.
— SDK needs to open and close connections for each thing it does
— Driver does not open and close and connection for each thing it does.
----The behaviors described cause very different functionalities when you’re engaging an application both on a security standpoint, and raw performance standpoint. These services are separated for very strong reason, and should be kept to - SDK == Client Side, and Driver==Back End.
Assessing whether the use case fits the product is a big deal, this is actually the very first thing I consider when I look at a Relam issue, is whether or not it fits Device Syncs use case criteria, or if it should be a Core/Atlas use case situation. You’d be surprised how many people mix the two for the incorrect use case. Several times I’ve worked with customers for instance to migrate from a MongoDB Driver, to a Realm SDK, and vice-versa because they are trying to use the service for the wrong thing, and education on which for what is ambiguous at best in the literatures. So always, always verify the use case meets the criteria associated to client vs backend, and whether or not SDK or Driver is in play.
These are just some examples and case examples, but these are main things that I personally do and have done to troubleshoot Realm(Device Sync). If you would like, you’re welcome to present specific issues you’re experiencing and I am more than happy to walk you through how we can troubleshoot it.
Is its own troubleshooting and support category on its own, you’re going to have to do a lot of Wack O Mole tactics to determine what will make your GraphQL work, or whether you need to spin up an Apollo GraphQL server and have that navigate your GraphQL stuff.
For instance GeoJSON is supported by MongoDB, and it’s supported by Apollo GraphQL, but it’s not support in Atlas GraphQL, or Realm.
And Atlas GraphQL doesn’t support custom scalars, so you can’t use enum scalars either. So when you use GraohQL in Atlas you need to not only understand GraphQL, but you need to take the time and get acquainted with the limitations posed with Atlas GraphQL services and what needs to be implemented by third party services.
Device Sync, Realm, and the Apollo GraphQL mobile clients can all work together on the mobile device just fine, and interface between Atlas and an Apollo GraphQL server all together very, very well. And the GraphQL Client is very performant so you’re not really causing much tech debt with it at all if you know how to use GraphQL.
But that’s something to consider too based on whatever you’re troubleshooting, is what’s a limitation and what isn’t.
@UrDataGirl In response to your below questions.
I’m always down for extra work, and GeoJSON is common for graphic coordinate data.
The main reason of using and choosing MongoDB, particularly 5.0 and above is the time-series data support it offers, having Realm not support GeoJSON is crippling for people who want to use Atlas for the mobile apps and use GeoJSON for all sorts of use cases. Transportation apps, delivery apps, having to plot coordinates for getting to a particular location or geottracking of an asset.
Lots of use cases, but generally the typical work around is just implementing the Apollo GraphQL client into your app, and translating whatever you need to React.Native if not already a React app etc. And just connecting MongoDB to an Apollo GraphQL server and you get all of the time series data and the link to your mobile app.
I actually ran this through in an interview a few weeks ago for a trucking fleet to prove concept. I failed the test due to running out of time, but still finished it 38 minutes pass the timeline (interviewer wanted to see how it works and how to do this.)
Realm handled everything else in the app except for the time series data and the GeoJSON. That was all routed from Atlas to Apollo spun up in AWS. If I didn’t have to setup that extra stuff I would have made time, but anyways, that’s the general thing to it.
As far as JWT SSO troubleshooting the most common issue lately, is the fact Functions are Node 10 like I mentioned above, make sure you’re not using JWT 9 because it’s not supported right now, otherwise verify your certificate and headers with your other services.
You can search my account on here and find my AAD and IAM tutorials for interfacing Realm with them and see if those help. If you still have problems definitely let me know and I don’t mind jumping on a discord call or something, you’re always welcome to ping me.
Regarding the dashboard stuff, yeah definitely it’ll all work. The Data API makes log forwarding really easy and simple, I especially like routing via Python as I have a bunch of things built with Python that goes to TKinter etc. with a lot of things handled with Pandas and Tensorflow to make stuff much easier to find and determine.
If you want I can give you a general breakdown anytime how to implement a dashboard like that, the Data API can also interface with Splunk if your company or agency uses it, too.
@UrDataGirl In response to How to troubleshoot Device Sync? - #7 by UrDataGirl
They didn’t ask for it, I mentioned it several times in meetings and no one cared to want it. Even in screen sharing showing how easy it was to just click buttons and get the outputs from then, nobody wanted the dashboard for themselves. -Insert shrugs here-
@UrDataGirl regarding How to troubleshoot Device Sync? - #8 by UrDataGirl
It’s a way of getting around post limits but that’s fine, if you want later we can probably just build an example on out tonight if you’d like? JLMK. But yeah, if you’d like to move this over there for a more fluid conversation you’re welcome to message anytime.
Thank you for such a great response! Hey do you do consulting per chance? Can I message you on the other place? (^_^) and one error I have is a trigger keeps failing and turning off that handles my sso using jwt for my mobile app. How would you troubleshoot that and what would you use geojson for? Thx in advance!
Can you also walk thru more of this dashboard to troubleshoot like can we run it to a NOC or SOC?
Ok yeah that might be it I’m on jwt 9 so I just have to downgrade it I guess? I’ll work on that and maybe reach out to you but thank you so much for your information because it’s hard to find straight forward stuff like this.
Hey wait why didn’t your coworkers use the dashboard too? @Brock
Editing posts instead of replying is weird but I like weird haha. I’m baffled that nobody else wants something like that but I’m going to reach out in discord and move this over thank you again! @Brock