Slow, Poor performance

Chris_Swainson · May 24, 2023, 10:00pm

I have per second data in a time series collection like this:

TimeStamp:2023-05-10T22:34:52.875+00:00
Heater2DutyC:0
CellPressure:0
_id: ObjectID('645c1e066b393fd1d7f1e9f8')
etc

I’m pulling data out using aggregating bucket like this:

BsonElement

{
  "$bucket": {
    "groupBy": "$TimeStamp",
    "boundaries": [
      ISODate("2023-05-24T21:44:09.222Z"),
      ISODate("2023-05-24T21:44:10.222Z"),
      ISODate("2023-05-24T21:44:11.222Z"),
      ISODate("2023-05-24T21:44:12.222Z"),
      ISODate("2023-05-24T21:44:13.222Z"),
      ISODate("2023-05-24T21:44:14.222Z"),
      ISODate("2023-05-24T21:44:15.222Z"),
      ISODate("2023-05-24T21:44:16.222Z"),
      ISODate("2023-05-24T21:44:17.222Z"),
      ISODate("2023-05-24T21:44:18.222Z"),
      ISODate("2023-05-24T21:44:19.222Z"),
      ISODate("2023-05-24T21:44:20.222Z"),
      ISODate("2023-05-24T21:44:21.222Z"),
      ISODate("2023-05-24T21:44:22.222Z"),
      ISODate("2023-05-24T21:44:23.222Z"),
      ISODate("2023-05-24T21:44:24.222Z"),
      ISODate("2023-05-24T21:44:25.222Z"),
      ISODate("2023-05-24T21:44:26.222Z"),
      ISODate("2023-05-24T21:44:27.222Z"),
      ISODate("2023-05-24T21:44:28.222Z"),
      ISODate("2023-05-24T21:44:29.222Z"),
      ISODate("2023-05-24T21:44:30.222Z"),
      ISODate("2023-05-24T21:44:31.222Z"),
      ISODate("2023-05-24T21:44:32.222Z"),
      ISODate("2023-05-24T21:44:33.222Z"),
      ISODate("2023-05-24T21:44:34.222Z"),
      ISODate("2023-05-24T21:44:35.222Z"),
      ISODate("2023-05-24T21:44:36.222Z"),
      ISODate("2023-05-24T21:44:37.222Z"),
      ISODate("2023-05-24T21:44:38.222Z"),
      ISODate("2023-05-24T21:44:39.222Z"),
      ISODate("2023-05-24T21:44:40.222Z"),
      ISODate("2023-05-24T21:44:41.222Z"),
      ISODate("2023-05-24T21:44:42.222Z"),
      ISODate("2023-05-24T21:44:43.222Z"),
      ISODate("2023-05-24T21:44:44.222Z"),
      ISODate("2023-05-24T21:44:45.222Z"),
      ISODate("2023-05-24T21:44:46.222Z"),
      ISODate("2023-05-24T21:44:47.222Z"),
      ISODate("2023-05-24T21:44:48.222Z"),
      ISODate("2023-05-24T21:44:49.222Z"),
      ISODate("2023-05-24T21:44:50.222Z"),
      ISODate("2023-05-24T21:44:51.222Z"),
      ISODate("2023-05-24T21:44:52.222Z"),
      ISODate("2023-05-24T21:44:53.222Z"),
      ISODate("2023-05-24T21:44:54.222Z"),
      ISODate("2023-05-24T21:44:55.222Z"),
      ISODate("2023-05-24T21:44:56.222Z"),
      ISODate("2023-05-24T21:44:57.222Z"),
      ISODate("2023-05-24T21:44:58.222Z"),
      ISODate("2023-05-24T21:44:59.222Z"),
      ISODate("2023-05-24T21:45:00.222Z"),
      ISODate("2023-05-24T21:45:01.222Z"),
      ISODate("2023-05-24T21:45:02.222Z"),
      ISODate("2023-05-24T21:45:03.222Z"),
      ISODate("2023-05-24T21:45:04.222Z"),
      ISODate("2023-05-24T21:45:05.222Z"),
      ISODate("2023-05-24T21:45:06.222Z"),
      ISODate("2023-05-24T21:45:07.222Z"),
      ISODate("2023-05-24T21:45:08.222Z"),
      ISODate("2023-05-24T21:45:09.222Z"),
      ISODate("2023-05-24T21:45:10.222Z")
    ],
    "default": "overflow",
    "output": {
      "count": {
        "$sum": 1
      },
      "Average": {
        "$avg": "$ConcentrationNO2"
      },
      "TimeStamp": {
        "$push": "$TimeStamp"
      }
    }
  }
}

Except I can have up to 400 aggregations.

The problem is that in tests (I’m hoping to switch from MS SQL Server) the data retrieval is slower in MongoDB than in SQL Server. This is especially noticeable as the amount of aggregation increases.

My MongoDB Collection is defined as a time series with second intervals (data is in second intervals ± a few ms).

{
  "name": "crdsperiodic",
  "type": "timeseries",
  "options": {
    "expireAfterSeconds": 15552000,
    "timeseries": {
      "timeField": "TimeStamp",
      "granularity": "seconds",
      "bucketMaxSpanSeconds": 3600
    }
  }
}

Any idea what’s going on here?

Chris

Kushagra_Kesav · May 25, 2023, 12:46pm

Hello @Chris_Swainson,

Thank you for reaching out to the MongoDB Community forums!

Based on the aggregation pipeline you shared, it seems that you are trying to bucket the data with consecutive second intervals. However, to better understand the issue, could you please provide more details about your requirements and the expected output result?

It would be helpful if you could share the expected output example document. Additionally, please let us know which version of MongoDB you are using and the deployment environment (e.g., on-prem, MongoDB Atlas, or local).

Regarding your statement about having up to 400 aggregations, could you please clarify what exactly you mean? Any additional context or examples would be helpful.

Could you please give an example of the scenario that you have in mind?

Also, it would be helpful to know the volume of data you are working with and any indexes you have set up.

To read more about the Time Series, please refer to the following resources:

Best regards,
Kushagra

Chris_Swainson · May 25, 2023, 4:15pm

Hi Kushagra,

Thanks for getting back to me promptly.

Version:

{
  version: '6.0.5',
  gitVersion: 'c9a99c120371d4d4c52cbb15dac34a36ce8d3b1d',
  targetMinOS: 'Windows 7/Windows Server 2008 R2',
  modules: [],
  allocator: 'tcmalloc',
  javascriptEngine: 'mozjs',
  sysInfo: 'deprecated',
  versionArray: [ 6, 0, 5, 0 ],
  openssl: { running: 'Windows SChannel' },
  buildEnvironment: {
    distmod: 'windows',
    distarch: 'x86_64',
    cc: 'cl: Microsoft (R) C/C++ Optimizing Compiler Version 19.31.31107 for x64',
    ccflags: '/nologo /WX /FImongo/platform/basic.h /fp:strict /EHsc /W3 /wd4068 /wd4244 /wd4267 /wd4290 /wd4351 /wd4355 /wd4373 /wd4800 /wd4251 /wd4291 /we4013 /we4099 /we4930 /errorReport:none /MD /O2 /Oy- /bigobj /utf-8 /permissive- /Zc:__cplusplus /Zc:sizedDealloc /volatile:iso /diagnostics:caret /std:c++17 /Gw /Gy /Zc:inline',
    cxx: 'cl: Microsoft (R) C/C++ Optimizing Compiler Version 19.31.31107 for x64',
    cxxflags: '/TP',
    linkflags: '/nologo /DEBUG /INCREMENTAL:NO /LARGEADDRESSAWARE /OPT:REF',
    target_arch: 'x86_64',
    target_os: 'windows',
    cppdefines: 'SAFEINT_USE_INTRINSICS 0 PCRE_STATIC NDEBUG BOOST_ALL_NO_LIB _UNICODE UNICODE _SILENCE_CXX17_ALLOCATOR_VOID_DEPRECATION_WARNING _SILENCE_CXX17_OLD_ALLOCATOR_MEMBERS_DEPRECATION_WARNING _SILENCE_CXX17_CODECVT_HEADER_DEPRECATION_WARNING _SILENCE_ALL_CXX20_DEPRECATION_WARNINGS _CONSOLE _CRT_SECURE_NO_WARNINGS _ENABLE_EXTENDED_ALIGNED_STORAGE _SCL_SECURE_NO_WARNINGS _WIN32_WINNT 0x0A00 BOOST_USE_WINAPI_VERSION 0x0A00 NTDDI_VERSION 0x0A000000 BOOST_THREAD_VERSION 5 BOOST_THREAD_USES_DATETIME BOOST_SYSTEM_NO_DEPRECATED BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS BOOST_ENABLE_ASSERT_DEBUG_HANDLER BOOST_LOG_NO_SHORTHAND_NAMES BOOST_LOG_USE_NATIVE_SYSLOG BOOST_LOG_WITHOUT_THREAD_ATTR ABSL_FORCE_ALIGNED_ACCESS'
  },
  bits: 64,
  debug: false,
  maxBsonObjectSize: 16777216,
  storageEngines: [ 'devnull', 'ephemeralForTest', 'wiredTiger' ],
  ok: 1
}

It’s running locally. Same machine as SQL.

Basically I’m storing per second time series data. I have to pull out data for different time periods. Sometimes up to one week but normally just a few hours. However only ever require a maximum of 400 data points. Thus if the timespan is 4000 seconds each retrieved value becomes an average of 10 seconds. If the timespan is 40000 seconds then each data point becomes an average of 100 seconds etc. Note that if the timespan is 1 minute then 60 per second data points are pulled out (as it’s 400 max). In SQL I had SQL code that did all these averaging aggregations. Also note that in SQL Server I was storing everything in one table.

As for indexes. This is all I have:

"timeseries": {
      "timeField": "TimeStamp",

I’ve looked through the documentation but nothing really strikes me as being helpful.

I hope that further info helps.

Chris

Kushagra_Kesav · May 26, 2023, 6:19am

Hello @Chris_Swainson,

Thank you for providing the information about your current setup. However, to better understand the problem and provide relevant suggestions, It would be helpful if you could share the schema or table structure of your SQL database, as well as some example data since apparently you’re more familiar with the SQL solution and less familiar with MongoDB.

Additionally, it would be helpful if you could provide the SQL query you are currently using to retrieve the desired results.

Having this information will allow us to understand your issue more effectively and provide you with appropriate recommendations.

Best regards,
Kushagra

Chris_Swainson · May 26, 2023, 6:05pm

Sure, here is the table:

CREATE TABLE [dbo].[HighFreq](
	[TimeStamp] [datetime] NOT NULL,
	[Heater1Temp] [float] NULL,
	[Heater2Temp] [float] NULL,
	[Heater1DutyC] [float] NULL,
	[Heater2DutyC] [float] NULL,
	[ChassisPressure] [float] NULL
	--more variables here following same pattern
PRIMARY KEY CLUSTERED 
(
	[TimeStamp] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO

This is a SQL query that does the averaging I require.

SQL = "SELECT Average,Timestamp FROM( " +
    "SELECT " +
    "ROW_NUMBER() OVER( " +
           "ORDER BY [TimeStamp]) as Rownum, " +
    $"AVG({sVariable}) " +
    $"OVER(ORDER BY[TimeStamp] ROWS BETWEEN {mod / 2} PRECEDING AND {mod / 2} FOLLOWING) " +
    "AS Average, [TimeStamp] FROM[Aether].[dbo].[HighFreq] WHERE[TimeStamp] " +
    $">= '{tmpStartDate}' AND[TimeStamp] <= '{tmpEndDate}' " +
    ") as" +
    "firstQuery " +
    "WHERE " +
    $"Rownum % {mod} = 0 " +
    "ORDER BY TimeStamp DESC";

Also I should note in Mongo I’m doing two sortings. I’m wondering on time series data stored sequentially if these are really required:

var matchStage = new BsonDocument
{
    {
        "$match", new BsonDocument
        {
            {"TimeStamp", new BsonDocument
                {
                    {"$gte", new DateTime(2023, 1, 1, 0, 0, 0, DateTimeKind.Utc)},
                    {"$lte", new DateTime(2023, 3, 31, 23, 59, 59, DateTimeKind.Utc)}
                }
            }
        }
    }
};

var sortStage = new BsonDocument
{
    {
        "$sort", new BsonDocument
        {
            {"TimeStamp", 1}
        }
    }
};

var bucketStage = new BsonDocument
{
    {
        "$bucket", new BsonDocument
        {
            {"groupBy", "$TimeStamp"},
            {"boundaries", new BsonArray(GetBoundaries(X))},
            {"default", "overflow"},
            {
                "output", new BsonDocument
                {
                    {"count", new BsonDocument {{"$sum", 1}}},
                    {"Average", new BsonDocument {{"$avg", "$Heater1Temp"}}},
                    {"TimeStamp", new BsonDocument {{"$push", "$TimeStamp"}}}
                }
            }
        }
    }
};

var projectStage = new BsonDocument
{
    {
        "$project", new BsonDocument
        {
            {"_id", 0},
            {"TimeStamp", new BsonDocument {{"$first", "$TimeStamp"}}},
            {"Average", 1}
        }
    }
};

var finalSortStage = new BsonDocument
{
    {
        "$sort", new BsonDocument
        {
            {"TimeStamp", -1}
        }
    }
};

var pipeline = new[]
{
    matchStage,
    sortStage,
    bucketStage,
    projectStage,
    finalSortStage
};

var result = collection.Aggregate<BsonDocument>(pipeline).ToList();

Hope that helps.

Chris