Make the MongoDB docs better! We value your opinion. Share your feedback for a chance to win $100.
Click here >
Docs Menu
Docs Home
/

AML Network Analysis with $graphLookup

Use cases: Single View

Industries: Financial Services

Products: MongoDB Atlas

Financial criminals operate in networks. Money laundering, fraud rings, and sanctions evasion rely on webs of shell companies, proxy directors, and layered transaction paths. Compliance teams need to map and analyze the relationship network around a suspect entity.

Traditional approaches run into two bottlenecks:

  • Expensive queries: Relational databases use recursive Common Table Expressions (CTEs). These queries become expensive beyond two or three hops.

  • Constant network round-trips: Client-side graph processing pulls data into the application layer. The client runs a new query for each hop.

In this solution, you:

  • Build a network analysis engine that uses MongoDB operators to avoid these bottlenecks.

  • Run multi-hop graph traversal, shortest-path detection, centrality scoring, community detection, and risk propagation directly inside MongoDB’s aggregation framework.

This solution focuses on the network analysis engine and the MongoDB operators that power it. To implement the overall architecture with MongoDB Atlas, FastAPI, and Next.js, also follow the previous Solutions Library, Financial Crime Mitigation.

  • Store entities and relationships in MongoDB alongside search indexes, vector embeddings, and change streams.

  • Avoid synchronizing data between two systems, managing extra infrastructure, and incurring cross-system latency because you keep graph workloads in MongoDB.

  • Use MongoDB’s $graphLookup operator to run breadth-first search with standard indexed queries at each hop, so performance depends on your indexes instead of a separate graph engine.

  • For the 1-to-5-hop traversals common in compliance investigations, this approach matches dedicated graph databases while querying the same data your application already reads and writes.

Note

In compliance investigations, you typically trace a few steps out from a subject. Immediate counterparties and first-layer intermediaries sit 1–2 hops away, while shell structures and money-movement paths usually fall within 3–5 hops. Beyond that, links grow less meaningful and harder to explain, so most practical investigations stay in this 1–5-hop range.

The network analysis engine sits between the FastAPI service layer and MongoDB Atlas and processes graph operations through the aggregation pipeline.

Agentic Investigation Pipeline
click to enlarge

Figure 1. Agentic Investigation Pipeline

Dual $graphLookup Pattern

Money laundering networks involve bidirectional flows. An entity can source funds in one relationship and receive them in another.

A single $graphLookup traverses edges in one direction only. This solution executes two parallel lookups — forward and reverse — and merges the results into a unified network graph.

pipeline = [
{"$match": {"entityId": center_entity_id}},
# Forward traversal: follow source → target edges
{"$graphLookup": {
"from": "relationships",
"startWith": "$entityId",
"connectFromField": "target.entityId",
"connectToField": "source.entityId",
"as": "forward_relationships",
"maxDepth": max_depth - 1,
"restrictSearchWithMatch": {
"active": True,
"confidence": {"$gte": min_confidence}
}
}},
# Reverse traversal: follow target → source edges
{"$graphLookup": {
"from": "relationships",
"startWith": "$entityId",
"connectFromField": "source.entityId",
"connectToField": "target.entityId",
"as": "reverse_relationships",
"maxDepth": max_depth - 1,
"restrictSearchWithMatch": {
"active": True,
"confidence": {"$gte": min_confidence}
}
}},
# Merge both directions
{"$project": {
"entityId": 1,
"all_relationships": {
"$concatArrays": [
"$forward_relationships",
"$reverse_relationships"
]
}
}},
{"$unwind": "$all_relationships"},
{"$replaceRoot": {"newRoot": "$all_relationships"}},
{"$limit": max_relationships}
]
Dual $graphLookup — Bidirectional Network Discovery
click to enlarge

Figure 2. Dual $graphLookup — Bidirectional Network Discovery

The application receives the complete network graph in a single round-trip. Both traversals run in one aggregation pipeline, which removes the N+1 query problem of issuing a separate follow-up query for each document returned in the initial query.

How $graphLookup Traverses the Graph

$graphLookup performs a breadth-first search (BFS) in discrete waves:

  • Seed: Evaluate startWith against the input document. Array values seed the frontier simultaneously (multi-root BFS).

  • Query: Construct { connectToField: { $in: [frontier_values] }}, merged with any restrictSearchWithMatch filter. Execute as a standard indexed query against the from collection.

  • Expand: For each matched document not in the visited set, add it to results, extract connectFromField values, and push them into the next frontier.

  • Repeat: Increment depth. Return to step 2 until the frontier empties or maxDepth is reached.

  • Assemble: Place all accumulated documents into an array under the as field.

Cycle detection is automatic. An internal visited set prevents infinite loops in cyclic graphs (A → B → C → A), which are common in money laundering structures. Each document appears in results exactly once.

The restrictSearchWithMatch Advantage

restrictSearchWithMatch pushes filter criteria into the traversal itself, not as a post-filter. MongoDB prunes dead branches during traversal rather than discovering the full graph and filtering afterward. For large networks, this can reduce the working set by an order of magnitude.

Filtering During Traversal vs. Post-Filtering
click to enlarge

Figure 3. Filtering During Traversal vs. Post-Filtering

The solution uses two collections: entities and relationships. They follow an adjacency list pattern. Edge metadata (confidence, evidence, verification status) lives as first-class fields on the relationship document.

Relationship Schema

{
"relationshipId": "REL_8910",
"source": {
"entityId": "ENT_123",
"entityType": "individual"
},
"target": {
"entityId": "ENT_456",
"entityType": "organization"
},
"type": "beneficial_owner_of",
"direction": "directed",
"strength": 0.85,
"confidence": 0.95,
"active": true,
"verified": true,
"evidence": [
{
"evidence_type": "corporate_registry",
"confidence": 0.95,
"source": "Companies House UK"
}
],
"datasource": "KYC_onboarding"
}
  • Nested source/target references: The source.entityId and target.entityId structure maps directly to $graphLookup's connectFromField and connectToField parameters. It also preserves entity type metadata at the edge level without requiring a join.

  • Separate strength and confidence fields: Strength captures how inherently strong the relationship is like a UBO holding 90% ownership versus a distant business associate. Confidence captures trust in the data point verified by two independent sources versus inferred from a shared address. Risk propagation uses both values differently.

  • active boolean for soft-delete: AML systems require audit trails. Delete a relationship by setting active: false rather than removing the document. This flag also serves as a traversal filter in restrictSearchWithMatch.

Clone the repository and follow the setup instructions in the GitHub README:

git clone https://github.com/mongodb-industry-solutions/fsi-aml-fraud-detection.git
cd fsi-aml-fraud-detection/aml-backend
poetry install
poetry run uvicorn main:app --host 0.0.0.0 --port 8001 --reload

The following subsections walk through the six core graph operations implemented in the NetworkRepository.

1
  • $graphLookup issues a { connectToField: { $in: [frontier] } } query at every BFS wave. Index connectToField on both traversal directions:

db.relationships.createIndex({
"source.entityId": 1,
"active": 1,
"confidence": -1
});
db.relationships.createIndex({
"target.entityId": 1,
"active": 1,
"confidence": -1
});

Index advantage is strongest at 1–3 hops and diminishes as depth increases. The 1-to-4-hop traversals in AML investigations sit squarely in the sweet spot.

2
  • Use the dual $graphLookup pipeline described in Reference Architectures to build complete bidirectional networks in a single aggregation.

  • The /network/{entity_id} endpoint exposes this operation with configurable depth, confidence threshold, and maximum node count.

3
  • Determine whether a low-risk customer connects to a sanctioned entity — and reconstruct the exact chain.

  • Use depthField to annotate each discovered relationship with its hop distance:

pipeline = [
{"$match": {"entityId": source_entity_id}},
{"$graphLookup": {
"from": "relationships",
"startWith": "$entityId",
"connectFromField": "source.entityId",
"connectToField": "target.entityId",
"as": "forward_paths",
"maxDepth": max_depth - 1,
"depthField": "depth"
}},
{"$graphLookup": {
"from": "relationships",
"startWith": "$entityId",
"connectFromField": "target.entityId",
"connectToField": "source.entityId",
"as": "reverse_paths",
"maxDepth": max_depth - 1,
"depthField": "depth"
}},
{"$project": {
"all_paths": {"$concatArrays": ["$forward_paths", "$reverse_paths"]}
}},
{"$unwind": "$all_paths"},
{"$match": {"$or": [
{"all_paths.source.entityId": target_entity_id},
{"all_paths.target.entityId": target_entity_id}
]}},
{"$sort": {"all_paths.depth": 1}},
{"$limit": 1}
]

The result identifies the shortest depth. A second bounded $graphLookup at that depth reconstructs the full relationship chain:

4
  • Use $facet to run five parallel analyses on the same entity set in a single pipeline — risk distribution, entity type breakdown, hub detection, prominence scoring, and basic metrics:

stats_pipeline = [
{"$match": {"entityId": {"$in": network_entity_ids}}},
{"$addFields": {
"connection_count": {"$size": {"$ifNull": ["$connected_entities", []]}}
}},
{"$facet": {
"basic_stats": [{"$group": {
"_id": None,
"total_nodes": {"$sum": 1},
"avg_risk_score": {"$avg": "$riskAssessment.overall.score"},
"max_risk_score": {"$max": "$riskAssessment.overall.score"}
}}],
"risk_distribution": [
{"$group": {"_id": "$riskAssessment.overall.level", "count": {"$sum": 1}}},
{"$sort": {"_id": 1}}
],
"hub_entities": [
{"$match": {"connection_count": {"$gte": 2}}},
{"$sort": {"connection_count": -1}},
{"$limit": 5},
{"$project": {"entityId": 1, "name": 1, "connection_count": 1}}
]
}}
]

Without $facet, each analysis requires a separate pipeline. $facet processes all sub-pipelines in parallel and returns results in a single response. This executes in 2–5 milliseconds.

5
  • Identify the entities that connect the most suspicious actors.

  • The centrality pipeline uses $facet to aggregate outgoing and incoming connections separately, then merges and scores.

  • The $switch operator assigns risk weights to each relationship type directly inside the aggregation:

"outgoing_risk_weighted": {
"$sum": {"$multiply": [
"$confidence",
{"$switch": {
"branches": [
{"case": {"$in": ["$type", [
"confirmed_same_entity",
"business_associate_suspected"
]]}, "then": 0.9},
{"case": {"$in": ["$type", [
"director_of", "ubo_of",
"parent_of_subsidiary"
]]}, "then": 0.7},
{"case": {"$in": ["$type", [
"household_member",
"professional_colleague_public"
]]}, "then": 0.3}
],
"default": 0.5
}}
]}
}
  • A confirmed_same_entity connection contributes far more to risk-weighted centrality than a household_member link.

  • The final composite score blends normalized degree centrality (40%), average confidence weight (30%), and risk-weighted centrality (30%) — all computed server-side.

6
  • Community detection builds an adjacency map via aggregation, filtering on confidence >= 0.7 to draw community boundaries around high-confidence relationships only:

adjacency_pipeline = [
{"$match": {
"$or": [
{"source.entityId": {"$in": entity_ids}},
{"target.entityId": {"$in": entity_ids}}
],
"active": True,
"confidence": {"$gte": 0.7}
}},
{"$group": {
"_id": "$source.entityId",
"connections": {"$addToSet": "$target.entityId"}
}}
]
  • $addToSet deduplicates connections automatically — two entities sharing both shared_address and business_associate relationships appear as a single connection.

Risk propagation applies exponential decay through the relationship chain after $graphLookup discovers the network:

propagated_risk = (
parent_entity_risk
* propagation_factor # decay per hop (default: 0.5)
* relationship_confidence # trust level for this edge
* type_risk_weight # domain-specific weight
)

A sanctioned entity's shell company (high confidence, high-risk relationship type) receives nearly the full risk score. Their accountant's social media connection receives almost nothing. The traversal is breadth-first, depth-limited to 3 hops, and stops when the propagated score drops below a configurable threshold.

  • Use dual $graphLookup for bidirectional network discovery: Run two parallel lookups, forward and reverse, and merge the results with $concatArrays to capture the full network in one aggregation.

  • Push filters into the traversal with restrictSearchWithMatch: Prune dead branches during BFS instead of filtering the full graph afterward to reduce the working set for large networks.

  • Create compound indexes on connectToField and your traversal filters: $graphLookup issues a $match/$in query at every BFS wave, so indexing connectToField together with active and confidence prevents collection scans at each hop.

  • Compute network analytics server‑side with $facet and $switch: Run risk distribution, hub detection, and centrality scoring in parallel sub‑pipelines, and use $switch to assign domain‑specific risk weights to relationship types inside the aggregation.

  • Apply exponential decay for risk propagation: Combine $graphLookup-based network discovery with a per‑hop decay formula that includes relationship confidence and type so the model automatically distinguishes high‑risk structural connections from low‑risk social links. This matches the guideline to keep key learnings in a consistent imperative style.

  • Luis Pazmiño

  • Mehar Grewal

  • Andrea Alaman Calderon

Back

Open Finance Data Store

On this page