Blog Post: Join performance in MongoDB 3.2 using $lookup

One of the key tenants of MongoDB schema design is to account for the absence of server-side joins. Data is joined all the time inside of application code of course, but traditionally there’s been no way to perform joins within the server itself. This changed in 3.2 with the introduction of the $lookup operator within the aggregation framework. $lookup performs the equivalent of a left outer join – eg: it retrieves matching data from another document and returns null data if no match is found. Here’s an example using the MongoDB version of the Sakila dataset that I converted from MySQL back in this post : 1: db.films.aggregate([ 2: {$match:{ "Actors.First name" : "CHRISTIAN" , 3: "Actors.Last name" : "GABLE" }}, 4: {$lookup: { 5: from: "customers" , 6: as : "customerData" , 7: localField: "_id" , 8: foreignField: "Rentals.filmId" 9: }}, 10: {$unwind: "$customerData" }, 11: {$project:{ "Title" :1, 12: "FirstName" : "$customerData.First Name" , 13: "LastName" : "$customerData.Last Name" }}, 14: ]) What we’re doing here is finding all customers who have ever hired a film staring “Christian Gable”; We start by finding those films in the films collection (lines 2-3), then use $lookup to retrieve customer data (lines 4-9). Films embeds actors in the “Actors” array; the customers collection embeds films that have been hired in the "Rentals" array. The result of the join contains all the customers who have borrowed the movie returned as an array, so we use the $unwind operator to “flatten” them out (line 10). The resulting output looks like this: { "_id" : 1, "Title" : "ACADEMY DINOSAUR", "FirstName" : "SUSAN", "LastName" : "WILSON" } { "_id" : 1, "Title" : "ACADEMY DINOSAUR", "FirstName" : "REBECCA", "LastName" : "SCOTT" } { "_id" : 1, "Title" : "ACADEMY DINOSAUR", "FirstName" : "DEBRA", "LastName" : "NELSON" } { "_id" : 1, "Title" : "ACADEMY DINOSAUR", "FirstName" : "MARIE", "LastName" : "TURNER" } { "_id" : 1, "Title" : "ACADEMY DINOSAUR", "FirstName" : "TINA", "LastName" : "SIMMONS" } One thing that we need to be careful here is with join performance. The $lookup function is going to be executed once for each document returned by our $match condition. There is - AFAIK - no equivalent of a hash or sort merge join operation possible here, so we need to make sure that we've used an index. Unfortunately, the explain() command doesn’t help us. It tells us only if we have used an index to perform the initial $match , but doesn't show us if we used an index within the $lookup . Here's the explain output from the operation above (TL;DR): 1: > db.films.explain().aggregate([ 2: ... {$match:{ "Actors.First name" : "CHRISTIAN" , 3: ... "Actors.Last name" : "GABLE" }}, 4: ... {$lookup: { 5: ... from: "customers" , 6: ... as : "customerData" , 7: ... localField: "_id" , 8: ... foreignField: "Rentals.filmId" 9: ... }}, 10: ... {$unwind: "$customerData" }, 11: ... {$project:{ "Title" :1, 12: ... "FirstName" : "$customerData.First Name" , 13: ... "LastName" : "$customerData.Last Name" }}, 14: ... 15: ... ]) 16: { 17: "waitedMS" : NumberLong(0), 18: "stages" : [ 19: { 20: "$cursor" : { 21: "query" : { 22: "Actors.First name" : "CHRISTIAN" , 23: "Actors.Last name" : "GABLE" 24: }, 25: "fields" : { 26: "Title" : 1, 27: "customerData.First Name" : 1, 28: "customerData.Last Name" : 1, 29: "_id" : 1 30: }, 31: "queryPlanner" : { 32: "plannerVersion" : 1, 33: "namespace" : "sakila.films" , 34: "indexFilterSet" : false , 35: "parsedQuery" : { 36: "$and" : [ 37: { 38: "Actors.First name" : { 39: "$eq" : "CHRISTIAN" 40: } 41: }, 42: { 43: "Actors.Last name" : { 44: "$eq" : "GABLE" 45: } 46: } 47: ] 48: }, 49: "winningPlan" : { 50: "stage" : "COLLSCAN" , 51: "filter" : { 52: "$and" : [ 53: { 54: "Actors.First name" : { 55: "$eq" : "CHRISTIAN" 56: } 57: }, 58: { 59: "Actors.Last name" : { 60: "$eq" : "GABLE" 61: } 62: } 63: ] 64: }, 65: "direction" : "forward" 66: }, 67: "rejectedPlans" : [ ] 68: } 69: } 70: }, 71: { 72: "$lookup" : { 73: "from" : "customers" , 74: "as" : "customerData" , 75: "localField" : "_id" , 76: "foreignField" : "Rentals.filmId" , 77: "unwinding" : { 78: "preserveNullAndEmptyArrays" : false 79: } 80: } 81: }, 82: { 83: "$project" : { 84: "Title" : true , 85: "FirstName" : "$customerData.First Name" , 86: "LastName" : "$customerData.Last Name" 87: } 88: } 89: ], 90: "ok" : 1 91: } However, we can see the queries created by the $lookup function if we enable profiling. For instance if we turn profiling on can see a full collection scan of customers has have been generated for every film document that has been joined: These “nested” collection scans are bad news. Below is the results of a benchmark in which I joined two collections using $lookup with and without an index. As you can see, the unindexed $lookup degrades steeply as the number of rows to be joined increases. The solution is obvious: Always create an index on the foreignField attributes in a $lookup, unless the collections are of trivial size. The MongoDB company is putting a lot of new features into the aggregation framework: they clearly intend to create a very powerful and flexible capability that matches and maybe even exceeds what can be done with SQL. Indeed, the aggregation framework seems poised to become a dataflow language similar to Pig. Anyone wanting to do any serious work in MongoDB should make sure they are very comfortable with aggregate. If you use $lookup to perform joins in aggregate, make sure there is an index on the ForiegnField attribute.

Blog Post: Join performance in MongoDB 3.2 using $lookup

Trending Articles

Chitown Wiseguy Cashed In His Chips In Winter Of ’20, Made Bones In Chicago...

Missing boy, Queens Quay West and Bathurst Street area, Javin Dillon, 15

Practice Sheet of Right form of verbs for HSC Students

Gulabi kallu Lyrics and translation | GAV / Govindhudu andhari vadele (2014)

Mp3 Download: Mdu - Mazola

99 God Status for Whatsapp, Facebook

Grimsby sex-swap teen Nicole beats the bullies

Portable iSkysoft PDF Editor 5.6.0.1

Zara Larsson – Midnight Sun [iTunes Plus M4V – Full HD]

Troubleshooting Connectivity #9 –ローカル接続でネットワークエラーとはこれいかに？

Black Angus Grilled Artichokes

Materials Around Us Class 6 Worksheet Science Chapter 6

GTA 5 PPSSPP Zip File Download For Android Mediafire 382 MB

SAHARA FLASH LIVE IN WERAGOLLA 2018-04-20

AVS4YOU Products Patcher v1.4 By RADIXX11

VMOU RSCIT Result 2017, RSCIT Result VMOU rkcl.vmou.ac.in Name Wise

Moondru Mudichu 20-07-2016 – Polimer tv Serial

Dove Cameron – Too Much – Single [iTunes Plus M4A]

Bureau of Internal Revenue: Regional Offices (Directory)

New curfew for accused Brathwaite