Quick Start: Golang & MongoDB - A Quick Look at GridFS

Alain Mbuku

#GridFS#golang
Quick Start Go and MongoDB

This post is a continuity of getting started with Go and MongoDB series using the official Go driver. In this tutorial, we’re going to look at how to use the MongoDB GridFS feature to store files which are larger than 16 MB. Some use case of GridFS are:

  • If your filesystem limits the number of files in a directory, you can use GridFS to store as many files as needed.
  • When you want to access information from portions of large files without having to load files into memory, you can use GridFS to recall sections of files without reading the entire file into memory.
  • When you want to keep your files and metadata automatically synced and deployed across a number of systems and facilities, you can use GridFS. When using geographically distributed replica sets, MongoDB can distribute files and their metadata automatically to a number of mongod instances and facilities.

For more information about GridFS, head over to this page.

Project Tooling

To move along this tutorial, it’s important to have a knowledge of Go (The Go Programing language) and MongoDB CRUD operations. For tooling, the minimum requirements are:

Sign up for MongoDB Atlas today and get started with a Free M0 tier.

Assuming that the development environment is already set up, let's start with installing the necessary dependencies (if this is the first post you’re reading, I’d recommend to check out Nic’s first Quick Start post):

Project folder structure: mongo-gridfs:

  • main.go
  • testfile.zip (you can use any file larger than 16 MB. In this tutorial, we are going to use the binary file of VS Code which is renamed to testfile.zip)

Get dependencies

go get go.mongodb.org/mongo-driver/mongo
go get go.mongodb.org/mongo-driver/mongo/gridfs

Since the text editor used in this tutorial is VS Code, the rest of the dependencies will be added (imported) automatically (which is being taken care of by the VS Code Go extension). VS Code is not the only editor that can be used. There are plenty of alternative text editors or IDEs to choose form such Atom, GoLand by JetBrain, Sublime Text; feel free to choose the one that you feel comfortable with.

Next up is to connect to the cluster, let’s create a function that will initiate the connection. This will be in main.go.

func InitiateMongoClient() *mongo.Client {
    var err error
    var client *mongo.Client
    uri := "mongodb://localhost:27017"
    opts := options.Client()
    opts.ApplyURI(uri)
    opts.SetMaxPoolSize(5)
    if client, err = mongo.Connect(context.Background(), opts); err != nil {
        fmt.Println(err.Error())
    }
    return client
}

Notice in the above the URI points to localhost. If you have an Atlas account, you can replace it with the provided URL from the Atlas portal.

First, we'll read the file that will be uploaded. Let’s create a function called UploadFile and add the following lines:

func UploadFile(file, filename string) {

    data, err := ioutil.ReadFile(file)
    if err != nil {
        log.Fatal(err)
	...
    }

The UploadFile function can take a single parameter but as we progress it will make sense to why we set it to have two parameters.

Database Connection

Let’s initiate the connection to the database named myfiles using the NewBucket function by adding the following lines into the UploadFile function. When the code is executed, if myfiles does not exist, it will be created.

conn := InitiateMongoClient()
bucket, err := gridfs.NewBucket(
     conn.Database("myfiles"),
)
if err != nil {
    log.Fatal(err)
    os.Exit(1)
}

File Streaming

Before we move forward, it’s good we talk about file streaming. To put it in simple words, it’s a process of taking a file, cutting it into chunks and serializing the chunk. To better understand the I/O Stream, check out this link.

Now we need to initiate the stream to upload the file, which is creating the file name, and other metadata. Recall that the UploadFile function takes two parameters which one argument (file) is the file itself and the other(filename) is for the name of the file. To do so, let's initiate it with the [upload stream] function.

uploadStream, err := bucket.OpenUploadStream(
		filename, // this is the name of the file which will be saved in the database
)
if err != nil {
        fmt.Println(err)
        os.Exit(1)
    }
    defer uploadStream.Close()

    fileSize, err := uploadStream.Write(data)
    if err != nil {
        log.Fatal(err)
        os.Exit(1)
    }
    log.Printf("Write file to DB was successful. File size: %d \n", fileSize)

GridFS Files

When a file is uploaded to MongoDB, two collections are automatically created in database: fs.chunks and fs.files. All the metadata such as file size, file type, file name are stored in fs.files collection and the file itself is saved into multiple chunks under the same ID in chunks.fs.

The next thing is to create the download function to make sure that we can, in fact, retrieve the same file as its original. This one is fairly straightforward. The mongo-go-driver provides a way to fetch the files either by the file name or the ObjectID.

Below is a function to query and download the file using the file name.

 
func DownloadFile(fileName string) {

	conn := InitiateMongoClient()

      // For CRUD operations, here is an example
	db := conn.Database("myfiles")
	fsFiles := db.Collection("fs.files")
	ctx, _ := context.WithTimeout(context.Background(), 10*time.Second)
	var results bson.M
	err := fsFiles.FindOne(ctx, bson.M{}).Decode(&results)
	if err != nil {
		log.Fatal(err)
	}
      // you can print out the result
      fmt.Println(results)


      bucket, _ := gridfs.NewBucket(
		db,
	)
	var buf bytes.Buffer
	dStream, err := bucket.DownloadToStreamByName(fileName, &buf)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("File size to download: %v \n", dStream)
	ioutil.WriteFile(fileName, buf.Bytes(), 0600)

}

Something to note here DownloadToStreamByName takes a string and a buffer.

Full Main.go

To wrap things up, here is the full project:


package main

import (
    "bytes"
    "context"
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "path"
    "time"

    "go.mongodb.org/mongo-driver/bson"
    "go.mongodb.org/mongo-driver/mongo"
    "go.mongodb.org/mongo-driver/mongo/gridfs"
    "go.mongodb.org/mongo-driver/mongo/options"
)

func InitiateMongoClient() *mongo.Client {
    var err error
    var client *mongo.Client
    uri := "mongodb://localhost:27017"
    opts := options.Client()
    opts.ApplyURI(uri)
    opts.SetMaxPoolSize(5)
    if client, err = mongo.Connect(context.Background(), opts); err != nil {
        fmt.Println(err.Error())
    }
    return client
}
func UploadFile(file, filename string) {

    data, err := ioutil.ReadFile(file)
    if err != nil {
        log.Fatal(err)
    }
    conn := InitiateMongoClient()
    bucket, err := gridfs.NewBucket(
        conn.Database("myfiles"),
    )
    if err != nil {
        log.Fatal(err)
        os.Exit(1)
    }
    uploadStream, err := bucket.OpenUploadStream(
        filename,
    )
    if err != nil {
        fmt.Println(err)
        os.Exit(1)
    }
    defer uploadStream.Close()

    fileSize, err := uploadStream.Write(data)
    if err != nil {
        log.Fatal(err)
        os.Exit(1)
    }
    log.Printf("Write file to DB was successful. File size: %d M\n", fileSize)
}
func DownloadFile(fileName string) {
    conn := InitiateMongoClient()

    // For CRUD operations, here is an example
    db := conn.Database("myfiles")
    fsFiles := db.Collection("fs.files")
    ctx, _ := context.WithTimeout(context.Background(), 10*time.Second)
    var results bson.M
    err := fsFiles.FindOne(ctx, bson.M{}).Decode(&results)
    if err != nil {
        log.Fatal(err)
    }
    // you can print out the results
    fmt.Println(results)

    bucket, _ := gridfs.NewBucket(
        db,
    )
    var buf bytes.Buffer
    dStream, err := bucket.DownloadToStreamByName(fileName, &buf)
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("File size to download: %v\n", dStream)
    ioutil.WriteFile(fileName, buf.Bytes(), 0600)

}

func main() {
    // Get os.Args values
    file := os.Args[1] //os.Args[1] = testfile.zip
    filename := path.Base(file)
    UploadFile(file, filename)
    // Uncomment the below line and comment the UploadFile above this line to download the file
    //DownloadFile(filename)
}

To test is out, run the following command from the terminal (make sure you are in the directory where the main.go and testfile.zip files are):

go run main.go testfile.zip

Go GridFS sample script run command

We can also check this in the GUI with MongoDB Compass. You can download it here. The below image shows the metadata: the id, the file length, the date it was uploaded and the filename. You could also add content-type.

MongoDB Compass file confirmation

Conclusion

In this tutorial, we learned how to store files in MongoDB which are larger than 16 MB using the GridFS service with the official MongoDB Go driver. We also visualized the uploaded file using MongoDB Compass.

GridFS is very useful for streaming services, serving large file temporary, etc. With GridFS the possibilities are limitless. Next up, we will learn how to perform client side encryption using Go AES package before loading the file to the database.

Link to source: GitHub repo