MongoDb is known to be a scalable, document oriented database. MongoDB provides easy scalability, sharding, fail over and many more cool features. Moreover, replication and master slave configuration is relatively easy.
Start mongodb server:
./mongod
/data/db folder should exist and should be writable. Also port 27017 should be available for mongo to use. Mongo will enable a web interface for status and administrative information.
Mongodb uses admin database for administrative tasks to authentication and many other tasks. There is Local db which the data is stored locally only, not being distributed. Then there is config database which stores the configuration information for database server.
Mongodb uses Collection which corresponds to tables in relational model. Moreover, mongo uses Documents which corresponds to Rows in relations Model. Even though there is no notion of schema, it is not enforced, it is definitely important to keep the document model consistent with others. Moreover, storing documents to their own collections, in a hierarchy, is as well essential. For example: even though you can store any kind of document within a single collection, it doesn’t make much sense, due to later on while querying the collection, you will have to sort them out.
In my opinion document based model fits object oriented model better than relational model, not fully, but thinking about a document as an object is much more sensible.
File = {
“FileName”:”Users.csv”,
“FIleSize” : 1024
};
This is our File Document. Entries within this document can be thought as a key value pair, or a variable and a value.
If you are on mongo shell:
# db.files.insert(File);
This command will insert the File document to files collection. Whenever, there is an entry in a collection,it produces a document id, which is : “_id” represents and object id.
# db.files.find();
find command is executed on collections and returns all the documents within that collection. find(Predicate) method also takes a predicate which returns the document satisfying property.
Ex: db.Files.find({“FileName”:”MongoDB”});
There is also findone() method which returns one document.
Update(Predicate, data) method takes two arguments, one is a predicate that we want to run the update and the other argument is the data we want to update the document with.
Remove() method is used to remove all the documents from a collection. Moreover, Remove(Predicate) is used to remove the elements that satisfies the predicate.
# db.files.remove({“FileName”:”Mysql”});
MongoDB has 6 data type which are as follows: null, Boolean, numeric (Double), string (UTF-8), array (object array) and object. Moreover, MongoDb supports embedded documents, ie: document within another document.
MongoDb produces globally unique object id for documents, which I think is a great feature. Then you don’t have to worry about synchronizing the ids across multiple server etc.
object id is in a hierarchy in the sense: timestamp + machine id + process id + increment. Including all these information in the object id, ensures that object id is globally unique and there won’t be any collision. As far as my experience, generating unique ids, guids are computationally expensive. Therefore, mongodb provides functionality for the clients to generate an object id and inject the values to the documents. These would reduce the load and burden of database to execute this expensive operation.
MongoDb supports batch inserts besides the regular inserts in which you insert documents one by one. Batch insert reduces the communication between the client and server. Batch inserts are way faster than inserting one at a time. One of the drawback of insertion to mongodb is there is no data validation which means, you can insert malicious data in your database and if you want to do validation, operations will be slower.
#db.foo.insert(“hello”:”world”);
Removing documents from a collection is very simple:
#db.foo.remove();
This command clears the content of the collection but doesn’t clear the indexes. Weird right?
Remove(Predicate) method takes a predicate or many to remove a item that satisfies the conditions.
Clearing a collection to remove all the elements are computationally expensive. However dropping a collection is rather very fast.
Updates are very interesting topic in Databases and Concurrency. Here is how mongodb handles it: last update stays there. There is no notion of transactions or locking, whichever update executes last, stays consistent. Updates are atomic.
Update has a special operator , $set, which is used to add a property to a document. In other words, you can the schema of the document while updating with this operator. Likewise, you can remove a key from the document and change the schema of the document with $unset operator.
$inc operator provide increment of a key that represents an integer. This is very useful for counters, analytics etc.
Mongodb doesnt support transactions which is considered to be a missing implementation, but if you look at it one of the most favorite database engine, mysql’s MyIsam engine doesnt support transactions but it is being used very widely.
Mongodb uses BSON data format, which is binary JSON. Even though BSON takes more space than JSON. It has certain benefits such as performance, processing BSON is more efficient than processing JSON. Another benefit is that every client knows how to convert BSON to any data format for processing purposes.
Mongodb uses dynamic querying and update in-place. Updates in place are very efficient operations and are done with lazy writes, because accessing and writing to disk is very slow and mongodb uses memory mapped files to store data, updating in-place can increase mongodb’s performance.On the other hand, the data mongodb is updating might be stale for a very short time. On the other hand, CouchDB uses MVCC, which keeps multiple versions of data from different users. By MVCC data is guaranteed to be up to date.
Storing binary data in mongodb is also an easy operation, mongodb supports binary data storage up to 4MB and if you need to store bigger files, mongodb partitions the file for you and store them in chunks. Then when you run range queries, you can get the chunks of the data and merge them or use them as you need.
Mongodb supports manual and auto sharding. Sharding is basically partitioning your data sets across set of servers. With manual sharding, you are responsible of determining the servers for the data to go as well as merging the results for your queries. With auto sharding mongodb takes care of everything for you. You will have the impression of talking to a single server while you are working with auto sharding, even though, this hasnt been implemented yet, it promises great flexibility with scaling.
with the following commands you can have some information about your database and environment.
One of the great features of mongodb is its schemaless database design, you dont need to a schema as you do with relational database management systems. This provided a great strength while you are developing your applications, first benefit is that changing requirements and specifications, second benefit is changing nature of your data and another one is having different data for same data types.
Mongodb uses Collections to store documents. Collections is a similar notion to tables in relational database terminology. Collections are container that are used to store documents, they are dynamic, they grow as you add more documents to it. There are also capped collections which contains limited amount of documents. Fixed sized collections.
Data types in mongodb are followings: String, Integer, Double, Boolean, Min/Max Keys, Arrays, Timestamp, object, objectid, null, symbo, date, regular expressions, javascript code and binary data.
ObjectId is a special type. ObjectId is created automatically by mongodb for every document. ObjectId is 12 bytes hex string which is built by timestamp, machine id, etc This guarantees that the objectid’s across all the servers are unique. Moreover, in a sharded, clustered environment managing auto generated primary key ids is very difficult, auto generated objectid’s help with the document ids. This is a great feature in my opinion.
You can use indexes in Mongodb to increase your read and query efficiency. However keep in mind that adding indexes to a collection would increase your read speed but it might effect your write speeds. You will have a performance hit on your collection if you are writing too much data and reading seldom. This is due to indexes and data structures being used in the indexes, ie: B Trees.
With Mongodb, unlike relational databases, you dont have to create databases or collections right away. If you dont have a database, or collection mongodb will create it for you when you execute your code.
In order to view current databases use the following:
# show dbs
Then if you want to use a specific database:
# use myDb
Once you are in a database context, you can view the collections as follows:
# show collections
Once you switch to a database, current database is referred as db.
While working with mongodb collections, you will often search documents, find() is used to find document, find() also takes predicates to match the documents being search.
# db.items.find({Foo : “bar”});
will return you items with Foo attribute is “bar”.
While searching, you can limit or skip documents as well.
# db.items.find().limit(10);
# db.items.find().skip(20);
you can also sort your results:
# db.items.find().sort({Foo : 1};
One of the nice feature is that you can combine these function calls and make it fluent.
# db.items.find().sort().limit(10).skip(20);
The notion of capped collections in mongodb comes handy when you want the insertion order to be natural. While you are inserting documents to the regular collections, your items are not guaranteed to be in order, this might be due to the collection size can exceed the capacity of the collection and can be mapped to another collection. Capped collections are fixed size collection, just like a fixed size circular buffer, once the capped collection is full, new items will let the old items to be purged, dropped. You can define the size of the capped collections. Capped collections guarantees the insertion order. Updating documents within capped collections is possible if and only if the document size will remain same, also, you can not delete a document from a capped collection. In order to achieve these, you will have to re-create your capped collection and insert the documents again.
In order to find out about your collection you can invoke validate() method on your collection.
As stated you can use the find() method to query your documents, once in a while findOne() method also can be used to find a single document.
Mongodb aggregation methods are very easy to use. For aggregation you can use count, distinct and group functionality.
count() method is used to find out about the number of document in a collection as follows:
# db.users.count();
or with a predicate:
# db.users.find({Age : 30}).count();
distinct() is used to return the distinct, unique documents within a collection.
# db.users.distinct();
You can also pass predicates to the distinct method.
Moreover, mongodb provides many predicate, and conditional operators such as $gt, $gte, $lt, $lte for filtering your results.
# db.users.find(Age:{$lt : 21})
There is a $ne operator to get documents that a predicate not equal to a value.
# db.users.find(Name : {$ne : “firat”})
$in operator is a filter to get the documents within the defined array. There is also $nin operator to find documents not in an array.
# db.users.find(Name : {$in : ["foo","firat","bar"])
This call finds documents whose name is foo or firat or bar.
# db.users.find(Name : {$nin : ["foo","firat","bar"]})
This call finds documents whose name is not foo or firat or bar.
There is a counter call to $in which is $all, which gets you the document that has $all the predicates in the statement.
$size operator lets you specify the number of sub documents within a document and returns you the resulting document.
# db.users.find({Accounts : {$size:2}});
returns you users with 2 accounts.
Because mongodb is schema less, sometimes you search for documents and keys might not be in the document definition. You can use $exists operator to find documents with specified key as follows:
# db.users.find({DateOfBirth : {$exists :true}})
Mongodb querying is lazy. Querying the database returns you a cursor not the results. As you are iterating through the result set you get the results lazily. This is a nice feature.
Mongodb provides an upsert() statement that either inserts a new document or updates a current one.
Moreover, there is an $inc operator that increments a value, this might turn handy for analytics.
$set operator sets a value. $unset operator unsets a particular value. There is $push and $pop operators just like a stack that allows users to add or remove elements from the document.
In order to remove documents from a collection:
# db.users.remove({“Username” : “firat”})