Introduction to Couchbase - Part 2

December 27, 2012 dev nosql

In this post, I’ll talk about some technical details and terminology of Couchbase. The official documentation is very comprehensive and I highly recommend taking a look at it: http://www.couchbase.com/docs/

Installation

First of all I recommend you check the supported OS list here. I tried to install it on Windows 8 but turns out it’s not supported yet. Then I installed it on Windows Server 2008 R2 and a Ubuntu Server 12.10. You can find Linux installation instructions here.

Installation is quite easy. There are a few things that need to paid attention though.

File locations: Actually this step is very easy, just accept the default location. But Couchbase recommends storing document and index data on different disks to get the best performance,
Memory Size: First node in the cluster determines the quota and that value is inherited to the following nodes. To update it, on the management console, select Data Buckets and click on the arrow on the left of the bucket name. Then by clicking on Edit you can change this value.
Bucket Type: memcached and Couchbase bucket types are significantly different so you have to choose carefully. memcached buckets don’t support persistence nor replication. They are meant to be an in-memory caching solution.
Bucket Name: During setup you cannot change the name of the default bucket. Couchbase recommends to use it for testing purposes only. So it’s best to create your own bucket for the actual data once the installation is over.
Flush: This is a very dangerous operation. It allows you delete all the data in a bucket. Default is disabled and I’d recommend to keep it that way.

Basic concepts

A couch database is called a bucket.
A document is a self-contained piece of data. It is a JSON object. A row in a RDBMS would be stored in a document with all the data it’s related to. (i.e: A customer record may contain a list of orders). This approach is called Single-Document approach and the document is called an aggregate. More about it in Modeling Documents section later in this post. A new feature that came with v2.0 is these records can be indexed and queried.
vBucket is short for “Virtual Bucket” and they work functionally equivalent to database shards in traditional relational databases. Good news is that Couchbase will automatically manage vBuckets.
XDCR stands for Cross Data Center Replication. It’s a very cool feature that can be used in a multiple of scenarios such as spreading data geographically or creating an active offsite backup.

Modelling Documents: has-many vs. belongs-to

The way we model data should depend on the structure and nature of the data. There are two approaches when modelling the data. has-many means storing all the child records with the parent. For example a standard Customer – Order relation could be expressed like this:

{
    "id" : 123,
    "name": Valued,
    "surname": "Customer"
    "orders": [ "order1", "order2", "order3" ]
}
{
    "id": "order1",
    "orderDate": "2012-12-20",
    "status": "sent"
}

The Customer stores the IDs of the orders. This method can be problematic if the parent (Customer in this example) is updated frequently. As orders can be accessed via customer this will effect the overall query performance. belongs-to approach suggests approaching it from the other direction. If we modeled the above example with belongs-to approach we would come up with something like this:

{
    "id" : 123,
    "name": Valued,
    "surname": "Customer"
}
{
    "id": "order1",
    "orderDate": "2012-12-20",
    "status": "sent",
    "customerId": 123
}
{
    "id": "order2",
    "orderDate": "2012-12-10",
    "status": "pending",
    "customerId": 123
 }

This is preferable to avoid contention. With this method we need to use indexing to be able query all orders by customerId. has-many approach performs better because a multiple-retrieve query is faster than indexing and querying.

Backup and Restore

Before diving into playing with the data it’s always a good practice to backup the original data. Couchbase provides 2 options to accomplish these tasks:

Good ol’ file copy Copy the data files stored under the default path (which is “C:\Program Files\couchbase\server\var\lib\couchbase\data” for Windows). The disadvantage of this method is that it can only be restored to offline nodes in an identical cluster environment. Also database is not compressed.
cbbackup / cbrestore These tools can be found in the bin folder.

Couchbase_Backup

I think a slight disadvantage is that you have to specify password in clear text in the command line. I was expecting just providing –p parameter would end up it asking me the password after I enter the command. Instead I got an error saying the password cannot be empty.

Couchbase_Restore

Advantages are that it allows a backup to be restored onto a different size and configuration. Also it compresses the data so it’s disk-space friendly.

Tip: When specifying the backup path to cbrestore make sure to remove the trailing backslash from the path. In the next instalment of this series I’ll post a sample application using the Beer sample database that is shipped with Couchbase 2.0

Raspberry Pi - Installation and Setup

December 19, 2012 hobby raspberry_pi

It is world famous now. It is a dirt cheap ARM-based computer running Linux. Just bought one for myself. I installed Raspbian Wheezy which can be downloaded from here: http://www.raspberrypi.org/downloads. It is the recommended download for newbies so I went straight to it. I used Win32DiskImager and formatted an SD card. Installed it on the Raspberry Pi and it was good to go.

I definitely recommend buying a case which makes it a lot more fun to play with it. I also bought a 3.5” display. I think small screen goes well with the small device. If I’m going to plug something in to my 23” LED monitor, I’d prefer it to be my desktop. The display I bought can be found on Amazon. It doesn’t come with a power supply so you also have to buy a 12V – 2A DC power supply. I also needed a male – male RCA cable to connect the display to the Pi.

The result is the smallest computer I have ever had:

Raspberry Pi

I hope I can do something useful with it too.

Introduction to Couchbase – Part 1

December 16, 2012 dev nosql

Couch is one of move most popular databases in NOSQL movement. When I first started playing around with Couch I was a bit confused by the naming. I had thought there was one product but turns out there are two actually.

CouchDB vs. Couchbase

Apache CouchDB was created by Damien Katz who then started a company called CouchOne Inc. After some time they decided to merge their company with Membase Inc. which developed another open-source distributed key-value database called Membase. They merged the two products so that it would use Membase as storage backend and some portions were rewritten. The end result was called Couchbase. So even though it’s based on Apache CouchDB it’s a different product and is being developed by a different company. But it’s still open source and licensed under the Apache 2.0 license.

Which one to use?

They serve different needs. Couchbase has a built-in memcached-based caching technology whereas Apache CouchDB is a disk-based database. Therefore Couchbase is better suited low latency requirements. Couchbase has built-in replication which allows data to be spread across all the nodes in the cluster automatically. Apache CouchDB supports peer-to-peer replication. I find auto-replication feature of Couchbase marvellous and it’s extremely easy to manage. When you create a new node it can be a new cluster on its own or it can be added to an existing cluster. Adding it to a cluster consists of just providing the IP address/hostname and administrator credentials of a machine in that cluster and the rest is automagically taken care of. I’m using Couchbase in my test applications.

What’s new in Couchbase 2.0

Couchbase released a new major version recently. Highlights of the new features are:

Cross Data-Center Replication (XDCR) enhancements
2 cool sample buckets (beer-sample and gamesim-sample)
A new REST-API
New command-line tools
Querying views during rebalance

In the next post I’ll go into more technical details.

Playground for the mind

It's all about the journey, not the destination