Home United States USA — software NoSQL Options for Java Developers

NoSQL Options for Java Developers

312
0
SHARE

In comparing NoSQL databases for Java development, we look at all the options with a focus on MongoDB, Redis, Cassandra, Neo4j, and PostgreSQL + JSON.
The Java community is one I know and love, so even though a NoSQL database is rarely tied to a language, I’m writing this article for you Java developers around the world. In this article, I’ll show you several options for NoSQL databases. After exploring all the options, I’ll narrow the choices down to the top five based on Indeed Jobs, GitHub stars, and Stack Overflow tags. Then I’ll let you know if they’re supported by Spring Data and Spring Boot.
NoSQL databases have helped many web-scale companies achieve high scalability through eventual consistency: because a NoSQL database is often distributed across several machines, with some latency, it guarantees only that all instances will eventually be consistent. Eventually consistent services are often called BASE (basically available, soft state, eventual consistency) services in contrast to traditional ACID properties.
Defining the top five can be difficult. Many folks have attempted to this recently. See the Research and Notes section at the end of this article for reference.
In mid-August, I told my followers on Twitter that I was writing this article. I asked for good/bad stories about NoSQL databases and received a number of options people wanted me to include.
I received many suggestions, listed in alphabetical order below:
People also mentioned Hibernate OGM (JPA for NoSQL) and NoSQLUnit as tools to help access and test NoSQL databases.
Note that I didn’t receive any requests for CouchDB, HBase, Elasticsearch, or Solr. CouchDB and Couchbase are often confused because of their similar names, but they’re quite different. Since CouchDB is a document store, I included it in my rankings. I also added HBase, since it is mentioned by ITBusinessEdge, KDnuggets, and DB-Engines (in the Research and Notes section). I didn’t include Elasticsearch or Solr because I believe those aren’t often used as the primary data store.
I used Indeed Jobs, GitHub Stars, Stack Overflow tags, and Docker pulls to develop my system of ranking the top five NoSQL databases.
I searched on Indeed Jobs without a location and found very few surprises, save for Amazon’s DynamoDB showing up as a top contender.
I searched and found the top five NoSQL options by GitHub stars are Redis, MongoDB, ArangoDB, Neo4j, and Cassandra.
You can use Tim Qian’s star-history project to see the star growth of these five.
I searched on Stack Overflow for tags for each and found that MongoDB and PostgreSQL are the most popular, followed by Neo4j, Cassandra, and Redis.
I searched on Docker Hub for images and found the stats to be 10M+ for a few, 5M+ for Neo4j, and 1M+ for many others. FaunaDB and JetBrains Xodus don’t seem to have images available.
After gathering this information, it didn’t seem very relevant to include these stats in my ranking. My reason is two-fold: because the numbers aren’t exact and because there weren’t “official” images for each option.
I created a matrix to combine jobs, stars, and tags. I awarded 1-5 points based on the ranking they scored in each category. If an option didn’t make the top five, it received a zero. The results – MongoDB, Redis, Cassandra, Neo4j, and PostgreSQL – are in the table below.
If you look at DB-Engines Ranking for their top five options, you’ll find PostgreSQL, MongoDB, Cassandra, Redis, and HBase.
Will you look at that — our top five results are pretty close!
Since my top five results are pretty close to what DB-Engines has, I’ll use mine as the top five. Below is an overview of each one, along with information about their Spring Boot support.
You might ask “Why Spring Boot?” My answer is simple: because Spring Boot adoption is high. According to Redmonk’s recent look at Java frameworks, Spring Boot adoption grew 76% between September 2016 and June 2017.
And things haven’t slowed down since June: Maven downloads in August 2017 were 22.2 million .
MongoDB was founded in 2007 by the folks behind DoubleClick, ShopWiki, and Gilt Groupe. It uses the Apache and GNU-APGL licenses on GitHub. Its many large customers include Adobe, eBay, and eHarmony.
Redis stands for REmote Dictionary Server and was started by Salvatore Sanfilippo. It was initially released on April 10,2009. According to redis.io, Redis is a BSD-licensed in-memory data structure store and can be used as a database, cache, and message broker. Well-known companies using Redis include Twitter, GitHub, Snapchat, and Craigslist.
Cassandra is “a distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure” (from “Cassandra – A structured storage system on a P2P Network” on the Facebook Engineering blog). It was initially developed at Facebook to power its Inbox Search feature. Its creators, Avinash Lakshman (one of the creators of Amazon DynamoDB) and Prashant Malik, released it as an open-source project in July 2008. In March 2009, it became an Apache Incubator project and graduated to a top-level project in February 2010.
In addition to Facebook, Cassandra helps a number of other companies achieve web scale. It has some impressive numbers about scalability on its homepage.
One of the largest production deployments is Apple’s, with over 75,000 nodes storing over 10 PB of data. Other large Cassandra installations include Netflix (2,500 nodes, 420 TB, over 1 trillion requests per day), Chinese search engine Easou (270 nodes, 300 TB, over 800 million requests per day), and eBay (over 100 nodes, 250 TB).
Neo4j is available as GPL3-licensed “community edition” with some extensions licensed under the Affero GPL. The community edition is limited to running on one node and does not contain clustering support or hot backups. Neo4J’s “enterprise edition” has scale-out capabilities, in-memory page cache, and hot backups. A 30-day trial is available; no pricing is provided.
Neo4j is best known as a graph database, where everything is stored as an edge, node, or an attribute. Version 1.0 was released in February 2010 and has been developed by Neo4j, Inc. since its beginning. Its large customers include Walmart, Airbnb, Monsanto, and eBay.
PostgreSQL is a traditional relational database management system (RDBMS) that has NoSQL support via its native JSON support (added in version 9.2). In 9.4, they added support for Binary JSON (aka JSONB) and indexes.
Leigh Halliday explains how you can unleash the power of storing JSON in Postgres in a blog post dated June 2017. Halliday goes on to show how this can be used with Ruby on Rails. A blog post from Umair Shahid shows how to process PostgreSQL JSON and JSONB data in Java.
I’m not sure that PostgreSQL and its JSON support should be included as a recommend NoSQL option. However, it likely makes sense if you’re already using PostgreSQL and want to make your data schema more free-flowing.

Continue reading...