Nov 08 2009

Predictions on the future of NoSQL

Category: Ideas,UncategorizedAleksander Kmetec @ 12:02 pm

These days new NoSQL databases are springing up faster than URL shorteners. Even incomplete lists are likely to mention over 50 of them, most of them never heard of before. But even though they’re all being lumped together under the same name, they are so different from each other that there’s still a dilemma how to come up with a name which would describe them for what they are instead of describing them for what they’re not.

Nobody knows for sure what the future of NoSQL will be like, which way the development is most likely to head and who the winners will be, but we can still try and make some predictions. Here are mine:

Several subgroups will emerge
This is not as much a prediction as much as it is an observation of already visible patterns. At least two main groups will emerge from the NoSQL movement: networked data structure servers (key/value stores, queues, …) and databases for working with structured data.

The data structure branch will remain very diverse
Typical software in this category is rather minimalistic, both in terms of functionality and in terms of code size. Thanks to this the threshold for entry of new players is rather low; also low is the price paid by users for switching between competing implementations.
Various players will likely specialize in some technological niche and they will continue to be used mainly as means of speeding up applications and not as fundamental building blocks.

Relational databases will fight back
Some databases already have support for storing, manipulating and indexing structured data in the form of XML. I have a feeling that JSON support can’t be far away. For most users this will be enough to stay with the established players in the database field instead of choosing a strictly document oriented database.

Document oriented databases will morph into graph databases
Implementing cross document referencing will take them half way there. Pressure from the relational databases, as described above, will push them the rest of the way.

SPARQL will become the query language of choice
SPARQL is a query language for RDF1; in other words: a language for querying graphs. It supports querying multiple data sources at the same time (federated queries) and there are projects underway to make it work with Hadoop clusters.
I’m not saying that alternative methods of querying will disappear completely! I’m just trying to say that the key players most likely to be used by the average developer (the next generation graph database equivalents of MySQL and PostgreSQL and similar) will end up standardizing on SPARQL instead of inventing yet another language.

Software will gain weight
Reading about NoSQL databases gives me a feeling of deja-vu. Most of it reads almost exactly like articles about MySQL from the beginning of this decade: “We’re better than competition because we don’t have transactions/triggers/datatype checking/guaranteed consistency/fulltext search/…”. MySQL now has all of those features and NoSQL databases will follow in the same path. Most users will start hitting walls due to lack of features, not due to performance issues and when that happens having features will become more important than being lean.

The cycle will repeat itself
After a decade or so a new class of players claiming that their lack of features is their strength will emerge once again

  1. RDF is an extremely simple format wrapped in a metric buttload of mystery in misunderstanding. But more about that some other time.

Tags: , , , , ,

12 Responses to “Predictions on the future of NoSQL”

  1. dwight_mongodb

    will be interesting to see if the document-oriented databases get more graph-like over time or not. it is harder to shard a graph database, that is the main challenge, and true horizontal scalability is a key feature of this new space.

  2. Aleksander Kmetec

    I never said it’s going to be a quick and easy transition. ;-) Anyway, I wish you lots of success with tackling that problem if my predictions happen to come true one day.

    I have to admit, though, that the main reason I’d like to see document oriented dbs become more graph like is not because I would think that there’s anything wrong with them, but because I’d like to see diversity in the graph db space. I spent last 3+ years working with a graph db emulated on top of a relational db and you can probably imagine it was a less than ideal experience.

  3. Rod

    I, too, would like to see more graph-like data stores (e.g. triplestores) but I’m afraid NoSQL efforts are actually moving in the opposite direction. While most key-value stores introduce flat structure and redundancy in order not to join anything and scale better, triplestores are exactly the opposite (no redundancy, join like crazy). I, for one, have no idea how to find compromise here but I sure would love to see it happen.

    I mean, something that scales as good as Big Table but can be queried with SPARQL would be… a dream come true :)

  4. Aleksander Kmetec

    The HEART project (RDF store built on top of Hadoo) looks very interesting, but judging by the project blog and SVN history the development appears to have stalled. :(
    http://heart.korea.ac.kr/trac/

  5. Matjaž Lipuš

    Looking at trends moving to WOA I would say there is something on it. RESTful databases such as MongoDB, CouchDB and Persevere are simple to query and fast.

    I won’t predict what is the future of NoSQL, but it is obvious relational databases needs some freshness.

    For us who still loves to experiment with academic experiments here are some links: http://www.openrdf.org http://sparql.sourceforge.net http://razor.occams.info/code/semweb
    But as already Aleksander mentioned a lot of those projects stale or “taking an indefinite hiatus”.

  6. Sam Johnston

    +1 Insightful. FWIW I refer to them all as “structured storage” now, as distinct from “raw storage” like Amazon EBS (at least in the context of cloud computing).

    Sam

  7. Emil Eifrem

    Rod –

    I actually disagree that most NOSQL stores are moving in the opposite direction from graphs. I think there’s two main focuses in the NOSQL space today: projects oriented around scaling to SIZE and projects oriented around scaling to COMPLEXITY.

    Scaling to size then means coping with large volumes of relatively simple data (for example username / password) and the key-value stores and bigtable clones live here.

    Scaling to complexity means coping with data that is semi-structured and that is connected. Here’s where the document databases and graph databases live.

    Scaling to size get a lot of attention because scaling to hundreds and thousands of machines is very sexy. But for the majority of the use cases out there that don’t need Google or Amazon scale, then coping with complexity through a rich data model that can easily represent most or all domains, is much more important.

    And in this group we can certainly see that there’s a move towards more graph-like structures. Couch was the pioneering document database that set the ground and already Mongo added references to their data model. Now recently Riak came along and took the next step with its support for links.

    In a graph database like http://neo4j.org (disclaimer: /me is involved) then relationships — how two nodes are related to one another — are first class citizens. They have a mandatory type (KNOWS, OWNS, CONTAINED_IN, etc) and an arbitrary amount of key-value properties. Our experience is that it makes modeling any domain a WHOLE lot easier and over time I think other models will start adding similar features.


    Emil Eifrem
    http://twitter.com/emileifrem
    http://neo4j.org

  8. Peter Neubauer - Neo4J

    Very good thoughts Aleksander!
    Yes, we are thinking along the same lines – NOSQL does not mean to abandon “enterprise” features. Then, of course we need techniques to handle scaling. Both to size and to data density which are both increasing. Graphs are the most capable data models, but of course the hardest to scale in a generic way.

    However, the concrete form and domain of the data has great potential for graph sharding optimization, as the example of document stores shows – collecting related information in a non-generic shard – the document.

    Cheers,

    /peter

  9. G

    The other misunderstanding about RDF is that it is a cheminformatics file type that has been around an extended period of time. The more complicated cousin to SDF, parsers are harder to come by.

  10. todd

    Emil,

    OO databases have had similar modeling primitives and architectures for quite some time. What makes now different? Is separating behaviour from data the key difference or is there something else?

  11. rob

    Regarding JSON support and XML handling: these are not natural bed-fellows for the relational model, but take, for example, something like M/DBX (http://bit.ly/zljNa). Here an underlying NoSQL database (GT.M: http://bit.ly/2fS2jy) allows JSON to be mapped to and from XML and persisted as a persistent XML DOM.

  12. Reedo

    For document oriented stuff, I’d have thought XQuery would be a better fit than SPARQL.