{"id":749,"date":"2018-10-01T21:14:56","date_gmt":"2018-10-01T19:14:56","guid":{"rendered":"https:\/\/craftcoders.app\/?p=749"},"modified":"2024-08-14T14:27:53","modified_gmt":"2024-08-14T12:27:53","slug":"a-deep-dive-into-apache-cassandra-part-1-data-structure","status":"publish","type":"post","link":"https:\/\/craftcoders.app\/a-deep-dive-into-apache-cassandra-part-1-data-structure\/","title":{"rendered":"A deep dive into Apache Cassandra &#8211; Part 1: Data Structure (was not continued)"},"content":{"rendered":"\r\n<p>Hey guys,<br \/><br \/><\/p>\r\n\r\n\r\n\r\n<p>during my studies I had to analyze the NoSQL database Cassandra as a possible replacement for a regular relational database.<br \/>During my research I dove really deep into the architecture and the data model of Cassandra and I figured that someone may profit from my previous research, maybe for your own evaluation process of Cassandra or just personal curiosity.<\/p>\r\n\r\n\r\n<hr class=\"wp-block-separator\" \/>\r\n\r\n\r\n<p>I will separate this huge topic into several posts and make a little series out of it. I don&#8217;t know how many parts the series will contain yet, but I will try to keep every post as cohesive and understandable as possible.<\/p>\r\n\r\n\r\n\r\n<p>Please forgive me, as I have to introduce at least a couple of terms or concepts I won&#8217;t be able to describe thoroughly in this post. But don&#8217;t worry, I will be covering them in an upcoming one.<br \/><br \/><\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">What is Cassandra?<\/h2>\r\n\r\n\r\n\r\n<p>Cassandra is a column-oriented open source NoSQL database whose data model is based on <a href=\"https:\/\/static.googleusercontent.com\/media\/research.google.com\/en\/\/archive\/bigtable-osdi06.pdf\">Big Table<\/a> by Google and its distributed architecture on <a href=\"http:\/\/www.cs.bu.edu\/~jappavoo\/jappavoo.github.com\/451\/papers\/dynamo.pdf\">Dynamo<\/a> by Amazon. It was originally developed by Facebook, later Cassandra became an Apache project and is now one of the top-level projects at Apache. Cassandra is based on the idea of a decentralized, distributed system without a single point of failure and is designed for high data throughput and high availability.<br \/><br \/><\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">Cassandras Data Structure<\/h2>\r\n\r\n\r\n\r\n<p>I decided to begin my series with Cassandras data structure because it is a good introduction to the general ideas behind Cassandra and a good foundation for future posts regarding the Cassandra Query Language and the distributed nature of it.<br \/><br \/><\/p>\r\n\r\n\r\n\r\n<p>I try to give you an overview how data is stored in Cassandra and show you some similarities and differences to a relational database, so let&#8217;s get right to it.<br \/><br \/><\/p>\r\n\r\n\r\n\r\n<h3 class=\"wp-block-heading\">Columns, Rows and Tables<\/h3>\r\n\r\n\r\n\r\n<p>The basic component in Cassandras data structure is the <strong>column<\/strong>, which consists classically of a key\/value pair. Individual columns are combined in a <strong>row<\/strong> and uniquely identified by a <strong>primary key<\/strong>. It consists of one or more columns and the primary key, which can also consist of one or more columns. To connect individual rows describing the same entity in a logical unit, Cassandra defines <strong>tables<\/strong>, which are a container for similar data in row format, equivalent to relations in relational databases.<br \/><br \/><\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-image\"><img decoding=\"async\" class=\"wp-image-750\" src=\"https:\/\/craftcoders.app\/wp-content\/uploads\/2018\/10\/cassandra_row.png\" alt=\"\" \/>\r\n<figcaption>the row data structure in Cassandra<\/figcaption>\r\n<\/figure>\r\n\r\n\r\n\r\n<p>However, there is a remarkable difference to the tables in relational databases. If individual columns of a row are not used when writing to the database, Cassandra does not replace the value with zero, but the entire column is not stored. This represents a storage space optimization, so the data model of tables has similarities to a multidimensional array or a nested map.<br \/><br \/><\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-image\"><img decoding=\"async\" class=\"wp-image-751\" src=\"https:\/\/craftcoders.app\/wp-content\/uploads\/2018\/10\/wide_rows.png\" alt=\"\" \/>\r\n<figcaption>table consisting of skinny rows<\/figcaption>\r\n<\/figure>\r\n\r\n\r\n\r\n<h3 class=\"wp-block-heading\">Skinny and Wide Rows<\/h3>\r\n\r\n\r\n\r\n<p>Another special feature of the tables in Cassandra is the distinction between skinny and wide rows. I only described skinny rows so far, i.e. they do not have a complex primary key with clustering columns and few entries in the individual partitions, in most cases only one entry per partition.<br \/><br \/><\/p>\r\n\r\n\r\n\r\n<p>You can imagine a partition as an isolated storage unit within Cassandra. There are typically several hundred of said partitions in a Cassandra installation. During a write or read operation the value of the primary key gets hashed. The resulting value of the hash algorithm can be assigned to a specific partition inside the Cassandra installation, as every partition is responsible for a certain range of hash values. I will dedicate a whole blog post to the underlying storage engine of Cassandra, so this little explanation has to suffice for now.<br \/><br \/><\/p>\r\n\r\n\r\n\r\n<p>Wide rows typically have a significant number of entries per partition. These wide rows are identified by a composite key, consisting of a partition key and optional clustering keys.<br \/><br \/><\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-image\"><img decoding=\"async\" class=\"wp-image-752\" src=\"https:\/\/craftcoders.app\/wp-content\/uploads\/2018\/10\/wide_rows2.png\" alt=\"\" \/>\r\n<figcaption>table consisting of wide rows<\/figcaption>\r\n<\/figure>\r\n\r\n\r\n\r\n<p><br \/>When using wide rows you have to pay attention to the defined limit of two billion entries in a partition, which can happen quite fast when storing measured values of a sensor, because after reaching the limit no more values can be stored in this partition.<\/p>\r\n\r\n\r\n\r\n<p><br \/>The partition key can consist of one or more columns, just like the primary key. Therefore, in order to stay with the example of the sensor data, it makes sense to select the partition key according to several criteria. Instead of simply partitioning according to for example a <strong>sensor_id<\/strong>, which depending on the number of incoming measurement data would sooner or later inevitably exceed the limit of 2 billion entries per partition, you can combine the partition key with the date of the measurement. If you combine the sensor_id with the date of the measurement the data is written to another partition on a daily basis. Of course you can make this coarser or grainer as you wish (hourly, daily, weekly, monthly).<br \/><br \/><\/p>\r\n\r\n\r\n\r\n<p>The clustering columns are needed to sort data within a partition. Primary keys are also partition keys without additional clustering columns.<\/p>\r\n\r\n\r\n\r\n<p>Several tables are collected in to a <strong>keypsace<\/strong>, which is the exact equivalent of a database in relational databases.<br \/><br \/><\/p>\r\n\r\n\r\n\r\n<h3 class=\"wp-block-heading\">Summary<\/h3>\r\n\r\n\r\n\r\n<p>The basic data structures are summarized,<br \/><br \/><\/p>\r\n\r\n\r\n\r\n<ul>\r\n<li>the <strong>column<\/strong>, consisting of key\/value pairs,<\/li>\r\n<li>the <strong>row<\/strong>, which is a container for contiguous columns, identified by a primary key,<\/li>\r\n<li>the <strong>table<\/strong>, which is a container for rows and<\/li>\r\n<li>the <strong>keyspace<\/strong>, which is a container for tables.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p>I hope I was able to give you a rough overview of the data structure Cassandra uses. The next post in this series will be about the Cassandra Query Language (CQL), in which I will give you some more concrete examples how the data structure affects the data manipulation.<\/p>\r\n\r\n\r\n\r\n<p>Cheers,<\/p>\r\n\r\n\r\n\r\n<p>Leon<\/p>\r\n","protected":false},"excerpt":{"rendered":"<p>Hey guys, during my studies I had to analyze the NoSQL database Cassandra as a possible replacement for a regular relational database.During my research I dove really deep into the architecture and the data model of Cassandra and I figured that someone may profit from my previous research, maybe for your own evaluation process of Cassandra or just personal curiosity. &#8230; <a href=\"https:\/\/craftcoders.app\/a-deep-dive-into-apache-cassandra-part-1-data-structure\/\" class=\"more-link\">Read More<\/a><\/p>\n","protected":false},"author":7,"featured_media":2340,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"categories":[111,108,109],"tags":[],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>A deep dive into Apache Cassandra - Part 1: Data Structure (was not continued) - CraftCoders.app<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/craftcoders.app\/a-deep-dive-into-apache-cassandra-part-1-data-structure\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A deep dive into Apache Cassandra - Part 1: Data Structure (was not continued) - CraftCoders.app\" \/>\n<meta property=\"og:description\" content=\"Hey guys, during my studies I had to analyze the NoSQL database Cassandra as a possible replacement for a regular relational database.During my research I dove really deep into the architecture and the data model of Cassandra and I figured that someone may profit from my previous research, maybe for your own evaluation process of Cassandra or just personal curiosity. ... Read More\" \/>\n<meta property=\"og:url\" content=\"https:\/\/craftcoders.app\/a-deep-dive-into-apache-cassandra-part-1-data-structure\/\" \/>\n<meta property=\"og:site_name\" content=\"CraftCoders.app\" \/>\n<meta property=\"article:published_time\" content=\"2018-10-01T19:14:56+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-08-14T12:27:53+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/craftcoders.app\/wp-content\/uploads\/2018\/10\/Cassandra_Stratford_Gallery.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"575\" \/>\n\t<meta property=\"og:image:height\" content=\"478\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Leon Gottschick\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Leon Gottschick\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/craftcoders.app\/a-deep-dive-into-apache-cassandra-part-1-data-structure\/\",\"url\":\"https:\/\/craftcoders.app\/a-deep-dive-into-apache-cassandra-part-1-data-structure\/\",\"name\":\"A deep dive into Apache Cassandra - Part 1: Data Structure (was not continued) - CraftCoders.app\",\"isPartOf\":{\"@id\":\"https:\/\/craftcoders.app\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/craftcoders.app\/a-deep-dive-into-apache-cassandra-part-1-data-structure\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/craftcoders.app\/a-deep-dive-into-apache-cassandra-part-1-data-structure\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/craftcoders.app\/wp-content\/uploads\/2018\/10\/Cassandra_Stratford_Gallery.jpg\",\"datePublished\":\"2018-10-01T19:14:56+00:00\",\"dateModified\":\"2024-08-14T12:27:53+00:00\",\"author\":{\"@id\":\"https:\/\/craftcoders.app\/#\/schema\/person\/5ece49c0f004fa0def05eabd3e877f46\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/craftcoders.app\/a-deep-dive-into-apache-cassandra-part-1-data-structure\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/craftcoders.app\/a-deep-dive-into-apache-cassandra-part-1-data-structure\/#primaryimage\",\"url\":\"https:\/\/craftcoders.app\/wp-content\/uploads\/2018\/10\/Cassandra_Stratford_Gallery.jpg\",\"contentUrl\":\"https:\/\/craftcoders.app\/wp-content\/uploads\/2018\/10\/Cassandra_Stratford_Gallery.jpg\",\"width\":575,\"height\":478,\"caption\":\"cmp3.10.3.1Lq3 0xf5fffab1\"},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/craftcoders.app\/#website\",\"url\":\"https:\/\/craftcoders.app\/\",\"name\":\"CraftCoders.app\",\"description\":\"Jira and Confluence apps\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/craftcoders.app\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/craftcoders.app\/#\/schema\/person\/5ece49c0f004fa0def05eabd3e877f46\",\"name\":\"Leon Gottschick\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/craftcoders.app\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/a3d3933320fa0864064ffa224e524439?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/a3d3933320fa0864064ffa224e524439?s=96&d=mm&r=g\",\"caption\":\"Leon Gottschick\"},\"url\":\"https:\/\/craftcoders.app\/author\/leon\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A deep dive into Apache Cassandra - Part 1: Data Structure (was not continued) - CraftCoders.app","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/craftcoders.app\/a-deep-dive-into-apache-cassandra-part-1-data-structure\/","og_locale":"en_US","og_type":"article","og_title":"A deep dive into Apache Cassandra - Part 1: Data Structure (was not continued) - CraftCoders.app","og_description":"Hey guys, during my studies I had to analyze the NoSQL database Cassandra as a possible replacement for a regular relational database.During my research I dove really deep into the architecture and the data model of Cassandra and I figured that someone may profit from my previous research, maybe for your own evaluation process of Cassandra or just personal curiosity. ... Read More","og_url":"https:\/\/craftcoders.app\/a-deep-dive-into-apache-cassandra-part-1-data-structure\/","og_site_name":"CraftCoders.app","article_published_time":"2018-10-01T19:14:56+00:00","article_modified_time":"2024-08-14T12:27:53+00:00","og_image":[{"width":575,"height":478,"url":"https:\/\/craftcoders.app\/wp-content\/uploads\/2018\/10\/Cassandra_Stratford_Gallery.jpg","type":"image\/jpeg"}],"author":"Leon Gottschick","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Leon Gottschick","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/craftcoders.app\/a-deep-dive-into-apache-cassandra-part-1-data-structure\/","url":"https:\/\/craftcoders.app\/a-deep-dive-into-apache-cassandra-part-1-data-structure\/","name":"A deep dive into Apache Cassandra - Part 1: Data Structure (was not continued) - CraftCoders.app","isPartOf":{"@id":"https:\/\/craftcoders.app\/#website"},"primaryImageOfPage":{"@id":"https:\/\/craftcoders.app\/a-deep-dive-into-apache-cassandra-part-1-data-structure\/#primaryimage"},"image":{"@id":"https:\/\/craftcoders.app\/a-deep-dive-into-apache-cassandra-part-1-data-structure\/#primaryimage"},"thumbnailUrl":"https:\/\/craftcoders.app\/wp-content\/uploads\/2018\/10\/Cassandra_Stratford_Gallery.jpg","datePublished":"2018-10-01T19:14:56+00:00","dateModified":"2024-08-14T12:27:53+00:00","author":{"@id":"https:\/\/craftcoders.app\/#\/schema\/person\/5ece49c0f004fa0def05eabd3e877f46"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/craftcoders.app\/a-deep-dive-into-apache-cassandra-part-1-data-structure\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/craftcoders.app\/a-deep-dive-into-apache-cassandra-part-1-data-structure\/#primaryimage","url":"https:\/\/craftcoders.app\/wp-content\/uploads\/2018\/10\/Cassandra_Stratford_Gallery.jpg","contentUrl":"https:\/\/craftcoders.app\/wp-content\/uploads\/2018\/10\/Cassandra_Stratford_Gallery.jpg","width":575,"height":478,"caption":"cmp3.10.3.1Lq3 0xf5fffab1"},{"@type":"WebSite","@id":"https:\/\/craftcoders.app\/#website","url":"https:\/\/craftcoders.app\/","name":"CraftCoders.app","description":"Jira and Confluence apps","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/craftcoders.app\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/craftcoders.app\/#\/schema\/person\/5ece49c0f004fa0def05eabd3e877f46","name":"Leon Gottschick","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/craftcoders.app\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/a3d3933320fa0864064ffa224e524439?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a3d3933320fa0864064ffa224e524439?s=96&d=mm&r=g","caption":"Leon Gottschick"},"url":"https:\/\/craftcoders.app\/author\/leon\/"}]}},"_links":{"self":[{"href":"https:\/\/craftcoders.app\/wp-json\/wp\/v2\/posts\/749"}],"collection":[{"href":"https:\/\/craftcoders.app\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/craftcoders.app\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/craftcoders.app\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/craftcoders.app\/wp-json\/wp\/v2\/comments?post=749"}],"version-history":[{"count":2,"href":"https:\/\/craftcoders.app\/wp-json\/wp\/v2\/posts\/749\/revisions"}],"predecessor-version":[{"id":2341,"href":"https:\/\/craftcoders.app\/wp-json\/wp\/v2\/posts\/749\/revisions\/2341"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/craftcoders.app\/wp-json\/wp\/v2\/media\/2340"}],"wp:attachment":[{"href":"https:\/\/craftcoders.app\/wp-json\/wp\/v2\/media?parent=749"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/craftcoders.app\/wp-json\/wp\/v2\/categories?post=749"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/craftcoders.app\/wp-json\/wp\/v2\/tags?post=749"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}