Difference between RDD , DF and DS in Spark

Knoldus Blogs

In this blog I try to cover the difference between RDD, DF and DS. much of you have a little bit confused about RDD, DF and DS. so don’t worry after this blog everything will be clear.

With Spark2.0 release, there are 3 types of data abstractions which Spark officially provides now to use: RDD, DataFrame and DataSet.

so let’s start some discussion about it.

Resilient Distributed Datasets (RDDs) – Rdd is is a fault-tolerant collection of elements that can be operated on in parallel.
By the rdd, we can perform operations on data on the different nodes of the same cluster parallelly so it’s helpful in increasing the performance.

How we can create the RDD

Spark context(sc) helps to create the rdd in the spark. it can create the rdd from –

  1. external storage system like HDFS, HBase, or any data source offering a Hadoop InputFormat.
  2. parallelizing an…

View original post 659 more words

Automatic Deployment Of Lagom Service On ConductR On DCOS

Knoldus Blogs

In our previous blog we have seen how to deploy the lagom service on conductr – https://blog.knoldus.com/2017/05/25/deploying-the-lagom-service-on-conductr/

In this blog we will create script to deploy your Lagom service on conduct present or running on top of DCOS.

There are two types of automatic deployments can be done –

  1. Deploying from scratch (stop the current Lagom bundle and deploy the new one)
  2. Rolling/Incremental Deployment( Overriding the already running Lagom bundle)

Note* – Currently for automatic deployment on conductr running on DCOS, one need Enterprise Version of DCOS cluster setup rather than the open source. Another way is to disable the authentication.

  • Deploying From Scratch –

In this approach one need to stop and unload the running bundle and then need to run the below script.

The Script and its details are as follow –

View original post 360 more words