The stable build of Pentaho Community Edition (CE) 5.0 has been released. Many many new features have made it to this build. Particularly keen to try out the enhancements to web services deployment via Carte. More details can be found in the Pentaho 5.0 release notes.
Insights on Java, Big Data, Search, Cloud, Algorithms, Data Science, Machine Learning...
Showing posts with label Pentaho Kettle Clusters. Show all posts
Showing posts with label Pentaho Kettle Clusters. Show all posts
Sunday, November 24, 2013
Saturday, November 16, 2013
Pentaho Clusters
Pentaho provides the option to scale out Kettle Transformations via Pentaho Clusters. It is fairly straightforward to set up a Pentaho cluster and elastic/dynamic clusters. The 1-2-3 of what needs to be done is:
1. Start the Carte Instances
There are two kinds of instances - Masters & Slaves. At least one instance must act as the dedicated Master which takes on the responsibility of management/ distribution of transformations/ steps to slaves, fail-over/ restart and communicating with the slaves.
The Carte instances need a config file with details about the Master's port, IP/ Hostname etc. For sample config files take a look at the pwd folder in your default Pentaho installation (/data-integration/pwd).
E.g. With defaults, a cluster can be started on localhost with:
2. Set up Cluster & Server Information using Spoon (GUI)
Switch to the View tab, next to the Design tab in the left hand panel of the Spoon GUI.
Click on 'Slave Servers' to add new Slave servers (host, port, name, etc.). Make sure to check the 'is_the_master' checkbox for the Master server.
Next click on the 'Kettle Cluster Schemas' and use 'Select Slave servers' to choose the slave servers. For the ability to dynamically add/ remove slave servers, also select the 'Dynamic Cluster' checkbox.
3. Mark Transformation Steps to Execute in Cluster Mode
Right click on the step which needs to be run in the cluster mode, select Clustering & then select the cluster schema. You will now see a symbol next to the step (CxN) indicating that the step is to be executed in a clustered mode.
The cluster settings will be similar to what you see in the left panel in the image. You can also see a transformation, with two steps (Random & Replace in String) being run in a clustered mode in the right panel in the image below.
1. Start the Carte Instances
There are two kinds of instances - Masters & Slaves. At least one instance must act as the dedicated Master which takes on the responsibility of management/ distribution of transformations/ steps to slaves, fail-over/ restart and communicating with the slaves.
The Carte instances need a config file with details about the Master's port, IP/ Hostname etc. For sample config files take a look at the pwd folder in your default Pentaho installation (/data-integration/pwd).
E.g. With defaults, a cluster can be started on localhost with:
./carte.sh localhost 8080 (For master)
& ./carte.sh localhost 8081 (For slave1)
./carte.sh localhost 8082 (For slave2), & so on..
& ./carte.sh localhost 8081 (For slave1)
./carte.sh localhost 8082 (For slave2), & so on..
2. Set up Cluster & Server Information using Spoon (GUI)
Switch to the View tab, next to the Design tab in the left hand panel of the Spoon GUI.
Click on 'Slave Servers' to add new Slave servers (host, port, name, etc.). Make sure to check the 'is_the_master' checkbox for the Master server.
Next click on the 'Kettle Cluster Schemas' and use 'Select Slave servers' to choose the slave servers. For the ability to dynamically add/ remove slave servers, also select the 'Dynamic Cluster' checkbox.
3. Mark Transformation Steps to Execute in Cluster Mode
Right click on the step which needs to be run in the cluster mode, select Clustering & then select the cluster schema. You will now see a symbol next to the step (CxN) indicating that the step is to be executed in a clustered mode.
The cluster settings will be similar to what you see in the left panel in the image. You can also see a transformation, with two steps (Random & Replace in String) being run in a clustered mode in the right panel in the image below.
Subscribe to:
Posts (Atom)