Iris MultiClass ClassificationΒΆ
The following code illustrates how TransmogrifAI can be used to do classify multiple classes over the Iris dataset. The code for this example can be found here, and the data over here.
Define features
val id = FeatureBuilder.Integral[Iris].extract(_.getID.toIntegral).asPredictor
val sepalLength = FeatureBuilder.Real[Iris].extract(_.getSepalLength.toReal).asPredictor
val sepalWidth = FeatureBuilder.Real[Iris].extract(_.getSepalWidth.toReal).asPredictor
val petalLength = FeatureBuilder.Real[Iris].extract(_.getPetalLength.toReal).asPredictor
val petalWidth = FeatureBuilder.Real[Iris].extract(_.getPetalWidth.toReal).asPredictor
val irisClass = FeatureBuilder.Text[Iris].extract(_.getClass$.toText).asResponse
Feature Engineering
val labels = irisClass.indexed()
val features = Seq(sepalLength, sepalWidth, petalLength, petalWidth).transmogrify()
Modeling & Evaluation
val pred = MultiClassificationModelSelector
.withCrossValidation(splitter = Some(DataCutter(reserveTestFraction = 0.2, seed = randomSeed)), seed = randomSeed)
.setInput(labels, features).getOutput()
private val evaluator = Evaluators.MultiClassification.f1()
.setLabelCol(labels)
.setPredictionCol(pred)
private val wf = new OpWorkflow().setResultFeatures(pred, labels)
def runner(opParams: OpParams): OpWorkflowRunner =
new OpWorkflowRunner(
workflow = wf,
trainingReader = irisReader,
scoringReader = irisReader,
evaluationReader = Option(irisReader),
evaluator = Option(evaluator),
featureToComputeUpTo = Option(features)
)
You can run the code using the following commands for train, score and evaluate:
cd helloworld
./gradlew compileTestScala installDist
Train
./gradlew -q sparkSubmit -Dmain=com.salesforce.hw.iris.OpIris -Dargs="\
--run-type=train \
--model-location=/tmp/iris-model \
--read-location Iris=`pwd`/src/main/resources/IrisDataset/iris.data"
Score
./gradlew -q sparkSubmit -Dmain=com.salesforce.hw.iris.OpIris -Dargs="\
--run-type=score \
--model-location=/tmp/iris-model \
--read-location Iris=`pwd`/src/main/resources/IrisDataset/bezdekIris.data \
--write-location=/tmp/iris-scores"
Evaluate
./gradlew -q sparkSubmit -Dmain=com.salesforce.hw.iris.OpIris -Dargs="\
--run-type=evaluate \
--model-location=/tmp/iris-model \
--metrics-location=/tmp/iris-metrics \
--read-location Iris=`pwd`/src/main/resources/IrisDataset/bezdekIris.data \
--write-location=/tmp/iris-eval"