Iris MultiClass ClassificationΒΆ

The following code illustrates how TransmogrifAI can be used to do classify multiple classes over the Iris dataset. This example is very similar to the Titanic Binary Classification example, so you should look over that example first if you have not already. The code for this example can be found here, and the data over here.

Data Schema

case class Iris
(
  id: Int,
  sepalLength: Double,
  sepalWidth: Double,
  petalLength: Double,
  petalWidth: Double,
  irisClass: String
)

Define Features

val sepalLength = FeatureBuilder.Real[Iris].extract(_.getSepalLength.toReal).asPredictor
val sepalWidth = FeatureBuilder.Real[Iris].extract(_.getSepalWidth.toReal).asPredictor
val petalLength = FeatureBuilder.Real[Iris].extract(_.getPetalLength.toReal).asPredictor
val petalWidth = FeatureBuilder.Real[Iris].extract(_.getPetalWidth.toReal).asPredictor
val irisClass = FeatureBuilder.Text[Iris].extract(_.getClass$.toText).asResponse

Feature Engineering

val features = Seq(sepalLength, sepalWidth, petalLength, petalWidth).transmogrify()
val label = irisClass.indexed()
val checkedFeatures = label.sanityCheck(features, removeBadFeatures = true)

Modeling & Evaluation

In MultiClass Classification, we use the MultiClassificationModelSelector to select the model we want to run on, which is Logistic Regression in this case. You can find more information on model selection here.

val prediction = MultiClassificationModelSelector
  .withTrainValidationSplit(
    modelTypesToUse = Seq(OpLogisticRegression))
  .setInput(label, checkedFeatures).getOutput()

val evaluator = Evaluators.MultiClassification()
  .setLabelCol(label)
  .setPredictionCol(prediction)

val workflow = new OpWorkflow().setResultFeatures(prediction, label).setReader(dataReader)

val model = workflow.train()

Results

We can still find the contributions of each feature for the model, but in MultiClass Classification, ModelInsights has a contribution of each feature to the prediction of each class. This code takes the max of all of these contributions as the overall contribution.

val modelInsights = model.modelInsights(prediction)
val modelFeatures = modelInsights.features.flatMap( feature => feature.derivedFeatures)
val featureContributions = modelFeatures.map( feature => (feature.derivedFeatureName,
  feature.contribution.map( contribution => math.abs(contribution))
    .foldLeft(0.0) { (max, contribution) => math.max(max, contribution)}))
val sortedContributions = featureContributions.sortBy( contribution => -contribution._2)
    
val (scores, metrics) = model.scoreAndEvaluate(evaluator = evaluator)

You can run the code using the following command:

cd helloworld
./gradlew compileTestScala installDist
./gradlew -q sparkSubmit -Dmain=com.salesforce.hw.OpIrisSimple -Dargs="\
`pwd`/src/main/resources/IrisDataset/iris.csv"