Example 2: Multiclass Classification on Glass Identification Dataset

This example demonstrates the low-level API of Neuro-Fuzzy Toolbox on the Glass Identification dataset, a nine-feature, six-class classification benchmark. Its dimensionality makes rule_reduced_ANFIS a more suitable choice than classical ANFIS, since it avoids the combinatorial growth of rules with the number of input features.

The example combines an initial gradient-based training phase with a custom greedy rule-growing procedure that iteratively expands the rule base by targeting the worst-performing class at each step. This is the same workflow described in the Custom Training section.

Imports and reproducibility

from ucimlrepo import fetch_ucirepo

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, LabelEncoder
from sklearn.metrics import (
    confusion_matrix, f1_score, precision_score,
    recall_score, accuracy_score, classification_report
)

import torch
import torch.nn as nn
import torch.utils.data as data
import numpy as np
import random

import neuro_fuzzy_toolbox as nft

SEED = 0
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)

Data

The class labels are re-encoded with LabelEncoder to produce contiguous integer indices starting from 0, as required by CrossEntropyLoss. The dataset is split into training (70%), validation (16%), and test (14%) sets using stratified sampling.

glass_identification = fetch_ucirepo(id=42)

X = glass_identification.data.features
y = glass_identification.data.targets

le = LabelEncoder()
y.loc[:, 'Type_of_glass'] = le.fit_transform(y['Type_of_glass'])
y = y.astype('int64')

x_train, x_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, stratify=y, random_state=SEED
)
x_train, x_val, y_train, y_val = train_test_split(
    x_train, y_train, test_size=0.2, stratify=y_train, random_state=SEED
)

scaler = MinMaxScaler(feature_range=(0, 1))

x_train = torch.from_numpy(scaler.fit_transform(x_train)).to(torch.float32)
x_val   = torch.from_numpy(scaler.transform(x_val)).to(torch.float32)
x_test  = torch.from_numpy(scaler.transform(x_test)).to(torch.float32)

y_train = torch.from_numpy(y_train.values).squeeze()
y_val   = torch.from_numpy(y_val.values).squeeze()
y_test  = torch.from_numpy(y_test.values).squeeze()

DataLoaders

generator = torch.Generator()
generator.manual_seed(SEED)

train_loader = data.DataLoader(
    data.TensorDataset(x_train, y_train),
    batch_size=8, shuffle=True, generator=generator
)
val_loader = data.DataLoader(
    data.TensorDataset(x_val, y_val),
    batch_size=8, shuffle=False
)

Model

A rule_reduced_ANFIS model is instantiated with 5 initial rules, GeneralizedBell_MF membership functions, and a softmax output layer for six-class classification. The custom rule-growing procedure will expand the rule base dynamically during training.

features = X.columns.tolist()

model = nft.rule_reduced_ANFIS(
    input_size=x_train.shape[1],
    num_mfs=5, # 5 rules initially (rule-reduced model)
    outputs=6,
    membership_function=nft.GeneralizedBell_MF(),
    output_type='softmax',
    features=features
)

Initial training

The model is first trained with the Basic Optimizer Training Algorithm to establish a reasonable baseline before the rule-growing procedure begins.

trainer = nft.Basic_optimizer_training_algorithm(
    epochs=5000,
    loss_function=nn.CrossEntropyLoss(),
    optimizer=torch.optim.AdamW,
    optimizer_params={'lr': 1e-3, 'weight_decay': 1e-2},
    early_stopping=nft.EarlyStopping(patience=60)
)

trainer(model, train_loader, val_loader)

Initial evaluation

pred = model.predict(x_test)

acc        = accuracy_score(y_test, pred)
prec       = precision_score(y_test, pred, average='weighted', zero_division=0)
recall     = recall_score(y_test, pred, average='weighted', zero_division=0)
f1         = f1_score(y_test, pred, average='weighted', zero_division=0)
conf_matrix = confusion_matrix(y_test, pred)
class_rep  = classification_report(y_test, pred)

print("Accuracy:", acc)
print("Precision:", prec)
print("Recall:", recall)
print("F1 score:", f1, "\n")

print("Confusion Matrix:")
print(conf_matrix, "\n")

print("Classification Report:")
print(class_rep)

Accuracy: 0.5846153846153846
Precision: 0.6485067873303167
Recall: 0.5846153846153846
f1 score: 0.5797903356799868

Confusion Matrix:
[[ 8  6  6  0  1  0]
 [ 2 19  1  1  0  0]
 [ 0  4  1  0  0  0]
 [ 0  3  0  1  0  0]
 [ 0  0  0  0  2  1]
 [ 0  2  0  0  0  7]]

Classification Report:
              precision    recall  f1-score   support

           0       0.80      0.38      0.52        21
           1       0.56      0.83      0.67        23
           2       0.12      0.20      0.15         5
           3       0.50      0.25      0.33         4
           4       0.67      0.67      0.67         3
           5       0.88      0.78      0.82         9

    accuracy                           0.58        65
   macro avg       0.59      0.52      0.53        65
weighted avg       0.65      0.58      0.58        65

Custom strategy: greedy rule-growing

The greedy rule-growing procedure iteratively attempts to expand the rule base. At each step, a new rule is added centered on a training sample from the class with the lowest current recall. The new rule’s parameters are fine-tuned in isolation; if validation loss improves, the rule is retained and a global readaptation step is performed over all parameters. Otherwise, the rule is discarded. The procedure terminates when a maximum number of consecutive failed attempts is reached.

Helper function

loss_function = nn.CrossEntropyLoss()

def val_loss(model):
    with torch.no_grad():
        return sum(
            loss_function(model(xb), yb) for xb, yb in val_loader
        ) / len(val_loader)

Hyperparameters

max_failed_attempts      = 5

single_adaptation_lr     = 0.005
single_adaptation_epochs = 500
single_patience          = 30

global_adaptation_lr     = 0.001
global_adaptation_epochs = 1000
global_patience          = 60

Rule-growing loop

failed_attempts = 0
best_loss = val_loss(model)
print(f"Initial val loss: {best_loss:.4f} | Rules: {model.rules}")
print("=" * 60)

while failed_attempts < max_failed_attempts:

    # Identify the worst-recall class
    with torch.no_grad():
        pred_train = model.predict(x_train)
    recalls = recall_score(
        y_train.numpy(), pred_train.numpy(), average=None, zero_division=0
    )
    worst_class = int(recalls.argmin())
    print(f"Recalls per class: {[f'{r:.2f}' for r in recalls]}")
    print(f"Worst class: {worst_class} (recall={recalls[worst_class]:.2f})")

    # Add a rule centered on a sample from the worst class
    class_indices = (y_train == worst_class).nonzero(as_tuple=True)[0]
    idx   = class_indices[torch.randint(0, len(class_indices), (1,))]
    means = x_train[idx].to(torch.float32)
    stds  = torch.full_like(means, 0.25)
    model.add_rules(means, stds)
    print(f"Rule added. Total rules: {model.rules}")

    # Fine-tune only the new rule's parameters
    new_params = [
        model.get_premises_as_parameters_list()[-1],
        model.get_consequents_as_parameters_list()[-1]
    ]
    opt_new = torch.optim.AdamW(
        new_params, lr=single_adaptation_lr, weight_decay=0.01
    )
    best_single_loss  = val_loss(model)
    patience_counter  = 0

    for epoch in range(single_adaptation_epochs):
        for xb, yb in train_loader:
            opt_new.zero_grad()
            loss_function(model(xb), yb).backward()
            opt_new.step()
        current = val_loss(model)
        if current < best_single_loss:
            best_single_loss = current
            patience_counter = 0
        else:
            patience_counter += 1
        if patience_counter >= single_patience:
            print(f"  Single adaptation stopped at epoch {epoch + 1}"
                  f" | val loss: {current:.4f}")
            break

    val_after_single = val_loss(model)
    print(f"Val loss after single adaptation: {val_after_single:.4f}"
          f" (before: {best_loss:.4f})")

    # Retain or discard the new rule
    if val_after_single < best_loss:
        print("Rule RETAINED. Running global readaptation...")
        opt_all = torch.optim.AdamW(
            model.parameters(), lr=global_adaptation_lr, weight_decay=0.01
        )
        best_global_loss = val_after_single
        patience_counter = 0

        for epoch in range(global_adaptation_epochs):
            for xb, yb in train_loader:
                opt_all.zero_grad()
                loss_function(model(xb), yb).backward()
                opt_all.step()
            current = val_loss(model)
            if current < best_global_loss:
                best_global_loss = current
                patience_counter = 0
            else:
                patience_counter += 1
            if patience_counter >= global_patience:
                print(f"  Global adaptation stopped at epoch {epoch + 1}"
                      f" | val loss: {current:.4f}")
                break

        best_loss       = val_loss(model)
        failed_attempts = 0
        print(f"Val loss after global adaptation: {best_loss:.4f}")
    else:
        model.remove_rules([model.rules - 1])
        failed_attempts += 1
        print(f"Rule DISCARDED. Failed attempts:"
              f" {failed_attempts}/{max_failed_attempts}")

    print(f"Rules: {model.rules} | Best val loss: {best_loss:.4f}")
    print("-" * 60)

print(f"\nFinal number of rules: {model.rules}")

Initial val loss: 0.8744 | Rules: 5
============================================================
Recalls per class: ['0.69', '0.79', '0.40', '1.00', '0.80', '0.81']
Worst class: 2 (recall=0.40)
Rule added. Total rules: 6
  Single adaptation stopped at epoch 33 | val loss: 0.8744
Val loss after single adaptation: 0.8744 (before: 0.8744)
Rule RETAINED. Running global readaptation...
  Global adaptation stopped at epoch 62 | val loss: 0.9219
Val loss after global adaptation: 0.9219
Rules: 6 | Best val loss: 0.9219
------------------------------------------------------------
Recalls per class: ['0.74', '0.71', '0.70', '1.00', '1.00', '0.81']
Worst class: 2 (recall=0.70)
...
...
...
Rules: 7 | Best val loss: 0.9153
------------------------------------------------------------

Final number of rules: 7

Final evaluation

pred = model.predict(x_test)

acc        = accuracy_score(y_test, pred)
prec       = precision_score(y_test, pred, average='weighted', zero_division=0)
recall     = recall_score(y_test, pred, average='weighted', zero_division=0)
f1         = f1_score(y_test, pred, average='weighted', zero_division=0)
conf_matrix = confusion_matrix(y_test, pred)
class_rep  = classification_report(y_test, pred)

print("Accuracy:", acc)
print("Precision:", prec)
print("Recall:", recall)
print("F1 score:", f1, "\n")

print("Confusion Matrix:")
print(conf_matrix, "\n")

print("Classification Report:")
print(class_rep)

Accuracy: 0.6153846153846154
Precision: 0.6494505494505495
Recall: 0.6153846153846154
f1 score: 0.622604365590791

Confusion Matrix:
[[12  3  5  0  1  0]
 [ 3 17  2  1  0  0]
 [ 1  3  1  0  0  0]
 [ 0  3  0  1  0  0]
 [ 0  0  0  0  2  1]
 [ 0  2  0  0  0  7]]

Classification Report:
              precision    recall  f1-score   support

           0       0.75      0.57      0.65        21
           1       0.61      0.74      0.67        23
           2       0.12      0.20      0.15         5
           3       0.50      0.25      0.33         4
           4       0.67      0.67      0.67         3
           5       0.88      0.78      0.82         9

    accuracy                           0.62        65
   macro avg       0.59      0.53      0.55        65
weighted avg       0.65      0.62      0.62        65

Note

The built-in SONFIS algorithm provides a self-organizing alternative to this custom procedure, encapsulating rule growing, splitting, and pruning within a single training loop operating directly on rule_reduced_ANFIS models.