Loading AIminify models

This page explains how to properly load compressed models after using AIminify, covering both PyTorch and TensorFlow frameworks.﻿﻿﻿﻿ 💾PyTorch model loading⁠ ﻿﻿﻿ 💾TensorFlow model loading⁠ ﻿﻿﻿ 💾Complete end-to-end examples⁠ ﻿﻿﻿ 💾Troubleshooting common issues⁠ ﻿﻿﻿ 💾Best practices⁠ ﻿
Understanding AIminify's saving strategyAIminify automatically determines the best saving method for your compressed model based on its type and compression techniques applied. The loading method you use depends on how the model was saved.
﻿
PyTorch model loadingSaving strategy decision treeAIminify follows this hierarchy when saving PyTorch models:
1. Try torch.save()        → Load with torch.load()
2. If fails: torch.jit.trace()  → Load with torch.jit.load()  
3. If fails: torch.jit.script() → Load with torch.jit.load()
4. If all fail: Manual export required (e.g., ONNX)
When to Use Each Loading Methodtorch.load() - Standard PyTorch ModelsUse when:
Model was saved with torch.save()
Non-quantized models
Models that didn't require JIT compilation
Log message shows: "model can be loaded with torch.load()"
torch.jit.load() - JIT compiled modelsUse when:
Model was saved with torch.jit.trace() or torch.jit.script()
Most of the times, both for non-quantized and quantized models (most common cases)
Models that couldn't be saved with standard torch.save()
Log message shows: "model can be loaded with torch.jit.load()"
import torch
import torchvision.models as models
﻿
from aiminify import minify, save_model
﻿
# Compress model (non-quantized)
model = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
compressed_model, _ = minify(
    model,
    quantization=False,
    compression_strength=3,
)
﻿
# Save model
save_model(
    compressed_model,
    input_shape=(1, 3, 224, 224),
    path="compressed_model.pt",
)
# The following log will be printed:
# 2025-08-01 11:15:04,785 - aiminify [INFO] storing model as pt format with filename compressed_model.pt, model can be loaded with torch.jit.load(compressed_model.pt)
﻿
# Load model - Standard method
loaded_model = torch.jit.load("compressed_model.pt")
loaded_model.eval()
﻿
# Inference
with torch.no_grad():
    sample_input = torch.randn(1, 3, 224, 224)
    output = loaded_model(sample_input)
Robust loading functionFor maximum compatibility, use this robust loading approach:
def load_aiminify_model(model_path):
    """
    Robust function to load AIminify compressed PyTorch models.
    Tries both loading methods automatically.
    """
    try:
        # Try standard PyTorch loading first
        model = torch.load(model_path, map_location='cpu')
        print(f"Model loaded successfully with torch.load()")
        return model
    except Exception as e:
        print(f"torch.load() failed: {e}")
        try:
            # Fallback to JIT loading
            model = torch.jit.load(model_path, map_location='cpu')
            print(f"Model loaded successfully with torch.jit.load()")
            return model
        except Exception as e2:
            raise RuntimeError(f"Failed to load model with both methods. "
                             f"torch.load() error: {e}, torch.jit.load() error: {e2}")
﻿
# Usage
model = load_aiminify_model("compressed_model.pt")
model.eval()
Model type examples1. Pruned only (no quantization)# Compression
compressed_model, _ = minify(
    model, 
    compression_strength=3,
    quantization=False,  # No quantization
    fine_tune=True,
)
save_model(
    compressed_model, 
    input_shape=(1, 3, 224, 224), 
    path="pruned_model.pt",
)
﻿
# Loading - Usually torch.load()
loaded_model = torch.jit.load("pruned_model.pt")
2. Quantized models# Compression with quantization
compressed_model, _ = minify(
    model, 
    compression_strength=3,
    quantization=True,  # Quantization enabled
    training_generator=train_loader  # Required for quantization
)
save_model(compressed_model, input_shape=(1, 3, 224, 224), path="quantized_model.pt")
﻿
# Loading - Usually torch.jit.load()
loaded_model = torch.jit.load("quantized_model.pt")
﻿
TensorFlow model loadingTensorFlow saving strategyAIminify uses different formats based on model type:
- Standard Models: .keras format → Load with tf.keras.models.load_model()
- Quantized Models: .tflite format → Load with tf.lite.Interpreter()
Standard TensorFlow models (.keras)import tensorflow as tf
﻿
from aiminify import minify, save_model
﻿
# Compress TensorFlow model (no quantization)
model = tf.keras.applications.ResNet50(weights='imagenet')
compressed_model, _ = minify(
  model,
  quantization=False,
  compression_strength=3,
)
﻿
# Save model (.keras format)
save_model(
    compressed_model, 
    path="compressed_model.keras", 
    input_shape=(224, 224, 3),
)
﻿
# Load model
loaded_model = tf.keras.models.load_model("compressed_model.keras")
﻿
# Inference
sample_input = tf.random.normal((1, 224, 224, 3))
output = loaded_model(sample_input)
Quantized TensorFlow models (.tflite)import tensorflow as tf
﻿
from aiminify import minify, save_model
﻿
# Compress with quantization
model = tf.keras.applications.ResNet50(weights='imagenet')
compressed_model, _ = minify(
  model, 
  quantization=True, 
  compression_strength=3,
)
﻿
# Save model (.tflite format)
save_model(
  compressed_model, 
  path="quantized_model.tflite",
  input_shape=(224, 244, 3),
)
﻿
# Load quantized model
interpreter = tf.lite.Interpreter(model_path="quantized_model.tflite")
interpreter.allocate_tensors()
﻿
# Get input and output tensors
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
﻿
# Inference
sample_input = tf.random.normal((1, 224, 224, 3)).numpy()
interpreter.set_tensor(input_details[0]['index'], sample_input)
interpreter.invoke()
output = interpreter.get_tensor(output_details[0]['index'])
Robust TensorFlow loading functiondef load_aiminify_tf_model(model_path):
    """
    Load AIminify compressed TensorFlow models based on file extension.
    """
    if model_path.endswith('.keras') or model_path.endswith('.h5'):
        # Standard Keras model
        model = tf.keras.models.load_model(model_path)
        print(f"Keras model loaded from {model_path}")
        return model, 'keras'
        
    elif model_path.endswith('.tflite'):
        # Quantized TFLite model
        interpreter = tf.lite.Interpreter(model_path=model_path)
        interpreter.allocate_tensors()
        print(f"TFLite model loaded from {model_path}")
        return interpreter, 'tflite'
        
    else:
        raise ValueError(f"Unsupported file format: {model_path}")
﻿
# Usage
model, model_type = load_aiminify_tf_model("compressed_model.keras")
﻿
if model_type == 'keras':
    # Standard inference
    output = model(sample_input)
elif model_type == 'tflite':
    # TFLite inference
    input_details = model.get_input_details()
    output_details = model.get_output_details()
    model.set_tensor(input_details[0]['index'], sample_input.numpy())
    model.invoke()
    output = model.get_tensor(output_details[0]['index'])
﻿
Complete end-to-end examplesPyTorch complete workflowimport torch
import torchvision.transforms as transforms
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader
from aiminify import minify, save_model
﻿
# 1. Load and prepare model
model = torch.hub.load(
    'pytorch/vision', 
    'resnet18', 
    pretrained=True,
)
model.eval()
﻿
# 2. Prepare data for compression
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])
﻿
# Sample dataset for calibration
dataset = CIFAR10(
    root='./data', 
    train=True,
    download=True, 
    transform=transform,
)
dataloader = DataLoader(
    dataset, 
    batch_size=32, 
    shuffle=True,
)
﻿
# 3. Compress model
compressed_model, feedback = minify(
    model,
    compression_strength=3,
    quantization=True,  # This will likely require JIT loading
    training_generator=dataloader,
    fine_tune=True,
    verbose=1,
)
﻿
# 4. Save compressed model
save_model(
    compressed_model, 
    input_shape=(1, 3, 32, 32), 
    path="compressed_resnet18.pt",
)
print("Check the log message above to see which loading method to use!")
﻿
# 5. Load compressed model (robust method)
def load_aiminify_model(model_path):
    try:
        return torch.load(model_path, map_location='cpu')
    except:
        return torch.jit.load(model_path, map_location='cpu')
﻿
loaded_model = load_aiminify_model("compressed_resnet18.pt")
loaded_model.eval()
﻿
# 6. Inference
with torch.no_grad():
    sample_input = torch.randn(1, 3, 32, 32)
    output = loaded_model(sample_input)
    predicted_class = torch.argmax(output, dim=1)
    print(f"Predicted class: {predicted_class.item()}")
TensorFlow complete workflowimport tensorflow as tf
from aiminify import minify, save_model
﻿
# 1. Load and prepare model
model = tf.keras.applications.MobileNetV2(
    input_shape=(224, 224, 3),
    weights='imagenet',
    include_top=True,
)
﻿
# 2. Prepare sample data
sample_data = tf.random.normal((100, 224, 224, 3))
﻿
# 3. Compress model
compressed_model, feedback = minify(
    model,
    compression_strength=3,
    quantization=True,  # This will create .tflite file
    verbose=1,
)
﻿
# 4. Save compressed model
save_model(compressed_model, path="compressed_mobilenet.tflite")
﻿
# 5. Load compressed model
if compressed_model.endswith('.tflite'):
    # Quantized model
    interpreter = tf.lite.Interpreter(model_path="compressed_mobilenet.tflite")
    interpreter.allocate_tensors()
    
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    
    # 6. Inference
    sample_input = tf.random.normal((1, 224, 224, 3))
    interpreter.set_tensor(input_details[0]['index'], sample_input.numpy())
    interpreter.invoke()
    output = interpreter.get_tensor(output_details[0]['index'])
    
else:
    # Standard model
    loaded_model = tf.keras.models.load_model("compressed_mobilenet.keras")
    output = loaded_model(sample_input)
﻿
print(f"Model output shape: {output.shape}")
﻿
Troubleshooting common issuesPyTorch issuesProblem: torch.load() fails with serialization error
Solution: The model was likely saved with JIT. Use torch.jit.load() instead.
Problem: JIT model gives different results than original
Solution: This is expected for quantized models. Small numerical differences are normal.
Problem: Model file is very large after compression
Solution: Check if quantization was applied. Non-quantized models may not reduce file size significantly.
TensorFlow issuesProblem: .keras file won't load
Solution: Try loading as .h5 format or check TensorFlow version compatibility.
Problem: TFLite model gives errors during inference
Solution: Ensure input data type and shape match the model's expectations. TFLite models often require specific data types.
General issuesProblem: Don't know which loading method to use
Solution: Check the console output from save_model(). It explicitly tells you which loading method to use.
Problem: Model performance is significantly degraded
Solution: Try lower compression_strength values or disable quantization for better accuracy retention.
﻿
Best practicesAlways check the log output from save_model() - it tells you exactly which loading method to use
Use the robust loading functions provided above for maximum compatibility
Test inference immediately after loading to ensure the model works correctly
Save the original model before compression for comparison
Use appropriate compression settings - start with lower compression strengths for critical applications
For production deployment, prefer the explicit loading method shown in logs over robust fallback functions
SummaryPyTorch: Use torch.load() for standard models, torch.jit.load() for quantized/JIT models
TensorFlow: Use tf.keras.models.load_model() for .keras files, tf.lite.Interpreter() for .tflite files
Always check the console output from AIminify to know which method to use
Use the robust loading functions when in doubt
Loading AIminify models

Understanding AIminify's saving strategy

PyTorch model loading

Saving strategy decision tree

When to Use Each Loading Method

`torch.load()` - Standard PyTorch Models

`torch.jit.load()` - JIT compiled models

Robust loading function

Model type examples

1. Pruned only (no quantization)

2. Quantized models

TensorFlow model loading

TensorFlow saving strategy

Standard TensorFlow models (.keras)

Quantized TensorFlow models (.tflite)

Robust TensorFlow loading function

Complete end-to-end examples

PyTorch complete workflow

TensorFlow complete workflow

Troubleshooting common issues

PyTorch issues

TensorFlow issues

General issues

Best practices

Summary