Skip to content

Troubleshooting

This guide provides solutions for common issues you might encounter when using the PaiTIENT Secure Model Service.

General Troubleshooting Process

When encountering issues with the PaiTIENT Secure Model Service, follow these general steps:

  1. Check Service Status: Verify that the service is operational
  2. Validate Credentials: Ensure your API keys are valid and properly configured
  3. Review Logs: Examine logs for error details
  4. Check Documentation: Refer to documentation for configuration requirements
  5. Contact Support: If the issue persists, contact PaiTIENT support

Common Issues

Authentication Problems

API Key Issues

Issue: "Authentication failed" or "Invalid API key" errors.

Solutions:

  1. Verify that your API key and client ID are correct
  2. Ensure environment variables are properly set
  3. Check that your key has not expired
  4. Verify you're using the correct key for your environment (development vs. production)
python
# Python example: Verify API key configuration
import os
from paitient_secure_model import Client

# Print environment variable names (not the values for security)
print(f"PAITIENT_API_KEY set: {'PAITIENT_API_KEY' in os.environ}")
print(f"PAITIENT_CLIENT_ID set: {'PAITIENT_CLIENT_ID' in os.environ}")

# Test authentication
try:
    client = Client()
    # A simple API call to test authentication
    deployments = client.list_deployments(limit=1)
    print("Authentication successful")
except Exception as e:
    print(f"Authentication failed: {e}")

Permission Issues

Issue: "Permission denied" or "Unauthorized" errors.

Solutions:

  1. Verify your API key has the required permissions
  2. Check role assignments for your account
  3. Request necessary permissions from your administrator

Deployment Issues

Deployment Failures

Issue: Deployments fail with errors.

Common Causes and Solutions:

  1. Resource Constraints:

    • Select a different instance type
    • Check your quota limits
    • Scale down other deployments to free resources
  2. Configuration Errors:

    • Verify the model name is correct
    • Check that configuration parameters are valid
    • Ensure VPC and security group settings are correct
  3. Network Issues:

    • Verify network connectivity
    • Check firewall and security group rules
    • Ensure proper access to required resources
python
# Python example: Debugging deployment
from paitient_secure_model import Client
from paitient_secure_model.exceptions import DeploymentError

client = Client()

try:
    deployment = client.create_deployment(
        model_name="ZimaBlueAI/HuatuoGPT-o1-8B",
        deployment_name="clinical-assistant"
    )
    print(f"Deployment created: {deployment.id}")
except DeploymentError as e:
    print(f"Deployment failed: {e}")
    
    # Get detailed error logs
    if hasattr(e, 'deployment_id') and e.deployment_id:
        logs = client.get_deployment_logs(
            deployment_id=e.deployment_id,
            limit=10,
            filter="level=error"
        )
        print("Error logs:")
        for log in logs:
            print(f"  {log.message}")

Deployment Timeouts

Issue: Deployments take too long or time out.

Solutions:

  1. Large models may require more time to deploy
  2. Check for resource contention in your environment
  3. Verify network connectivity and bandwidth
  4. Try deploying during off-peak hours

Inference Issues

High Latency

Issue: Inference requests take too long to complete.

Solutions:

  1. Right-size your deployment:

    • Use a more powerful instance type
    • Add more replicas for higher throughput
  2. Optimize your prompts:

    • Reduce prompt length
    • Simplify complex instructions
    • Limit max tokens in responses
  3. Network optimization:

    • Use the endpoint in the closest geographic region
    • Ensure sufficient network bandwidth

Out of Memory Errors

Issue: "Out of memory" errors during inference.

Solutions:

  1. Reduce batch size
  2. Shorten input prompts
  3. Limit maximum response length
  4. Use a larger instance type
  5. Quantize the model to reduce memory usage
python
# Python example: Optimized inference settings
from paitient_secure_model import Client

client = Client()

# Using memory-optimized settings
response = client.generate_text(
    deployment_id="dep_12345abcde",
    prompt="What are the potential side effects of metformin?",
    max_tokens=100,              # Limit output length
    temperature=0.7,
    stream=False                 # Non-streaming may be more memory efficient
)

Content Filtering Issues

Issue: Responses are being filtered or blocked.

Solutions:

  1. Review your prompt for potentially sensitive content
  2. Adjust content filtering settings if appropriate
  3. Rephrase prompts to avoid triggering filters
  4. For healthcare contexts, ensure you're using models and deployments with appropriate clinical content settings

Fine-tuning Issues

Training Failures

Issue: Fine-tuning jobs fail.

Solutions:

  1. Dataset issues:

    • Verify dataset format and structure
    • Check for data quality issues
    • Ensure dataset size is appropriate
  2. Resource constraints:

    • Check quota limits
    • Use a different instance type
  3. Configuration errors:

    • Verify hyperparameters are valid
    • Check base model compatibility
python
# Python example: Validating fine-tuning dataset
import json
from paitient_secure_model import Client
from paitient_secure_model.validation import validate_fine_tuning_dataset

client = Client()

# Load and validate dataset
with open("training_data.jsonl", "r") as f:
    dataset = [json.loads(line) for line in f]

# Validate dataset
validation_result = validate_fine_tuning_dataset(dataset)

if validation_result["valid"]:
    print("Dataset is valid for fine-tuning")
else:
    print("Dataset validation failed:")
    for error in validation_result["errors"]:
        print(f"  {error}")

Poor Fine-tuned Model Performance

Issue: Fine-tuned model performs worse than expected.

Solutions:

  1. Dataset quality:

    • Improve example quality
    • Add more diverse examples
    • Ensure examples align with your use case
  2. Training parameters:

    • Adjust learning rate
    • Try different batch sizes
    • Modify number of training epochs
  3. Model selection:

    • Try a different base model
    • Use a model pre-trained on similar domain

SDK and API Issues

SDK Installation Problems

Issue: SDK installation fails.

Solutions:

  1. Check Python/Node.js version compatibility
  2. Update package managers (pip, npm)
  3. Install required dependencies first
  4. Try with a virtual environment
bash
# Create a fresh virtual environment
python -m venv paitient_env
source paitient_env/bin/activate  # On Windows: paitient_env\Scripts\activate

# Install with specific version
pip install paitient-secure-model==1.2.3

API Connection Issues

Issue: Cannot connect to API endpoints.

Solutions:

  1. Network connectivity:

    • Check internet connection
    • Verify firewall settings
    • Ensure VPN is not blocking access
  2. Endpoint configuration:

    • Verify endpoint URL is correct
    • Check region settings
    • Ensure endpoint is operational
python
# Python example: Testing API connectivity
import requests
import os

# Test basic connectivity to the API
api_endpoint = os.environ.get("PAITIENT_ENDPOINT", "https://api.paitient.com")
api_key = os.environ.get("PAITIENT_API_KEY")

try:
    response = requests.get(
        f"{api_endpoint}/health",
        headers={"Authorization": f"Bearer {api_key}"}
    )
    print(f"API connectivity test: {response.status_code}")
    print(f"Response: {response.json()}")
except Exception as e:
    print(f"API connection failed: {e}")

Rate Limiting

Issue: "Rate limit exceeded" errors.

Solutions:

  1. Implement exponential backoff retry logic
  2. Reduce request frequency
  3. Request rate limit increases for your account
  4. Distribute requests more evenly over time
python
# Python example: Handling rate limits with retry
import time
from paitient_secure_model import Client
from paitient_secure_model.exceptions import RateLimitError

client = Client()

def generate_with_retry(deployment_id, prompt, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.generate_text(
                deployment_id=deployment_id,
                prompt=prompt
            )
        except RateLimitError as e:
            if attempt < max_retries - 1:
                # Get retry-after header or use exponential backoff
                retry_after = getattr(e, 'retry_after', 2 ** attempt)
                print(f"Rate limit exceeded. Retrying in {retry_after} seconds...")
                time.sleep(retry_after)
            else:
                raise

Troubleshooting by Component

CLI Troubleshooting

For CLI-specific issues, refer to the CLI Troubleshooting guide.

Python SDK Troubleshooting

For Python SDK-specific issues:

  1. Ensure you're using the latest SDK version
  2. Check compatibility with your Python version
  3. Review import statements and method signatures
  4. Verify debug logs using the SDK's logging capabilities
python
# Enable detailed logging for the SDK
import logging
logging.basicConfig(level=logging.DEBUG)
logging.getLogger('paitient_secure_model').setLevel(logging.DEBUG)

Node.js SDK Troubleshooting

For Node.js SDK-specific issues:

  1. Verify you're using the latest SDK version
  2. Check compatibility with your Node.js version
  3. Inspect promise chains and async/await usage
  4. Enable debug logging
javascript
// Enable debug logging
const { PaiTIENTClient } = require('paitient-secure-model');

const client = new PaiTIENTClient({
  apiKey: process.env.PAITIENT_API_KEY,
  clientId: process.env.PAITIENT_CLIENT_ID,
  debug: true  // Enable debug logging
});

Encryption Service Issues

For encryption-related issues:

  1. Verify encryption keys are properly configured
  2. Check for key expiration or rotation issues
  3. Ensure proper IAM permissions for encryption operations
  4. Validate encrypted model access patterns

Kubernetes Deployment Issues

For Kubernetes-specific deployment issues:

  1. Check pod status and events
  2. Verify resource quotas and limits
  3. Inspect container logs for errors
  4. Validate network policies and service connections

Gathering Information for Support

When contacting PaiTIENT support, provide the following information:

  1. Error messages and stack traces
  2. Request IDs for failed operations
  3. SDK/CLI version you're using
  4. Deployment IDs for relevant deployments
  5. Steps to reproduce the issue
  6. Recent changes that might have triggered the problem

Preventative Measures

Implement these practices to prevent common issues:

  1. Monitoring and alerting: Set up proactive monitoring
  2. Regular testing: Test deployments and queries regularly
  3. Version pinning: Pin SDK versions in production
  4. Graceful degradation: Implement fallback mechanisms
  5. Load testing: Verify performance under load before production use

Next Steps

Released under the MIT License.