Troubleshooting
This guide provides solutions for common issues you might encounter when using the PaiTIENT Secure Model Service.
General Troubleshooting Process
When encountering issues with the PaiTIENT Secure Model Service, follow these general steps:
- Check Service Status: Verify that the service is operational
- Validate Credentials: Ensure your API keys are valid and properly configured
- Review Logs: Examine logs for error details
- Check Documentation: Refer to documentation for configuration requirements
- Contact Support: If the issue persists, contact PaiTIENT support
Common Issues
Authentication Problems
API Key Issues
Issue: "Authentication failed" or "Invalid API key" errors.
Solutions:
- Verify that your API key and client ID are correct
- Ensure environment variables are properly set
- Check that your key has not expired
- Verify you're using the correct key for your environment (development vs. production)
# Python example: Verify API key configuration
import os
from paitient_secure_model import Client
# Print environment variable names (not the values for security)
print(f"PAITIENT_API_KEY set: {'PAITIENT_API_KEY' in os.environ}")
print(f"PAITIENT_CLIENT_ID set: {'PAITIENT_CLIENT_ID' in os.environ}")
# Test authentication
try:
client = Client()
# A simple API call to test authentication
deployments = client.list_deployments(limit=1)
print("Authentication successful")
except Exception as e:
print(f"Authentication failed: {e}")Permission Issues
Issue: "Permission denied" or "Unauthorized" errors.
Solutions:
- Verify your API key has the required permissions
- Check role assignments for your account
- Request necessary permissions from your administrator
Deployment Issues
Deployment Failures
Issue: Deployments fail with errors.
Common Causes and Solutions:
Resource Constraints:
- Select a different instance type
- Check your quota limits
- Scale down other deployments to free resources
Configuration Errors:
- Verify the model name is correct
- Check that configuration parameters are valid
- Ensure VPC and security group settings are correct
Network Issues:
- Verify network connectivity
- Check firewall and security group rules
- Ensure proper access to required resources
# Python example: Debugging deployment
from paitient_secure_model import Client
from paitient_secure_model.exceptions import DeploymentError
client = Client()
try:
deployment = client.create_deployment(
model_name="ZimaBlueAI/HuatuoGPT-o1-8B",
deployment_name="clinical-assistant"
)
print(f"Deployment created: {deployment.id}")
except DeploymentError as e:
print(f"Deployment failed: {e}")
# Get detailed error logs
if hasattr(e, 'deployment_id') and e.deployment_id:
logs = client.get_deployment_logs(
deployment_id=e.deployment_id,
limit=10,
filter="level=error"
)
print("Error logs:")
for log in logs:
print(f" {log.message}")Deployment Timeouts
Issue: Deployments take too long or time out.
Solutions:
- Large models may require more time to deploy
- Check for resource contention in your environment
- Verify network connectivity and bandwidth
- Try deploying during off-peak hours
Inference Issues
High Latency
Issue: Inference requests take too long to complete.
Solutions:
Right-size your deployment:
- Use a more powerful instance type
- Add more replicas for higher throughput
Optimize your prompts:
- Reduce prompt length
- Simplify complex instructions
- Limit max tokens in responses
Network optimization:
- Use the endpoint in the closest geographic region
- Ensure sufficient network bandwidth
Out of Memory Errors
Issue: "Out of memory" errors during inference.
Solutions:
- Reduce batch size
- Shorten input prompts
- Limit maximum response length
- Use a larger instance type
- Quantize the model to reduce memory usage
# Python example: Optimized inference settings
from paitient_secure_model import Client
client = Client()
# Using memory-optimized settings
response = client.generate_text(
deployment_id="dep_12345abcde",
prompt="What are the potential side effects of metformin?",
max_tokens=100, # Limit output length
temperature=0.7,
stream=False # Non-streaming may be more memory efficient
)Content Filtering Issues
Issue: Responses are being filtered or blocked.
Solutions:
- Review your prompt for potentially sensitive content
- Adjust content filtering settings if appropriate
- Rephrase prompts to avoid triggering filters
- For healthcare contexts, ensure you're using models and deployments with appropriate clinical content settings
Fine-tuning Issues
Training Failures
Issue: Fine-tuning jobs fail.
Solutions:
Dataset issues:
- Verify dataset format and structure
- Check for data quality issues
- Ensure dataset size is appropriate
Resource constraints:
- Check quota limits
- Use a different instance type
Configuration errors:
- Verify hyperparameters are valid
- Check base model compatibility
# Python example: Validating fine-tuning dataset
import json
from paitient_secure_model import Client
from paitient_secure_model.validation import validate_fine_tuning_dataset
client = Client()
# Load and validate dataset
with open("training_data.jsonl", "r") as f:
dataset = [json.loads(line) for line in f]
# Validate dataset
validation_result = validate_fine_tuning_dataset(dataset)
if validation_result["valid"]:
print("Dataset is valid for fine-tuning")
else:
print("Dataset validation failed:")
for error in validation_result["errors"]:
print(f" {error}")Poor Fine-tuned Model Performance
Issue: Fine-tuned model performs worse than expected.
Solutions:
Dataset quality:
- Improve example quality
- Add more diverse examples
- Ensure examples align with your use case
Training parameters:
- Adjust learning rate
- Try different batch sizes
- Modify number of training epochs
Model selection:
- Try a different base model
- Use a model pre-trained on similar domain
SDK and API Issues
SDK Installation Problems
Issue: SDK installation fails.
Solutions:
- Check Python/Node.js version compatibility
- Update package managers (pip, npm)
- Install required dependencies first
- Try with a virtual environment
# Create a fresh virtual environment
python -m venv paitient_env
source paitient_env/bin/activate # On Windows: paitient_env\Scripts\activate
# Install with specific version
pip install paitient-secure-model==1.2.3API Connection Issues
Issue: Cannot connect to API endpoints.
Solutions:
Network connectivity:
- Check internet connection
- Verify firewall settings
- Ensure VPN is not blocking access
Endpoint configuration:
- Verify endpoint URL is correct
- Check region settings
- Ensure endpoint is operational
# Python example: Testing API connectivity
import requests
import os
# Test basic connectivity to the API
api_endpoint = os.environ.get("PAITIENT_ENDPOINT", "https://api.paitient.com")
api_key = os.environ.get("PAITIENT_API_KEY")
try:
response = requests.get(
f"{api_endpoint}/health",
headers={"Authorization": f"Bearer {api_key}"}
)
print(f"API connectivity test: {response.status_code}")
print(f"Response: {response.json()}")
except Exception as e:
print(f"API connection failed: {e}")Rate Limiting
Issue: "Rate limit exceeded" errors.
Solutions:
- Implement exponential backoff retry logic
- Reduce request frequency
- Request rate limit increases for your account
- Distribute requests more evenly over time
# Python example: Handling rate limits with retry
import time
from paitient_secure_model import Client
from paitient_secure_model.exceptions import RateLimitError
client = Client()
def generate_with_retry(deployment_id, prompt, max_retries=5):
for attempt in range(max_retries):
try:
return client.generate_text(
deployment_id=deployment_id,
prompt=prompt
)
except RateLimitError as e:
if attempt < max_retries - 1:
# Get retry-after header or use exponential backoff
retry_after = getattr(e, 'retry_after', 2 ** attempt)
print(f"Rate limit exceeded. Retrying in {retry_after} seconds...")
time.sleep(retry_after)
else:
raiseTroubleshooting by Component
CLI Troubleshooting
For CLI-specific issues, refer to the CLI Troubleshooting guide.
Python SDK Troubleshooting
For Python SDK-specific issues:
- Ensure you're using the latest SDK version
- Check compatibility with your Python version
- Review import statements and method signatures
- Verify debug logs using the SDK's logging capabilities
# Enable detailed logging for the SDK
import logging
logging.basicConfig(level=logging.DEBUG)
logging.getLogger('paitient_secure_model').setLevel(logging.DEBUG)Node.js SDK Troubleshooting
For Node.js SDK-specific issues:
- Verify you're using the latest SDK version
- Check compatibility with your Node.js version
- Inspect promise chains and async/await usage
- Enable debug logging
// Enable debug logging
const { PaiTIENTClient } = require('paitient-secure-model');
const client = new PaiTIENTClient({
apiKey: process.env.PAITIENT_API_KEY,
clientId: process.env.PAITIENT_CLIENT_ID,
debug: true // Enable debug logging
});Encryption Service Issues
For encryption-related issues:
- Verify encryption keys are properly configured
- Check for key expiration or rotation issues
- Ensure proper IAM permissions for encryption operations
- Validate encrypted model access patterns
Kubernetes Deployment Issues
For Kubernetes-specific deployment issues:
- Check pod status and events
- Verify resource quotas and limits
- Inspect container logs for errors
- Validate network policies and service connections
Gathering Information for Support
When contacting PaiTIENT support, provide the following information:
- Error messages and stack traces
- Request IDs for failed operations
- SDK/CLI version you're using
- Deployment IDs for relevant deployments
- Steps to reproduce the issue
- Recent changes that might have triggered the problem
Preventative Measures
Implement these practices to prevent common issues:
- Monitoring and alerting: Set up proactive monitoring
- Regular testing: Test deployments and queries regularly
- Version pinning: Pin SDK versions in production
- Graceful degradation: Implement fallback mechanisms
- Load testing: Verify performance under load before production use
Next Steps
- Review Security Best Practices
- Learn about Python SDK
- Explore Node.js SDK
- Understand Deployment Options