Python SDK: Text Generation
This guide covers the text generation capabilities of the PaiTIENT Secure Model Service Python SDK, allowing you to securely generate text from deployed models in a HIPAA/SOC2 compliant environment.
Basic Text Generation
Generate text from a deployed model:
from paitient_secure_model import Client
# Initialize client
client = Client()
# Generate text
response = client.generate_text(
deployment_id="dep_12345abcde",
prompt="What are the potential side effects of metformin?"
)
print(response.text)Generation Options
Text Parameters
Customize text generation with various parameters:
# Generation with parameters
response = client.generate_text(
deployment_id="dep_12345abcde",
prompt="What are the potential side effects of metformin?",
max_tokens=500, # Maximum length of the generated text
temperature=0.7, # Controls randomness (0.0-1.0)
top_p=0.95, # Nucleus sampling parameter
top_k=50, # Top-k sampling parameter
stop=["\n\n", "END"], # Stop sequences
repetition_penalty=1.1, # Penalize repeated tokens
presence_penalty=0.0, # Penalize tokens based on presence
frequency_penalty=0.0 # Penalize tokens based on frequency
)System Messages
Use system messages to control the model's behavior:
# Generation with system message
response = client.generate_text(
deployment_id="dep_12345abcde",
prompt="What are the potential side effects of metformin?",
system_message="You are a helpful medical assistant providing accurate information to healthcare professionals. Always include references to clinical guidelines and mention important warnings."
)Context Management
Maintain conversation context:
# Conversation with context
conversation = client.create_conversation(deployment_id="dep_12345abcde")
# First message
response1 = conversation.generate_text(
prompt="What are the potential side effects of metformin?"
)
print("Response 1:", response1.text)
# Follow-up question (context is automatically maintained)
response2 = conversation.generate_text(
prompt="What about patients with kidney disease?"
)
print("Response 2:", response2.text)
# Another follow-up
response3 = conversation.generate_text(
prompt="Are there any alternatives for these patients?"
)
print("Response 3:", response3.text)Advanced Generation Features
Streaming Responses
Stream the response as it's generated:
# Stream the response
for chunk in client.generate_text_stream(
deployment_id="dep_12345abcde",
prompt="Write a detailed summary of diabetes management techniques.",
max_tokens=1000
):
print(chunk.text, end="", flush=True)Batch Processing
Process multiple prompts efficiently:
# Define a list of prompts
prompts = [
"What are the symptoms of hypertension?",
"What are common treatments for type 2 diabetes?",
"Explain the mechanism of action for statins."
]
# Process in batch
results = client.generate_text_batch(
deployment_id="dep_12345abcde",
prompts=prompts,
max_tokens=300
)
for i, result in enumerate(results):
print(f"Prompt {i+1}: {prompts[i]}")
print(f"Response: {result.text}")
print()Function Calling
Define functions that the model can call:
from paitient_secure_model import Client
from paitient_secure_model.functions import FunctionDefinition
# Initialize client
client = Client()
# Define functions
functions = [
FunctionDefinition(
name="search_medication_interactions",
description="Search for potential interactions between medications",
parameters={
"type": "object",
"properties": {
"medications": {
"type": "array",
"items": {"type": "string"},
"description": "List of medications to check for interactions"
}
},
"required": ["medications"]
}
),
FunctionDefinition(
name="calculate_dosage",
description="Calculate medication dosage based on patient parameters",
parameters={
"type": "object",
"properties": {
"medication": {"type": "string", "description": "Medication name"},
"weight_kg": {"type": "number", "description": "Patient weight in kg"},
"age_years": {"type": "number", "description": "Patient age in years"},
"kidney_function": {"type": "string", "description": "Kidney function (normal, impaired, severe)"}
},
"required": ["medication", "weight_kg", "age_years"]
}
)
]
# Generate text with function calling
response = client.generate_text(
deployment_id="dep_12345abcde",
prompt="Check for interactions between metformin, lisinopril, and simvastatin.",
functions=functions,
function_call="auto" # Options: "auto", "none", or {"name": "specific_function"}
)
# Check if the model decided to call a function
if response.function_call:
function_name = response.function_call.name
function_args = response.function_call.arguments
print(f"Model called function: {function_name}")
print(f"Arguments: {function_args}")
# Here you would actually execute the function with the provided arguments
# and then potentially continue the conversation with the result
if function_name == "search_medication_interactions":
# Simulate function execution
result = {"interactions": [
{"medications": ["metformin", "lisinopril"], "severity": "low", "description": "..."},
{"medications": ["metformin", "simvastatin"], "severity": "moderate", "description": "..."}
]}
# Continue the conversation with the function result
follow_up = client.generate_text(
deployment_id="dep_12345abcde",
prompt="What should I do about these interactions?",
function_results=[{
"name": function_name,
"arguments": function_args,
"result": result
}]
)
print("Follow-up:", follow_up.text)
else:
print("Model response:", response.text)Security Controls
Apply security controls to text generation:
from paitient_secure_model import Client
from paitient_secure_model.security import SecuritySettings, DataFiltering
# Initialize client
client = Client()
# Generate text with security controls
response = client.generate_text(
deployment_id="dep_12345abcde",
prompt="Patient with diabetes and hypertension, currently taking metformin and lisinopril.",
security_settings=SecuritySettings(
data_filtering=DataFiltering(
detect_pii=True, # Detect personally identifiable information
redact_phi=True, # Redact protected health information
content_filtering="strict", # Apply strict content filtering
detect_toxic_content=True # Detect potentially harmful content
)
)
)Response Analysis
Response Metadata
Access additional information about the generation:
# Analyze response metadata
response = client.generate_text(
deployment_id="dep_12345abcde",
prompt="What are the potential side effects of metformin?",
max_tokens=500,
return_metadata=True
)
print("Response:", response.text)
print("Token count:", response.usage.total_tokens)
print("Prompt tokens:", response.usage.prompt_tokens)
print("Completion tokens:", response.usage.completion_tokens)
print("Finish reason:", response.finish_reason)
print("Model used:", response.model)
print("Created at:", response.created_at)Content Analysis
Analyze the generated content:
# Analyze generated content
analysis = client.analyze_text(
text=response.text,
analyses=["toxicity", "factuality", "bias", "medical_accuracy"]
)
print("Toxicity score:", analysis.toxicity)
print("Factuality score:", analysis.factuality)
print("Bias score:", analysis.bias)
print("Medical accuracy score:", analysis.medical_accuracy)Generation Control
Rate Limiting
Implement rate limiting for your application:
from paitient_secure_model import Client
from paitient_secure_model.rate_limit import RateLimiter
# Initialize client with rate limiter
client = Client()
limiter = RateLimiter(
requests_per_minute=60,
burst_size=10
)
# Generate text with rate limiting
try:
with limiter:
response = client.generate_text(
deployment_id="dep_12345abcde",
prompt="What are the potential side effects of metformin?"
)
print(response.text)
except Exception as e:
print(f"Rate limit exceeded: {e}")Timeout Control
Control timeouts for requests:
# Generate text with timeout control
try:
response = client.generate_text(
deployment_id="dep_12345abcde",
prompt="Write a detailed analysis of current diabetes management guidelines.",
max_tokens=2000,
timeout=30.0 # Timeout in seconds
)
print(response.text)
except TimeoutError:
print("Request timed out. Try again with fewer tokens or simpler prompt.")Advanced Usage
Async Support
Use async versions of methods for better performance:
import asyncio
from paitient_secure_model import AsyncClient
async def generate_responses():
# Initialize async client
client = AsyncClient()
# Generate multiple responses concurrently
prompts = [
"What are the symptoms of hypertension?",
"What are common treatments for type 2 diabetes?",
"Explain the mechanism of action for statins."
]
tasks = [client.generate_text(
deployment_id="dep_12345abcde",
prompt=prompt,
max_tokens=300
) for prompt in prompts]
responses = await asyncio.gather(*tasks)
for i, response in enumerate(responses):
print(f"Prompt: {prompts[i]}")
print(f"Response: {response.text}")
print()
# Run the async function
asyncio.run(generate_responses())Custom Models
Generate text from custom fine-tuned models:
# Get fine-tuned model ID
fine_tuning_job = client.get_fine_tuning_job("ft_12345abcde")
fine_tuned_model = fine_tuning_job.fine_tuned_model
# Deploy the fine-tuned model
deployment = client.create_deployment(
model_name=fine_tuned_model,
deployment_name="custom-medical-assistant"
)
# Wait for deployment to complete
deployment.wait_until_ready()
# Generate text from custom model
response = client.generate_text(
deployment_id=deployment.id,
prompt="What are the potential side effects of metformin?"
)
print(response.text)Multi-tenant Usage
For applications serving multiple clients:
# Generate text in multi-tenant context
response = client.generate_text(
deployment_id="dep_12345abcde",
prompt="What are the potential side effects of metformin?",
tenant_id="tenant_12345", # Ensures strong isolation
user_id="user_67890" # For audit and attribution
)Error Handling
Implement robust error handling:
from paitient_secure_model import Client
from paitient_secure_model.exceptions import (
GenerationError,
ResourceNotFoundError,
RateLimitError,
ContentFilterError,
InvalidParameterError
)
client = Client()
try:
response = client.generate_text(
deployment_id="dep_12345abcde",
prompt="What are the potential side effects of metformin?",
max_tokens=500
)
print(response.text)
except ResourceNotFoundError:
print("Deployment not found. Check your deployment ID.")
except RateLimitError as e:
print(f"Rate limit exceeded. Retry after {e.retry_after} seconds.")
except ContentFilterError as e:
print(f"Content filtered: {e.reason}")
except InvalidParameterError as e:
print(f"Invalid parameter: {e}")
except GenerationError as e:
print(f"Generation failed: {e}")
print(f"Request ID for troubleshooting: {e.request_id}")Best Practices
Prompt Engineering
Follow these best practices for effective prompts:
- Be Specific: Provide clear, detailed instructions
- Establish Context: Include relevant background information
- Structure Output: Specify desired format and structure
- Use Examples: Include examples for complex tasks
- Iterate: Refine prompts based on results
Example of a well-structured prompt:
prompt = """
You are a clinical assistant helping a healthcare provider.
Provide information about metformin for diabetes management.
Please include:
1. Common side effects and their frequency
2. Contraindications
3. Recommended dosage adjustments for patients with renal impairment
4. Drug interactions to be aware of
Format the response with clear headings and bullet points.
Cite relevant clinical guidelines where appropriate.
"""
response = client.generate_text(
deployment_id="dep_12345abcde",
prompt=prompt,
max_tokens=800
)Performance Optimization
Optimize text generation performance:
- Batch Requests: Use batch API for multiple prompts
- Stream Long Responses: Use streaming for better UX
- Optimize Tokens: Keep prompts concise
- Cache Common Responses: Implement response caching
- Right-size Parameters: Adjust max_tokens to actual needs
Security
Ensure secure text generation:
- Sanitize Inputs: Validate and clean user inputs
- Enable Content Filtering: Prevent harmful outputs
- Use PII/PHI Detection: Protect sensitive information
- Audit Generations: Track and review outputs
- Implement Rate Limiting: Prevent abuse
Next Steps
- Learn about Deployment
- Explore Fine-tuning
- Understand Security Best Practices
- Review Troubleshooting