Prompt Engineering: From Basics to Advanced Strategies
Prompt engineering is often dismissed as “just writing good instructions.” While that’s part of it, effective prompt engineering is a skill that combines psychology, linguistics, and empirical experimentation.
After writing thousands of prompts for production systems, I’ve developed strategies that consistently improve output quality. Here’s what I’ve learned.
The Prompt Engineering Mental Model
Think of prompting as programming in natural language. You’re:
- Defining the task (like a function signature)
- Providing context (like parameters)
- Setting constraints (like type checking)
- Specifying output format (like return types)
The LLM is your interpreter, but it’s probabilistic and context-sensitive.
Foundational Techniques
1. Be Specific and Explicit
Bad:
Summarize this document.
Good:
Summarize the following technical document in 3-5 bullet points, focusing on:
1. Main technical contributions
2. Key findings or results
3. Practical applications
Keep each bullet point under 50 words. Use technical terminology where appropriate.
Document:
{document_text}
Why it works: Removes ambiguity, sets clear expectations, defines success criteria.
2. Provide Examples (Few-Shot Learning)
Zero-Shot:
Extract action items from this meeting transcript.
Few-Shot:
Extract action items from meeting transcripts. Format each as: [Person] needs to [action] by [deadline].
Examples:
Input: "John, can you send the report by Friday?"
Output: [John] needs to [send the report] by [Friday]
Input: "Sarah mentioned she'll follow up with the client next week"
Output: [Sarah] needs to [follow up with client] by [next week]
Now extract from this transcript:
{transcript}
Why it works: Shows the LLM exactly what “good” looks like. Establishes format and tone.
3. Chain of Thought (CoT)
Without CoT:
Is this contract clause enforceable under California law?
With CoT:
Analyze whether this contract clause is enforceable under California law.
Step 1: Identify the key elements of the clause
Step 2: Determine relevant California statutes and case law
Step 3: Apply the legal principles to the clause
Step 4: Provide your conclusion with reasoning
Contract clause:
{clause_text}
Why it works: Encourages reasoning rather than pattern matching. Improves accuracy on complex tasks.
4. Role Assignment
Without Role:
Explain quantum computing.
With Role:
You are a senior technical educator who specializes in making complex topics accessible.
Explain quantum computing to a software engineer who is familiar with classical computing concepts but has no physics background. Use analogies to programming concepts where helpful.
Why it works: Sets the right tone, knowledge level, and communication style.
Advanced Techniques
5. Self-Consistency
Run the same prompt multiple times with temperature > 0 and aggregate results.
def self_consistent_answer(question, n=5):
answers = []
for _ in range(n):
response = llm.complete(
f"Answer this question: {question}",
temperature=0.7
)
answers.append(response)
# Use LLM to synthesize the most consistent answer
synthesis_prompt = f"""
Here are {n} different answers to the same question:
{format_answers(answers)}
Identify the most consistent answer or synthesize the best answer from these responses.
"""
return llm.complete(synthesis_prompt, temperature=0)
When to use: High-stakes decisions, complex reasoning tasks, when you need confidence estimation.
6. Tree of Thoughts
Explore multiple reasoning paths simultaneously.
prompt = """
Problem: {problem}
Generate 3 different approaches to solve this problem:
Approach 1:
[Description of first approach]
Pros:
Cons:
Approach 2:
[Description of second approach]
Pros:
Cons:
Approach 3:
[Description of third approach]
Pros:
Cons:
Based on the analysis, which approach is best and why?
"""
When to use: Open-ended problems, architectural decisions, strategy planning.
7. Constitutional AI / Self-Critique
Have the LLM critique and refine its own output.
# First draft
initial_prompt = """
Write a technical blog post about {topic}.
"""
draft = llm.complete(initial_prompt)
# Self-critique
critique_prompt = f"""
You wrote this blog post:
{draft}
Critique it according to these criteria:
1. Technical accuracy
2. Clarity for the target audience
3. Logical flow
4. Missing important points
Provide specific suggestions for improvement.
"""
critique = llm.complete(critique_prompt)
# Revision
revision_prompt = f"""
Original blog post:
{draft}
Critique:
{critique}
Revise the blog post addressing the critique.
"""
final = llm.complete(revision_prompt)
When to use: Content generation, code review, any task where quality matters more than speed.
8. Prompt Chaining
Break complex tasks into sequential steps.
# Step 1: Extract information
extract_prompt = """
Extract all customer complaints from this support ticket:
{ticket}
List each complaint clearly.
"""
complaints = llm.complete(extract_prompt)
# Step 2: Categorize
categorize_prompt = f"""
Categorize these complaints into: Product, Service, Billing, Other
Complaints:
{complaints}
"""
categories = llm.complete(categorize_prompt)
# Step 3: Prioritize
prioritize_prompt = f"""
Prioritize these categorized complaints by severity and urgency:
{categories}
For each, assign priority: High, Medium, Low
"""
priorities = llm.complete(prioritize_prompt)
# Step 4: Generate response
response_prompt = f"""
Generate a professional response addressing these prioritized complaints:
{priorities}
Tone: Empathetic and solution-oriented
"""
response = llm.complete(response_prompt)
When to use: Complex workflows, when intermediate outputs are valuable, when different steps need different prompting strategies.
RAG-Specific Prompting
9. Context Utilization
rag_prompt = """
Answer the question based ONLY on the provided context. Follow these rules:
1. If the context contains the answer, provide it with citations
2. If the context is relevant but doesn't fully answer, say what you can answer
3. If the context is not relevant, say "I don't have enough information to answer this question"
4. Never use information not present in the context
5. Cite sources using [Source: X] format
Context:
{context}
Question: {question}
Answer:
"""
Key elements:
- Explicit instruction to use only provided context
- Handling of edge cases (partial info, no info)
- Citation requirements
- Clear prohibitions (no external knowledge)
10. Multi-Document Reasoning
prompt = """
You are given information from multiple documents. Some information may be contradictory.
Documents:
[Doc 1 - Sales Report Q1]:
{doc1}
[Doc 2 - Sales Report Q2]:
{doc2}
[Doc 3 - Marketing Analysis]:
{doc3}
Question: {question}
Instructions:
1. Identify which documents are relevant to the question
2. If documents contradict each other, note the contradiction
3. Synthesize a coherent answer, citing specific documents
4. If there's ambiguity, acknowledge it
Answer:
"""
Prompt Optimization Workflow
1. Start with a baseline
baseline_prompt = "Summarize this article."
2. Add specificity
v2_prompt = "Summarize this article in 100 words, focusing on key findings."
3. Add examples
v3_prompt = """
Summarize articles like this example:
Input: [long article]
Output: [concise 100-word summary highlighting key findings]
Now summarize:
{article}
"""
4. Test and measure
test_set = load_test_examples()
for prompt_version in [baseline, v2, v3]:
results = evaluate(prompt_version, test_set)
print(f"{prompt_version}: Accuracy={results.accuracy}, Quality={results.quality}")
5. Iterate based on failures
# Analyze where v3 fails
failures = [ex for ex in test_set if evaluate(v3, ex).quality < 3]
# Identify patterns
for failure in failures:
print(f"Failed on: {failure.type}")
# Failed on: Technical jargon-heavy articles
# Refine prompt
v4_prompt = """
[Previous v3 prompt]
Note: If the article contains technical terminology, include a brief explanation in parentheses.
"""
Common Pitfalls
Pitfall 1: Over-Prompting
Bad:
You are an expert AI assistant with deep knowledge of all subjects. You are helpful, harmless, and honest. You always provide accurate information. You never make things up. You think carefully before responding...
[200 more words of instructions]
Question: What is 2+2?
Good:
Answer this math question accurately: What is 2+2?
Lesson: Only include necessary instructions. More prompt ≠ better results.
Pitfall 2: Ambiguous Constraints
Bad:
Write a short summary.
Good:
Write a summary in exactly 100 words.
Lesson: Quantify when possible. “Short” is subjective.
Pitfall 3: Conflicting Instructions
Bad:
Be creative and innovative, but only use the information provided.
Good:
Synthesize the provided information in a clear, organized way. Use headings and bullet points for readability.
Lesson: Don’t ask for creativity then constrain it entirely. Be consistent.
Pitfall 4: Assuming Context Persistence
Bad:
# First message
"You are a Python expert."
# Second message (new API call)
"How do I reverse a string?"
# LLM doesn't remember it's a "Python expert"
Good:
# Every message includes role
"You are a Python expert. How do I reverse a string in Python?"
Lesson: Each API call is independent. Include necessary context every time.
Model-Specific Considerations
GPT-4 vs GPT-3.5-turbo
- GPT-4: Better at following complex instructions, can handle longer contexts
- GPT-3.5-turbo: Needs simpler, more explicit prompts
Claude (Anthropic)
- Responds well to XML-style tags:
<instructions>,<context>,<examples> - Good at following constitutional principles
- Excels at longer context (100K+ tokens)
Open Source Models (Llama, Mistral)
- Often fine-tuned with specific prompt formats (e.g.,
[INST]tags) - May need more explicit instructions
- Vary widely in capabilities
Example (Llama 2 Chat):
<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>
{user_message} [/INST]
Evaluation Metrics
How do you know if your prompt is good?
def evaluate_prompt(prompt, test_set):
scores = {
'relevance': [],
'correctness': [],
'completeness': [],
'format_compliance': [],
'latency': [],
'cost': []
}
for example in test_set:
response = llm.complete(prompt.format(**example.inputs))
scores['relevance'].append(
judge_relevance(example.query, response)
)
scores['correctness'].append(
semantic_similarity(response, example.ground_truth)
)
# ... other metrics
return {
metric: np.mean(values)
for metric, values in scores.items()
}
Real-World Example: Customer Support Bot
Initial Prompt (Poor):
Help the customer.
Evolved Prompt (Production):
You are a customer support agent for TechCorp. Your goal is to resolve customer issues efficiently and professionally.
Guidelines:
1. Be empathetic and acknowledge the customer's frustration
2. Ask clarifying questions if needed (max 2 questions before providing solution)
3. Provide step-by-step solutions when applicable
4. If you cannot help, escalate to a human agent
5. Always end with asking if there's anything else you can help with
Context:
- Customer tier: {customer_tier}
- Previous interactions: {interaction_history}
- Current issue category: {issue_category}
Customer message: {customer_message}
Your response:
Results:
- Baseline (poor prompt): 62% resolution rate
- Production prompt: 84% resolution rate
- Customer satisfaction: 3.2 → 4.3 / 5
Prompt Library Template
Maintain a library of tested prompts:
# prompts/summarization_v3.yaml
name: summarization_v3
task: Document summarization
version: 3.2.1
created: 2026-01-15
tested_on: 500 documents
avg_quality: 4.2/5
template: |
Summarize the following document in {word_count} words.
Focus on:
- Main themes and arguments
- Key findings or conclusions
- Actionable insights
Format: {format} # Options: paragraph, bullets, numbered
Document:
{document}
Summary:
parameters:
word_count:
type: int
default: 100
range: [50, 500]
format:
type: enum
default: bullets
options: [paragraph, bullets, numbered]
examples:
- input:
document: "[Example document]"
word_count: 100
format: bullets
output: |
- Key point 1
- Key point 2
- Key point 3
Conclusion
Prompt engineering is both art and science:
- Art: Understanding how to communicate effectively with LLMs
- Science: Systematic testing and iteration
Key takeaways:
- Start simple, add complexity only when needed
- Test with real examples, not just happy paths
- Version and track your prompts
- Measure what matters (quality, not just completion)
- Learn from failures
The field is still evolving. What works today may be suboptimal tomorrow as models improve. Stay empirical, keep experimenting.
Resources
What prompt engineering techniques have worked for you? Share your strategies and examples. Reach out via email or X.
Disclaimer: The views, opinions, and technical approaches shared in this post are my own, based on my personal experience building production AI/ML systems. They do not represent the views of my current or former employers. Technology choices and architectural decisions should always be evaluated in the context of your specific use case and requirements.
Questions or feedback? I’d love to hear your thoughts and experiences.
| Contact: LinkedIn | GitHub | X |