
Job type
Work type
Experience
Position Overview
We're seeking an AI/ML Research Engineer to lead experimental research and optimization of our LLM-powered systems. This role combines hands-on experimentation, data analysis, and strategic research to continuously improve system performance and guide product direction. You'll run experiments, analyze system behavior, optimize prompts and architectures, and present findings that directly influence our technical roadmap. This is a unique opportunity to apply research methodologies to a production system with real-world impact.
Location: Remote
Experience Level: Mid to Senior (5+ years in AI/ML)
Commitment: Full-time
Core Responsibilities
Experimentation & Research (40%)
Design and execute experiments to evaluate system performance across different configurations
Run comparative studies on LLM providers, models, prompt strategies, and analysis techniques
Conduct ablation studies to understand which components contribute most to accuracy
Test new approaches to code analysis, vulnerability detection, and validation
Benchmark system performance against datasets and baselines
Explore emerging LLM capabilities and techniques (reasoning models, tool use, multi-agent systems)
System Analysis & Optimization (30%)
Analyze system logs to understand module usage patterns and LLM behavior
Identify bottlenecks, failure modes, and areas for improvement
Compare system outputs against ground truth datasets
Evaluate precision, recall, F1-scores, and other metrics across different scenarios
Optimize prompt engineering for better accuracy and cost-efficiency
Fine-tune analysis pipelines based on empirical evidence
Monitor token usage and cost patterns to improve efficiency
Research Communication & Strategy (20%)
Document experiments with clear methodologies, results, and insights
Prepare regular research reports with actionable recommendations
Present findings to technical and product teams weekly/biweekly
Translate research insights into concrete product improvements
Maintain research documentation and experiment logs
Contribute to technical blog posts and research publications
Tool & Infrastructure Development (10%)
Build experiment harnesses and evaluation frameworks
Develop analysis tools for log processing and metrics extraction
Create visualization dashboards for system performance
Automate benchmark runs and result collection
Improve observability and instrumentation
Required Technical Skills
AI/ML & LLM Expertise (Required)
LLM Experience: Hands-on work with GPT, Claude, Gemini, or similar models
Prompt Engineering: Advanced techniques for optimizing LLM outputs
Evaluation Methodology: Designing experiments, A/B testing, statistical analysis
Metrics & Analysis: Precision, recall, F1-score, ROC curves, confusion matrices
RAG Systems: Understanding of retrieval-augmented generation and vector search
AI/ML Concepts: Understanding of temperature, sampling strategies, context windows, and model behavior
Python & Data Analysis (Required)
Python: 3+ years of professional Python development
Data Analysis: pandas, numpy, scipy for data manipulation and statistical analysis
Visualization: matplotlib, seaborn, plotly for creating insightful charts and dashboards
Notebooks: Jupyter for exploratory analysis and experiment documentation
Statistical Analysis: Hypothesis testing, confidence intervals, significance testing
Research & Experimentation
Scientific Method: Designing controlled experiments with clear hypotheses
Benchmarking: Experience running systematic evaluations and comparisons
Metrics Design: Creating meaningful evaluation criteria
Data Collection: Instrumenting systems for observability and data capture
Analysis Frameworks: Building reproducible experiment pipelines
Software Engineering Fundamentals
Version Control: Git for code and experiment tracking
Scripting: Bash, Python for automation
CLI Tools: Working with command-line interfaces and log analysis
Testing: Understanding of test frameworks and validation approaches
Documentation: Clear technical writing and documentation
Nice to Have
Machine Learning: Experience with scikit-learn, embeddings, clustering
Code Analysis: Familiarity with AST parsing, static analysis, or code understanding
Smart Contracts: Understanding of Solidity, blockchain, or security concepts
Academic Research: Published papers or conference presentations
Data Engineering: Experience with data pipelines and large-scale data processing
Domain Knowledge
Code Analysis & Security (Preferred)
Understanding of static analysis and code quality tools
Familiarity with software vulnerabilities and security patterns
Knowledge of how code analysis systems work
Experience with developer tooling or IDE features
Research Methodology
Experimental design and hypothesis testing
Statistical analysis and significance testing
Comparative evaluation methodologies
Reproducible research practices
Technical writing and presentation skills
LLM Applications
Multi-agent systems and orchestration
Function/tool calling and structured outputs
Chain-of-thought and reasoning strategies
Error analysis and failure mode identification
Cost optimization and efficiency techniques
Preferred Qualifications
Experience
3+ years in AI/ML, research, or applied science roles
Experience with LLM applications in production or research settings
Background in experimental research or data science
Prior work in developer tools, code analysis, or security (strong plus)
Experience with academic research or industry research labs (plus)
Technical Accomplishments
Designed and executed systematic research studies
Built evaluation frameworks or benchmarking systems
Published technical blog posts, papers, or presentations
Experience optimizing LLM-based systems for production
Contributions to open-source ML/AI projects
Posted 2 months ago