AI/ML Research Engineer - Applied Research & Experimentation

Cyfrin

•

cyfrin.io

•

Apply now

Job type

Work type

Experience

Position Overview

We're seeking an AI/ML Research Engineer to lead experimental research and optimization of our LLM-powered systems. This role combines hands-on experimentation, data analysis, and strategic research to continuously improve system performance and guide product direction. You'll run experiments, analyze system behavior, optimize prompts and architectures, and present findings that directly influence our technical roadmap. This is a unique opportunity to apply research methodologies to a production system with real-world impact.

Location: Remote

Experience Level: Mid to Senior (5+ years in AI/ML)

Commitment: Full-time

Core Responsibilities

Experimentation & Research (40%)

Design and execute experiments to evaluate system performance across different configurations
Run comparative studies on LLM providers, models, prompt strategies, and analysis techniques
Conduct ablation studies to understand which components contribute most to accuracy
Test new approaches to code analysis, vulnerability detection, and validation
Benchmark system performance against datasets and baselines
Explore emerging LLM capabilities and techniques (reasoning models, tool use, multi-agent systems)

System Analysis & Optimization (30%)

Analyze system logs to understand module usage patterns and LLM behavior
Identify bottlenecks, failure modes, and areas for improvement
Compare system outputs against ground truth datasets
Evaluate precision, recall, F1-scores, and other metrics across different scenarios
Optimize prompt engineering for better accuracy and cost-efficiency
Fine-tune analysis pipelines based on empirical evidence
Monitor token usage and cost patterns to improve efficiency

Research Communication & Strategy (20%)

Document experiments with clear methodologies, results, and insights
Prepare regular research reports with actionable recommendations
Present findings to technical and product teams weekly/biweekly
Translate research insights into concrete product improvements
Maintain research documentation and experiment logs
Contribute to technical blog posts and research publications

Tool & Infrastructure Development (10%)

Build experiment harnesses and evaluation frameworks
Develop analysis tools for log processing and metrics extraction
Create visualization dashboards for system performance
Automate benchmark runs and result collection
Improve observability and instrumentation

Required Technical Skills

AI/ML & LLM Expertise (Required)

LLM Experience: Hands-on work with GPT, Claude, Gemini, or similar models
Prompt Engineering: Advanced techniques for optimizing LLM outputs
Evaluation Methodology: Designing experiments, A/B testing, statistical analysis
Metrics & Analysis: Precision, recall, F1-score, ROC curves, confusion matrices
RAG Systems: Understanding of retrieval-augmented generation and vector search
AI/ML Concepts: Understanding of temperature, sampling strategies, context windows, and model behavior

Python & Data Analysis (Required)

Python: 3+ years of professional Python development
Data Analysis: pandas, numpy, scipy for data manipulation and statistical analysis
Visualization: matplotlib, seaborn, plotly for creating insightful charts and dashboards
Notebooks: Jupyter for exploratory analysis and experiment documentation
Statistical Analysis: Hypothesis testing, confidence intervals, significance testing

Research & Experimentation

Scientific Method: Designing controlled experiments with clear hypotheses
Benchmarking: Experience running systematic evaluations and comparisons
Metrics Design: Creating meaningful evaluation criteria
Data Collection: Instrumenting systems for observability and data capture
Analysis Frameworks: Building reproducible experiment pipelines

Software Engineering Fundamentals

Version Control: Git for code and experiment tracking
Scripting: Bash, Python for automation
CLI Tools: Working with command-line interfaces and log analysis
Testing: Understanding of test frameworks and validation approaches
Documentation: Clear technical writing and documentation

Nice to Have

Machine Learning: Experience with scikit-learn, embeddings, clustering
Code Analysis: Familiarity with AST parsing, static analysis, or code understanding
Smart Contracts: Understanding of Solidity, blockchain, or security concepts
Academic Research: Published papers or conference presentations
Data Engineering: Experience with data pipelines and large-scale data processing

Domain Knowledge

Code Analysis & Security (Preferred)

Understanding of static analysis and code quality tools
Familiarity with software vulnerabilities and security patterns
Knowledge of how code analysis systems work
Experience with developer tooling or IDE features

Research Methodology

Experimental design and hypothesis testing
Statistical analysis and significance testing
Comparative evaluation methodologies
Reproducible research practices
Technical writing and presentation skills

LLM Applications

Multi-agent systems and orchestration
Function/tool calling and structured outputs
Chain-of-thought and reasoning strategies
Error analysis and failure mode identification
Cost optimization and efficiency techniques

Preferred Qualifications

Experience

3+ years in AI/ML, research, or applied science roles
Experience with LLM applications in production or research settings
Background in experimental research or data science
Prior work in developer tools, code analysis, or security (strong plus)
Experience with academic research or industry research labs (plus)

Technical Accomplishments

Designed and executed systematic research studies
Built evaluation frameworks or benchmarking systems
Published technical blog posts, papers, or presentations
Experience optimizing LLM-based systems for production
Contributions to open-source ML/AI projects

Posted 2 months ago