Cyfrin background banner
CY

AI/ML Research Engineer - Applied Research & Experimentation

Cyfrin

cyfrin.io

GB

Apply now

Job type

Work type

Experience

Position Overview

We're seeking an AI/ML Research Engineer to lead experimental research and optimization of our LLM-powered systems. This role combines hands-on experimentation, data analysis, and strategic research to continuously improve system performance and guide product direction. You'll run experiments, analyze system behavior, optimize prompts and architectures, and present findings that directly influence our technical roadmap. This is a unique opportunity to apply research methodologies to a production system with real-world impact.

Location: Remote

Experience Level: Mid to Senior (5+ years in AI/ML)

Commitment: Full-time

Core Responsibilities

Experimentation & Research (40%)

  • Design and execute experiments to evaluate system performance across different configurations

  • Run comparative studies on LLM providers, models, prompt strategies, and analysis techniques

  • Conduct ablation studies to understand which components contribute most to accuracy

  • Test new approaches to code analysis, vulnerability detection, and validation

  • Benchmark system performance against datasets and baselines

  • Explore emerging LLM capabilities and techniques (reasoning models, tool use, multi-agent systems)

System Analysis & Optimization (30%)

  • Analyze system logs to understand module usage patterns and LLM behavior

  • Identify bottlenecks, failure modes, and areas for improvement

  • Compare system outputs against ground truth datasets

  • Evaluate precision, recall, F1-scores, and other metrics across different scenarios

  • Optimize prompt engineering for better accuracy and cost-efficiency

  • Fine-tune analysis pipelines based on empirical evidence

  • Monitor token usage and cost patterns to improve efficiency

Research Communication & Strategy (20%)

  • Document experiments with clear methodologies, results, and insights

  • Prepare regular research reports with actionable recommendations

  • Present findings to technical and product teams weekly/biweekly

  • Translate research insights into concrete product improvements

  • Maintain research documentation and experiment logs

  • Contribute to technical blog posts and research publications

Tool & Infrastructure Development (10%)

  • Build experiment harnesses and evaluation frameworks

  • Develop analysis tools for log processing and metrics extraction

  • Create visualization dashboards for system performance

  • Automate benchmark runs and result collection

  • Improve observability and instrumentation

Required Technical Skills

AI/ML & LLM Expertise (Required)

  • LLM Experience: Hands-on work with GPT, Claude, Gemini, or similar models

  • Prompt Engineering: Advanced techniques for optimizing LLM outputs

  • Evaluation Methodology: Designing experiments, A/B testing, statistical analysis

  • Metrics & Analysis: Precision, recall, F1-score, ROC curves, confusion matrices

  • RAG Systems: Understanding of retrieval-augmented generation and vector search

  • AI/ML Concepts: Understanding of temperature, sampling strategies, context windows, and model behavior

Python & Data Analysis (Required)

  • Python: 3+ years of professional Python development

  • Data Analysis: pandas, numpy, scipy for data manipulation and statistical analysis

  • Visualization: matplotlib, seaborn, plotly for creating insightful charts and dashboards

  • Notebooks: Jupyter for exploratory analysis and experiment documentation

  • Statistical Analysis: Hypothesis testing, confidence intervals, significance testing

Research & Experimentation

  • Scientific Method: Designing controlled experiments with clear hypotheses

  • Benchmarking: Experience running systematic evaluations and comparisons

  • Metrics Design: Creating meaningful evaluation criteria

  • Data Collection: Instrumenting systems for observability and data capture

  • Analysis Frameworks: Building reproducible experiment pipelines

Software Engineering Fundamentals

  • Version Control: Git for code and experiment tracking

  • Scripting: Bash, Python for automation

  • CLI Tools: Working with command-line interfaces and log analysis

  • Testing: Understanding of test frameworks and validation approaches

  • Documentation: Clear technical writing and documentation

Nice to Have

  • Machine Learning: Experience with scikit-learn, embeddings, clustering

  • Code Analysis: Familiarity with AST parsing, static analysis, or code understanding

  • Smart Contracts: Understanding of Solidity, blockchain, or security concepts

  • Academic Research: Published papers or conference presentations

  • Data Engineering: Experience with data pipelines and large-scale data processing

Domain Knowledge

Code Analysis & Security (Preferred)

  • Understanding of static analysis and code quality tools

  • Familiarity with software vulnerabilities and security patterns

  • Knowledge of how code analysis systems work

  • Experience with developer tooling or IDE features

Research Methodology

  • Experimental design and hypothesis testing

  • Statistical analysis and significance testing

  • Comparative evaluation methodologies

  • Reproducible research practices

  • Technical writing and presentation skills

LLM Applications

  • Multi-agent systems and orchestration

  • Function/tool calling and structured outputs

  • Chain-of-thought and reasoning strategies

  • Error analysis and failure mode identification

  • Cost optimization and efficiency techniques

Preferred Qualifications

Experience

  • 3+ years in AI/ML, research, or applied science roles

  • Experience with LLM applications in production or research settings

  • Background in experimental research or data science

  • Prior work in developer tools, code analysis, or security (strong plus)

  • Experience with academic research or industry research labs (plus)

Technical Accomplishments

  • Designed and executed systematic research studies

  • Built evaluation frameworks or benchmarking systems

  • Published technical blog posts, papers, or presentations

  • Experience optimizing LLM-based systems for production

  • Contributions to open-source ML/AI projects

Posted 2 months ago