RAG-LLM Evaluation & Test Automation for Beginners

RAG-LLM Evaluation & Test Automation for Beginners

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 57 lectures (8h 39m) | 4.00 GB

Understand, Evaluate & Test RAG – LLM’s (AI based Systems) from Scratch using RAGAS-Python-Pytest Framework

LLMs are everywhere! Every business is building its own custom AI-based RAG-LLMs to improve customer service. But how are engineers testing them? Unlike traditional software testing, AI-based systems need a special methodology for evaluation.

This course starts from the ground up, explaining the architecture of how AI systems (LLMs) work behind the scenes. Then, it dives deep into LLM evaluation metrics.

This course shows you how to effectively use the RAGAS framework library to evaluate LLM metrics through scripted examples. This allows you to use Pytest assertions to check metric benchmark scores and design a robust LLM Test/evaluation automation framework.

What will you learn from the course?

  • High level overview on Large Language Models (LLM)
  • Understand how Custom LLM’s are built using Retrieval Augmented Generation (RAG) Architecture
  • Common Benchmarks/Metrics used in Evaluating RAG based LLM’s
  • Introduction to RAGAS Evaluation framework for evaluating/test LLM’s
  • Practical Scripts generation to automate and assert the Metrics Score of LLM’s.
  • Automate Scenarios such as Single turn interactions and Multi turn interactions with LLM’s using RAGAS Framework
  • Generate Test Data for evaluating the Metrics of LLM using RAGAS Framework.

By end of the course, you will be able to create RAGAS Pytest Evaluation Framework to assert the Metrics of RAG- (Custom) LLM’s

Table of Contents

Introduction to AI concepts – LLM’s & RAG LLM’s
1 What this course offers FAQs -Must Watch
2 Course outcome – Setting the stage of expectation
3 Introduction to Artificial Intelligence and LLM’s – How they work
4 Overview of popular LLMs and Challenges with these general LLM’s
5 What is Retrieval Augmented Generation (RAG) Understand its Architecture
6 End to end flow in RAG Architecture and its key advantages

Understand RAG (Retrieval Augmented Generation) – LLM Architecture with Usecase
7 Misconceptions – Why RAG LLM’s – cant we solve problem with traditional methods
8 Optional – Overview how code looks in building RAG LLM’s applications

Getting started with Practice LLM’s and the approach to evaluate Test
9 Course resources download
10 Demo of Practice RAG LLM’s to evaluate and write test automation scripts
11 Understanding implementation part of practice RAG LLM’s to understand context
12 Understand conversational LLM scenarios and how they are applied to RAG Arch
13 Understand the Metric benchmarks for Document Retrieval system in LLM

Setup Python & Pytest Environment with RAGAS LLM Evaluation Package Libraries
14 Install and set the path of Python in windows OS
15 Install and set the path of Python in MAC OS
16 Install RAGAS Framework packages and setup the LLM Test project
17 Python & Pytest Basics – Where to find them in the tutorial

Programmatic solution to evaluate LLM Metrics with Langchain and RAGAS Libraries
18 Making connection with OpenAI using Langchain Framework for RAGAS
19 End to end -Evaluate LLM for ContextPrecision metric with SingleTurn Test data
20 Metrics document download
21 Communicate with LLM’s using API Post call to dynamically get responses
22 Evaluate LLM for Context Recall Metric with RAGAS Pytest Test example

Optimize LLM Evaluation tests with Pytest Fixtures & Parameterization techniques
23 Build Pytest fixtures to isolate OpenAI and LLM Wrapper common utils from test
24 Introduction to Pytest Parameterization fixtures to drive test data externally
25 Reusable utils to isolate API calls of LLM and have test only on Metric logic

Evaluate LLM Core Metrics and importance of EvalDataSet in RAGAS Framework
26 Understand LLM’s Faithfulness and Response relevance metrics conceptually
27 Build LLM Evaluation script to test Faithfulness benchmarks using RAGAS
28 Reading Test data from external json file to LLM evaluation scripts
29 Understand how Metrics are used at different places of RAG LLM Architecture
30 Factual Correctness – Build a single Test to evaluate multiple LLM metrics

Upload LLM Evaluation results & Test LLM for Multi Conversational Chat History
31 Understand EvaluationDataSet and how it help in evaluating Multiple metrics
32 Upload the LLM Metrics evaluation results into RAGAS dashboard portal visually
33 How to evaluate RAG LLM with multi conversational history chat
34 Build LLM Evaluation Test which can evaluate multi conversation – example

Create Test Data dynamically to evaluate LLM & Generate Rubrics Evaluation Score
35 How to Create Test Data using RAGAS Framework to evaluate LLM
36 Load the external docs into Langchain utils to analyze and extract test data
37 Install and configure NLTK package to scan the LLM documents & generating tests
38 Generate Rubrics based Criteria Scoring to evaluate the quality of LLM responses

Conclusion and next steps!
39 slide Recap of concepts learned from the course
40 Bonus Lecture

Optional – Learn Python Fundamentals with examples
41 Python hello world Program with Basics
42 Datatypes in python and how to get the Type at run time
43 List Datatype and its operations to manipulate
44 Tuple and Dictionary Data types in Python with examples
45 If else condition in python with working examples
46 How to Create Dictionaries at run time and add data into it
47 How loops work in Python and importance of code idendation
48 Programming examples using for loop – 1
49 Programming examples using While loop – 2
50 What are functions How to use them in Python
51 OOPS Principles Classes and objects in Python
52 What is Constructor and its role in Object oriented programming
53 Inheritance concepts with examples in Python
54 Strings and its functions in python

Optional – Overview of Pytest Framework basics with examples
55 What are pytest fixtures and how it help in enhancing tests
56 Understand scopes in Pytest fixtures with examples
57 Setup and teardown setup using Python fixtures with yield keyword

Homepage