{"id":703,"date":"2025-02-12T21:19:01","date_gmt":"2025-02-12T21:19:01","guid":{"rendered":"https:\/\/airewolucja.pl\/?p=703"},"modified":"2025-02-12T21:19:01","modified_gmt":"2025-02-12T21:19:01","slug":"ai-alliance-rag-pipeline-with-data-prep-kit-milvus-llama-workshop","status":"publish","type":"post","link":"https:\/\/airewolucja.pl\/?p=703","title":{"rendered":"AI Alliance: RAG pipeline with Data Prep Kit + Milvus + Llama (Workshop)"},"content":{"rendered":"<div class=\"event-var ql-editor\">\n<p>RAG (Retrieval-Augmented Generation) or fine-tuning a model, a significant portion of your time will be dedicated to data wrangling (cleaning, de-duping, removing markups, etc.).<\/p>\n<p><strong>Data Prep Kit<\/strong> can help you with data wrangling.<\/p>\n<p>Noteworthy features of DPK include:<\/p>\n<ul>\n<li>de-duping documents (exact dedupe and fuzzy dedupe),<\/li>\n<li>handling documents and code,<\/li>\n<li>language detection (spoken languages and programming languages),<\/li>\n<li>malware detection, and<\/li>\n<li>creating embeddings.<\/li>\n<\/ul>\n<p>In this workshop, we will demonstrate implementing an end-to-end RAG pipeline using open source technologies:<\/p>\n<ul>\n<li>Data Prep Kit for processing documents<\/li>\n<li>Milvus as vector database<\/li>\n<li>Llama 3 as the LLM<\/li>\n<\/ul>\n<p><strong>Session Type<\/strong><\/p>\n<p>Hands-on workshop<\/p>\n<p><strong>Audience<\/strong><\/p>\n<p>LLM app developers, data scientists, data engineers<\/p>\n<p><strong>Technical Level<\/strong><\/p>\n<p>Beginner &#8211; Intermediate<\/p>\n<p><strong>Prerequisites<\/strong><\/p>\n<p>A Python development environment is strongly recommended for this workshop. Step-by-step instructions for setting up the environment will be provided.<\/p>\n<p><strong>Industry<\/strong><\/p>\n<p>Cross industry<\/p>\n<p><strong>Agenda<\/strong><\/p>\n<ul>\n<li>Welcome &amp; introductions (5&#8242;)<\/li>\n<li>About the AI Alliance &amp; how you can get involved (5&#8242;)<\/li>\n<li>Workshop: \u201cRAG pipeline with Data Prep Kit + Milvus + Llama\u201d (60&#8242;)<\/li>\n<li>Q&amp;A<\/li>\n<li>Closing<\/li>\n<\/ul>\n<p><strong>About the instructor<\/strong><\/p>\n<p>Sujee Maniyam\u00a0(AI Engineer, Developer Advocate @ Node51) is an expert in Generative AI, Machine Learning, Deep Learning, Big Data, Distributed Systems, and Cloud technologies. He is passionate about developer education, fostering community engagement. Sujee has led numerous training sessions, hackathons, and workshops. He is also an author, open source contributor and frequent speaker at conferences and meetups.<\/p>\n<p><strong>About the AI Alliance<\/strong><\/p>\n<p>The\u00a0AI Alliance\u00a0is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.<\/p>\n<\/div>\n<div>\n<div class=\"event-var fw-bold\"><\/div>\n<div class=\"event-label\">Typ wydarzenia: Spotkanie<\/div>\n<div class=\"event-label\">Kategoria: IT<\/div>\n<div class=\"event-label\">Tematyka: Big Data , Open Source , AI\/ML<\/div>\n<div class=\"event-label\">Data: 13.02.2025 (czwartek)<\/div>\n<div class=\"event-label\">Godzina: 18:00<\/div>\n<div class=\"event-label\">J\u0119zyk: angielski<\/div>\n<div class=\"event-label\">Wst\u0119p: Bezp\u0142atne<\/div>\n<div class=\"event-label\">Miasto: Online<\/div>\n<div>\n<div class=\"event-label\">Miejsce: Online Event<\/div>\n<\/div>\n<div class=\"event-label\"><\/div>\n<div class=\"event-var fw-bold\"><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>RAG (Retrieval-Augmented Generation) or fine-tuning a model, a significant portion of your time will be dedicated to data wrangling (cleaning, de-duping, removing markups, etc.). Data Prep Kit can help you with data wrangling. Noteworthy features of DPK include: de-duping documents (exact dedupe and fuzzy dedupe), handling documents and code, language detection (spoken languages and programming [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":704,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[24],"tags":[],"class_list":["post-703","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-wydarzenia"],"_links":{"self":[{"href":"https:\/\/airewolucja.pl\/index.php?rest_route=\/wp\/v2\/posts\/703","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/airewolucja.pl\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/airewolucja.pl\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/airewolucja.pl\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/airewolucja.pl\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=703"}],"version-history":[{"count":2,"href":"https:\/\/airewolucja.pl\/index.php?rest_route=\/wp\/v2\/posts\/703\/revisions"}],"predecessor-version":[{"id":706,"href":"https:\/\/airewolucja.pl\/index.php?rest_route=\/wp\/v2\/posts\/703\/revisions\/706"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/airewolucja.pl\/index.php?rest_route=\/wp\/v2\/media\/704"}],"wp:attachment":[{"href":"https:\/\/airewolucja.pl\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=703"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/airewolucja.pl\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=703"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/airewolucja.pl\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=703"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}