<?xml version="1.0"?>
<oembed><version>1.0</version><provider_name>Maker Faire</provider_name><provider_url>https://makerfaire.com</provider_url><author_name>rio</author_name><author_url>https://makerfaire.com/author/rio/</author_url><title>Embodied AI Agent with a real robotic platform - Maker Faire</title><type>rich</type><width>600</width><height>338</height><html>&lt;blockquote class="wp-embedded-content" data-secret="cfcJkqYjh2"&gt;&lt;a href="https://makerfaire.com/yearbook/projects/embodied-ai-agent-with-a-real-robotic-platform-2023/"&gt;Embodied AI Agent with a real robotic platform&lt;/a&gt;&lt;/blockquote&gt;&lt;iframe sandbox="allow-scripts" security="restricted" src="https://makerfaire.com/yearbook/projects/embodied-ai-agent-with-a-real-robotic-platform-2023/embed/#?secret=cfcJkqYjh2" width="600" height="338" title="&#x201C;Embodied AI Agent with a real robotic platform&#x201D; &#x2014; Maker Faire" data-secret="cfcJkqYjh2" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" class="wp-embedded-content"&gt;&lt;/iframe&gt;&lt;script type="text/javascript"&gt;
/* &lt;![CDATA[ */
/*! This file is auto-generated */
!function(d,l){"use strict";l.querySelector&amp;&amp;d.addEventListener&amp;&amp;"undefined"!=typeof URL&amp;&amp;(d.wp=d.wp||{},d.wp.receiveEmbedMessage||(d.wp.receiveEmbedMessage=function(e){var t=e.data;if((t||t.secret||t.message||t.value)&amp;&amp;!/[^a-zA-Z0-9]/.test(t.secret)){for(var s,r,n,a=l.querySelectorAll('iframe[data-secret="'+t.secret+'"]'),o=l.querySelectorAll('blockquote[data-secret="'+t.secret+'"]'),c=new RegExp("^https?:$","i"),i=0;i&lt;o.length;i++)o[i].style.display="none";for(i=0;i&lt;a.length;i++)s=a[i],e.source===s.contentWindow&amp;&amp;(s.removeAttribute("style"),"height"===t.message?(1e3&lt;(r=parseInt(t.value,10))?r=1e3:~~r&lt;200&amp;&amp;(r=200),s.height=r):"link"===t.message&amp;&amp;(r=new URL(s.getAttribute("src")),n=new URL(t.value),c.test(n.protocol))&amp;&amp;n.host===r.host&amp;&amp;l.activeElement===s&amp;&amp;(d.top.location.href=t.value))}},d.addEventListener("message",d.wp.receiveEmbedMessage,!1),l.addEventListener("DOMContentLoaded",function(){for(var e,t,s=l.querySelectorAll("iframe.wp-embedded-content"),r=0;r&lt;s.length;r++)(t=(e=s[r]).getAttribute("data-secret"))||(t=Math.random().toString(36).substring(2,12),e.src+="#?secret="+t,e.setAttribute("data-secret",t)),e.contentWindow.postMessage({message:"ready",secret:t},"*")},!1)))}(window,document);
//# sourceURL=https://makerfaire.com/wp-includes/js/wp-embed.min.js
/* ]]&gt; */
&lt;/script&gt;
</html><thumbnail_url>https://makerfaire.com/wp-content/uploads/2024/03/embodied-ai-agent-with-a-real-robotic-platform-1024x541.png</thumbnail_url><thumbnail_width>1024</thumbnail_width><thumbnail_height>541</thumbnail_height><description>The talk will focus on a real implementation of Embodied AI agent. We will start with an overview of the Machine Learning models covered within Reply R&amp;D, therefore DinoV2 for Object Detection (https://dinov2.metademolab.com/), PALM (https://palm-e.github.io/ ) as a starting point for VLMs (Visual Language Models) and be able to generalize a large number of tasks that require multimodal input (both with images and text). We will then move on to a focus on a robotic agent such as SPOT by Boston Dynamics, therefore its architecture, the potential of this agent and the sensors present in stock. From here we will have the basis to move on to an implementation of Embodied AI Agents controlled completely with voice in natural language. We will show an orchestrator who, by receiving voice commands in natural language as input, will be able to control a robotic agent such as SPOT by Boston Dynamics and use the Machine Learning models necessary to complete the individual tasks within the episode initiated by the user. We will then show current developments in the way related to the use of Visual Language Models, such as RT-2(https://robotics-transformer2.github.io/) for robotic agents and LINGO-1(https://wayve.ai/ ) for autonomous driving.</description></oembed>
