YT – MCP vs API: Simplifying AI Agent Integration with External Data – IBM Technology
– LLM needs to communicate with external data sources to be truly useful
– Anthropic came up with the Model Context Protocol – “standardizes how applications provide context to LLMs”
– MCP is like a USB for your AI applications. USB can be used regardless of manufacturers.
– Each MCP Host Client opens a JSON RPC 2.0 session using the MCP protocol, and that connects to MCP servers (e.g. database, email storage)
– MCP does … 1) provides a standardized way to retrieve contextual data to AI Agents 2) enables usage of tools(web search, API call, calculation, db lookup) by AI Agents
– MCP server primitives … 1) tools: discrete actions or functions the AI can call 2) resources: read-only data items the server can provide e.g. db schema, file content 3) prompt templates
– AI agents can “query an MCP server to discover what primitives are available and invoke those capabilities in a uniform way”
– RESTful API communicates of HTTP using methods such as get, post, put, delete
– MCP supports dynamic self discovery – client can ask “what can you do?”
– MCP servers is also a wrapper around an API. Many of the tools are APIs. In this case MCP is an AI friendly interface of APIs.
– MCP systems are deployed in file systems, Docker, Spotify, Google Maps etc.
YT – Building Production RAG Over Complex Documents
– Decided to stop after 28min. I believe if a pdf is important enough, it should have its own bespoke parser, not a universal parser.
– QA system for company
– how to get high-response quality
– naive RAG … glorified search
– todo1) improve data quality
– todo2) improve query quality
– RAG is only as good as your data
– need to make sure parameters are fine tuned
– Parse, Chunk, Index
– Complex documents i.e. table, chart, image, page # -> page chunking has limits
– Llama Parse – AAPL 10-K example / figure check test
– Advanced Indexing: nodes will be indexed by the LLM
Got curious how PayPal works. This video is pretty good. Alternatives back in 2000. 1) Mailing check. Slow. Potential loss or fraud 2) Bank wire. Need to disclose bank account details which one may not want to. Domestic wire took 1-2 days, longer for international. Digital banking was less accessible. High fees. 3) Punching in credit card numbers. was not facilitated to all sellers by eBay (Grok). Hence PayPal was the best P2P wire solution. Make a PayPal account, link debit/credit card or bank account. And PayPal would handle the transaction. Since eBay provides a messaging function btw buyer and seller, they can wire via PayPal code. No need for PayPal to be supported natively on eBay – although it was supported, facilitated, then bought-out by eBay later on. No need to exchange card/bank details. What could the banks have done? Coordinate hundreds of international banks to start including email as a unique identifier. Meaning having to contact all existing clients to provide an email & verify it. Being agile enough to make the email wire system supported by websites in a seemless, integrated manner. e.g. when Bank email transfers via eBay becomes a thing, be collectively agile enough to introduce more features such as automatic payment requests for the ease of users. Deal with a blow in revenue from traditional wires. In other words, impossible. Banks were not in the position to support this. Visa/Mastercard may have missed out. Grok says they focused on merchant to consumer transactions. Requiring merchant accounts, which sellers on eBay often lacked. Ultimately it failed to provide a user friendly way for P2P transactions. Also banks and credit card companies had clearer regulations already in place, meaning greater opportunity cost. Startups like PayPal and X.com could afford to take more risks.
In March 2001, PayPal had yet to make a profit but our revenues were growing 100% year-over-year. When I projected our future cash flows, I found that 75% of the company’s present value would come from profits generated in 2011 and beyond— hard to believe for a company that had been in business for only 27 months. But even that turned out to be an underestimation. Today, PayPal continues to grow at about 15% annually, and the discount rate is lower than a decade ago. It now appears that most of the company’s value will come from 2020 and beyond.
For this I will try some exercise with AAPL.
I’ve decided to work on the processing of pdfs. I get many, regularly, as part of my work. First, I’ll turn one pdf into orderly datasets
Betterment of the bot – better logs, Slack connection, no relay conditions(word limit, special code at the end. Recap study of terms and processes. Host a super simple website that displays the visit count. Leaning towards option three. Already worried what would happen to the billing cost if I get attacked. Once I begin, I’ll find a way. Grok will find a way.
Recap of DigitalOcean Scrib Bot Setup – fully written by Grok
- Initial Updates
We updated the Droplet to ensure the system had the latest packages and security updates.
apt update
apt upgrade -y
What This Did: Fetched the latest package lists and upgraded all installed packages.
- Download Python, pip, and Other Packages
We ensured Python 3 was available, installed pip, set up a virtual environment, and installed necessary Python libraries (tweepy, feedparser, python-dotenv, requests, beautifulsoup4).
python3 –version
apt install python3-pip python3-venv -y
python3 -m venv venv
source venv/bin/activate
pip install tweepy feedparser python-dotenv requests beautifulsoup4
pip list | grep -E ‘tweepy|feedparser|python-dotenv|requests|beautifulsoup4’
What This Did: Confirmed Python 3, installed pip and venv, created a virtual environment, and installed libraries for X API, RSS parsing, environment variables, HTTP requests, and web scraping.
- New Directory and the Files Included
We created a directory for the bot’s files, set up environment variables, and created multiple script versions.
mkdir scrib-bot
cd scrib-bot
python3 -m venv venv
source venv/bin/activate
nano .env
nano scrib_to_x_vps.py
nano scrib_to_x_vps_ver2.py
nano scrib_to_x_vps_ver3.py
ls
Files Included: venv/ (virtual environment), .env (X API keys), scrib_to_x_vps.py (initial script), scrib_to_x_vps_ver2.py (tweets only text), scrib_to_x_vps_ver3.py (added persistence), scrib_to_x_ver3.log (log file), last_entry_id_ver3.txt (stores last entry ID).
What This Did: Organized files in /root/scrib-bot and iterated on the script for improved functionality.
- What Is tmux?
tmux is a terminal multiplexer that lets your script run in a detachable session, continuing even after closing the Droplet Console.
Commands:
apt update
apt install tmux -y
tmux new -s scribt-bot
tmux rename-session -t scribt-bot scrib-bot
cd /root/scrib-bot
source venv/bin/activate
python3 scrib_to_x_vps_ver3.py
Inside tmux, detach with: Ctrl+B, release, then D
tmux list-sessions
tmux attach -t scrib-bot
What tmux Does: Runs your script in a background session (scrib-bot), allowing detachment and reattachment. Your script is currently running in this session.
The previous one tweeted the link. I just want the text. Checking if this is any better.
Created a Droplet. Ubuntu 24.10. Updated packages. Set up venv. Downloaded packages, including Tweepy. Running the bot now. It should theoretically relay this tweet on X.
Time to deploy a bot. Grok suggests I should use DigitalOcean. Did some more research – like why wouldn’t I be using AWS, Azure or GCP. Too inexperienced to understand everything. Gist is that DigitalOcean would provide decent cloud experience, affordably, with less complications. Trusting Grok’s judgement, I shall deploy a bot running 24/7 on DigitalOcean.
Checked out X API. The free access will support up to 500 post writes per month. Plenty. This guy has a good tutorial too. Very useful page. But most useful is Grok. Sign up for the developer account. Check. Renamed default project name. Unnecessary, but check. Generate API Key and Secret – the username and password for the app. Check. Post using Python with Tweepy module. Check
First of the two prerequisite for the bot to work. New scrib -> RSS feed -> extract URLs -> compare with previous URL list -> identify new URL -> navigate to new URL to extract content. It works.
Thinking of how I can mirror posts. This blog and X. I don’t want an external service. Grokked it. One. Add a WordPress hook in php that triggers a custom function equipped with X API credentials to post directly to X. – I don’t feel comfortable hard coding credentials in the php code. Two. Use a polling bot to monitor RSS feed. When new post is created a Python bot from a server will post contents to X using X API. Leaning towards option two. Haven’t deployed servers before. So good experience. Also this task shouldn’t be too expensive. I’ll have to start by testing the RSS feeds and X API first. If these don’t work it’s a pipeline for nothing. Worst case can deploy agents using Selenium? Or can ChatGPT handle this?