OpenSource LLM
Applications Serverless Cloud
Think OpenSource LLM.
Think Serverless
Or probably could have by the end of session:
OpenSource LLMs vs Paid LLMs
Own Cloud hosted LLM vs Serverless Pay-as-you-go LLM APIs
Note:
Mistral AI Instruct
. Let us see how the intermingling of 2 concepts - Serverless + Open Source LLMs - help you build demo-able PoC LLM applications, at minimal cost.
#LLMOps
#MLOps
#AWSLambda
#LLMonServerless
#OpenSourceLLMs
What is RAG, How does RAG improve LLM Accuracy?
Retrieval augmented generation, or RAG, is an architectural approach that can improve the efficacy of large language model (LLM) applications by leveraging custom data.
Source: Databricks
How does LLM work?
Source: AnyScale Blog: a-comprehensive-guide-for-building-rag-based-llm-applications
How is a Vector DB created
Source: AnyScale Blog: a-comprehensive-guide-for-building-rag-based-llm-applications
Detour: If you wish to use other Vector databases
Source: Data Quarry Blog: Vector databases - What makes each one different?
MLOps Concepts:
cache_dir
well. Otherwise, models get downloaded everytime docker container is createdos.environ['HF_HOME'] = '/tmp/model' #the only `write-able` dir in AWS lambda = `/tmp`
...
...
your_model="ab-ai/pii_model"
tokenizer = AutoTokenizer.from_pretrained(your_model,cache_dir='/tmp/model')
ner_model = AutoModelForTokenClassification.from_pretrained(your_model,cache_dir='/tmp/model')
AWS Concepts:
aws cli
is your friend for shorterning deployments, especially for Serverlessaws cli
and OpenAPI
spec makes it replicable## AWS Lambda ARM Architecture Costs (assuming you have used up all your free tier)
Number of requests: 50 per day * (730 hours in a month / 24 hours in a day) = 1520.83 per month
Amount of memory allocated: 10240 MB x 0.0009765625 GB in a MB = 10 GB
Amount of ephemeral storage allocated: 5120 MB x 0.0009765625 GB in a MB = 5 GB
Pricing calculations
1,520.83 requests x 120,000 ms x 0.001 ms to sec conversion factor = 182,499.60 total compute (seconds)
10 GB x 182,499.60 seconds = 1,824,996.00 total compute (GB-s)
1,824,996.00 GB-s x 0.0000133334 USD = 24.33 USD (monthly compute charges)
1,520.83 requests x 0.0000002 USD = 0.00 USD (monthly request charges)
5 GB - 0.5 GB (no additional charge) = 4.50 GB billable ephemeral storage per function
4.50 GB x 182,499.60 seconds = 821,248.20 total storage (GB-s)
821,248.20 GB-s x 0.0000000352 USD = 0.0289 USD (monthly ephemeral storage charges)
24.33 USD + 0.0289 USD = 24.36 USD
Lambda costs - Without Free Tier (monthly): 24.36 USD
c5.large
(minimal CPU) EC2 instance running throughout the month, cost = 60 USDg4dn.large
(minimal GPU) EC2 instance running throughout the month, cost = 420 USDFinally, the LLM Concepts:
Models are like wines and these LLM frameworks are like bottles. The important thing is the wine more than the bottle. But getting used to how the wines are stored in the bottles help.
Next Steps for the author:
For Phi3-Mini-RAG:
Sentence Transformers
)Sources:
Next Steps for the reader:
aws cli
and OpenAPI