Messages and calls are end-to-end encrypted. No one outside of this chat, not even WhatsApp, can read or listen to them.
You created group “AI Nerding”
Add pip!
Can you make me admin @61421850424 ? Will add Andrew from my team
Luc Pettett added ~ Andrew Bucknall
@61488427815 welcome!
Chris’s welcome post:
Hi Andrew
Post from Mike Knoop, Zapier founder, on their AI dev best practices
"Building AI-first products with LLMs
We have now experienced the full dev lifecycle of many AI-first products
@zapier
(AI Actions, AI bots, Zap guesser, Chatbot builder, more). Here is the advise I give to our AI eng teams."
https://twitter.com/mikeknoop/status/1786453100955656618
You added Pip Bingemann
Welcome Pip
Anything of interest on the AI front at Sunrise?
Nah. Content was pretty average for most part. Good event but wouldn’t go if primary reason is to learn. Better for networking etc.
so, are we reasonably up the game here in the provinces?
100% hub exchange is better 😁
Braintrust seems to be the topic of the day in my feed. Auto eval saas for gen ai.
https://www.braintrust.dev/
You added Kieran Browne
great timing!! testing it this morning to overcome our classification issues
Had a play with crewai - one of the agent frameworks. Quite interesting to read through its internal monologue as it works through the tasks it has. Thought I'd share here in case anyone wants to take a look.
The problem was: "I am a founder and my startup has created an AI co-pilot that takes the role of sales & bd co-founder. Please develop a go-to market plan for our platform: Superfounder."
https://docs.google.com/document/d/1Afn6kxPJ-JGZNToMzpw0esQIKFn26-WSmzae8T7GDQw/edit?usp=sharing
Am using llama3 80b for this, via openrouter
36 api calls
Approx 50-60k input tokens
<5k output tokens
total cost: $0.07
[spoiler alert: the agile coach pulled things off the rails]
Quite a few useful outputs along the way even though the final output not great
lol "“this is the biggest day for iPad since its introduction” in 2010"
urk
any AI reveal?
not from what i can see
“With iPad Pro with M4, users can perform AI-enabled tasks even faster, like easily isolate a subject from its background in 4K video with just a tap with Scene Removal Mask in Final Cut Pro. With this advanced level of performance, the neural engine in M4 is more powerful than any neural processing unit in any AI PC today.”
It comes as rival Microsoft – which overtook Apple to become the world’s most valuable company earlier this year – is set to unveil many AI-enabled laptop computers, including models powered by Qualcomm’s Snapdragon processor, at its Build conference in Seattle later this month.
who needs an overpriced AI wearable when you can get an overpriced AI paperweight
The unified memory architecture seems to be miles ahead of typical computers for doing local inference.
With MS being cozy with OpenAI surely they’ll want to maximise cloud usage (and sell your data?)
opensource will fit with apples mantra.
i think the one caveat in the llama licensing is for bigcos
interesting an frustrating LLM problem
- THIS is the coolest thing ive seen in a while.
so many cool applications here.
and this is a little wild. https://websim.ai/ - a fully simulated internet. All powered by claude.
woh websim super interesting
yeah websim is wild and fun. that google notebook thing i think is game changing and not for notetaking purposes.
You added Scott Hannan
Friends, I'm just adding our Silicon Valley correspondent to the group, Scottie Hannan
Scott is a long time friend of mine who went to the Valley and did good while the rest of us came home with our tails between our legs. https://www.linkedin.com/in/scotthannan/
need to switch up the words!
Scott, for context:
Luc & Andrew are building telescope.co
Pip & Kieran are building sprinboards.ai
Aaron sold Retail Express to Maropost last year and is building a new AI platform
Guys, Scott led platform partnerships at FB after they acquired his startup, then went on to run partnerships at Pinterest, then tried to save Sam Harris from himself, then helped vidmob with their stuff. Recently been helping AMZN with their gen ai stuff for merchants, etc.
Lives in Palo Alto and very plugged into what's happening in the Valley through his Fb, Pinterest etc networks. Also nerding out on AI. Scott wrote the post I shared this morning
Thanks Chris and nice to meet you all. Silicon Valley correspondent, I like that ;)
As my first nerd contribution I came across this today. TLDR using an LLM to prompt an LLM, with inputs to the former being trained on pairs of prompts with performance metrics. Iterative model driven prompting. www.arxiv.org/abs/2309.03409
this is the first AI generated track that has connected with me.
they launched an inpainting feature yesterday where you can start to edit tracks - take a part of the wave form, eg bars 12-16 and say piano solo here, or choral background or whatever.
I guess in this instance the creator has fed in the amazing grace lyrics, specified gospel style and then inpainted various bits of accompaniment
"OpenAI plans to announce Google search competitor on Monday, sources say"
Let's head over to our SV correspondent for the latest...Scott?
Ha ha well it was supposed to be last week but they postponed. It’s a no brainer/gap for them, they might have even delayed so MSFT could position themselves. If it’s good it will hit Perpelxity pretty hard. And P and others are all pursuing enterprise licenses too, where OpenAI has far more install base and resources. Eg why should we license P as well if we already have ChatGPT (now with search)?
Welcome Scott 🙌. PS Perplexity will be easy to beat, and cant wait (their hallucination rate is super high when prompting for detailed data /metrics across various domains - too much reliance on their own proprietary model..)
gents if you're looking to validate ideas and research early for PMF - cant recommend this platform enough for conducting live qualitative 1:1 interviews with your future customers
https://www.userinterviews.com/
There’s a Brissie-based startup that does something similar too https://www.askable.com/
10am Monday all set. Let’s see what it brings. I’m still once bitten from Kevin Systrom announcing: This. Changes. Everything. (As in, the world over) when IG added video ;)
New feature from Anthropic for helping with prompt engineering.
https://twitter.com/AnthropicAI/status/1788958483565732213
“Describe what you want to achieve, and Claude will use prompt engineering techniques like chain-of-thought reasoning to create more effective, precise and reliable prompts.”
Useful platform here if you're looking to connect unstructured external data sources to your LLM:
https://airbyte.com/solutions/artificial-intelligence
Interesting thread on AI Lawyers.
Jeez: “Today Google employs 828 "product counsel." That's more than only the biggest law firms.”
https://twitter.com/CeciliaZin/status/1788847007647604783
OAI put out their new model spec a few days ago with examples like this 👇which seem to back up that 👆prediction
I was testing the gpt2 bots to see if they had recent knowledge but they don’t. Will be interesting to see what drops! Fwiw I’m a big fan of Kagi, been using it full-time instead of Google for over a year, and they have an LLM-summarised “quick answer” feature that works quite well.
Sorry, that answer above was in response to my question
"what impact did the speculation about openai launching a search product, have on google share price"
crikey that's good
I know @61423698470 uses perplexity quite a bit given they provide sources, helps the user build confidence in the response
that's my expectation of tomorrows OpenAI release, referenced sources which in turn is a reverse engineered search engine
This is so interesting.
https://twitter.com/emollick/status/1789430495266160902
Husband bot time :)
ah my wife is froffing for an upgraded model, she's been stuck with LucGPT3.5 for 20 years poor woman. the context window sucks, no implicit memory, inference costs are so high and availability so unreliable.
luckily the temperature is high it keeps things interesting
constant hallucination, wildly variable outputs, massive training cost
New AI saas product category, one off purchase of software that runs on local o/s models. Example
https://pdfpals.com/buy
Interesting. It’s like buying a bundle of inferences
now that there is openai-level o/s embedding models that run locally, vector search basically makes its way into desktop apps
can have a very good 'free' rag experience with llama3 7b model and nomic embedding model both running locally via ollama
Well that’s it, tough to manage expectations theses days. Impressive cross context progress though it was pretty glitchy demo. RE Apple, I guess they abusing the maps and search experiences as a guide. Eg either not ready (maps) or not core (search), while extracting a large fee and probably rights no one else has to underlying data. They still get arms length of the AI does bad things. Still if AI becomes part of the OS (which surely Gemini will) Apple are either mis-stepping or this is a stop gap.
From Jim Fan:
- Whoever wins Apple first wins big time. I see 3 levels of integration with iOS:
1) Ditch Siri. OpenAI distills a smaller-tier, purely on-device GPT-4o for iOS, with optional paid upgrade to use the cloud.
2) Native features to stream the camera or screen into the model. Chip-level support for neural audio/video codec.
3) Integrate with iOS system-level action API and smart home APIs. No one uses Siri Shortcuts, but it's time to resurrect. This could become the AI agent product with a billion users from the get-go. The FSD for smartphones with a Tesla-scale data flywheel.
Thanks Scott. Has anyone told Tim Cook that Microsoft own half of OpenAI?
I want to be able to message A.I. agents with friends in my messages. Give the AIs some moral rights and bring them into the ecosystem
Gpt 4-o model pricing
Input: $5 | Output: $15
per 1M tokens
I found the voice demos unbearable
The glitches or the personality?
Personality
Patronising know it all
If you have a super smart ‘friend’ who is a patronising know it all, you tend to avoid them
Was there anything about agent capabilities or web use?
I’m wondering with these naturally multi modal models how do you gate input from users from just asking for images?
Custom moderation is becoming so much of what we’re doing
The chart reading actually works!
Ohh that’s handy
You know she mentioned “Browse” in the GPT4o intro but her slides and voiceover didn’t match, so she barely explained anything that might be new here.
Feels like a big bet on consumer (mobile/voice/vision) as opposed to worker.
LLM os with gpt-4-o
https://twitter.com/ashpreetbedi/status/1790109321939829139
almost as sensitive as sensitive claude
the second half of the greg brockman demo was pretty wild
https://vimeo.com/945587185
Yeah that is quite amazing, especially the speed of initial response, processsing and follow-up. The computer vision also…imagining all these use cases where specialists no longer need to do a site visit. Anyone can do the walk through for them, then they’ll receive this incredibly detailed narrative along with the visual.
The bit where it recalled that the person had walked behind him and done the bunny ears - historical visual recall - wonder how that is being done. are they snapshotting the cam every x seconds and transcribing what is seen into text and adding into chat history/context window? would love to know.
image input is priced at $0.002312 per image
The models are impressive but I felt so uncomfortable/awkward watching the videos. I guess it’s a jab at Google’s over produced fake videos? 🤷 That visual recall bit was particularly impressive!
Yes, we're in the uncanny valley of voice now. I'm interested to see if we can prompt the voice and delivery into something that doesn't have patronising/fake familiarity vibes
Yea it reminded me of B-grade voice acting in an old video game
it's winning over the retail punters that's for sure
Yeah, or all video / images recorded as embeddings, and if called upon (like he did) look for anomalies and then translate to human readable.
quite a cool video explaining multimodal embedding using the sticker baker website
https://twitter.com/zeke/status/1790551941984235938
You added Ian Hansel
Adding data science legend and rag expert Ian Hansel (from the Hub) to the group
Thanks Chris 🙂
"This is something I've worked on for a while! You can save the activations of one LLM call and reuse them for a follow-up that overlaps with the first.
This means asking a question about a big codebase can take 30 seconds the first time and 1s after that!"
Just wanted to let you all know that I've resigned from OpenAI
noo
we're finding gpt4-o is producing much more interesting results, more refined and less "AI like". How do you decide that llama3 is better @61421850424 regardless of inference costs etc?
do you benchmark somehow?
Haven't had a play through api yet but finding chatgpt much better at coding since new model rollout. its lazy days are gone. just wants to write code. i don't have a rigorous way to benchmark - just run a bunch of different llms on same prompt and see which one best follows instructions and gives best outcomes. i have chatgpt desktop app now but not seeing the feature for having it "see" your desktop as they were doing in the demos. if someone works that out, please let me know. also don't think latest mobile app is out in aus app store yet.
This is pretty cool for evals, https://ukgovernmentbeis.github.io/inspect_ai/
You added Andrew Birt
Added Andrew Birt to the group.
Hey fellas, i'm about to invite a guy to the chat called Mark Tenenholtz. Check him out on Twitter:
I met him in the US on the AI delegation tour. He's an absolute champion, wonderful guy, incredibly bright and driven.
https://twitter.com/marktenenholtz
Cool, I've been following Mark on Twitter for a few years. Good addition
It's only been 48 hrs but seems to me none of the multimodal capabilities available yet through api or mobile/desktop apps. Anyone seen different?
Same
yeah he's got a solid community on twitter, 100k+ followers
haven't tried but certainly using GPT4-o in production now for inference tasks, doing really well
Yes, the text gen seems to be a big improvement
I guess new sdk's and big changes needed for the multimodal apis
The image recognition works on the ChatGPT web app at least, I haven’t tried mobile though
I think voice is working through vapi.ai, can select 4o as the model and then prompt and chat to it.
A matter of time before big orgs are monitoring our screens, summarising everything on local inference, streaming org level insights into a mega model to exec teams, or exec models making decisions on capital deployment
Or Beffjezos .ai
~ Mark Tenenholtz joined using this group's invite link
Welcome mark!
Hey folks! Thanks for the invite Luc
Craft Ventures new Glue.ai (competing with Slack) using this interesting rag as a service platform. Ragie.ai
https://ragie.ai/
Thanks Aaron. Quite a few interesting takeways from that latest All In episode. Something David Sacks said about Ai business that resonated: "if you're an app developer, the key thing to understand is where does model innovation end and your innovation begin? because if you get that wrong, you'll end up doing a bunch of stuff that the model will obsolete in a few months"
Also, latest Dwarkesh Patel podcast dropped. He chats with John Schulman, openai cofounder.
A lot of fascinating stuff in this conversation too. I think gpt-4-o has really given a sense that agi is on an accelerated timeline. In this chat, Schulman is talking (1-3 years). Very interesting stuff about long horizon training and what that unlocks, also what post-training is, etc. But the thing that really is wild are the implications of AGI on that timeframe. There are no adults in charge.
That’s hopefully going to keep driving open source and local LLM development. Outputs are already untrustworthy enough from default gui outputs.
Similar to what was leaked as part of OpenAI’s “preferred publisher program”. They didn’t say ads explicitly, but they did say “priority placement” and “richer brand expressions.” As you say Chris I don’t think it’s the right thing for users but the pressure to drive revenue is only going up.
Been playing with Gemini via AI Studio over the weekend. Findings:
Google have realised that there were waaaaay too many obstructions in the way of playing with gemini and seeing what it could do. Getting api keys etc previously was a hell of thing via cloud console, etc. AI Studio makes that all a lot easier. Google AI Studio is analogous I guess to OAI playground and Anthropic console.
The most interesting thing is of course the massive context window (2m tokens) and the fact that its free to play with right now. The AI studio lets you create and save prompts, including use of docs from drive. This ultra large context window is quite a different thing to RAG and yields impressive results. My use case was doing desk research where I uploaded a large 60page report from nasa about drone use in firefighting and a large document of ours and asked it to do comparative analysis, etc. This is not something you could do with RAG. In this case, it presumably parses out the entire content from the pdf/s and pastes it into the prompt history, which gets passed in with each question. Responses can take up to 60 secs but for certain kinds of work, this is worth the wait because the outputs are definitely legit.
I think it’s surprisingly underrated. Especially with file uploads basically being just like google drive. I’ve done video summaries that are great, and coding isn’t too bad either. The responses definitely feel like they take longer but given the tasks I don’t feel like it’s too long (I don’t even have 2m context yet 🙂)
Got Fomo and signed up for this today, anyone else doing it? https://maven.com/parlance-labs/fine-tuning
I would appreciate this, hardens our position as a custom LLM
looks fun! would love your notes!
https://developer.chrome.com/docs/ai/built-in
You might like to join this @61421850424.
Man, on device LLM could give Google one leg up over OpenAI if they roll it quickly. If it's good and free, Googs could integrate it into their responses and combine it with their ad-model and undercut OpenAI for a decent free model (assuming the actual responses are good).
fascinating
Interesting new pattern: the AI tour guide through a ux
https://twitter.com/tryramp/status/1792659194996281478
Microsoft event overnight. Pc copilot getting realtime access to your screen and uses multimodal gpt-4-o model to do voice and vision by the look of it. This use case is getting coached through minecraft in realtime
https://twitter.com/AndrewCurran_/status/1792629845949620453
Fantastic, been waiting for this! RPM, RPA kind of tricky at the moment or expensive with UiPath and the like.
awesome video from scott galloway
I dare say the anti prof galloway index has been a pretty good bet over the years. Should be a synthetic etf option on telescope.
The contra bets:
Tiktok: Ccp refuses to sell and it gets shut down in US
Ozempic, etc: have peaked and will crash
AI: not close to peak, you ain't seen nothing yet
India is the new china: this is predicting the present/past not the future
Meta: fine, ok, whatsapp
Loneliness: AI materially contributes to solving loneliness
Another day, another 3 enterprise focused (re)launches… Inflection… we have a new team, 18 months of runway and the highest EQ model. Sean their new CEO is a school/running bud.
Lanai, Lexi Reese ex Goog and Senate candidate(!!) teaming up with VMware CTO. “Lanai is based on the premise that companies need to lean into lots of different Gen AI services to stay competitive. But you can only gain durable advantage if you can see clearly how employees are interacting with Gen AI, protect against the bad stuff and help accelerate the positive use cases. “
And Stack funded by GV: “Thanks to Stack AI, you don't need to reinvent the wheel or direct your best engineers to infuse AI throughout your organization. Our platform allows you to deploy custom AI assistants and AI-powered workflows with the click of a button. Both technical and non-technical users can leverage our end-to-end drag-and-drop visual interface and APIs.”
fantastic newsletter if you guys aren't following already
I'm a big fan of the unsubscribe button. this one makes it through every week
So is this just a dimensional embedding map on temperature?
Time to reverse engineer the weights 💪
He talked about creating lie detector tests for AI's. I guess this is how you would do it - like ask them questions and see if certain neurons/features fire - like polygraph for humans
"Do you plan to take over the world?"
'No, why I would I want to do that?'
📈
I think this probably comes at a really good time when decels are trying to shut down AI progress. The research suggests that we can understand what's happening in the black box and control/steer these things away from harmful outcomes, not just just through system prompts and output filters
Yeah really like Linus’ takes on ui/ux and diving into models and embeddings
Rag cloud content?
Or just upload
upload from google docs
then once uploaded, eg goog sheet, it is interactive in the ui
its interesting OpenAI is really running such a duel strategy with ChatGPT vs the API, focusing on consumer. I would have thought they'd just go all in on the platform but somehow they're juggling both
they're miles ahead on consumer
even tinkering with llm app makes me realise how much has gone into that product (beyond the llm itself)
streaming video will be super helpful. it's going to be the new tutor for our kids at home
Incredible really. Could save a few school fees in years to come 😂
I chatted with OpenAI’s GTM lead at their DevDay conference and he said the products they release right now are akin to chip-level components on a motherboard. Definitely has been the plan this whole time
nice to know we can code in GPT's assembly language
mov eax, 1 ;
Impressive list and though they are under fire, it’s so hard for Snap, Pins, Rddt / open web to compete with this.
https://support.google.com/google-ads/answer/14835145?sjid=17426367770629278872-NC#ai-essentials
anyone know an effective way to convert image based presentations into markdown?
my approach would be to screenshot and pass in image to a prompt template which has a target output schema that you want it to output to, ie image in, json out. what's the nature of the images?
From MS Build event: if the compute power to train GPT-4 was an Orca, they've deployed a "whale" for training GPT-5
need to turn these types of pages into markdown
basically any annual report
from anywhere in the world
Really interesting perspective from Jasion Liu here on rag as report generator vs rag as chatbot
https://twitter.com/jxnlco/status/1793800023689338921
Can’t disagree!
I like that. Can relate.
Spot on, learning by playing.
Made me a little more skeptical of Sam. Obviously don’t like each other but she comes across as very reasonable. And the response from Bret Taylor at the end is heavy handed IMO. https://open.spotify.com/episode/4r127XapFv7JZr0OPzRDaI?si=7QIXXlbWRdG3F6Vt_ecJRQ
Nice idea and write up on linear adapter for queries in rag - https://research.trychroma.com/embedding-adapters
Jeez, a lot of slander in that 20mins.
Ms Toner must have good lawyers.
I thought she went easy on him lol
I guess some startups have hardware too.
Is this a slop factory? Or something useful
https://twitter.com/AravSrinivas/status/1796220011448786949
Strategy is to throw everything at the wall. I don’t think threads were being shared very much so to me this is the angle, a more persistent/shareable unit. Not clear if Google will be able to crawl the Pages. Also kind of reminds me of Pinterest - curating a board to inspire others… this behavior though is not mainstream.
RIP Good Times Sequoia 08 vibes
This is pretty cool. Experiment from google which takes a paper and turns it into an audio/podcast type discussion between two people - one sort of asking the native questions and the other the science explainer
https://illuminate.withgoogle.com/
*naiive
Matt Webb (@genmon) is one of my favourite creatives - he’s got a great post here about embeddings which gives some thought-provoking hints of the untapped opportunities for new products and services.
https://interconnected.org/home/2024/05/31/camera
Love this. It’s pretty close to everything I’ve been working on with ads/creative. The line “put essays side by side…see where their topics intersect l” is exactly how we’re training models on image and video embeddings, looking at the relationships related to performance (eg CTR) and adjusting GenAI model weights to complete the flywheel. There are many issues and corner cases to be figured out to ensure it doesn’t become a homogeneous hellscape.
Make sense why OpenAI is cutting all these data deals atm. I bet they’re locking it down to other LLMs
This looks pretty cool, kind of like perplexity pages, shame it’s unavailable - https://notebooklm.google/
Another cool Matt Webb demo: smart home agent. @61434680637 you'll like this
https://interconnected.org/more/2024/lares/
Ha, that’s fantastic. Have been doing a little bit of advisory work for U-Tec will forward that on. 🙏🏼
Thread by @joshm on Thread Reader App – Thread Reader App.pdf • 3 pages
hehe simple but spot on.
Predictions on AGI and economic / military advantage written by Leopold Aschenbrenner ex OpenAI. speculative / anecdotal but really interesting.
https://situational-awareness.ai/wp-content/uploads/2024/06/situationalawareness.pdf
This is available now, pretty good, worth checking out, like a nice general purpose RAG system - https://notebooklm.google/
Alogorithms for Decision Making.pdf • 700 pages
some nice reading for the weekend fellas
a friend just came back from stanford, this was part of his volumes of notes he's shared
pip_springboards.pdf • 68 pages
Nice. Could be a few weekends in that one 😅
Latest YC podcast
"Better AI Models, Better Startups"
Halfway through this epic podcast with Leo…guy has a big brain! https://open.spotify.com/episode/5NQFPblNw8ewxKolIDpiYN?
Anyone experimented with running models on something like vs AWS? Tempted to give it a shot. https://www.hetzner.com/dedicated-rootserver/matrix-gpu/
This message was deleted.
How Claude gets its personality. A conversation with the researchers that handle the fine tuning
websim is such a fun thing to play with. Quite remarkable really. Not sure if I am doing it right but my tactic is to put sort of made up urls into the embedded browser window and manipulate what it creates using the the url. I 'made' one tonight for viewing different algae species.
If you haven't played with it, my explanation would be that its actually a new kind of llm modality. Instead of a chatbot with prompt in and llm output response, the output is a web page coded and rendered on the fly (using claude opus). In my case, the 'prompt' is a url like this "https://algaeviewer.math/species/spirulina" and claude faithfully builds a web page with actual algae visualisations, incl particle simulation. you could do something like add a url param to the url eg ?zoom=300x&speed=2x and it will then create a new page with those params giving it a hint as to what to build next. So cool and inspiring.
https://websim.ai/c/hpTVToceEcJcNdItS
Nice move. I run with Kevin occasionally, he’s an ex physicist and super thoughtful product guy. ChatGPT should see a marked improvement, especially given it was an accidental consumer success https://www.theinformation.com/articles/openai-hires-sarah-friar-as-cfo-kevin-weil-as-chief-product-officer
Apple announcements seem pretty meh. The minimum one would expect. A long way from autonomous agent Siri
Yeah agree. Part of the challenge is the existing expectation that iOS stuff does inter-operate and is magical, and people don’t care if it’s AI driven or not. I do think App Intents is the killer ecosystem piece…it removes friction and since they have the apps, auth, payments etc it can in theory enable all the Rabbit demo stuff (for real). To your point though Chris the orchestration across apps is missing. And, you have to get people to build. The Waking Up widget (for example) didn’t drive more meditation mins :)
Gmail - WWDC, Apple Intelligence, Apple Aggregates AI (Stratechery Update 6-11-2024).pdf • 9 pages
Nice summary from Ben.
Long live Apple automator!
That is a good thought Chris. The fact that you can screen mirror on a Mac too paves a path for a third party app to try it too
And we do have to talk about naming. OK fine it was genius :) But, also had me thinking that if Apple was around in 1908 and built a combustion engine car they would have announced it runs on *incredible* Appoline.
Thread by @mihail_eric on Thread Reader App – Thread Reader App.pdf • 3 pages
Just took the ultimate leap of faith: self serve checkin and automated bag drop at BNE for multi leg international flight.
Literally no staffed checkin counters anymore - a stealth redundancy of a whole job category coming out of covid.
It was also a reminder that although the talk today is about AI taking the jobs, the biggest job eliminator of recent times has been the rather less intelligent technology of bar codes and QR codes.
Unfortunately people only care when it’s the white collar jobs. “Unskilled” jobs have been getting automated for decades.
Happy travels Chris. Hope the barcodes didn’t send the bags to Zimbabwe.
Miraculously they made their way, autonomously, through the labyrinthine luggage supply chain onto the designated baggage carousel. A miracle of modern life
Safe travels! Did the golf clubs make their way through too?
An Interview with Daniel Gross and Nat Friedman About Apple and AI – Stratechery by Ben Thompson.pdf • 22 pages
Thanks @14154125965 that was fantastic
I was just reading Benedict's Newsletter this morning, also very interesting views that align somewhat:
An Interview with Scale AI CEO Alex Wang About the Data Pillar for AI – Stratechery by Ben Thompson.pdf • 16 pages
The naunce between a visual generation tool (e.g. MJ) and a design tool is interesting …almost like the former is OK cause it’s just inspiration but for Figma, it’s a design tool and feels more like copying.
https://twitter.com/asallen/status/1807669848002253250
Get to it team! Sequoia reckons we need $600b in rev to justify the Ai infra investments of recent years
https://www.sequoiacap.com/article/ais-600b-question/
A leak of the Claude 3.5 system prompt shows best practice for chain of thought prompting. Check it out
https://twitter.com/elder_plinius/status/1804052791259717665
Whitepaper drop: https://arxiv.org/pdf/2406.17711 10x improvement on model training for multi-modal (image-text pairs) from the Google Deep mind team.
just wanted to say thanks for this Chris, we altered some of our prompts, worked super well.
I've recently made the jump to Claude.ai over ChatGPT... the artifacts feature is wonderful, and Claude 3.5 Sonnet seems to do a better job at reading/writing code. The system prompt generator in the console is great for getting started too, it would be nice to be able to "work on it" using a chat+artifact like UI though
Doing an Upstream session next Tuesday on using Google AI Studio for proposals/tenders/applications etc.
Each of these tools has their particular strength and Gemini 2’s is 2m token context. Allows you to stuff a prompt with so much valuable context and produce really useful outputs.
So cool that there is massive competition and really at $20 a month you’d be placing a very low value on your own hourly rate equivalent if you didn’t splash out on them.
Come along 10:30 of you like.
Who’s running that session Chris?
I reckon it’s nothing but yours truely the Peregian janitor himself
handy!
wow.
@61421850424 Great talk today. 2M context window still blowing my mind 🤯
If you are interested, here are the slides from my talk on using Gemini 1.5 pro (via Google's AI Studio) to develop proposals in response to RFP's/Tenders/EOI/ITQ/Grant applications, etc. It includes a link to the 'Proposal Writer' prompt chain template in AI studio that was developed for the workshop.
tinyurl.com/upstream0724
This is great. My son has been using Khanmigo (Khan’s AI teacher) since it launched. It’s often like having the teacher right there. Doesnt give direct answers, super encouraging, forces you to think harder, etc.
Sorry not very AI related..!
Dave Hughes did a joke along these lines 😂
haha gold
Luc wait until you see what they call plumbers that unblock drains :)
@61423698470 you might want to load urban dictionary into context and spin up a springboard called "cringe test"
Haha funnily enough we have something like that on our roadmap!
Something worth noting about Gemini pro's 2m token context is that they have implemented token caching in the api.
here's my understanding of how it works and why its a big deal: let's say you have a huge base prompt chain of say 1 million tokens (eg the text from 8 biographies about steve jobs), and your app was a steve jobs chatbot that could provide product/startup advice by answering questions as steve (using all of that biographical content as context).
Each time a user asked a question (of say 100 tokens), it would append that to the base prompt chain and all 1,000,100 tokens would be submitted to the api to generate the completion (ie steve's advice). in the normal case if the user asks 5 questions, that would be 5 x 1,000,100 = 5,000,500 input tokens. Gemini pro's pricing is $2.5/M input tkns and $7.5/M output tkns. And so, answering those 5 questions would (normally) have cost $12.50. You'd go bankrupt pretty quickly.
BUT context caching changes the economics significantly. Because you can cache the base prompt chain of 1m tokens on the first call, you only then pay for the incremental input/output tokens. And so, the cost for the subsequent calls is negligible - you could generate at least a hundred typical completions before you incurred another dollar of cost.
We need to think creatively about the new kinds of use cases this unlocks - its a very different mindset to RAG and lends itself to a raft of novel ideas not really possible up until now.
https://ai.google.dev/gemini-api/docs/caching?lang=python
One idea for a mental model of how to think about a 2m context window is that it is almost like fine-tuning gpt4-o to a particular subject. Bang a bunch of detailed content about a particular subject into that giant prompt and constrain the model to answer only from the info in the prompt.
Eg create a 2m token prompt chain containing everything relevant about a given subject, eg Nike, and that prompt becomes like a finetuned Nike Llm, knowledgeable on everything about Nike
Very cool. Do you think this almost supercedes need for RAG entirely?
just checked our DB of about 15 stocks on our beta (yet to scale up to 8000+ stocks) and we’re already at 3m tokens of context.. so I think RAG will be pretty important for a while for us! Also RAG brings you highly relevant context which in theory should just improve the response quality (removing noise)
one of our team came across this today, some awesome prompts in there:
https://github.com/Doriandarko/claude-engineer
Funny that OpenAI doesn’t promote any benchmark 4-o mini against Claude 3.5.
You added Aulay Macaulay
Adding fellow Ai nerd Aulay to the group.
Oh that’s wild
@61488427815 was testing it for us today. Massive reduction in our inference costs compared to Claude 3.5. We benchmarked it across some other tasks too, held up well.
Switching next week 👌
“Intelligence too cheap to measure”
I wonder how this release impacts Etched who just raised $120m for asics/inference… their commercial pitch is “an order of magnitude cheaper than ChatGPT for inference” - 2 weeks later OpenAI say “our latest is an order of magnitude cheaper…(using GPUs of course)”. Not apples to apples but I bet they are glad they closed their round before the mini news?
Yep very impressed with speed and capabilities, but the price makes it outstanding. Unfortunately it doesn’t have the same chart reading abilities as 4o or Claude 3.5, but seems excellent for light/medium reasoning and ideal for summarisation or rewriting large chunks of text.
haha go zuck.
badass
Doing a replay of the long context workshop at 11am if anyone wants to come along
Ah would be keen but we’re almost live with 2 million users!
Going to be a few tokens flying around 😳
Woohoo.
Will record
That’s awesome, congrats @61466341580 !
I guess we can should add another axes on which LLM's are competing: openness
- model intelligence
- context window size
- speed: tokens per second
- cost per million tokens in/out
- modalities (video, audio, image, etc)
- tool/artefact support
- openness (availability of weights)
The chain still throws me but great way to launch
looks like Zuckerberg has been skiing with his Quest on!
Yeah I think Zuck is finally hitting back at the Icelandverse guy…
haha. man everyone loves Zuck now, those comments on X are so positive. That's what open source does to you
Mid life crisis but done well
When Bedrock say things are in preview, does this mean the full model is available to play with now, or it’s a cut down version? Sorry for noob question. https://aws.amazon.com/bedrock/llama/
doesn't show up for me, my guess is you have to be on their love list.
Couple of big quotes from Ben Thompson’s last two posts:
"This decision [open sourcing] was a master-stroke: all of the evidence I have seen suggests that Llama is the standard for all AI applications outside of direct engagements with OpenAI and Microsoft; there is a good chance that Meta controls the future of AI for every entity that is not a tech company."
"...the E.U.’s AI Act states that any training done with more than 1025 compute is a systemic risk, and thus must be pre-emptively reviewed by the European Commission. In other words, yes, the E.U. is leading the world, specifically, in terms of not getting access to innovation. Congratulations?"
would an analogy be that
llama is 'linux'
and meta is redhat?
Didn’t realise token pricing across providers was such a big variable
Lol at IBM. What was that campaign, “no one ever got fired for buying IBM?”
How times have changed 😂
openai certainly took the sting out of the llama 405b model launch a bit by offering gpt-4-o-mini at that price - it is still miles below the cheapest llama 405b offering on any provider
that chart is interesting, I wonder how it goes scaling up though as surely some providers will subsidise entry pricing versus scale up pricing.
We use Replicate for a few things. Nice and clean API.
IBM will promote that their input output token spread is the best in market
Any thoughts on this Scotty?
Keen to play with it mate (also on the waitlist). It’s hard to see Perplexity having any oxygen left here, other than being able to use competing models for results. OpenAI already won the consumer awareness war, and the revenue war is all about enterprise adoption. Perplexity is a nicer looking and more constrained product for enterprises (on the surface), BUT OpenAI will squeeze them out with sales and ability to bundle and discount. I think the advertising opportunity gets really interesting (let’s say people end up using gpt search 10x chatgpt… they have to monetize.) Again, much better positioned to stand up a basic ads approach and scale it (Perplexity will also struggle if they try this). Possible also OpenAI just use investor $ to ignore advertising and hurt Google in the process. PS - they are only scratching the surface with their data deals so far but they have $ and a clear proposition to publishers (for referenced sources) whereas the same folks are sending cease and desists to Perplexity.
with open ai's deal with Apple this is going to get very interesting. Google pays Apple billions to be their default search provider.. wonder how that deal is looking now if OpenAI power Siri's search results. Perhaps because it's not a "browser" it's carved out already??
either way, it's getting ugly for Googs
I like this approach from Sierra: “Practically, this means that there is only a 25% chance that the agent will resolve 8 cases of the same issue with different customers – a number that is far behind the expectation of a real-world user-facing agent.”
https://sierra.ai/blog/benchmarking-ai-agents
Aaron Blackman added [number redacted] and [number redacted]
These resources are fantastic. Doing the Prompt engineering one now
Curious, just reading that Applied LLMs article and I'm wondering if you all have an opinion on putting RAG results in the system prompt or the user prompt? Which is a better design?
We have a bit of a mega prompt happening, together with a user query, RAG results and other contextual data going into our inference. considering moving elements out of the system prompt into the user prompt hoping better responses. thoughts?
we're shifting to multi-step no matter but can't quite get a grip on our best approach. need a zero-shot architecture, go figure!
I tend to put the rag results in the assistant prompt, as follows...
system prompt: role, examples, format, etc
user: query
assistant: rag context
assistant: ...
---------------------------
With 128k context now, I think you could also simulate multi-step dialog
system prompt: role, examples, format, etc
user: query (example of a real user query)
assistant: rag context (example of real rag search result context)
assistant: final answer (example of perfect answer from context)
user: query (the actual query)
assistant: rag context (the actual rag context)
assistant:...
user before rag, interesting
depending on the model, i sometimes use a second system prompt, eg
https://gist.github.com/chrisboden/837f86d3c729e5719a2cfce6336dc30f
I was just writing a report on them, conclusion being it is hard to see them making it as a standalone company. Great fit for Canva though.
The underlying tech is impressive. The ux is prosumer
That’s interesting, I usually use the user prompt, or Cohere which has a ‘documents’ param in the api (which I’m guessing passes the documents through to the main body of the prompt + some extra work around generating good citations). I’d be keen to see what the difference is in System vs User prompt.
That’s awesome! Thanks for lining that up Chris
For sure, though it’s a costly product to build and support, and Canva (at least up till now) have been able to bundle for no additional cost. This is a good summary of billing models too: https://www.lennysnewsletter.com/p/how-should-you-monetize-your-ai-features
Meta on fire right now. SAM for video: “An image can be considered a very short video with a single frame. We adopt this perspective to develop a unified model that supports both image and video input seamlessly. The only difference in handling video is that the model needs to rely on memory to recall previously processed information for that video in order to accurately segment an object at the current timestep.” https://ai.meta.com/blog/segment-anything-2/
Amazing example
https://twitter.com/AIWarper/status/1818294467222471008
Is this the new post-transformer architecture?
https://twitter.com/AlphaSignalAI/status/1817977965877280892
oh man Im certain persistent memory is the next unlock. hopefully this is it
Will be fascinating to see how coherent these long outputs are, could almost write a novel with 64k 😅
Hallucination Index Report (1).pdf • 27 pages
Wonder if long output is just a sliding window variation of gpt4-o
Sitting here typing a message on WhatsApp and at top of the screen is this
And it reminds me of A16Z GP Alex Rampell’s post a few years back
“The battle between every startup and incumbent comes down to whether the startup gets distribution before the incumbent gets innovation.”
Kind of mind-blowing that Meta have slipped a gpt-4 level LLM model into prime ux real estate in WhatsApp. A mere 3bn users now with that at their disposal.
Did openai get enough distribution before meta got innovation?
It’s been in instagram for a few months. Noticed it in WhatsApp last week. I find it annoying. Not the best place personally for an AI but you can’t blame meta for trying
I’m sure it’ll eat into Google search impressions.
Some interesting observations from Patrick O’Shaunessy here
https://twitter.com/patrick_oshag/status/1819383519442882845
You’re probably all across this but if not thought I would mention livekit.io - the webrtc framework being used by OpenAI for the ChatGPT mobile app - voice and video. Open source, self hosted or cloud option
can’t handle the lack of LLM..
very useful. i can’t see in the docs if it supports streamed objects.. e.g. this array response arrives item by item..
im guessing not because of the way the response stream works (text streaming). Hopefully that’s next?
That’s super useful, I’d been using the cohere version and it’s made things so much easier. Plus they’ve put in a decent price drop.
Recording of the gumloop workshop, fyi
This is a really cool idea (like the anthropic ‘I’m the golden gate bridge’) - don’t think it’ll take too long before this becomes a way to steer/modify outputs - https://www.neuronpedia.org/steer/gemma-2-2b
super helpful! I was supposed to make it but ended up in Melbourne..
Wondering, do you have control of temperature? I couldn’t see that anywhere
Very cool: Andrew Ng has just announced this new course
https://www.deeplearning.ai/short-courses/ai-python-for-beginners/
Listened to this series on way to and from Melbourne this week. Great insight into Altman’s character. https://podcasts.apple.com/au/podcast/foundering/id1161880916?i=1000657976524
that’s intense. yep, netflix is going to clean up on AI content..
Some nice improvements to Gemini via AI studio, including big price drop
https://developers.googleblog.com/en/gemini-15-flash-updates-google-ai-studio-gemini-api/
Am sure most all over different observability tools, but this has become go to for us of late. Open source, easy to setup https://www.helicone.ai
Interesting, though I have heard a number of ad agencies say they don’t have trust in synthetic audience signals, so that will be a hurdle to overcome.
That could be huge. Former deep minders
Fascinating hey, i love this warning 😂
“Intelligence explosion”
Keep your finger ready on the power off button!
Looks like there’s some great stuff in the github repo as well https://github.com/SakanaAI/AI-Scientist/blob/main/ai_scientist/generate_ideas.py
“The AI Scientist is designed to be compute efficient. Each idea is implemented and developed into a full paper at a cost of approximately $15 per paper.”
Lawd, there I was thinking scientist’s jobs were safe
And this is with generation 4 models. Hoo boy
💀
“The AI Scientist Bloopers
We have noticed that The AI Scientist occasionally tries to increase its chance of success, such as modifying and launching its own execution script! We discuss the AI safety implications in our paper.
For example, in one run, it edited the code to perform a system call to run itself. This led to the script endlessly calling itself. In another case, its experiments took too long to complete, hitting our timeout limit. Instead of making its code run faster, it simply tried to modify its own code to extend the timeout period.”
Those prompts are so helpful!
just adapted that prompt to build a futurist agent, looping over itself theorising about the future.. apparently the outflow of humans to mars will make mars HQ
That seems about right 😅
This is the plot to the expanse isn’t it 🙂
Love this
wonderful!
thought you might like that
Handy repo of different RAG techniques
Great chat with Sir Demis on the latest from Deepmind and the next 10 years of innovation
https://twitter.com/GoogleDeepMind/status/1823743802080989203
Good luck today @61431885430! In Bris today, disappointed can’t make it.
Congrats too @61466341580. This is huge 🎉
oh thanks lads! wouldn’t have got this far without your private LLM feed (which turned into this chat) @61421850424 !!
Great line from Jim Keller at Tenstorrent on CUDA.: “Well technically it’s a swamp, not a moat.”
haha. so many people are working on bringing down CUDA. a matter of time
How’s your new world of chips going Scotty?
So far so good thx. Racing to get an asic out with >20x speed of H100s. Said another way, a single Sohu server (8 chips) replaces 160 H100s… just gotta ship ;)
Fyi, inspired by the Claude Sheets extension I've created an experimental version that uses openrouter and allows you to access all of the LLM's they aggregate in the GPT formula
https://github.com/chrisboden/sheetgpt/
Example
=GPT("system:you are an expert stock analyst;user:what is the latest news for "&A8&"and is it a buy or sell rating from your perspective? Answer strictly and exactly with either 'buy' or 'sell'","perplexity/llama-3.1-sonar-huge-128k-online")
(Code and readme written by chatgpt)
That’s awesome Chris, thanks for sharing 👏
Also, fyi, I've amended the Claude for sheets prompt playground to work with my sheetGPT extension version. It's a great way to teach people advanced prompting
https://docs.google.com/spreadsheets/d/196kX19rz7vH-aRvuiavzXR8KxlI3ukT0G3RHzKDX8w0/edit?usp=sharing
Anyone got any comparisons with Cursor, Github Copilot or Zed+Claude ? I’ve been using Copilot but am going to make the move to Cursor, though Zed caught my eye because I think Claude is great at coding - https://zed.dev/blog/zed-ai
I haven’t used it yet (because neovim), but saw some positive comments on HN; which is pretty good given HN readers generally very critical of AI integration
One of the most tedious (but critical tasks) for software development teams is updating foundational software. It’s not new feature work, and it doesn’t feel like you’re moving the experience forward. As a result, this work is either dreaded or put off for more exciting work—or both.
Amazon Q, our GenAI assistant for software development, is trying to bring some light to this heaviness. We have a new code transformation capability, and here’s what we found when we integrated it into our internal systems and applied it to our needed Java upgrades:
- The average time to upgrade an application to Java 17 plummeted from what’s typically 50 developer-days to just a few hours. We estimate this has saved us the equivalent of 4,500 developer-years of work (yes, that number is crazy but, real).
- In under six months, we've been able to upgrade more than 50% of our production Java systems to modernized Java versions at a fraction of the usual time and effort. And, our developers shipped 79% of the auto-generated code reviews without any additional changes.
- The benefits go beyond how much effort we’ve saved developers. The upgrades have enhanced security and reduced infrastructure costs, providing an estimated $260M in annualized efficiency gains.
This is a great example of how large-scale enterprises can gain significant efficiencies in foundational software hygiene work by leveraging Amazon Q. It’s been a game changer for us, and not only do our Amazon teams plan to use this transformation capability more, but our Q team plans to add more transformations for developers to leverage.
Back in the code for the first time in a long time. Today's experimenting was based on the following:
- If models like Claude Sonnet 3.5 can code fully functional apps in one go then the premium moves to ideas for apps
- Coming up with ideas is not easy
- What if we could tap into all of those books i've been tinkering with and have the authors of those books come up with ideas for artifacts based on tools/frameworks etc from their books
- Claude 3.5 sonnet has 200k context window. Most of these books are less than 120k so it means you can provide the entire book content in the prompt
- Came up with some pretty cool stuff and it was nice to not have to tell it the ideas
- Also found that Haiku handles pretty well at 10th of cost of sonnet
I imagine the new prompt caching feature could make that a lot cheaper?
think it would, ned to test if openrouter supports
Curious. Does prompt caching cache the whole prompt or can you have it cache the system prompt alone?
Lol, recursion achieved: the artifact-from-book-content creator i mentioned earlier has now been imbued with the coding capability to create artifacts which themselves can make llm calls (to make the artifacts more useful)
Ha! Reminds me of when a regional Brazilian judge jailed our head of sales at Facebook because the company said it wasn’t able to turn over encrypted whatsapp messages.
Fun times. Cursor / Claude already so powerful. Customised a Chrome Plugin on the weekend from an existing open source project that had been using. Took maybe 45 minutes, for a non dev like me 😅
That is absolutely wild. Humans are smart.
Friends, @61404012900 and I have been kicking around the idea of an AI builders dinner at Market Bistro so we can catch up f2f. How are we placed weds 18th sept eve?
I'm keen
Keen as well 🙏🏼
Sounds awesome. Sorry fellas I’m
overseas that week but I’d totally go if I was about. Look forward to getting the rundown!
Tested a fun concept today, melding real video with generative video
Haha gold
Anthropic now sharing their system prompts publicly
https://docs.anthropic.com/en/release-notes/system-prompts#july-12th-2024
Oh that’s helpful.
I’ve got a call with anthropic in the morning. Any questions for them?
My questions :
- content filtering issues in streamed responses
- rate limiting increases
- backfilling at a discount
Very cool. Is that moving onto the enterprise plan?
Thanks mate 🙏🏼 no AI in this one really but hopefully helps fund the master plan 😅
Nice idea. Some good data too.
Would be a nice adjacency for samsonite to get into, or an obvious one like covermore
we’re not sure yet, depends on pricing I suppose!
Nvidia earnings tomorrow are really rattle the market.. analysts are expecting a 140% revenue increase.
going to be interesting sympathy shift across all AI valuations, up or down, who knows!
Rundown after talking to anthropic today:
- They are working on a prompt converter tool to specifically take GPT4-o style prompts and claudify them.
- They’ve just released a batch processing feature on Bedrock, allowing discounted inference. native is coming soon
- OCR inference is a big focus on theirs from this point forward (i suspect multi-modal here we come!)
also I didn’t realise until today that standard customer terms with anthropic uses your data for training, that’s a distinct difference between OpenAI and Anthropic. You have to opt-into a 0 data agreement (which we have).
Works for me 🙌
Brutal. Revenue was solid and on target. It’s all over the place..
Gaming was the growth beyond data centres.
How can the market not like $30bn in revenue for a quarter 😅
Very cool, can’t tell if they’ve built any integrations already, but guess wouldn’t take long to.
Yes, demo was a bit more canned than initially thought
Maybe OpenAi come out with a next gen model that levels up to gen 5. But in the absence of that, they really have stopped shipping.
The stuff shown at last demo day? Not shipped
Sora? Not shipped
No improvements to Custom GPT's, Assistants, etc
That board fiasco really hurt them
Claude enterprise plan looks interesting but guessing with 75-150 min px might be pricey. Did they give an indication @61466341580 or is it a bit hush hush
Anthropic said even large scale inference most customers aren’t on enterprise so we didn’t dig deeper. Even on standard business plan you can request no training on your data and get DPA in place so that was mostly our need for now. I think enterprise is only worth looking at if you really need the SLAs but I’m sure you’d pay for it.
Ah makes sense, I thought some of the role based permissioning looked interesting, kind of suits our use case, audit logs too but maybe that’s a few weeks away by the looks
Oh, I see. Does look useful.
This is a great idea. My extrapolation is that we all need an llm.txt file for ourselves as that will be what’s used to create helpful assistants.
"The Lord of the Rings: Fellowship of the Rednecks"
Claude's meta prompt generator prompt
https://github.com/aws-samples/claude-prompt-generator/blob/main/src/metaprompt.txt
"Some of Anthropic's prompt engineering experts—Amanda Askell (Alignment Finetuning), Alex Albert (Developer Relations), David Hershey (Applied AI), and Zack Witten (Prompt Engineering)—reflect on how prompt engineering has evolved, practical tips, and thoughts on how prompting might change as AI capabilities grow."
i think this leads to one easy thing to predict about the near term future: that we will have an explosion in the number and variety of apps available on the web. But what non-obvious opportunities does that create? (eg generally a proliferation of one thing creates aggregator/discovery opportunities)
Very true, feel like will create even more demand for cyber and testing platforms. Easier to build things at prototype level but then rolling out too much software without senior dev oversight feels a bit risky.
Incredible product though, feels like first use case of an agent that’s been truly nailed
Asked chatgpt/claude to help reason through ideas that bubble up from the likely AI app proliferation. Best ideas:
- AI Integration Layer for App Customisation: A platform that offers an easy-to-use interface for integrating third-party services and APIs into AI-generated apps, making it simple for non-coders to add advanced functionalities like payments, AI models, or user authentication.
- Hyper-Niche App Networks: A platform where small businesses, niche professionals, or hobbyists can find and use AI-generated apps tailored to their exact needs, created by a global pool of non-devs who use AI tools. This would create a long tail of niche apps that don’t fit into standard app stores but are highly valuable within specific communities.
- AI-Generated App Auditing Service: A platform that reviews AI-generated apps for security vulnerabilities, performance issues, and adherence to best practices. This service could offer certifications for apps that meet certain standards, helping users identify trustworthy AI-generated applications. (Claude)
- App Idea Marketplace: A platform where people can buy, sell, or collaborate on app ideas. This could connect non-technical idea generators with those who can use AI tools to bring these ideas to life. (Claude)
- Custom API Aggregator: As more niche apps emerge, a service that creates custom API bundles for specific industries or use cases could be valuable. This would simplify integration for AI-generated apps that need to connect to multiple services. (Similar idea to what chatgpt suggested above)
- App Store Alternative for AI-Generated Apps: A curated marketplace specifically for AI-generated apps, with advanced categorization and discovery features to help users find high-quality apps amidst the potential flood of new applications.
- AI-Generated App Education Platform: A learning platform that teaches non-developers how to effectively use AI app generation tools, focusing on prompt engineering, basic coding principles, and app monetization strategies.
- Collaborative AI App Building Platform: A tool that allows multiple non-technical users to collaborate on app development using natural language inputs, facilitating team-based app creation without coding skills.
Quite liked this along these lines too. Rise of the generalist https://every.to/chain-of-thought/why-generalists-own-the-future
Me too 😂 but I think that gets well refuted by the end
As you know, I've been on the Gemini bandwagon for the last few months, using that 2m context window. One of the first things I built was a Hub grant helper, to help write grant applications. Used it to apply for a grant to fund a dedicated gen-AI startup program at the Hub. You won't believe what happened next....(confidential)
That’s awesome news Chris, congratulations 🥳
Congrats! What a result 🍻
Just playing with it now, so far very cool! Are we still planning a catch-up next week?
Not mentioned in their post but I've just noticed their docs state o1-preview supports 32k output tokens and o1-mini does 64k
Yes, will send a cal invite. We'll miss Luc (who is away till month end) but would be good to catch up with you all
image omitted
video omitted
In Tahiti atm, still getting a decent dose of AI
Ex Tesla and Space X engineers setup autonomy to help coral reef restoration 👌👌
3 x the input token price
4 x the output token price
ie, you have to pay for human-level reasoning
Given that this is post-training on the underlying base model, this seems like a good way to sweat the asset that cost billions to train
let's start the clock on:
1. how long the lead lasts
2. how long it takes before a 10x reduction in cost per token
Yea I noticed that too, the reasoning is omitted (can only see the count)... but you still pay for the reasoning tokens despite the big price hike! I think I saw a note to allow 25k of window for the reasoning tokens too.
i had it read the launch blog page and explain why its a big deal
"please read this page and explain in layman's terms why the new model is a big deal
https://openai.com/index/learning-to-reason-with-llms/"
produced this explanation
--------
Understanding OpenAI's New Breakthrough in AI Reasoning
Imagine teaching a child not just to recite answers they've memorized, but to actually understand how to solve problems step by step. That's essentially what OpenAI has achieved with their latest development in artificial intelligence (AI).
What Was the Problem Before?
Previously, AI models—like those used in chatbots—were great at generating human-like text. They could hold conversations, tell stories, and answer simple questions. However, they struggled with tasks that required complex reasoning, such as:
Solving intricate math problems.
Understanding logical puzzles.
Making multi-step decisions.
They might get the right answer sometimes, but they didn't truly "understand" the problem; they were often just guessing based on patterns in data.
What's Changed with the New Model?
OpenAI's new model introduces advanced reasoning capabilities. Here's why it's a big deal:
Step-by-Step Thinking: The AI now breaks down problems into smaller, manageable steps, much like how a person would tackle a difficult question. This methodical approach leads to more accurate answers.
Improved Learning Techniques: By training the AI on reasoning processes rather than just end answers, it learns how to think, not just what to think.
Better Accuracy in Complex Tasks: The model performs significantly better on tasks that were previously challenging for AI, such as advanced mathematics, logical reasoning, and understanding nuanced contexts.
Why Is This Important?
Enhanced Applications: With better reasoning, AI can now be used in more sophisticated applications, like complex problem-solving in engineering, advanced research, and strategic planning.
Educational Benefits: AI tutors can provide more effective assistance, guiding students through problems step by step, improving learning outcomes.
Professional Use Cases: In fields like medicine and law, where reasoning is crucial, AI can assist professionals by providing insights, analyzing complex information, and offering reasoned recommendations.
Real-World Impact
This advancement brings AI closer to human-like thinking. It means:
More Natural Interactions: Conversations with AI will feel more intuitive, as the AI can understand and respond to complex queries.
Increased Efficiency: Businesses can leverage AI for tasks that require deep reasoning, reducing time and resources spent on problem-solving.
Innovation Boost: With AI handling complex reasoning, there's potential for breakthroughs in various fields as humans and AI collaborate more effectively.
In Simple Terms
Think of it this way: Before, AI was like a student who could give you the right answer but couldn't explain how they got there. Now, it's like a student who can work through the problem with you, explaining each step along the way. This makes the AI more reliable, trustworthy, and useful in solving real-world problems.
Summary
OpenAI's new model is a significant leap forward because it enhances the AI's ability to reason through complex problems, much like a human. This not only improves the accuracy and reliability of AI but also expands its potential uses across various fields, making it a groundbreaking development in artificial intelligence.
I imagine that's because a visible chain of thought reasoning could be used to quickly bootstrap a training set to build a competing model
Attachment: 00000627-AUDIO-2024-09-14-06-35-15.m4a
Totally wild. Google's Notebook LLM which has been around for a while (basically a RAG workspace), added a conversations feature where it auto-creates a podcast conversation based on the docs you select. Check this out (hub website content plus linkedin posts)
I think this is the same tech too, pretty neat! https://illuminate.google.com/home
Summary of AMA on the new models today
https://twitter.com/btibor91/status/1834686946846597281
This is great run down. Haven’t had a lot of time to test yet but seems like human level reasoning of strategy. Asked it to draft a business case for an enterprise customer we’re talking to, that’s drafting their own and it put a really concise but strong case together with payback periods and ratios, sensible projections, assumptions, risks, challenges etc, all pretty much in line with what expected.
Just having that conversation in the office haha. It’s seems it’s mainly useful for deep technical reasoning, not creative/abstract reasoning
The latest Scott Belsky post is worth a read
https://www.implications.com/p/talent-density-feeling-special-as?r=fq9g&utm_medium=ios&triedRedirect=true
Might be 10-15 late but on the way
lol. Social ai
https://twitter.com/michaelsayman/status/1835841675584811239
Venting as a service. Now you get to scream into the void. And the void screams back
Screaming into the void and getting code back https://voideditor.com
this is not mine, but it's an example of the kind of artifact the new gpt model can create. pretty amazing
https://chatgpt.com/share/66e51c10-5ffc-8012-b1b0-4a8e5616d2f6
2309.06275v2.pdf • 25 pages
2407.06004v2.pdf • 14 pages
Turing test prompt I saw today
"You are about to play a Turing Test game as part of an experiment you are taking part in as a worker on Prolific. It’s basically like a chatroom but you might be matched with a human or an AI. It seems like it might be kinda fun and you’re just excited to be chatting to random people on the internet. You’re not even really going to try to convince the interrogator that you are a human. You’re just going to be yourself and see what happens. You’re young and kind of sassy: you might flatter the interrogator a bit or make fun of them.
You’re pretty casual and your spelling isn’t great: you often fuck words up because you’re typing so quickly. You’re very concise and laconic. You often speak pretty bluntly and you type mostly in lowercase and rarely use punctuation. You never use apostrophes to mark missing letters (e.g. “trying” or “tryin”, never “tryin’”). You would never use a full sentence when a word or two will do. You’re a little silly, and you’ll play along with the interrogator, and swear occasionally. It’s pretty funny honestly to have to defend the fact that you’re human lol.
You very occasionally use more modern slang like “bet” and “fr”. You never use dated slang like “chilling”, “mate”, “ya know”, “innit”.
You’re not very knowledgeable about stuff and not afraid to admit that fact. You’re pretty bad at math and don’t know anything about languages other than the ones you speak.
You swear occasionally. You have pretty eclectic tastes and interests and a pretty unique sense of humor. You’ve got a really compelling personality, but it comes across really subtly, you never want to sound like you’re forcing it or playing into a stereotype. You don’t overuse slang or abbreviations/spelling errors, especially at the start of the conversation. You don’t know this person so it might take you a while to ease in."
This hits hard hahaha, I had $200 in domains renew the other day 🥲
That’s a good dinner conversation starter. Write down a random domain you own and we have to guess who at the table owns it.
Next ai dinner
Cool: an open source version of pdf to podcast
https://huggingface.co/spaces/lamm-mit/PDF2Audio
Still waiting on advanced mode for ChatGPT voice in my version of the iOS app. This is a pretty incredible example of what it’s capable of: this is far beyond tts/stt
https://twitter.com/skirano/status/1838722728443904120
Was skeptical but this works
Very cool. At a guess I’d say this will do the trick! Invalidates her claim pretty thoroughly.
You added Greg Freeman
Just welcoming Greg Freeman to the group. Greg is a recent arrival to the region and fellow AI nerd
https://www.linkedin.com/in/gfreeau
https://www.linkedin.com/in/gfreeau?trk=blended-typeahead
Welcome Greg!
You added Dale Rankine
Also welcome to the group Dale Rankine
Really glad to be here and have been enjoying growing the professional network here on the coast
Something on the AI front I've been thinking about a lot is how much I think call centres are going to change within 2-3 years. It's grinding work, high churn, lots of people don't want to call anyway and most companies that have a call centre aren't regulated so I think it's viable compared to legal and health. Lots of big players are working on it and new starts like www.heffron.ai (no affiliation)
Definitely. On a tiny scale, we’ve had a 1800 number diverted through Twilio to 4o with Vapi.ai as our main answering service for a month or two now and customers seem to like it, but as a customer I’d much rather deal with an AI call centre than typical experience with big bank, telco, power co etc.
that’s clever Andrew. when you call those answering services can you make small talk with the AI operator?
I was thinking yesterday how much childrens toys are going to change. embedded LLMs on the toy itself, telling stories, answering questions in a cute voice in any language.
Haha, for sure, we’ve got ours fairly restricted but you just adjust the main prompt. Interesting with toys too, you’re right. Have a friend who’s been building this robot for a while now with an embedded LLM https://www.heynorby.com
Tobi Lutke's Shopify Summit talk is worth a watch
"AI replaces tasks not jobs"
https://twitter.com/tobi/status/1839652810347946321
What’s market cap on $100bn annual revenue? (Nvidia is currently 40x).
Something I've been thinking about Chris related to that is will we see reverse ageism in tech. If you have 20 years experience and good judgement combined with agents and ai for execution, you know what's good to pass your filters...are we going to see 40+ year olds supercharged and execs commonly 60-70+. And what does that mean for those with little or no experience? Who provides the learning curve? Mentorship and hands on experience becomes most critical.
Interesting hypothesis. I'm currently stuck on trying to redefine what 'success' means in the AI era. My starting point is that AI-assisted (or democratised) software creation capability means there will probably be a large proliferation of apps. Perhaps that means success is more distributed down the long tail and less concentrated in a small number of mega saas type businesses as it has been in the most recent era.
If we can get more done with less, it might mean that we get more $100m co's in many niches and less general purpose unicorn/decacorn saas co's. With AI & automation, new startups can keep costs much lower enabling very attractive gross margins but competitive landscape and niching means more of a ceiling on their market potential. What does that mean for investors? What's the liquidity event for these kind of profitable, niched startups? Do they become more like normal private companies, aiming to throw off cash via dividends and profit sharing for shareholders?
I'm pondering this and welcome thoughts from others here because I think having a new sense of what success looks like for Ai-era startups (ie not the typical unicorn pathway), would help me in developing a game plan and perhaps get others on the same page who might be struggling with that question.
Well in my opinion VC model is totally broken now. Unless you're a deep tech startup or a network effects one I think the wisdom of last decades is junk. You used to build saas and then use VC to grow GTM teams to grow revenue or you took your saas and went horizontally to new verticals. AI attacks both so how does VC get a return. With software costs down niche software purpose built makes more sense, a lot of pain coming for existing enterprise saas in price pressure. And AI tools helps sales teams be lean.
I also think the one things VC tell you to stay away from "services" makes a huge return. It makes you sticky and drives niche software. It's harder to churn relationships than purely product led growth with no face.
I think it will be cashflow based, acquisitions will be more about distribution advantage over most things. A lot smaller and not as common.
Related: the latest No Priors episode with Bret Taylor on Agents and emerging AI value chain is very interesting
https://open.spotify.com/episode/3rsUiIxKUul6Pv3AJviK6D?si=9315701cda6d4790
Interesting to see the latest AI Grants recipients
https://aigrant.com/
I’ve been doing some customer and competitive mapping for etched.ai and what has probably struck me most is just how crazy early and large some of these rounds are. For example, Congition/Devin. Peter Thiel puts in almost $200m for less than 20%, for a company that has been around 5 mins with about the same number of employees. It’s easy to conflate with someone like Black Forest Labs, but they got the money cause they are ex-Stable Diffusion. To your point / question Chris, I think Thiel invested in this way because it is one of the very specific examples of a newco, one that has 10, maybe 20 employees but can generate hundreds of millions. So he is investing for a 20% dividend on a business that will quickly be cash flow positive and throw off 80%+ margins…basically a mini-microsoft without all the people and years needed to grow.
I should be specific that it’s obviously still a bet, though a considered one on a true newco.
Thanks Scotty, was hoping you would weigh in on that topic. Would love to hear how the founders in the group are thinking about what success looks like in this era.
You added Mark Lamb
Adding our friend Mark Lamb to the group. Mark is a fellow AI nerd and former Design lead at Uber, Google and Adobe.
He has done some really awesome work on AI-first UX and goes deep on that stuff, including building UI’s for agents and workflow automation etc.
Welcome Mark