Gemini 1.5

Barely a week after announcing Gemini 1.0 Ultra, their most powerful AI chatbot to date, Google and Deepmind announced Gemini 1.5, an apparently massive upgrade.

Context Length

Unless you are an AI professional or fanatical hobbyist it is likely that you will not be entirely clear about what context length is. The simple way to think of it is in terms of the total amount of words in the interaction between a user and the AI. It isn't really measured in 'words' but in 'tokens', the numbers that encode the words, but for everyday purposes 'words' is good enough. Every token represents about 5 characters - letters, spaces, punctuation, etc. - so there are generally fewer words than tokens because it takes more tokens to encode longer words, but you will get my drift. Basically I think you can work on the basis that 1000 tokens is about 800 words 'give or take'.

So suppose you want to interact with an AI and do three things:

upload a document
ask the AI to read and summarise and comment on the document
ask questions about the document

If the document is 1500 words and your 'context length' is 4000, then after it's read the document and you've explained what you want it to do, the chances are that there will only be around 2000 words/tokens left. Your conversation can then only be 2000 words long, and that means counting what you say and ask as well as what the AI says.

If your document is much longer, say 3500 words, then after the AI has read it there are only 500 words/tokens left in the context length; barely enough for a decent conversation.

If your document is 6,000 words, the AI won't be able to read it at all if its context limit is 4000 words. You are dead!

That is why longer context lengths matter. Once we only had about 2048 (2K); then 4096 (4K); then 32768 (32K); then 128K; then Claude 2.1 from Anthropic had a whopping 210K. Claude could read a book of 180,000 words and still have space for a conversation. But that's only one fairly long book. What if you wanted to upload the complete works of some author, or your own life-work? Probably it will be too long.

Gemini 1.5 promises to change all that. With a standard 128K context-length it is already 'a beast' comparable to GPT-4.5-turbo, but it promises much more: at first 1 million tokens of context length; eventually 10 million. That is a game-changer: you could upload everything most authors have written all at once and have a long detailed conversation about it afterwards.

But that is not all. Not by a long chalk!

Multimodal Operation

So far I have only mentioned text but Gemini 1.5 does far more than text. It can analyse images, video and audio files, extracting information from them that satisfies the 'needle-in-a-haystack' criterion (below). It can translate, learn new languages, interpret rough hand-drawn sketches only vaguely recognisable for what they are. If you were writing a PhD on Kant, you could upload the entire Kant corpus and then some, ask it innumerable questions and get authoritative, cross-referenced answers.

Education

Now we can start to think seriously about a pan-curricular personal tutor because the entire syllabus content of secondary education can be uploaded and everything a student does cross-referenced against it to create a comprehensive personal profile. The RIDE-AI server will store that profile, update it, write reports based on it, and generally monitor the educational journey of every student in a way that no teacher has ever been able to do.

Cost

Apart from the cost of the hardware needed to run such systems, there is a much more serious question about access costs. Users are charged on the basis of the total token consumption of their interaction, and that is where something new needs to happen before using 10 million tokens becomes affordable.

OpenAI charge (20240221 prices) \$0.01 per 1000 tokens input and \$0.03 per 1000 tokens output. So if we upload 1,000,000 tokens at that price it costs 1000x0.01 = \$10, or, for 10 million tokens, \$100. Responses from the AI in those quantities will cost 3 times as much. That's a lot of money if we are doing a lot of it! And of course there are questions about reloading data and how much that costs that I won't go into here.

Accelerated Change

By the time you read this it will be out of date. A week is a long time in AI. OpenAI have just announced Sora, their text-to-video engine, and it seems likely to overwhelm traditional film-making. The only question is whose imagination is big enough to capitalise on these extraordinary developments from an educational perspective. Or any other.