#25: Force vs Flow (Multi-Modal Thinking)

Authentic Value Generation With Multi-Modal Systems

Breaking News This Week: Gemini (Natively Multi-Modal AI)

Google’s long-awaited ChatGPT competitor was released earlier this week (long-awaited in AI terms, it’s only been a few months).

Google Gemini has entered the AI race, multi-modally.

Google is an “AI-first” company that has been building towards this moment for the over 10 years. They may have launched late when it comes to the popular success of ChatGPT, but I have no doubt they were biding their time to spring into the game with proper preparation.

Notably, Gemini has already been shown to outperform GPT4 across almost all categories. Here are some of the benchmarks (note that some benchmarks are different):

Gemini vs GPT4 - Google Source.png

Perhaps even more notably, for the first time that I am aware of, Gemini benchmarked at a higher score than human experts in the MMLU (massive multitask language understanding).

This MMLU uses a combination of 57 subjects including math, physics, history, law, medicine, and ethics for testing both world knowledge and problem-solving abilities.

Expanding on what was discussed last week regarding the hype around OpenAI developing Q*, a model that has the potential to reason, Gemini has reasoning built in.

In other words, Gemini is able to use reasoning to “think” before answering your questions so that it does not just spit out the first thing that comes “to mind”, but instead takes time to process what it is about to tell you so that it can craft an even better answer than its first impression.

Even more exciting (to me at least), Gemini is the first AI model that has been natively trained multi-modally.

Multi-modality: code, image, video, and audio. Four simple words, but literally new worlds of potential to be woven between them.

Are you ready?

Recalibrating Recap

Welcome to Recalibrating! My name is Callum (@_wanderloots)

Join me each week as I learn to better life in every way possible, reflecting and recalibrating along the way to keep from getting too lost.

Thanks for sharing the journey with me ✨

Last week, we touched on what it means to be original in the Age of AI and how writing can be used to identify our self amongst the noise of others.

This week, we are going to continue by discussing the importance of viewing life multi-modally and how AI can be used to augment our originality.

At the end, I provide a series of steps on how you can begin building your data systems to prepare for the upcoming AI innovations.

The Bigger Picture (Why you should care)

Source: Google – Figure 2 | Gemini supports interleaved sequences of text, image, audio, and video as inputs (illustrated by tokens of different colors in the input sequence)

Gemini is the first consumer model I am aware of that has been natively built to be multi-modal.

Natively multi-modal means that it is not just text (or code), not just vision (image and video), and not just audio, but a blend between the three.

While other models are capable of multi-modality (e.g., GPT4Vision), these models were trained separately on different modes (image, audio, text) and then blended together afterwards.

In contrast, Gemini was multi-modally pre-trained from the beginning and multi-modally fine-tuned after that.

In other words, Gemini was built for multiple sensory input from the beginning, allowing for a more holistic approach at taking input data and crafting a more well-reasoned response output.

But why does that matter? Who cares if an AI model is multi-modal?

To answer that, I want you to start thinking bigger about where the world of AI could be headed.

Even more importantly, I want you to imagine your own AI world and what that looks like. Does it involve only text, or is it a multi-sensory world?

My guess is the latter.

Multi-Modal House Building Analogy

To explain the value of a multi-modal system, I thought it would be helpful to use a housing analogy to compare building an AI model to building houses.

Ironically, I used ChatGPT to provide this analogy (with my own edits):

Developing a multi-modal AI system is like constructing a house. When built from scratch, integrating electricity, plumbing, and design simultaneously creates a seamless, efficient structure. Similarly, training an AI model with video, audio, and image data from the outset ensures a cohesive understanding of their relationships, like the integrated components of a house.

Conversely, training separate models for each modality is like assembling pre-made house sections before combining them. This approach may lead to mismatches and inefficiencies, similar to combining separately trained AI models, which could struggle with the intricate relationships between different data types.

Training an AI system multi-modally from the start fosters a deeper comprehension of data interplay, resulting in a more robust and efficient AI, much like the harmonious design of an intricately built house.

In other words, building the model from the ground up with multi-modality creates a stronger and more versatile model that can receive multi-modal inputs with ease.

And this is only Gemini version 1.0. Imagine where this technology is in 1 year, 5 years, 10 years 🤯.

That said, right now, I am unable to test the capabilities of Gemini (through Google Bard) since Google and Canada are in a fight over the value of digital information presented as “news”).

Thankfully, this fight seems to be coming to an end.

Regardless of whether Gemini performs better than whatever the next hyped AI model is, this shift towards multi-modality provides an arrow pointing to the future of AI systems and where they are headed.

A calibration of the compass we have been discussing over the last 24 weeks.

But.. and this is a big but, why should we care about multi-modality?

Perhaps an even better question, how can we prepare ourselves for a world that will continue to be disrupted by AI innovations?

Force vs Flow (Single Modal vs Multi-Modal)

Now that you have a better understanding of this week’s changes to the AI world, let’s take a step back to review our own humanity and how we can humanly leverage these new tools to increase value and better our own experience.

Over the last few weeks, I have been explaining the concepts of self-actualization as it relates to cognitive and aesthetic needs of Maslow’s Hierarchy. As part of this discussion, I have noted the failures of our logic-based society to value creativity as it should.

I used the comparison of logic and creativity to illustrate that our society tends to favour structure over flexibility, conformity over creativity. Of thinking inside the box, the way things have “always” been done.

Unfortunately, forcing creative energy into a box is not a natural fit. If an object does not align properly with container, forcing it to fit will likely damage both the object and the container.

The point of creativity is that it is flexible and can go its own way. Creativity needs to flow.

In a way, creativity is like a non-newtonian fluid (remember that weird fluid that becomes a solid when put on a speaker?)

In physics, a non-newtonian fluid is a substance that does not obey Newton’s laws related to viscosity, causing changes in behaviour based on the amount of force applied to the substance.

In other words, if a large force is applied quickly (high impact), the substance behaves like a solid. Once the force is removed, the substance returns to behaving like a liquid.

Trying to force creativity into conformity is what causes people to break. Force creates a solid that is unyielding, unable to flow in its natural state. It is only once the conformist force is removed that we can relax into our creativity and begin to flow.

Unfortunately, as long as that force is applied, it becomes difficult, if not impossible to relax into our natural form.

Self vs Other

We have a human desire to fit in, to be accepted. Rather than fight to hold onto our own individuality, we often allow ourselves to be put into boxes that others build for us, even if we don’t fit properly.

The thought of standing out introduces anxiety, an innate fear that we will be alone or left behind. That we will not be valued by others.

This feeling is natural, a part of our evolutionary psychology. I talk more about this when discussing Story and Community in Level 3 of Maslow’s Hierarchy.

Unfortunately, operating in a state of anxiety can lead to burn out, but so can operating in a manner that goes against our own values (burnout or bore out). Either we allow ourselves to be forced into the box and operate out of tune with our own signal, or we fight against the forcing and break along the way.

Lose-lose.

Over time, whether through force or acquiescence, people begin conforming to the expectations of others so that they can fit in their box rather than be ejected from the system entirely. They become single modal.

In a way, much of our society operates as a world trying to use a single mode in an AI system. It might be good in specific use cases, but it’s not very flexible when it comes to adapting to change and modernization (an increasingly important skill).

Humans are more than one mode. We have 5 main senses and many other subtle senses we do not think about. Humans are multi-modal.

Trying to force our unique value and perspective as individuals into a single mode box isn’t good for anyone in the long term, even if it brings economic value to corporations in the short term. Inauthenticity is good for no one.

There has to be a better way.

Shifting Societal Systems

Thankfully, AI systems are becoming more than one mode. They consider a variety of communication methods so that they can gather the bigger picture of an individual and society. They can help us tell our stories in a more natural way. Natural self-expression.

Sure, we can tell our stories through text (I love reading), but there is something special about hearing the story told through the voice of the storyteller. Even better, when that voice is accompanied by visuals, we can send our imagination even deeper.

Rather than being forced into the same box as everyone else, we can build AI systems that allow for personalized translation between others. My box doesn’t have to be the same shape as your box and AI can still help map between the two. I can speak my language and you can speak yours, and if there is misunderstanding, we can use AI to help translate.

I foresee AI systems operating as middleware, an intermediary between individuals, corporations, governments, and society at large. Translators.

However, to leverage these tools effectively, we need to forget much of what we thought we knew about the operation of our knowledge worker society.

Our minds are much more complex than operating through text alone, which is why we use so many forms of non-verbal communication (gestures, art, music, movement, etc.)

Each of us has a different balance of modality, a different form of communicating across various situations.

We need to develop methods of generating and communicating value that take into account the unique form of each person.

While this may have been difficult to do at scale in the past, AI is about to change everything that we thought we know about value.

Authenticity

As I mentioned last week, it can be difficult to identify our original self. Our own signal of self can be lost amidst the noise of others.

I believe part of this difficulty comes from operating in an inauthentic way (for much of our lives). When someone does something that bothers us, we keep quiet. If someone asks us to do something we don’t want to do, we say yes out of a fear of letting them down.

People-pleasing, imposter syndrome, self-suppression, anxiety, depression, burnout, are just a few mental health issues that can cause this inauthentic mode of operating.

In other words, our daily lives put us out of practice of being ourselves. There are so many forms of anti-flow that we are no longer able to find flow for ourselves.

Part of this reason is that many people only find flow through their work (if they are lucky). But finding flow for others, what they are supposed to do, drains psychic energy (mental capacity) to the point that finding flow on their own personal time becomes a burden. When we constantly prioritize others over ourselves, we lose our signal. Over time, it fades away to almost nothing.

But, it’s not too late (it’s never too late) to change things. There are steps you can take to resuscitate the signal of the self to find it amongst the noise.

(1) Find Flow

Step 1 is both easy and extremely difficult: get started and find flow.

Flow will present itself to you when you look for it. It’s that state where you are so fully engrossed in the task at hand that time seems to have no meaning. Autotelic experiences are one of the prime ways to find flow.

Some examples of how I find flow (in no particular order):

  • photography
  • editing
  • reading
  • cooking
  • hiking
  • travelling
  • cleaning
  • writing
  • playing an instrument
  • conversation with others about topics that interest me

I will do an entire entry on flow at some point, as it is one of my favourite things to talk about. For now, I highly recommend reading Finding Flow: The Psychology of Engagement with Everyday Life by Mihaly Csikszentmihalyi (the person who coined the term flow).

Follow your intuition. Find your flow.

(2) Self-Reflect

Step 2 is at the same time easier and more difficult than the first step: reflect.

Once you have found flow, reflect on what allowed you to find it. Put these reflections into writing, film yourself speaking, record a voice memo on your phone.

The medium by which you reflect does not matter – what matters is taking the time to reflect and expressing that reflection through your own voice.

I recommend writing or speaking in full sentences, not bullet points, though bullet points are fine if you don’t have the time (though I suggest you make time for it).

As I noted last week, writing forms a mirror to the ego, a reflection of the self.

If you are struggling to find the signal of yourself amidst the noise of others, reflective writing is, in my opinion, the best place to start.

If you don’t know how to begin, remember that this is for you, not for anyone else. Describe how you felt when you found flow, what helped you find it, how you felt afterwards. Capture the emotion of the experience and your thoughts surrounding it.

Take a deep breath (or 10) and allow your creativity to flow into the words you use to describe your own experience.

Words are far more powerful than you realize.

(3) Self-Analyze

Step 3 is the by far the hardest step: analysis.

After finding flow a few times, in a few different ways, it is time to take a step back.

Reflecting in the moment is helpful to capture the essence of our experience, but the true power comes from analysis.

Taking a look across a longer period of time (e.g., a week or month of writing) can be extremely valuable for identifying patterns in your thinking. Current self looking at your past self to help your future self.

This form of self-cognitive analysis is called metacognition, a topic I discuss more in-depth in other entries.

Metacognition will allow you to begin seeing the patterns of your own behaviour, enabling a mindful assessment of what worked, what didn’t, and how you can increase what worked and decrease what didn’t.

Over time, you will notice yourself finding flow frequently and more effortlessly. Practice improves likelihood of flow, which teaches you to find it more regularly. It’s a synergistic cycle.

That said, self-analysis can be exhausting, especially when reviewing what didn’t work. We begin to notice problems with ourselves that may have been hidden before, making it difficult to face that self.

Thankfully, this self-analysis is not something that will need to be done alone forever.

We are going to build our own AI system to copilot our reflections and analysis.

Multi-modally.

Generate Multi-Modal Data For Our CoPilots

The reason I am so excited about multi-modal AI systems such as Gemini is that we are finally going to be able to have true multi-modal inputs.

Imagine, for a moment, the future.

✨ ✨ ✨

It’s 5 years from now, and thanks to excellent advice you got in a newsletter you happened to come across, you now have a data hoard of self-reflective and analytical journals. These journals are in various modes: hand-written, drawings, mind maps, photos, videos, and voice notes.

Your personalized AI copilot has access to this data – no one else does. This copilot, your personal assistant, is able to use this massive log of personal data to craft unique middleware for you to interact with the internet and your employer.

As the saying goes, you know that garbage in = garbage out. Accordingly, you have been carefully crafting what this AI system has access to so it can understand your wants and desires, augmenting your value.

You crafted this data hoard by taking the time to identify what you truly value in life, who you truly are. Your True Self.

Given that you based your data on your authentic self, your AI copilot understands how best to help you out throughout your life, in a way that is natural to you. Therapist, business advisor, creative brainstormer, critic, marketer, lawyer, accountant, and more, this AI partner is able to advise you on everything you need to build the best personal brand possible, maximizing the effectiveness of your creations.

✨ ✨ ✨

That’s enough imagining for now.

Take some time to think about what this AI system would look like for you and how you can begin calibrating it now.

Recalibrating is not just the name of this newsletter. It is a philosophy for the future of our augmented reality. A world built with ever-changing data systems that help us augment who we are as humans, leveraging AI to help us each build the world of our dreams.

What world would you build?

Note: this step-by-step process is a sneak peak at what I will be discussing with my paid subscribers, teaching them to build a digital mind to help augment their self-awareness and creativity so they are ready for an AI-assisted world.

Next week

Next week, we will continue exploring authenticity and intuition, bringing in some of the latest neuroscientific studies on how we can use this exploration to permanently reduce our stress and improve our engagement with life.

Now that we have begun discussing the building of our personal AI system, some of my content will be paywalled for paid subscribers only. This paywalled content will include specific protocols and tutorials on how to begin preparing for AI personal assistants, Q&A with me, and voice commentary providing additional context to what I am writing.

If you are confused or frustrated by this paywall, please see my entries on building a sustainable creation system.

I would love any feedback on how you would find this content to be more helpful 🫡

Stay tuned ✨

P.S. If you are interested in learning how I build my digital mind (second brain) to help me process information and identify patterns to solve my problems, please consider upgrading your subscription to paid. Your support means more than you know 😌 ✨

If you are not interested in a paid subscription but would like to show your support, please consider buying me a coffee to help keep my energy levels up as I write more ☕️ 📝

Related Posts