TRANSCRIPT
00:09.920 --> 00:14.056
Thank you for coming today. Appreciate it. Hope this is going to be entertaining.
00:14.168 --> 00:17.640
This presentation is going to be pretty high level, right,
00:17.680 --> 00:21.336
about strategy, right? So as we just heard, things are early.
00:21.408 --> 00:24.856
It's early for compliance, it's early for regulation and
00:24.928 --> 00:28.540
it's still very early for, you know, what the right approach is
00:28.580 --> 00:31.244
going to be for managing, you know,
00:31.332 --> 00:36.524
any, anything related to data compliance when you create LLMs
00:36.572 --> 00:40.252
or anything adjacent to it. So. Right. I think that it
00:40.276 --> 00:43.228
would, I think it's the right time to step back and sort of think,
00:43.284 --> 00:46.588
well, like if we're going to develop LLMs at scale,
00:46.684 --> 00:49.900
right, if we're really going to start developing, you know,
00:49.940 --> 00:53.644
not just one or two, but have teams, 50 teams, 100 teams,
00:53.692 --> 00:56.652
each of them building on top of gen AI models,
00:56.796 --> 01:00.664
right. What do we think is really going to with the data management,
01:00.752 --> 01:04.360
with the compliance types of stuff and what can we do now to
01:04.400 --> 01:08.232
prepare us for this inevitable headache? Right, so let's be clear, it's going to
01:08.256 --> 01:11.960
be a headache, right? Anytime a new computing platform comes
01:12.000 --> 01:15.720
along, it's been a headache, right? So we can go through
01:15.760 --> 01:19.540
examples here and we're already seeing some headaches here
01:20.080 --> 01:23.624
in the headlines, right? So specific
01:23.672 --> 01:27.512
things that are unique and emergent with LLMs, right, are just some
01:27.536 --> 01:30.872
things listed here. Very small, undetectable bugs that lead
01:30.896 --> 01:35.124
to a very high blast radius. Legal and compliance
01:35.172 --> 01:38.308
issues, very high amounts of liability,
01:38.484 --> 01:42.404
need clarity on who's actually liable for what when multiple parties
01:42.452 --> 01:45.604
are involved. And just the general how do we develop
01:45.692 --> 01:49.668
quickly without having to have a meeting every
01:49.724 --> 01:53.252
week about are we following the rules, are we doing the right thing? These are
01:53.276 --> 01:56.852
things that are in our way as AI developers and if we
01:56.876 --> 01:59.918
don't sort of just step back and think about, well, what can we do today
01:59.974 --> 02:03.582
to prevent us from getting sidelined later? Right? You think that that's the
02:03.606 --> 02:06.942
right activity and that's where we should
02:06.966 --> 02:10.542
be investing time in today. Right. So I'll make the case that AI development is
02:10.566 --> 02:13.918
very different than traditional development. That's okay.
02:14.014 --> 02:18.490
This has happened several times over the last couple of decades. Web 2.0,
02:19.030 --> 02:22.542
all of this every time a new computing paradigm has come
02:22.566 --> 02:26.610
along, right. We've had to adapt. Right. And my thesis here is
02:26.650 --> 02:29.842
that every time we've needed to adapt, there's been an attempt
02:29.906 --> 02:33.410
to lock things down. Right? We thought that that was the right thing to
02:33.450 --> 02:37.470
- So let's go Back to Web 2.0. Imagine it's 1998.
02:38.330 --> 02:41.730
Imagine, you know, oh my gosh, we're going to release software. How many Times.
02:41.770 --> 02:45.410
We're going to release once a month. That's crazy, right? We can't. How could
02:45.450 --> 02:48.578
we possibly release software once a month? Right? This sounds like,
02:48.634 --> 02:52.184
like nuts. What about the compliance? What about the rules?
02:52.232 --> 02:56.008
What about all of these things that we need to follow? Right. I remember doing
02:56.064 --> 02:59.512
this, I worked for a big box retailer. The solution
02:59.576 --> 03:02.264
was, you know what we need to do? We need to fill out a paper
03:02.312 --> 03:05.208
form and we need to fax it and then we need to get a sign
03:05.264 --> 03:08.340
off from a VP and that's how we're going to do a release.
03:08.640 --> 03:11.992
Yep. And then what happened after that? Right, what happened after that was, well,
03:12.016 --> 03:15.864
actually this is a pain in the neck. We can't release software with these encumbrances.
03:15.992 --> 03:19.080
This is too much. So you know what we need? We need to OCR these
03:19.120 --> 03:22.606
forms. That's what we should be buying. We should totally
03:22.638 --> 03:27.166
digitize these forms and we can just process
03:27.318 --> 03:30.654
signature forms at scale. That sounds like a great idea.
03:30.782 --> 03:34.062
Was it a great idea? No, it was not a great idea. That's not
03:34.086 --> 03:37.518
what happened. That's not the world we live in today where we can do thousands
03:37.534 --> 03:41.582
of releases per week, small ones, albeit with pipelines.
03:41.726 --> 03:45.262
It's just not what happened and it's never been what happened every
03:45.286 --> 03:48.340
time a new paradigm comes along.
03:48.960 --> 03:52.504
So I always think that it's, hey, if that's what happened with web 2.0.
03:52.592 --> 03:55.752
And then we had cloud, right? We had the cloud revolution and we said,
03:55.776 --> 03:59.352
all right, we need to make very small incremental updates and
03:59.456 --> 04:02.920
we need something like what we've landed on is not,
04:02.960 --> 04:05.640
hey, you know what we need to do? We need to do security reviews once
04:05.680 --> 04:09.064
a day, right? That was a knee jerk reaction and said, how are
04:09.072 --> 04:12.248
we going to make so many releases without enough
04:12.304 --> 04:15.512
compliance and reviews in place? And in the beginning it
04:15.536 --> 04:18.296
was, hey, we just need to do more reviews, we need to do more compliance
04:18.328 --> 04:21.828
reviews, we need to embed security professionals within
04:21.884 --> 04:24.340
your team and they need to look at stuff once a day. Right. Again,
04:24.380 --> 04:27.844
is that the world we live in today? No, it's not. Right?
04:27.932 --> 04:31.652
We've automated those things and we've built those into our development pipelines,
04:31.796 --> 04:35.124
which required a fundamentally different approach to releasing software.
04:35.252 --> 04:38.820
In that case, it was pipelines, it was CICD pipelines and
04:38.940 --> 04:42.468
we moved on with ourselves. And now we live in a world where
04:42.604 --> 04:46.002
we can release as developers. We can release. So I'd make the
04:46.026 --> 04:49.666
case that for AI development, we have yet another
04:49.738 --> 04:53.714
problem here, a very simplified version of it, is that you
04:53.722 --> 04:56.834
have more moving pieces than we previously had.
04:57.002 --> 05:00.082
In addition to the actual software, the LLM, we have the
05:00.106 --> 05:03.602
training data, we have possibly RAG data. We might
05:03.626 --> 05:07.474
have other data that we need to power these applications. We have context,
05:07.522 --> 05:11.362
we have the configuration and there's probably a few other sources,
05:11.506 --> 05:15.244
each of which are living currently in little silos
05:15.292 --> 05:18.492
typically. Right. So one person might own some of this. One person might
05:18.516 --> 05:21.560
own some of this. You might have to ask somebody for this training set.
05:22.580 --> 05:26.220
This does not look like normal software development. Right. What that means
05:26.260 --> 05:29.836
is we can expect we're going to need to change how we do software development
05:29.908 --> 05:33.004
again. The world just gets a lot more complicated. The development landscape
05:33.052 --> 05:35.740
gets more complicated. Okay, so,
05:35.860 --> 05:39.004
right, so I'm here representing
05:39.132 --> 05:43.388
our product. I'm the CTO for PROVE AI. Right. This is our philosophy.
05:43.484 --> 05:47.454
It's designed to help organizations de risk AI
05:47.502 --> 05:51.006
simplify compliance and unify governance. So the idea is
05:51.158 --> 05:54.318
we don't really know what's going to happen in the future. We don't know what's
05:54.334 --> 05:58.126
going to happen with compliance. But what we can be sure of now
05:58.278 --> 06:01.502
is that it's going to be the
06:01.526 --> 06:04.958
thing that you can do now is get your data stored in the right place
06:05.094 --> 06:08.414
in a way that will set you up for success with later and most importantly,
06:08.462 --> 06:11.742
not get in the way of your development teams. That's something you can
06:11.766 --> 06:15.272
act on today. That really you're not wasting
06:15.336 --> 06:17.816
anything. You're going to have to do this anyway. You're going to have to store
06:17.848 --> 06:21.512
your observability data somewhere, your compliance data somewhere. Why not think
06:21.536 --> 06:24.020
this through now about the use cases of the future.
06:24.400 --> 06:27.608
Get this data written to the right type of place. Get your event streams written
06:27.624 --> 06:30.856
to the right place so you don't have to waste time later redoing
06:30.888 --> 06:34.184
everything. That is what our product
06:34.272 --> 06:37.848
provides. We are in early access. As you can imagine,
06:37.944 --> 06:41.288
this is an emergent type of use case. We're not embarrassed
06:41.304 --> 06:45.052
by that. It's early, that's fine. And we
06:45.076 --> 06:48.844
just think that it's worth the investment now to do that. We do use a
06:48.852 --> 06:52.428
decentralized ledger as the source of record behind.
06:52.564 --> 06:56.156
So what is this magic data store I'm talking about for PROVE AI?
06:56.188 --> 06:59.788
It's a decentralized ledger. There's a number of reasons. We think that
06:59.844 --> 07:03.980
there's a very legitimate set of use cases that
07:04.020 --> 07:07.692
are very well suited towards a distributed
07:07.836 --> 07:11.388
ledger. Azure data store for these types of compliance related events.
07:11.484 --> 07:14.860
Even if it's not the actual compliance data, it might be hashes of that
07:14.900 --> 07:18.172
data. Something that allows you to prove authenticity
07:18.316 --> 07:21.692
have it be tamper proof and allow you to do all of
07:21.716 --> 07:25.292
these use cases which I'll go through right now that
07:25.316 --> 07:28.780
we expect to be important in the future. All right, so now
07:28.900 --> 07:32.220
again, what I promised today was we'll go through a few
07:32.340 --> 07:35.612
forward looking use cases. These are food for thought. These are like,
07:35.716 --> 07:38.716
yeah, will you have to do this organization?
07:38.828 --> 07:42.472
We think so. I think so. And say,
07:42.656 --> 07:48.104
if I'm really going to need to solve these types of problems, then maybe
07:48.152 --> 07:51.640
adopting something that's ledger based is actually a good idea.
07:51.720 --> 07:54.792
So I'll go through a few sort of
07:54.816 --> 07:58.232
tenets or use cases that we think are going to be important.
07:58.416 --> 08:01.560
So one is this notion of being always compliant and
08:01.600 --> 08:05.816
observable for AI and any sort of gen AI
08:05.928 --> 08:10.154
sort of activity here. Collect everything, observe everything,
08:10.312 --> 08:13.718
right? Put it in one place. Most important to this use case
08:13.774 --> 08:17.798
is like, let's assume that we're not going to only have one MLOps tool.
08:17.854 --> 08:21.478
Let's assume you're going to do AI development at scale. You're going to have
08:21.534 --> 08:24.662
50 teams, even 10 teams. They're probably not
08:24.686 --> 08:28.502
going to want to use all of the same tooling, just like history has
08:28.526 --> 08:32.166
told us, right? So we're all going to standardize on web application container
08:32.198 --> 08:35.782
- No. Has that ever happened? It's never happened. Right. So should
08:35.806 --> 08:39.130
we expect it to happen with AI development? No, we shouldn't.
08:39.290 --> 08:42.650
So we can prepare for this inevitability now. And then we say,
08:42.690 --> 08:46.522
okay, if we expect our teams to use the tools that they
08:46.546 --> 08:50.042
want to use to be successful, we can therefore assume that
08:50.066 --> 08:52.910
there's probably going to be multiple MLOps tools in the mix.
08:54.210 --> 08:58.042
Let's also assume now we need to be always compliant. So this
08:58.066 --> 09:01.722
is not a notion of quarterly review is good enough, weekly review is good
09:01.746 --> 09:04.970
enough. Things are going to change so quickly and each little
09:05.010 --> 09:08.906
change may have a big impact on whatever happens compliance wise.
09:09.018 --> 09:12.410
So I should basically react to that by saying my
09:12.450 --> 09:15.850
goal as a business, as a cto, as a cio, is I want to be
09:15.890 --> 09:19.706
always compliant. I want to basically say everything that's
09:19.738 --> 09:23.402
logged is already logged, it's transparent. I don't need to make a
09:23.426 --> 09:26.618
phone call, I don't need to upload artifacts because that's
09:26.634 --> 09:30.026
going to be too slow, right? I don't have time every time somebody
09:30.058 --> 09:33.498
wants to push a new model version to go upload a bunch of artifacts.
09:33.514 --> 09:36.712
That proves that I'm doing the right thing with my data. Right?
09:36.736 --> 09:40.488
And again, you don't have to wait for compliance to, to, to move on this,
09:40.544 --> 09:44.040
right? As, as technology professionals, our job is to do the right thing,
09:44.080 --> 09:46.840
right? It's not to comply with the law, it's to do the right thing.
09:46.880 --> 09:50.232
Right? Do the right thing for our customers, do the right thing for, you know,
09:50.256 --> 09:53.704
our business. So, right. There's no reason to wait for compliance
09:53.752 --> 09:57.192
in this case, right? We, it's the right thing to do is to say,
09:57.216 --> 10:00.440
yeah, this is going to be too slow. Why don't I get everything written to
10:00.480 --> 10:03.858
a place that's transparent, observable and provably correct,
10:03.954 --> 10:07.506
provably accurate, such I don't have to field requests
10:07.538 --> 10:11.042
for information, I don't have to upload stuff, and I don't have to keep
10:11.146 --> 10:14.390
being bombarded with additional requests for stuff.
10:14.970 --> 10:19.314
Okay, so observable
10:19.362 --> 10:22.530
and compliant, I think are the two coins that we're talking about here.
10:22.570 --> 10:25.842
But the idea is the game is not going
10:25.866 --> 10:29.058
to be about looking at stuff once a month and
10:29.114 --> 10:33.266
having a review. It's going to be about it being available and online
10:33.338 --> 10:36.974
at all times. Completely hands off. Right? So you, as a technology
10:37.062 --> 10:40.702
professional, you don't want to get involved, right? Somebody says, are you guys compliant
10:40.766 --> 10:43.870
or are you guys doing the right thing with model bias and drift?
10:44.030 --> 10:47.422
You say, I show my work, I put all of our work here.
10:47.526 --> 10:51.150
It's observable if you want to go look at it. This is why we make
10:51.190 --> 10:54.814
all of our information available. Maybe internally, maybe externally.
10:54.862 --> 10:58.670
It doesn't matter. The point is it shouldn't fall on additional work for us as
10:58.710 --> 11:02.472
technologists. All right, so I like
11:02.496 --> 11:06.248
to just talk about a very real use case here. A use case could be.
11:06.304 --> 11:10.020
Let's say I'm a heavy equipment manufacturer. Let's say I make
11:10.400 --> 11:13.752
some sort of robotic devices that do some sort of
11:13.776 --> 11:17.480
industrial application. And let's say that these devices have
11:17.520 --> 11:21.896
embedded firmware now that have some sort of AI model built
11:21.928 --> 11:25.288
into it. Typical. Like you can imagine all sorts
11:25.304 --> 11:28.020
of X ray machines with gen AI models at this point.
11:28.480 --> 11:32.056
Not a far fetched use case. Industrial is something we've
11:32.088 --> 11:35.992
seen interest in. What happens now though? Let's say
11:36.016 --> 11:39.512
you have a product line of 50 different teams now trying to
11:39.536 --> 11:42.984
build AI into their products. And let's say that now
11:43.072 --> 11:46.552
the maintenance requirements on these things change in
11:46.576 --> 11:49.736
reaction to your exact models. All right. As a developer,
11:49.768 --> 11:53.480
I'm not going to have time to update my maintenance procedures every
11:53.520 --> 11:56.392
time I deploy tweaks to the model. I'm not going to have time to go
11:56.416 --> 11:59.824
through a compliance and safety review. The idea
11:59.872 --> 12:02.544
here is that I just want to automate all of this every time I make
12:02.552 --> 12:06.512
a change, possibly with an AI copilot of sorts, I would
12:06.536 --> 12:09.700
like to have an AI, maybe just generate documentation and maintenance
12:10.040 --> 12:13.740
stuff for me if that's the vision of what I want to do. Well then
12:14.040 --> 12:17.680
what happens when your automated documentation says something
12:17.720 --> 12:21.860
terribly wrong about a maintenance procedure? Massive liability,
12:22.280 --> 12:25.744
massive risk of public safety. How do you get around this
12:25.912 --> 12:29.222
without having to have really deep reviews every time you
12:29.246 --> 12:32.822
make a single change? Right. We can't live in that world. So what you
12:32.846 --> 12:36.102
can do though, right, in this case is just get into the notion of,
12:36.126 --> 12:39.158
look, I show my work, we're just going to keep developing, we're going to keep
12:39.214 --> 12:42.806
incrementally updating stuff. I need a way to share this information with
12:42.878 --> 12:46.054
third parties, my customers, about exactly what have I done
12:46.142 --> 12:50.214
to these models, what exactly is the impact? So collectively
12:50.262 --> 12:53.500
we can make a decision about where liability sits with,
12:53.590 --> 12:56.528
with these sort of automated maintenance stuff, right?
12:56.584 --> 13:00.608
And have a very clear sense of roles and responsibilities
13:00.704 --> 13:03.740
to avoid safety issues and legal issues.
13:04.200 --> 13:07.680
So lots of
13:07.800 --> 13:11.152
complicated things here. So next I wanted to
13:11.176 --> 13:14.820
talk about use cases, secure third party verifiability.
13:16.040 --> 13:19.776
So when we talk about this, right, we say compliance
13:19.808 --> 13:23.328
and safety, right? We typically talk about it as internal,
13:23.424 --> 13:26.896
right? But now let's think about the use case where,
13:26.968 --> 13:30.528
you know, I'm a software vendor today and I want to see your SOC
13:30.584 --> 13:34.064
2 compliance, right? Very typical type of thing.
13:34.232 --> 13:37.180
Maybe I develop software in the web 2.0 world.
13:38.040 --> 13:42.180
I want to go and embed a vendor's piece of software into my solution.
13:42.520 --> 13:46.112
Typical thing to do, show me your compliance, I want to see your certificates.
13:46.176 --> 13:49.618
Because I can only do business with other organizations that have high
13:49.674 --> 13:52.738
quality, you know, high degree of compliance,
13:52.834 --> 13:56.546
right? Normal. Okay, now what happens in the
13:56.618 --> 14:00.306
LLM world, right? What happens when I'm trying to license LLMs? What happens
14:00.338 --> 14:03.538
when I'm trying to share this type of information with third
14:03.594 --> 14:06.950
parties? When there's noticeably more moving parts,
14:07.450 --> 14:10.610
it gets more complicated. Right? Again, the show your
14:10.650 --> 14:13.530
work aspect here is probably the best solution,
14:13.650 --> 14:17.002
right? Instead of me saying I'll give you a certificate and we'll do an
14:17.026 --> 14:20.794
audit yearly. It's probably going to be, I'm just going to
14:20.802 --> 14:23.642
give you an event stream, I'm going to write an event stream of what we
14:23.666 --> 14:27.242
do, who did it and how it happens, right? I'm going
14:27.266 --> 14:30.762
to show that I'm always compliant and I'm going to share this
14:30.786 --> 14:34.042
information in a secure way with you that doesn't require us to even
14:34.066 --> 14:37.274
have a phone call or an email exchange. Right? So if
14:37.282 --> 14:40.602
you believe that that's sort of the personally, we believe that
14:40.626 --> 14:43.702
that's sort of the inevitable, you know, way that this nets out. And in this
14:43.726 --> 14:47.622
case writing this type of information to a decentralized ledger makes, makes a whole lot
14:47.646 --> 14:50.694
of sense. You know, example could be in healthcare,
14:50.742 --> 14:54.278
right? So let's say I have a, you know, let's say I buy a provider
14:54.294 --> 14:58.010
of medical chatbots that provide some sort of medical advice to patients,
14:58.430 --> 15:01.622
you know, and you can imagine there's all sorts of
15:01.646 --> 15:04.982
liability issues there. So let's say I say, hey, I expect you
15:05.006 --> 15:08.552
as an LLM provider, as a chatbot provider to be continually
15:08.616 --> 15:12.152
updating and improving the quality of your service. On the other hand,
15:12.336 --> 15:15.768
we can't have a two month gap where you fell out of compliance of
15:15.824 --> 15:19.300
something and I was unaware of it and you were unaware of it.
15:19.840 --> 15:23.032
We want a record, a tamper proof record, a place to point to
15:23.056 --> 15:26.216
where we can say you've told everything you've done with your model
15:26.248 --> 15:29.432
since we did our initial review is written here. I can see it, you can
15:29.456 --> 15:32.660
see it. Regulators can even see it if they need to.
15:32.960 --> 15:35.982
It's a much better way and much lighter way of,
15:36.056 --> 15:39.186
of proceeding rather than the notion of quarterly
15:39.218 --> 15:43.074
or yearly audits which are not going to catch things quickly enough to avoid
15:43.202 --> 15:46.750
potentially very damaging types of issues.
15:47.930 --> 15:51.362
So again, a lot of this
15:51.386 --> 15:55.090
is forward looking, so I'll just keep going through some additional use
15:55.130 --> 15:58.578
cases here. Another one is around journaled model outcomes.
15:58.674 --> 16:02.018
So the actual predictions, the outcomes of a model, Right.
16:02.074 --> 16:05.202
So now you can imagine the problem of, let's say I'm an AI as
16:05.226 --> 16:08.230
a service provider, right. I provide a model,
16:08.570 --> 16:12.210
it makes predictions, that's all my business does.
16:12.250 --> 16:16.322
And then there's other applications that build on my predictions downstream
16:16.386 --> 16:20.270
to build out a full solution. All right, so what happens when
16:20.650 --> 16:24.258
there's an issue, right. A customer encounters some sort of problem.
16:24.394 --> 16:28.248
Right, so let's take automotive driving. So let's say I provide a model that
16:28.354 --> 16:32.508
can detect objects, do visual processing on objects
16:32.524 --> 16:36.092
and let's say I'm an automotive maker that consumes those models
16:36.236 --> 16:39.404
and I build a self driving application
16:39.452 --> 16:42.668
off of them. Okay, well what happens when the car goes
16:42.684 --> 16:45.900
ahead and hits something it shouldn't hit? Very bad situation,
16:45.980 --> 16:49.964
very bad outcome and I want to know exactly
16:50.012 --> 16:53.962
where the problem was. It can't be a two month inquiry about
16:54.156 --> 16:57.570
was this a bug in your model? Was it a bug in my software?
16:58.470 --> 17:01.790
Can't get lawyers involved in that. It's going to take too long. You're going to
17:01.830 --> 17:05.358
be offline for way too long. What you can do, though, is journal
17:05.414 --> 17:08.910
your predictions, right? If I'm an AI, as a service provider, I can journal those
17:08.950 --> 17:12.398
predictions to a decentralized ledger. I can say,
17:12.534 --> 17:15.902
again, I show my work. This is what our software did. And if there's any
17:15.926 --> 17:19.422
discrepancy about what actually happened and
17:19.526 --> 17:22.962
who. What actually happened underneath the hood of
17:22.986 --> 17:27.378
this, it's not something I have to go and request logs for and get
17:27.514 --> 17:30.722
legal involved. We don't want to live in that type of world.
17:30.826 --> 17:33.762
And that is still the world we live in today with a lot of our
17:33.786 --> 17:37.394
cloud providers. So I rely on a cloud provider. I expected to
17:37.402 --> 17:40.194
do something. Hey, there was a bug. Hey, it caused me damage.
17:40.322 --> 17:43.922
What do I do? I can't go look through their logs. I can request their
17:43.946 --> 17:47.202
logs. I can request a root cause analysis, I can ask for
17:47.226 --> 17:50.994
a bunch of stuff. If I'm not satisfied, I can go get lawyers
17:51.042 --> 17:54.498
involved. Slow process. And the only reason we've put up with that
17:54.554 --> 17:57.954
so long is just the blast radius for these types of things has been easier
17:58.002 --> 18:01.346
to mitigate. They're much harder to mitigate in the gen AI world,
18:01.418 --> 18:04.150
which is why we think a different approach is necessary.
18:04.810 --> 18:08.530
All right. Automation types of use cases. Again, I like to say
18:08.570 --> 18:12.178
automate. We're not talking about workflows
18:12.194 --> 18:15.772
here. Workflows involves. Implies people are going to be in
18:15.796 --> 18:19.276
the mix, right? We think the vision here is people list
18:19.348 --> 18:23.132
types of automation. So chatbots could be an
18:23.156 --> 18:26.920
interesting use case or example of a use case here where we say,
18:27.300 --> 18:31.116
if I deploy just a simple customer service type
18:31.148 --> 18:33.400
of, type of chatbot here,
18:34.740 --> 18:39.212
there's a whole bunch of issues here. Then how
18:39.236 --> 18:43.120
do we figure out when something goes awry? With one of these,
18:43.300 --> 18:46.328
with one of these chatbots that we purchase, Right. How do we go ahead and
18:46.384 --> 18:49.816
trigger some actions based on it? That makes sense. How do we fully
18:49.848 --> 18:53.800
automate a recovery? How do we fully sort of streamline this
18:53.840 --> 18:57.384
whole process? Another version of. Let's say I release a new version of
18:57.392 --> 19:00.616
a chat bot, which maybe I want to do weekly. How do I educate
19:00.648 --> 19:04.440
the customer service teams who work alongside these chatbots with
19:04.480 --> 19:07.336
changes? It's a very real problem I've had and I've seen,
19:07.408 --> 19:11.272
right? You say, hey, we're going to release a feature. I need
19:11.296 --> 19:14.392
to inform the customer support team that this is. We have to do our training,
19:14.456 --> 19:17.880
we have to do our documentation, we have to make sure they're aware that things
19:17.920 --> 19:21.480
are going to change. Okay, but what happens if your changes are now
19:21.520 --> 19:25.032
sourced by an LLM, then what right are
19:25.056 --> 19:28.216
you going to slow everything down and say wait a minute, the actual human beings
19:28.248 --> 19:31.752
who do customer support need to need a couple of days to
19:31.776 --> 19:34.660
look through these changes and make sure they understand things.
19:35.680 --> 19:39.092
Don't think that's going to work, it's going to be too slow. So what we
19:39.116 --> 19:42.740
think though is well, if you automation has to be part of this, right?
19:42.780 --> 19:46.116
Every time stuff is changing with your event stream,
19:46.148 --> 19:49.892
with your model development, you're going to need to find a way to automate these
19:49.916 --> 19:53.572
events into meaningful actions that help the
19:53.596 --> 19:56.820
people who use it be aware of what just changed without having
19:56.860 --> 19:59.760
to go through a click a button. Review this.
20:00.140 --> 20:03.572
Touch this again. It's another plug for a decentralized ledger.
20:03.636 --> 20:07.144
Because decentralized ledgers you can build applications directly on
20:07.152 --> 20:10.680
the event stream in a secure tamper proof way. So it's a use
20:10.720 --> 20:13.752
case and we think it's an important one. All right,
20:13.776 --> 20:17.540
and the last one I'll cover here around sort of forward looking use cases
20:17.840 --> 20:21.048
is they're around model and tokenized data
20:21.104 --> 20:24.776
sets, right? So let's briefly just talk about the exchange of data,
20:24.848 --> 20:27.704
the tokenization of data, the ownership of data,
20:27.872 --> 20:31.800
right? So suppose I want to
20:31.840 --> 20:36.008
go ahead and share my information or my proprietary data
20:36.064 --> 20:40.360
sets with third parties, right? That made like again
20:40.480 --> 20:44.744
use cases. I've tried to do this in the past. You get into heavyweight licensing
20:44.792 --> 20:48.248
deals. It's difficult, right? It's a very difficult problem
20:48.384 --> 20:52.260
to solve. How do you give somebody your data without losing control of it?
20:52.880 --> 20:56.072
It's again a good use case for a decentralized
20:56.136 --> 21:00.312
ledger. You can put something perhaps tokenized on
21:00.336 --> 21:03.514
the ledger that says this is the cryptographic hash of the
21:03.522 --> 21:07.034
data set that we have and or the rows, here's the event stream of the
21:07.042 --> 21:10.810
data as I change it so you don't lose ownership of something even
21:10.850 --> 21:14.138
though the data sets continually changing, right? So again,
21:14.194 --> 21:17.466
is this how else would you do this? Right? I think that's what keeps coming
21:17.538 --> 21:21.002
back to us on prove AI is we know we're going to need to
21:21.026 --> 21:24.426
exchange data, we know we need to change this data continually.
21:24.538 --> 21:27.610
How on earth are you going to share this with third parties without it making
21:27.650 --> 21:30.730
a mess, right? How are you going to prove that you had ownership of this?
21:30.770 --> 21:33.932
How are you going to prove that each of these little change sets is
21:33.956 --> 21:37.692
stuff that you initiated, not some other third party knowing full well that
21:37.716 --> 21:41.280
you may be liable for whatever happens down the stream?
21:41.780 --> 21:44.860
Again, this is probably one of the most forward looking use cases out there.
21:44.900 --> 21:48.572
But again starting Today our thesis at
21:48.596 --> 21:52.520
PROVE AI is all of these things are likely
21:53.300 --> 21:56.588
to come back on us and cause a problem in the near future. Near future
21:56.644 --> 22:00.118
being one to two years. And if they do, then if I'm
22:00.134 --> 22:04.118
struggling with compliance today and how should I think about AI?
22:04.294 --> 22:08.054
We really think that the right way is to start with your data, start with
22:08.062 --> 22:11.638
your event collection and start thinking about these sort of third party
22:11.694 --> 22:15.094
use cases, these sharing use cases, these types of
22:15.102 --> 22:18.854
issues where you need real time event streams that are outside your multiple MLOps
22:18.902 --> 22:22.566
tools and start there and then continue on
22:22.718 --> 22:26.502
and build out the reactions to it exactly as
22:26.526 --> 22:30.236
you need. As an. So if you're interested, you outcome by
22:30.308 --> 22:33.996
we're at booth 4, 3, 5. Again we're an open,
22:34.068 --> 22:37.644
we're an early access product, we have a few initial customers.
22:37.692 --> 22:41.100
But as you heard from all of these use cases, right, it is early and
22:41.140 --> 22:45.100
that's, that's okay, right? Like any of the past big technology
22:45.220 --> 22:48.492
trends, the first step was typically to get your, you know,
22:48.516 --> 22:52.124
get your observability, get your eventing and get your data strategy correct
22:52.172 --> 22:56.152
first or you're going to make a headache for yourself later. And all
22:56.176 --> 22:59.352
things being equal, if you have to write and store this data anyway, why not
22:59.376 --> 23:04.424
choose the right type of storage and the right sort of long term philosophy
23:04.472 --> 23:08.808
here? Because there's really no downside to it, right? There's only potential upside.
23:08.984 --> 23:12.488
So with that I think that's about time and thank you. Great,
23:12.544 --> 23:13.220
thank you.
23:16.000 --> 23:18.360
We don't have a lot of time for questions, but we do have a break.
23:18.400 --> 23:21.736
So we have a little bit of a buffer. So one common element
23:21.768 --> 23:24.712
for a lot of the gen AI platform is that people may be building for
23:24.736 --> 23:28.750
internal use or combining together with for example a ChatGPT
23:28.830 --> 23:32.206
or a Claude or whatever is that they all learn from user
23:32.238 --> 23:35.886
feedback. Right. So essentially the LLM is learning and improving and the vector
23:35.918 --> 23:39.294
databases are adjusting based on the users
23:39.342 --> 23:43.534
actually interacting with the data set that's there. So obviously
23:43.582 --> 23:47.198
one could sort of be retraining the model on
23:47.254 --> 23:51.182
new data, but that doesn't necessarily take into account the training that
23:51.206 --> 23:55.130
occurs as a result of somebody saying, now wait, that wasn't actually the best
23:55.170 --> 23:58.522
Chinese restaurant on the Upper west side. You should have also looked at
23:58.546 --> 24:02.362
these Chinese restaurants. Well, at that point the LLM should be
24:02.386 --> 24:06.698
learning from the user feedback. So how does one document
24:06.794 --> 24:10.874
changes to the overall corpus of information and potential relevance
24:10.922 --> 24:15.162
of results when it's coming from user feedback? So that's exactly.
24:15.226 --> 24:19.632
Yeah, that's a great example of the problem here. Right. So as
24:19.656 --> 24:23.344
the AIs themselves, as whatever update
24:23.392 --> 24:26.960
mechanism in there is, we imagine a case
24:27.000 --> 24:31.184
where they're writing these events to the decentralized
24:31.232 --> 24:34.896
ledger, basically to tie back who exactly was the actor.
24:35.008 --> 24:38.448
Can we cryptographically be sure of who made this change?
24:38.584 --> 24:42.016
Another benefit of decentralized ledger. And then
24:42.088 --> 24:45.392
can you figure out exactly what point in time is? And most importantly, could you
24:45.416 --> 24:49.216
replay could you basically use a decentralized ledger as,
24:49.368 --> 24:53.008
like, a database ledger? So if there is a problem and then there's a change
24:53.064 --> 24:56.192
that somehow the AIs came
24:56.216 --> 25:00.416
to the wrong conclusion here, could you roll back and replay in reverse
25:00.528 --> 25:03.696
back to a known good state? Right. So, yes, you can.
25:03.768 --> 25:07.488
And you can do all of these things with something like a decentralized ledger.
25:07.664 --> 25:10.640
Great. Thanks so much, and we'll see you guys back here shortly.