1
00:00:00,000 --> 00:00:02,960
Welcome you all to the second panel

2
00:00:04,680 --> 00:00:06,720
of the first day and

3
00:00:09,280 --> 00:00:10,480
this panel

4
00:00:11,000 --> 00:00:14,400
will aim to discuss

5
00:00:14,960 --> 00:00:19,440
the current status of natural language
processing techniques.

6
00:00:20,000 --> 00:00:25,200
And during the context of the
the web, we can think

7
00:00:25,480 --> 00:00:27,680
or we know that they can be used

8
00:00:28,200 --> 00:00:31,440
to generate
textual descriptions for images,

9
00:00:31,440 --> 00:00:36,440
but also for other visual media
that's presented on web pages

10
00:00:37,160 --> 00:00:40,880
and we will focus today our discussion on

11
00:00:41,840 --> 00:00:45,480
or start to consider aspects
such as text adaptation

12
00:00:45,480 --> 00:00:51,480
and how to provide understandable text
to better meet the web user needs.

13
00:00:51,480 --> 00:00:55,880
And they're in different contexts of
of use and also what

14
00:00:56,160 --> 00:01:00,000
our future perspective
for natural language

15
00:01:00,000 --> 00:01:03,240
processing on web accessibility

16
00:01:03,240 --> 00:01:05,400
are to support web accessibility.

17
00:01:06,720 --> 00:01:09,600
So I'm glad to welcome back

18
00:01:10,720 --> 00:01:13,000
Michael, Shivam and Amy.

19
00:01:13,200 --> 00:01:16,320
And when

20
00:01:16,320 --> 00:01:18,840
you are Amy and also

21
00:01:19,560 --> 00:01:23,600
to welcome Shaomei Wu from AImpower

22
00:01:26,160 --> 00:01:27,480
who agreed to join

23
00:01:27,480 --> 00:01:30,920
us on the second panel of the day.

24
00:01:31,360 --> 00:01:33,840
Welcome
you all back and welcome Shaomei.

25
00:01:34,760 --> 00:01:36,960
For your first intervention

26
00:01:36,960 --> 00:01:41,400
I ask you
just to briefly introduce yourself

27
00:01:42,000 --> 00:01:44,240
and your three other co panelists

28
00:01:45,440 --> 00:01:47,600
already done that on the previous panel.

29
00:01:47,600 --> 00:01:51,400
So no need to
to reintroduce yourselves.

30
00:01:52,800 --> 00:01:57,520
But I will start by

31
00:01:59,840 --> 00:02:02,760
thinking about once again the quality.

32
00:02:02,760 --> 00:02:05,560
So we go back to the to the quality topic,

33
00:02:06,240 --> 00:02:09,040
but now the quality of 
machine generated descriptions

34
00:02:11,040 --> 00:02:13,920
now no longer
from the perspective of image processing,

35
00:02:14,200 --> 00:02:18,240
but from the perspective
of the natural language generation, right?

36
00:02:18,240 --> 00:02:21,480
So how can we improve the quality

37
00:02:21,480 --> 00:02:26,760
of this machine generated descriptions,
especially taking into account

38
00:02:26,880 --> 00:02:30,320
the personalized preferences from users?

39
00:02:30,760 --> 00:02:33,360
And I will start with you Shaomei

40
00:02:36,720 --> 00:02:39,360
Hello. So

41
00:02:41,880 --> 00:02:44,160
thank you all for

42
00:02:45,160 --> 00:02:47,640
having me here

43
00:02:47,640 --> 00:02:48,200
today.

44
00:02:48,200 --> 00:02:53,160
And my name is Shaomei Wu and right now

45
00:02:53,160 --> 00:02:57,480
I'm the founder and CEO of AImpower.org

46
00:02:58,080 --> 00:03:02,160
Tech non profit that researches

47
00:03:02,160 --> 00:03:05,320
and co-create

48
00:03:06,080 --> 00:03:09,800
in-powering technology for

49
00:03:09,840 --> 00:03:13,640
other ways marginalized users.

50
00:03:13,640 --> 00:03:15,800
And first of all, I want to

51
00:03:16,960 --> 00:03:20,400
and also share,

52
00:03:20,440 --> 00:03:26,200
that I do have a stutter so that you may hear

53
00:03:26,440 --> 00:03:30,720
there'll be more pauses and ...
when I talk

54
00:03:32,080 --> 00:03:33,920
and before AImpower.

55
00:03:33,920 --> 00:03:37,840
I was a research scientist at Facebook

56
00:03:38,920 --> 00:03:42,080
leading a lot of research and

57
00:03:43,680 --> 00:03:46,120
product work on

58
00:03:46,760 --> 00:03:50,480
accessibility, inclusion and equity.

59
00:03:51,160 --> 00:03:56,400
So one of those product that I shipped is

60
00:03:56,400 --> 00:03:59,240
automatic alt texts

61
00:04:02,840 --> 00:04:07,520
and these you're allowed to
we will provide

62
00:04:08,080 --> 00:04:10,320
short and

63
00:04:11,440 --> 00:04:13,560
machine generated

64
00:04:13,560 --> 00:04:16,440
description of

65
00:04:17,960 --> 00:04:21,800
images on Facebook and Instagram

66
00:04:23,160 --> 00:04:28,880
to screen reader users in real time.

67
00:04:29,880 --> 00:04:31,880
So when it comes to

68
00:04:31,880 --> 00:04:35,280
quality of automated alt text

69
00:04:35,840 --> 00:04:39,280
or other similar

70
00:04:40,480 --> 00:04:41,880
systems,

71
00:04:42,240 --> 00:04:44,400
we saw two kind of biggest

72
00:04:45,480 --> 00:04:50,280
area of development that we wanted to do.

73
00:04:50,760 --> 00:04:54,240
And the first one is accuracy,

74
00:04:54,320 --> 00:04:57,480
which I think we talked a lot about

75
00:04:57,480 --> 00:05:00,960
in the last panel as well.

76
00:05:02,520 --> 00:05:05,120
But I want to talk

77
00:05:05,120 --> 00:05:09,840
it'll be more about the second one,
which is the

78
00:05:10,880 --> 00:05:12,960
richness of those

79
00:05:14,000 --> 00:05:16,000
descriptions.

80
00:05:16,760 --> 00:05:20,520
So to be honest,

81
00:05:20,760 --> 00:05:26,480
like the alt texts that we generated

82
00:05:26,520 --> 00:05:29,560
was quite

83
00:05:29,880 --> 00:05:32,320
limited, you know, a lot of

84
00:05:33,440 --> 00:05:37,880
users they will say
that it's more like

85
00:05:38,560 --> 00:05:40,680
teasers, kind of, you know,

86
00:05:41,960 --> 00:05:44,880
and telling you like, oh yeah, five

87
00:05:45,720 --> 00:05:49,200
people smiling, pizza, indoor.

88
00:05:49,960 --> 00:05:51,600
But no more than that.

89
00:05:51,600 --> 00:05:59,880
No more than what kind of pizza
and what kind of like

90
00:06:00,520 --> 00:06:03,000
indoor environment. Is it at home?

91
00:06:03,000 --> 00:06:06,200
Is it like a restaurant?

92
00:06:06,200 --> 00:06:12,720
So I think our users really,
when they kind of gathered all of the

93
00:06:13,000 --> 00:06:17,520
richness of what someone who has eyesight can see,

94
00:06:17,880 --> 00:06:20,320
you know, can handle access.

95
00:06:21,120 --> 00:06:24,320
So one particular kind

96
00:06:26,240 --> 00:06:31,200
an area
that users do really want to know more

97
00:06:31,360 --> 00:06:37,000
is it's about people who they are,
how do they

98
00:06:38,480 --> 00:06:39,120
look?

99
00:06:39,920 --> 00:06:44,280
Look like... race, gender and

100
00:06:45,320 --> 00:06:47,640
even how

101
00:06:48,320 --> 00:06:49,840
attractive they are

102
00:06:49,840 --> 00:06:53,120
because that is something
really kind of socially salient.

103
00:06:53,880 --> 00:06:58,960
So that was kind of big challenge for us.

104
00:06:59,040 --> 00:07:01,920
One way was designing our

105
00:07:03,560 --> 00:07:07,040
our system

106
00:07:08,280 --> 00:07:11,520
because, you know, like, how can we share

107
00:07:11,520 --> 00:07:14,560
those kind of attribute in a, like,

108
00:07:15,520 --> 00:07:19,720
you know, both accurate and kind of socially

109
00:07:20,880 --> 00:07:22,400
conscious way.

110
00:07:22,400 --> 00:07:26,840
So we actually opt to not

111
00:07:28,080 --> 00:07:29,120
kind of

112
00:07:30,120 --> 00:07:34,200
showing like the race and gender of the,

113
00:07:35,160 --> 00:07:39,720
the people being photographed,

114
00:07:40,640 --> 00:07:43,640
but which I you know,

115
00:07:43,920 --> 00:07:46,600
we got actually a lot of complaint on

116
00:07:47,160 --> 00:07:52,200
but how to kind of
yeah convey those in a kind of socially

117
00:07:53,400 --> 00:07:57,040
respectful way I think it's something we

118
00:07:57,840 --> 00:08:00,520
we should really work like work on

119
00:08:00,960 --> 00:08:07,840
and now I can see kind of like a few ways
that we can make that better.

120
00:08:07,880 --> 00:08:12,080
For example, like, 
considering the like...

121
00:08:16,800 --> 00:08:18,480
relationship

122
00:08:18,480 --> 00:08:21,760
between kind of people in the photo and

123
00:08:23,040 --> 00:08:27,120
yours, for example,
like if they are friends

124
00:08:27,120 --> 00:08:32,600
and then we can just tell them the name, 
and, you know, other things about those people

125
00:08:32,920 --> 00:08:36,320
and then another thing 
is to kind of, give people

126
00:08:38,560 --> 00:08:41,840
progress details.

127
00:08:42,040 --> 00:08:48,200
So then you know, I have some kind of like option
to kind of allow the consumer

128
00:08:48,200 --> 00:08:52,200
of those alt texts
to kind of request more

129
00:08:54,840 --> 00:08:57,960
details that we just cannot provide

130
00:08:57,960 --> 00:09:02,840
by our systems.

131
00:09:03,680 --> 00:09:07,800
So I will be done here and let

132
00:09:09,280 --> 00:09:12,560
others talk.

133
00:09:12,560 --> 00:09:12,960
Thank you Shaomei.

134
00:09:12,960 --> 00:09:16,680
Shivam, your thoughts on

135
00:09:17,080 --> 00:09:21,680
how can we improve the quality of machine
generated descriptions?

136
00:09:23,080 --> 00:09:26,280
Okay, so this is a two part thing.

137
00:09:26,280 --> 00:09:29,080
So when you come
to technically implementing more, say

138
00:09:30,000 --> 00:09:32,400
how you have design your model,
how you train them,

139
00:09:32,720 --> 00:09:35,800
and what the whoever the stakeholders
of designing a particular model

140
00:09:35,800 --> 00:09:39,760
is very much necessary
in how and when to get the quality machine

141
00:09:39,760 --> 00:09:41,200
generated description.

142
00:09:41,200 --> 00:09:44,400
Now when you when we take into account
users personalized

143
00:09:44,400 --> 00:09:47,520
preferences, this is a two part...

144
00:09:47,560 --> 00:09:50,320
So first, let's take an example.

145
00:09:50,320 --> 00:09:54,360
I am a person who knows Spanish
and my model,

146
00:09:54,800 --> 00:09:58,360
a very famous model gives descriptions
in English right, now

147
00:09:58,720 --> 00:10:01,560
that model or whatever
the consumption of that model

148
00:10:01,680 --> 00:10:04,600
is like let's say
you're using an API to consume the model.

149
00:10:05,040 --> 00:10:08,680
So that should take into account
the user's personalized preferences

150
00:10:08,680 --> 00:10:12,400
of his language
and write the output based on that as well.

151
00:10:12,760 --> 00:10:17,080
So this diversity of a model
to prepare output in multiple

152
00:10:17,080 --> 00:10:21,280
formats, multiple languages
is something that can be looked into.

153
00:10:21,280 --> 00:10:25,160
This is how the quality of the machine
generated description increases.

154
00:10:25,440 --> 00:10:28,600
Now, you did not train the complete model
separately.

155
00:10:28,600 --> 00:10:29,640
What you can do

156
00:10:29,640 --> 00:10:33,440
is just a post-processing scripts
for your models and that can help.

157
00:10:33,440 --> 00:10:38,680
And you it's not much of an effort
when we say as a model training input,

158
00:10:38,680 --> 00:10:42,600
but it's very simple solution
to what can be a best solution.

159
00:10:42,920 --> 00:10:45,520
The other thing is
how you prepare your quality data.

160
00:10:45,520 --> 00:10:50,760
Now you should fully carefully categorize
it is strictly if needed.

161
00:10:51,080 --> 00:10:55,640
And let's say you have input data that are
blurred images and all sorts of thing.

162
00:10:55,920 --> 00:10:59,560
So you need to carefully
prepare your model, creating data and

163
00:10:59,800 --> 00:11:02,360
based on that data,
your description would be

164
00:11:02,640 --> 00:11:07,400
a bit more clearer and the population
will also be factored in.

165
00:11:07,400 --> 00:11:08,280
When you

166
00:11:08,400 --> 00:11:13,000
look into how you can both process
your data for a certain group of people.

167
00:11:13,760 --> 00:11:17,160
So that's how I see it.

168
00:11:17,160 --> 00:11:17,760
Thank you.

169
00:11:17,760 --> 00:11:21,880
And Amy, want to share your experiences?

170
00:11:22,920 --> 00:11:23,400
Sure.

171
00:11:23,400 --> 00:11:25,920
So a couple of ways that I've seen

172
00:11:26,720 --> 00:11:31,440
that I think are sort of promising
maybe to use NLP to improve quality.

173
00:11:31,800 --> 00:11:34,600
One thing I started
seeing recently is people,

174
00:11:35,200 --> 00:11:39,080
you know,
starting to consider context around the

175
00:11:39,480 --> 00:11:42,280
the image that's going to be described,
to maybe

176
00:11:42,280 --> 00:11:45,880
create a description
that's more that's more helpful.

177
00:11:45,880 --> 00:11:47,760
So imagine, you know,

178
00:11:47,760 --> 00:11:51,720
someone writes a post on Twitter
and they couple that post with an image.

179
00:11:51,880 --> 00:11:57,560
So considering the the post
and the image together, maybe might inform

180
00:11:57,720 --> 00:12:01,080
models on how to create something
that's more informative.

181
00:12:01,080 --> 00:12:05,600
So for instance,
if I posted a picture of myself

182
00:12:06,440 --> 00:12:08,600
snowboarding
and I said I learned a new trick,

183
00:12:08,600 --> 00:12:11,760
then it might be important to tell me
what trick you learned.

184
00:12:12,320 --> 00:12:15,560
Whereas on the other hand,
I said, I just went on vacation.

185
00:12:15,560 --> 00:12:17,800
You know, the exact trick
might not matter as much.

186
00:12:18,360 --> 00:12:22,560
And so I think that the idea
of like using language understanding

187
00:12:22,560 --> 00:12:25,920
to get more information about the context
before making a prediction is promising

188
00:12:26,560 --> 00:12:27,480
in another way.

189
00:12:27,480 --> 00:12:30,600
I've sort of seen it used to
maybe improve the quality kind of

190
00:12:30,640 --> 00:12:31,680
goes back to the other

191
00:12:32,840 --> 00:12:36,080
the other answers that were given
so maybe you can use question

192
00:12:36,080 --> 00:12:39,080
answering about the image
to gain more information when you need it.

193
00:12:40,440 --> 00:12:42,480
Oh, one thing

194
00:12:42,480 --> 00:12:48,040
I've also thought about is seeing
if maybe users could give examples

195
00:12:48,040 --> 00:12:52,640
or their preferences
about descriptions in in natural language.

196
00:12:52,640 --> 00:12:55,800
So this is an example of a description
maybe we can copy the style

197
00:12:55,800 --> 00:12:59,160
of this description when we're applying it
to other descriptions.

198
00:12:59,400 --> 00:13:03,960
So maybe I like to hear about the costumes
someone wears in a

199
00:13:04,920 --> 00:13:08,320
in in a video,
and I wish that future descriptions

200
00:13:08,800 --> 00:13:12,560
might include more information about that
rather than summarizing them.

201
00:13:12,880 --> 00:13:16,400
And then finally, one other way I've seen,

202
00:13:17,240 --> 00:13:20,120
I've used NLP to improve

203
00:13:20,120 --> 00:13:23,000
quality, is also based on summarization.

204
00:13:23,640 --> 00:13:27,520
So there can be times
when there's more to describe than time

205
00:13:27,520 --> 00:13:28,680
you have to describe it.

206
00:13:28,680 --> 00:13:32,720
So especially in videos,
there's often a really small amount

207
00:13:32,720 --> 00:13:36,120
of time to describe
without overlapping the other audio.

208
00:13:36,640 --> 00:13:42,080
So one way you can use use NLP to improve
the quality

209
00:13:42,360 --> 00:13:45,320
is by trying to summarize
those descriptions so they fit

210
00:13:45,320 --> 00:13:49,640
within the time you have and they don't
decrease the experience of people

211
00:13:50,560 --> 00:13:53,080
trying to watch the video
and hear the audio at the same time.

212
00:13:53,080 --> 00:13:55,560
Yeah, yeah.

213
00:13:55,560 --> 00:14:00,600
That's that's
definitely a good use for NLP.

214
00:14:01,440 --> 00:14:03,800
Michael, still in this topic

215
00:14:03,920 --> 00:14:08,560
and I would like to have your perspective

216
00:14:08,560 --> 00:14:12,560
on initiatives from WAI that

217
00:14:14,200 --> 00:14:17,000
might assist users in

218
00:14:17,400 --> 00:14:19,920
providing their preferences

219
00:14:20,760 --> 00:14:23,560
so that eventually

220
00:14:23,560 --> 00:14:26,760
models can use those

221
00:14:27,600 --> 00:14:33,440
or anything
that might be ongoing in that regard.

222
00:14:33,440 --> 00:14:34,200
Okay.

223
00:14:34,560 --> 00:14:39,600
So first of all, just give the disclaimer
for anybody new to this session

224
00:14:39,600 --> 00:14:42,320
that I'm not a machine learning
professional, I'm speaking from the

225
00:14:43,080 --> 00:14:46,080
of my work
and the Web accessibility initiative.

226
00:14:46,440 --> 00:14:47,640
I do want to talk briefly.

227
00:14:47,640 --> 00:14:52,280
I think the other panelists
covered almost anything

228
00:14:52,280 --> 00:14:54,760
I would have said

229
00:14:54,920 --> 00:15:00,720
one thing that based on my knowledge
of how machine learning works generally

230
00:15:00,720 --> 00:15:04,080
today, our models tend to be focused

231
00:15:04,080 --> 00:15:06,480
on, you know, of our particular ability,

232
00:15:07,680 --> 00:15:09,080
and it's not universal.

233
00:15:09,080 --> 00:15:13,640
And the future models
will have more abilities combined.

234
00:15:13,640 --> 00:15:18,440
But, you know, so there may be one model
that can recognize this is a human.

235
00:15:18,440 --> 00:15:22,440
And here are those attributes, another one
that you can say this is this human,

236
00:15:23,280 --> 00:15:26,800
and yet another one that can say this
human plus that human

237
00:15:26,800 --> 00:15:28,320
equals this relationship.

238
00:15:29,320 --> 00:15:29,920
So all of

239
00:15:29,920 --> 00:15:32,360
that information,
I believe is separate right now.

240
00:15:33,000 --> 00:15:36,800
So the ability for models
to share contexts,

241
00:15:37,080 --> 00:15:40,960
I think is going to be
a part of the solution that we need.

242
00:15:40,960 --> 00:15:47,760
So what I can speak up for in the Web
accessibility initiative.

243
00:15:47,760 --> 00:15:52,680
So we are only beginning to explore

244
00:15:52,960 --> 00:15:57,120
what AI and accessibility means.

245
00:15:57,120 --> 00:16:00,200
And so this symposium
is a part of that process.

246
00:16:01,160 --> 00:16:04,400
We have a practice of doing

247
00:16:04,920 --> 00:16:07,520
research papers,
sort of literature reviews,

248
00:16:08,040 --> 00:16:11,000
and then proposing accessibility
user requirements.

249
00:16:11,000 --> 00:16:16,120
So that would be something that, you know,
we could be working on to

250
00:16:17,160 --> 00:16:18,680
start gathering this information.

251
00:16:18,680 --> 00:16:22,640
And from there we decide what to do,
whether the content goes into guidelines

252
00:16:22,640 --> 00:16:26,000
or into new technologies or whatever.

253
00:16:26,000 --> 00:16:30,840
But I think most of the resources around
AI are

254
00:16:31,760 --> 00:16:36,200
would fit into new resources
for those categories.

255
00:16:36,200 --> 00:16:38,520
Okay, great. Thanks. And

256
00:16:39,480 --> 00:16:42,920
I would like now to move on to addressing
something

257
00:16:42,920 --> 00:16:47,400
that was basically the core of Jutta’s
keynote.

258
00:16:47,400 --> 00:16:48,720
So it's

259
00:16:49,080 --> 00:16:52,800
discrimination bias
or any other type of bias.

260
00:16:53,080 --> 00:16:55,280
And here

261
00:16:55,720 --> 00:16:58,760
also looking at something
that Antonio Gambabari

262
00:16:59,160 --> 00:17:02,640
has entered in the Q&A
for the previous panel.

263
00:17:02,640 --> 00:17:04,680
But I think it's also a

264
00:17:05,760 --> 00:17:07,840
very well fit.

265
00:17:07,840 --> 00:17:12,280
It fits very well into this topic
and it brought out

266
00:17:13,760 --> 00:17:15,320
the use

267
00:17:15,960 --> 00:17:19,280
large language models, LLMs

268
00:17:19,280 --> 00:17:21,560
and which are currently

269
00:17:22,160 --> 00:17:25,240
getting a lot of traction
and a lot of spotlight.

270
00:17:25,760 --> 00:17:29,120
And and do you think these LLMs

271
00:17:30,240 --> 00:17:33,560
can open up

272
00:17:33,600 --> 00:17:39,000
new avenues, as Antonio
Gambabari was mentioning, for reducing

273
00:17:39,320 --> 00:17:42,480
the different type of bias that we see

274
00:17:43,160 --> 00:17:48,480
as a result of the use of AI trained models?

275
00:17:49,240 --> 00:17:53,280
And Shivam you want to go first, this time?

276
00:17:53,280 --> 00:17:54,680
Yeah, sure, sure.

277
00:17:54,680 --> 00:17:57,880
So this is quite a question which is

278
00:17:58,160 --> 00:18:01,760
and has been close to my heart as well,
how can we address social

279
00:18:01,760 --> 00:18:05,600
bias, in largely any model.
As part of industry

280
00:18:05,600 --> 00:18:08,960
have seen a lot of ML models trainings
how the output comes.

281
00:18:09,360 --> 00:18:11,960
So social ML model
results of data that they have

282
00:18:12,440 --> 00:18:15,920
and how the social attitudes
are represented within that model.

283
00:18:16,520 --> 00:18:20,520
And most of that available
data is used between models publicly sold,

284
00:18:21,280 --> 00:18:23,840
which continuously degree of bias
that you can see,

285
00:18:23,840 --> 00:18:27,480
because most of the data that are
generated on Internet is basically

286
00:18:28,480 --> 00:18:30,280
those people who can consume it, right?

287
00:18:30,280 --> 00:18:35,160
It it's not that everybody once would
who doesn't even know what it is.

288
00:18:35,400 --> 00:18:36,840
They cannot create data over there.

289
00:18:36,840 --> 00:18:42,040
So most of the data that is available to
train the model, it's built out of that.

290
00:18:42,040 --> 00:18:45,920
So that's how you see a bias in one way.
The other

291
00:18:45,920 --> 00:18:49,720
instance I can give an example is
you will see a lot of street violence,

292
00:18:49,720 --> 00:18:53,760
homelessness, drug overdose and,
all those things overrepresented

293
00:18:54,080 --> 00:18:58,280
in the text that discuss mental illness,
although these both are not similar,

294
00:18:58,960 --> 00:19:01,360
but you will find this kind
of representation

295
00:19:01,600 --> 00:19:03,720
in ML outputs like

296
00:19:05,040 --> 00:19:06,080
how can we address this?

297
00:19:06,080 --> 00:19:10,040
Now there
there is a novel way of human in the loop.

298
00:19:10,040 --> 00:19:14,480
Our human feedback loop on an existing
models where you can provide some feedback

299
00:19:14,480 --> 00:19:18,840
to the already existing model that this is
the sort of output is not correct.

300
00:19:18,840 --> 00:19:22,600
This can be a correct version and
this can be a good version some some human

301
00:19:23,600 --> 00:19:24,480
interface

302
00:19:24,480 --> 00:19:24,920
is needed or

303
00:19:24,920 --> 00:19:29,280
what that now the other aspect of it
is the representational

304
00:19:29,280 --> 00:19:30,320
training of ML models.

305
00:19:30,320 --> 00:19:32,320
Now, the underlying data is models.

306
00:19:32,560 --> 00:19:34,400
It's the main source of the issue here.

307
00:19:34,400 --> 00:19:37,960
So you need to correctly source your data
and at least up to date,

308
00:19:38,800 --> 00:19:41,320
you're not all representing 
one section of data.

309
00:19:41,600 --> 00:19:44,000
For example,
let's say you have a bigger society.

310
00:19:44,560 --> 00:19:48,920
This society can be underprivileged, 
overprivileged and maybe some rigid persons.

311
00:19:48,920 --> 00:19:50,960
They just account society.

312
00:19:50,960 --> 00:19:53,800
Now, you cannot just take the data
from one section of society and train

313
00:19:53,840 --> 00:19:57,680
the model and say that I can give you
a complete picture of the area.

314
00:19:58,160 --> 00:20:00,920
There's much a separate section
which are underrepresented. That's

315
00:20:00,920 --> 00:20:04,600
what is happening with all the models
right from the start of ML.

316
00:20:05,040 --> 00:20:10,720
You can see. Now what we can also do to
mitigate is you can create inclusive buckets

317
00:20:10,720 --> 00:20:13,960
where the developer of ML models
or designer of ML models

318
00:20:13,960 --> 00:20:17,080
you can give the inclusive bucket
training to them.

319
00:20:17,080 --> 00:20:17,440
You can

320
00:20:18,720 --> 00:20:19,160
get them

321
00:20:19,160 --> 00:20:22,800
aware that what is happening
and what can how we can mitigate this.

322
00:20:23,000 --> 00:20:26,680
So all the person who are included
in ML generation

323
00:20:26,720 --> 00:20:31,120
or there are a lot of you still going on,
I mean a lot of data extraction goes on.

324
00:20:31,320 --> 00:20:33,640
So all those people can be trained
for inclusiveness.

325
00:20:34,520 --> 00:20:36,920
There are multiple tools
that help us do that.

326
00:20:37,600 --> 00:20:39,760
Like if you are creating a model,
you can test in

327
00:20:39,960 --> 00:20:42,800
Google helps us in

328
00:20:43,080 --> 00:20:43,680
analyzing the

329
00:20:43,680 --> 00:20:46,320
models like Google
has a lot of tools, AI fairness,

330
00:20:46,440 --> 00:20:49,360
So how your models are performing
when talk about

331
00:20:49,360 --> 00:20:52,560
a lot of including
inclusive outputs of your data.

332
00:20:53,440 --> 00:20:55,960
Also you need to do a thorough
testing of your models

333
00:20:55,960 --> 00:21:02,040
whenever you go ahead to include that
all the outputs are properly

334
00:21:02,840 --> 00:21:06,280
aligned, properly represented,
all the sections of your model

335
00:21:06,560 --> 00:21:10,240
which it is intended to be used
should be represented well.

336
00:21:10,480 --> 00:21:14,520
Your testing should be that model
in case of any models you're creating,

337
00:21:15,000 --> 00:21:16,880
because now we're not at that stage that

338
00:21:17,840 --> 00:21:19,080
AI and ML is

339
00:21:19,080 --> 00:21:21,840
in the starting off state,
it's quite mature

340
00:21:21,840 --> 00:21:23,080
right now.
We are seeing

341
00:21:23,080 --> 00:21:26,280
a lot of breakthrough technology
so we can do this going forward.

342
00:21:26,280 --> 00:21:28,440
I guess this can be a solution.

343
00:21:30,000 --> 00:21:31,800
Okay, thank you.

344
00:21:31,800 --> 00:21:36,400
Shivam.
Shaomei, can we have your input on how

345
00:21:36,480 --> 00:21:41,200
can we address that social bias
or other types of bias?

346
00:21:41,200 --> 00:21:44,160
Yeah. So

347
00:21:44,520 --> 00:21:44,880
yeah.

348
00:21:44,880 --> 00:21:48,120
So then on these,
I want to kind of go back to

349
00:21:48,160 --> 00:21:52,400
I just kind of talk about before in

350
00:21:53,520 --> 00:21:55,640
particular on

351
00:21:56,400 --> 00:21:58,560
sensitive social

352
00:21:58,880 --> 00:22:03,400
identities, you know, 
about people

353
00:22:03,400 --> 00:22:07,080
on the photos.

354
00:22:07,080 --> 00:22:10,200
I, I don't see I kind of the

355
00:22:10,200 --> 00:22:12,480
way for the 

356
00:22:13,880 --> 00:22:16,920
a current

357
00:22:16,920 --> 00:22:19,080
and machine learning

358
00:22:21,000 --> 00:22:23,240
system to kind of accurately

359
00:22:24,320 --> 00:22:26,880
come out with those labels.

360
00:22:26,880 --> 00:22:31,800
I think the key
kind of issue here is a lot of those

361
00:22:33,760 --> 00:22:34,720
systems will

362
00:22:34,720 --> 00:22:37,560
kind of really assume these like fixed

363
00:22:37,720 --> 00:22:41,200
and definite need

364
00:22:43,800 --> 00:22:47,560
these kind of social categorizations

365
00:22:48,840 --> 00:22:53,800
such as race and gender.

366
00:22:53,800 --> 00:22:56,840
So I think maybe we should think

367
00:22:56,840 --> 00:22:59,240
be an kind of

368
00:23:00,320 --> 00:23:03,720
a machine learning systems

369
00:23:03,720 --> 00:23:05,760
and kind of find a way to

370
00:23:07,320 --> 00:23:10,000
to kind of attribute people

371
00:23:10,880 --> 00:23:13,800
respective race back fully

372
00:23:15,520 --> 00:23:17,920
through by doing this kind of like

373
00:23:17,920 --> 00:23:20,600
having to like agencies

374
00:23:21,000 --> 00:23:23,840
of those being kind of

375
00:23:25,880 --> 00:23:29,280
photographed and being

376
00:23:29,520 --> 00:23:31,040
described.

377
00:23:31,040 --> 00:23:35,480
For example, I think now a lot of people
has been kind of

378
00:23:36,560 --> 00:23:38,960
specifying their

379
00:23:39,760 --> 00:23:42,720
pronouns, for example in their

380
00:23:42,720 --> 00:23:46,360
social media bios

381
00:23:46,800 --> 00:23:49,960
and I think those kind of

382
00:23:51,000 --> 00:23:52,680
information should be

383
00:23:52,680 --> 00:23:55,720
made use of or could be kind of

384
00:23:56,040 --> 00:23:59,000
made use of one way

385
00:23:59,000 --> 00:24:01,800
of assigning on, you know,

386
00:24:01,800 --> 00:24:04,760
one way to describing

387
00:24:04,760 --> 00:24:07,200
the gender of somebody in the photo.

388
00:24:07,920 --> 00:24:11,040
And also another kind of

389
00:24:13,200 --> 00:24:17,080
interactions that we have been kind of
exploring

390
00:24:17,080 --> 00:24:20,120
is to just kind of describing

391
00:24:20,120 --> 00:24:23,120
the appearances

392
00:24:23,320 --> 00:24:25,440
instead of identities.

393
00:24:26,200 --> 00:24:28,480
For example, what kind of

394
00:24:28,480 --> 00:24:31,200
describe

395
00:24:31,200 --> 00:24:35,200
skin tones or hair style

396
00:24:35,640 --> 00:24:38,560
and outfit

397
00:24:39,600 --> 00:24:41,320
instead of

398
00:24:41,320 --> 00:24:45,480
assigning a kind of race
or gender label of somebody.

399
00:24:46,320 --> 00:24:51,240
But I don't think any of those solutions
can really address

400
00:24:51,240 --> 00:24:56,280
the kind of the real cause of the problem.

401
00:24:56,360 --> 00:25:01,200
So I don't really have a very good
answer on this

402
00:25:01,800 --> 00:25:04,960
I think maybe we should,

403
00:25:04,960 --> 00:25:08,520
you know, like maybe the alternative
is to kind of think of the way

404
00:25:08,840 --> 00:25:12,880
to come away and kind of share who we are.

405
00:25:13,480 --> 00:25:16,880
We saw so much relying on the kind of

406
00:25:21,640 --> 00:25:25,120
images like we are today.

407
00:25:25,240 --> 00:25:28,360
So, you know, like, how can we convey

408
00:25:28,880 --> 00:25:30,680
the kind of

409
00:25:31,800 --> 00:25:34,640
information that we want to share online,

410
00:25:35,080 --> 00:25:37,840
not so visual centric way.

411
00:25:38,400 --> 00:25:39,440
I think that's a kind of

412
00:25:40,520 --> 00:25:41,640
bigger

413
00:25:44,520 --> 00:25:47,320
question, the way I saw it, too.

414
00:25:47,320 --> 00:25:48,400
Thank you.

415
00:25:49,560 --> 00:25:50,640
Thank you, Shaomei.

416
00:25:50,640 --> 00:25:53,720
And Amy next to you.

417
00:25:54,640 --> 00:25:58,600
I think the prior
the prior answer is mostly covered.

418
00:25:58,600 --> 00:26:00,480
The things I was going to mention I loved.

419
00:26:00,480 --> 00:26:03,040
Shaomei’s answer about,

420
00:26:03,040 --> 00:26:06,240
you know, describing ourselves in ways
that are like figuring out ways

421
00:26:06,240 --> 00:26:10,320
that don't rely on the visual information
and giving agency to people

422
00:26:10,320 --> 00:26:14,840
to just to add their own identities
that they want to be shared.

423
00:26:15,000 --> 00:26:18,520
I will say that I think that that depends
in different contexts.

424
00:26:18,520 --> 00:26:19,680
You might want to share

425
00:26:19,680 --> 00:26:23,800
different parts of your identity
if it's important to you and you might.

426
00:26:24,240 --> 00:26:29,040
And so I think that even things
that give like end users agency

427
00:26:29,040 --> 00:26:33,720
might have a lot of subtlety and how
they would be applied in different cases.

428
00:26:34,640 --> 00:26:37,800
And I like the idea,
you know, of describing,

429
00:26:37,800 --> 00:26:39,560
you know, aspects of appearance.

430
00:26:39,560 --> 00:26:42,520
I think you're missing one
one challenge with that is

431
00:26:42,520 --> 00:26:43,920
you might be sort of trading off

432
00:26:43,920 --> 00:26:47,280
between these like aspects of appearance
that you're describing and

433
00:26:48,080 --> 00:26:51,520
and the efficiency
with which someone can like

434
00:26:51,720 --> 00:26:52,760
maybe they're not going to get

435
00:26:52,760 --> 00:26:56,280
the information as quickly as a sighted
person would perceiving that person.

436
00:26:56,760 --> 00:27:00,040
And just because,
you know, audio occurs over time.

437
00:27:00,040 --> 00:27:04,200
So so I think there's it's an extremely
difficult, difficult challenge.

438
00:27:05,000 --> 00:27:07,280
And and in some cases it can matter.

439
00:27:07,280 --> 00:27:10,920
Like I can imagine, you know,
seeing a photograph of the leadership

440
00:27:10,920 --> 00:27:14,080
of a company,
you might want to know some some quick

441
00:27:15,360 --> 00:27:18,120
details about about the demographics
of who's who's leading it.

442
00:27:18,120 --> 00:27:22,680
For instance,
one one thing that I've noticed that

443
00:27:23,240 --> 00:27:26,160
is is sort of related to
this is that, you know,

444
00:27:26,280 --> 00:27:29,640
when I'm when I am asking.

445
00:27:29,640 --> 00:27:31,200
So I sometimes,

446
00:27:31,200 --> 00:27:35,280
you know, have people describe videos
and there can be a lot of differences

447
00:27:35,280 --> 00:27:38,360
in which aspects, even if they're going
to describe the aspects of someone's appearance

448
00:27:39,080 --> 00:27:42,360
the way they describe those based on
who is in front of them

449
00:27:42,720 --> 00:27:45,880
can also differ based on biases
that people have.

450
00:27:45,880 --> 00:27:49,680
So if people see a woman,
they might describe her differently

451
00:27:49,680 --> 00:27:50,760
than they would describe a man.

452
00:27:50,760 --> 00:27:54,360
They might focus on different
aspects of of appearance.

453
00:27:54,360 --> 00:27:58,320
And so I think even things that go towards
describing aspects of appearance

454
00:27:58,320 --> 00:28:02,280
will have to be, you know, very carefully,
very carefully designed.

455
00:28:02,280 --> 00:28:05,680
And it really feels like a challenging
a challenging problem.

456
00:28:05,720 --> 00:28:08,040
Yeah.

457
00:28:08,040 --> 00:28:11,080
Thank you so much, Amy.

458
00:28:11,080 --> 00:28:14,200
Michael, any thoughts on this?

459
00:28:14,200 --> 00:28:15,680
And I would

460
00:28:16,680 --> 00:28:18,200
add something

461
00:28:18,200 --> 00:28:21,920
here, especially for you, ...

462
00:28:22,160 --> 00:28:26,600
do you see any future role
in accessibility guidelines

463
00:28:27,600 --> 00:28:29,760
in contributing to

464
00:28:31,560 --> 00:28:34,320
preventing bias in machine

465
00:28:34,320 --> 00:28:37,880
learning, generated descriptions
or whatever

466
00:28:38,400 --> 00:28:43,840
that results from these models?

467
00:28:43,840 --> 00:28:46,040
I know my answer to that question.

468
00:28:46,040 --> 00:28:49,640
It could be longer
than my prepared answers.

469
00:28:49,640 --> 00:28:53,040
So let's see where we go.

470
00:28:53,040 --> 00:28:56,760
I just want to add a couple of thoughts
to what the others have been saying.

471
00:28:57,400 --> 00:28:59,880
I want to first to categorize bias

472
00:29:00,920 --> 00:29:04,240
as we're talking
so far mainly about bias and recognition.

473
00:29:04,240 --> 00:29:08,360
You know, this... are there biases
of how machine learning recognizes

474
00:29:08,360 --> 00:29:10,680
objects, people, etc., contexts

475
00:29:12,240 --> 00:29:17,160
in that, one thing that magnifies
the challenge and accessibility context

476
00:29:17,160 --> 00:29:20,600
is that the sample size of people
with disabilities

477
00:29:20,600 --> 00:29:23,160
can be smaller in various training sets.

478
00:29:24,120 --> 00:29:25,440
And so there is a risk

479
00:29:26,600 --> 00:29:29,040
that images of people with

480
00:29:29,040 --> 00:29:34,440
disabilities on a training set or contexts
that are important for them, like wheelchair ramps

481
00:29:34,440 --> 00:29:38,400
or something will be excluded as outliers

482
00:29:38,400 --> 00:29:43,440
or will be less
well recognizable by the AI

483
00:29:43,440 --> 00:29:46,040
than, you know,
images of other people are.

484
00:29:46,760 --> 00:29:49,600
So, you know, that's just another

485
00:29:50,920 --> 00:29:55,360
another dimension to the aspects of bias
that we need to look at.

486
00:29:55,720 --> 00:30:00,200
But then we also need to look at the own
bias in the application of this.

487
00:30:00,920 --> 00:30:03,400
You know, we've talked a few times
during the session

488
00:30:03,400 --> 00:30:07,520
about the risk of relying on machine
generated

489
00:30:08,640 --> 00:30:11,320
descriptions, captions
as being good enough,

490
00:30:12,520 --> 00:30:15,520
whereas content
that has more of a mainstream

491
00:30:15,520 --> 00:30:18,440
audience might also have captions
as descriptions.

492
00:30:18,440 --> 00:30:22,680
But get more curated,
you know, quality assurance.

493
00:30:23,280 --> 00:30:27,760
So, you know,
that kind of bias could creep in and

494
00:30:28,920 --> 00:30:29,440
that can

495
00:30:29,440 --> 00:30:32,400
magnify the impact on disability bias,

496
00:30:33,360 --> 00:30:37,640
you know, because it can cause people
to be excluded from the fora

497
00:30:38,040 --> 00:30:42,400
that, often, which people are recruited
to be part of training sets, etc..

498
00:30:42,720 --> 00:30:45,960
So, you know, again, the ethical
principles from where machine learning

499
00:30:46,240 --> 00:30:47,040
speaks to that.

500
00:30:47,040 --> 00:30:50,480
And I think that we may by identifying
some content

501
00:30:50,480 --> 00:30:52,680
that we need to add to that.

502
00:30:53,960 --> 00:30:58,560
So moving on to what we can do about that,
you know,

503
00:30:58,560 --> 00:31:02,280
I do believe that it's within the scope
of the Web accessibility initiative

504
00:31:02,560 --> 00:31:05,640
or the W3C to provide guidance

505
00:31:05,640 --> 00:31:10,680
in some form
about how AI and accessibility

506
00:31:10,680 --> 00:31:13,320
should work together,
addressing many of these things.

507
00:31:14,640 --> 00:31:16,600
You know, typically this sort of thing

508
00:31:16,600 --> 00:31:19,800
would be a working group
node, which means that it's a

509
00:31:21,000 --> 00:31:23,000
it is a a formal document

510
00:31:23,520 --> 00:31:27,040
published by the W3C
that's had a certain level of review.

511
00:31:27,600 --> 00:31:30,120
There's even opportunities for versions

512
00:31:30,120 --> 00:31:32,880
that have had more review and signoff.

513
00:31:33,480 --> 00:31:35,680
So I think that's one thing
we might like to do.

514
00:31:36,320 --> 00:31:39,000
I'll also talk briefly about the work
that we're doing

515
00:31:39,000 --> 00:31:42,000
on the Web content accessibility guidelines

516
00:31:42,000 --> 00:31:45,200
3.0 sorry, the W3C accessibility

517
00:31:45,200 --> 00:31:48,520
guidelines 3

518
00:31:48,520 --> 00:31:50,600
or WCAG 3.

519
00:31:50,600 --> 00:31:54,360
We it's a it's
a substantial re-envisioning

520
00:31:54,360 --> 00:31:59,000
and it's been a core requirement

521
00:31:59,000 --> 00:32:01,680
from the beginning
that we wanted to address, you know,

522
00:32:03,000 --> 00:32:06,120
addressed equity in the approach,
in the guidelines.

523
00:32:06,120 --> 00:32:09,280
How are we going to make sure that they're
equitable to people with disabilities?

524
00:32:09,520 --> 00:32:13,120
We've been exploring that in specific ways
and within the working group,

525
00:32:13,960 --> 00:32:16,480
really unpacking that to understand,

526
00:32:16,640 --> 00:32:19,560
you know, the relationship of equity
and accessibility and bias.

527
00:32:19,560 --> 00:32:20,800
And in the other dimension.

528
00:32:20,800 --> 00:32:25,360
So that's turning, you know,
we're connecting that with other work

529
00:32:25,360 --> 00:32:31,480
W3C has been doing to make itself
a more equitable organization.

530
00:32:31,480 --> 00:32:36,120
And so, you know, this is to say
that I believe WCAG 3,

531
00:32:36,560 --> 00:32:40,120
will also have some structure
built in and support resources,

532
00:32:40,120 --> 00:32:43,920
addressing issues of bias specifically.

533
00:32:45,040 --> 00:32:47,320
Now, these are

534
00:32:47,480 --> 00:32:50,880
hopes, not promises, but you know,

535
00:32:51,480 --> 00:32:54,240
that's the direction from activities
like this.

536
00:32:56,280 --> 00:32:57,120
Thank you so much.

537
00:32:57,120 --> 00:33:01,200
And yes, those are really
some exciting avenues that we

538
00:33:01,440 --> 00:33:05,400
we hope that can come to fruition
in the near future.

539
00:33:06,360 --> 00:33:08,640
So I guess final question

540
00:33:08,640 --> 00:33:12,800
for everyone, and it is

541
00:33:13,640 --> 00:33:18,080
I would like to know a bit
about your future perspectives on the use

542
00:33:18,080 --> 00:33:23,800
of natural language processing for the field
or in the field of accessibility.

543
00:33:24,360 --> 00:33:27,200
And I'll start with you this time, Amy.

544
00:33:30,920 --> 00:33:33,680
Yeah,
So I think this is a really exciting area.

545
00:33:33,680 --> 00:33:39,200
And one thing, one one shift
I've found recently among people in NLP

546
00:33:39,240 --> 00:33:42,960
who I talked to is that, you know, as
the models are getting better

547
00:33:42,960 --> 00:33:47,880
at just creating like fluent, fluent text
that looks reasonable,

548
00:33:48,240 --> 00:33:49,120
that lot of people

549
00:33:49,120 --> 00:33:52,920
are becoming more interested in
what are the actual applications of this

550
00:33:52,920 --> 00:33:56,560
and how can we build tools
that actually support those applications

551
00:33:56,720 --> 00:33:59,440
rather than relying
on, you know, automated metrics for,

552
00:34:00,360 --> 00:34:03,520
but that might not,
you know, capture people's experiences.

553
00:34:03,520 --> 00:34:08,520
So I wanted to to note
that that's a direction I found exciting.

554
00:34:08,520 --> 00:34:12,800
So I guess a couple a couple of things
I think could be promising are

555
00:34:13,200 --> 00:34:17,600
and I've kind of mentioned them before
in my other in my other responses, But,

556
00:34:17,960 --> 00:34:23,040
you know, as gain the ability to describe
more and more about the image, I,

557
00:34:23,240 --> 00:34:28,680
I think that NLP can provide
a really good opportunity to personalize

558
00:34:29,040 --> 00:34:33,480
those descriptions based on the person
and what they want as well as the context

559
00:34:33,680 --> 00:34:34,080
there is.

560
00:34:34,080 --> 00:34:35,760
You know,
if you think about walking in a room,

561
00:34:35,760 --> 00:34:38,720
there's like so much
you could possibly describe.

562
00:34:38,720 --> 00:34:41,720
If we can make it easier for people
to get the information that they're

563
00:34:41,720 --> 00:34:46,360
looking for quickly
from their media, that that would be a

564
00:34:47,560 --> 00:34:48,840
a great improvement.

565
00:34:48,840 --> 00:34:52,320
You know, combining computer vision
to recognize things

566
00:34:52,320 --> 00:34:56,640
in the underlying image
and using something like NLP to

567
00:34:57,520 --> 00:35:01,680
to summarize that description
I think is is promising and exciting.

568
00:35:02,120 --> 00:35:04,120
And one other way
I think I'm excited about

569
00:35:04,120 --> 00:35:09,080
it is in its opportunities to maybe help
people with their own description tasks.

570
00:35:09,080 --> 00:35:12,600
So when we have humans
working on descriptions, it's really hard.

571
00:35:13,400 --> 00:35:16,360
So, you know, novices
sometimes have a hard time remembering

572
00:35:16,360 --> 00:35:18,560
and applying the guidelines that exist.

573
00:35:18,560 --> 00:35:23,280
You know, maybe we could rewrite people's
descriptions of videos to be more in line

574
00:35:23,280 --> 00:35:24,760
with how an expert would write them

575
00:35:24,760 --> 00:35:28,560
by making them more concise
or changing the grammar a bit

576
00:35:28,560 --> 00:35:32,640
so that it fits what people are expecting
from their guidelines.

577
00:35:32,640 --> 00:35:36,480
Or we might alert people
to aspects of their own descriptions

578
00:35:36,480 --> 00:35:39,920
that that might need to
that could be changed a little bit

579
00:35:39,920 --> 00:35:44,480
to perhaps reduce something like bias
that they have in the description.

580
00:35:44,480 --> 00:35:47,280
So I think there's there's really lots
of exciting opportunities

581
00:35:47,280 --> 00:35:50,680
in terms of authoring descriptions
as well as making those end descriptions

582
00:35:50,680 --> 00:35:51,640
a little bit better. Yeah.

583
00:35:53,280 --> 00:35:53,760
Great.

584
00:35:53,760 --> 00:35:54,120
Yeah.

585
00:35:54,120 --> 00:35:58,120
Thanks a lot. Shivam?

586
00:35:58,120 --> 00:35:59,800
Yeah, so

587
00:36:00,080 --> 00:36:04,200
I see it a bit of more it now
rather than earlier,

588
00:36:04,600 --> 00:36:07,040
because now the models,
the engines are quite advanced

589
00:36:08,040 --> 00:36:10,040
so I see a good context

590
00:36:10,040 --> 00:36:13,960
aware solution that gives you faster
processing some efficient data.

591
00:36:13,960 --> 00:36:14,440
Right.

592
00:36:14,440 --> 00:36:17,840
And that works on text, video
and as well as audio.

593
00:36:17,840 --> 00:36:20,400
So I see this happening to be a reality.

594
00:36:21,360 --> 00:36:23,400
A good use case
I would have been following up

595
00:36:23,400 --> 00:36:27,280
also is how we can make
the academic textbooks.

596
00:36:27,280 --> 00:36:29,920
And we have academic
assignments, right?

597
00:36:29,920 --> 00:36:33,600
There are multiple charts,
bar chart graphs, all associated data.

598
00:36:33,920 --> 00:36:38,360
If some of these models or technologies
can create better

599
00:36:39,120 --> 00:36:42,120
understanding of those things,
it would help a lot of

600
00:36:43,160 --> 00:36:46,960
people in understanding that
we have difficulty just by reading it.

601
00:36:46,960 --> 00:36:50,400
Or maybe in absence of good quality

602
00:36:50,680 --> 00:36:53,320
descriptions of these charts, bars
and all those things.

603
00:36:53,600 --> 00:36:55,840
I see this happening in the next
few years.

604
00:36:56,320 --> 00:36:58,720
A better description of the generation.

605
00:36:59,080 --> 00:37:02,400
And as a closing comment, I would say
there are different types of consumers

606
00:37:02,760 --> 00:37:05,880
of media that some can easily read
but not comprehend.

607
00:37:05,880 --> 00:37:09,440
Some comprehend easily,
but have difficulty consuming,

608
00:37:09,720 --> 00:37:11,200
consuming it visually.

609
00:37:11,200 --> 00:37:16,080
Now in that sense, the coming NLP
technologies would help designers

610
00:37:16,080 --> 00:37:19,920
have contextual description of outputs
and that I would see in simple terms.

611
00:37:20,280 --> 00:37:25,000
If you give me a simple, efficient faster
output of a media and it's correct,

612
00:37:25,000 --> 00:37:29,080
then it will be the pinnacle 
of what I see as the NLP.

613
00:37:29,360 --> 00:37:32,440
And these are for natural language
processing understanding

614
00:37:32,560 --> 00:37:37,200
as well as generation
for all key technologies.

615
00:37:37,200 --> 00:37:38,200
Thank you so much.

616
00:37:38,200 --> 00:37:40,560
It's exciting times ahead. Definitely.

617
00:37:41,400 --> 00:37:43,680
Michael, you want to share your vision.

618
00:37:46,120 --> 00:37:49,920
So based on my knowledge of

619
00:37:50,560 --> 00:37:52,800
how machine learning
in the present day works,

620
00:37:54,360 --> 00:37:56,720
you know, the tools tend to be more

621
00:37:56,720 --> 00:38:00,120
focused on specific abilities,

622
00:38:00,120 --> 00:38:02,760
which means that the context is a bit
isolated.

623
00:38:03,680 --> 00:38:09,840
So I think I'm speaking as a as a person
working the field,

624
00:38:09,840 --> 00:38:13,600
identifying a need rather than something
that may necessarily be a technological

625
00:38:15,360 --> 00:38:16,640
potential.

626
00:38:16,640 --> 00:38:20,400
But the Internet of Things used as APIs

627
00:38:20,400 --> 00:38:23,520
to exchange data
between different types of devices.

628
00:38:24,160 --> 00:38:27,640
And if we could model some structure
like that so that these tools

629
00:38:27,640 --> 00:38:32,280
could share contexts with each other
and negotiate a better group description,

630
00:38:32,520 --> 00:38:35,800
I think that that might be an opportunity
for an early

631
00:38:37,160 --> 00:38:39,960
evolution of this field.

632
00:38:39,960 --> 00:38:41,560
You know, the long term, of course,

633
00:38:41,560 --> 00:38:45,920
I think tools will emerge
with greater sense of context built in,

634
00:38:46,960 --> 00:38:49,400
but that'll probably be, you know, another

635
00:38:49,400 --> 00:38:52,400
tier slash singularity or whatever.

636
00:38:52,960 --> 00:38:56,520
So yeah, that's my view on that near term future
based on my knowledge.

637
00:38:57,800 --> 00:38:58,520
Yeah.

638
00:38:58,920 --> 00:39:01,400
Good, good suggestions too to look at

639
00:39:01,400 --> 00:39:04,760
also. And Shaomei?

640
00:39:04,760 --> 00:39:05,600
Yeah.

641
00:39:05,600 --> 00:39:08,120
So yeah. So

642
00:39:08,720 --> 00:39:10,680
looking into the future

643
00:39:10,680 --> 00:39:13,760
I can see kind of two

644
00:39:16,680 --> 00:39:22,080
areas

645
00:39:22,080 --> 00:39:26,720
that I think
will have a lot of potentials.

646
00:39:26,760 --> 00:39:28,960
And the first one it's from the

647
00:39:30,840 --> 00:39:32,480
technology

648
00:39:33,240 --> 00:39:39,400
perspective
which I agree with Michael that I can see

649
00:39:39,760 --> 00:39:42,480
a lot of gain in kind of

650
00:39:44,520 --> 00:39:47,640
incorporating the context

651
00:39:47,640 --> 00:39:50,040
surrounding photos

652
00:39:50,040 --> 00:39:53,720
and by you know like taking advantage

653
00:39:53,720 --> 00:39:56,480
of the reason and,

654
00:39:57,600 --> 00:39:59,800
and progressing

655
00:40:00,800 --> 00:40:05,680
and deep learning models
that kind of have

656
00:40:05,680 --> 00:40:08,160
what kind of math models are

657
00:40:10,080 --> 00:40:14,720
representations space
So you know like we can embed both

658
00:40:14,800 --> 00:40:18,680
the kind of image
as well as the kind of tags

659
00:40:19,680 --> 00:40:21,840
surrounding it and then

660
00:40:21,840 --> 00:40:24,040
and then add a kind of 

661
00:40:24,320 --> 00:40:27,760
metadata such as the author or the time

662
00:40:27,760 --> 00:40:29,440
when, you know,

663
00:40:29,880 --> 00:40:35,520
when the photo was taken,
all kind of posted.

664
00:40:35,520 --> 00:40:39,360
So, you know, a lot of those
can be kind of drawing in a kind of big

665
00:40:40,040 --> 00:40:43,920
represent patient space that kind of that

666
00:40:44,960 --> 00:40:47,640
provides a lot more than just kind of

667
00:40:49,040 --> 00:40:50,400
visual

668
00:40:51,520 --> 00:40:53,800
information alone.

669
00:40:53,800 --> 00:40:57,760
So I think that's a kind of big

670
00:40:57,840 --> 00:40:59,840
technology break

671
00:40:59,840 --> 00:41:03,520
through that we can see
in the kind of near-term future.

672
00:41:03,960 --> 00:41:08,960
But the kind of second thing
I think and more important to me

673
00:41:08,960 --> 00:41:12,360
is the kind of use case

674
00:41:12,920 --> 00:41:15,000
perspectives.

675
00:41:15,000 --> 00:41:18,440
I think right now
when we think about all kind of talk

676
00:41:18,440 --> 00:41:19,320
about

677
00:41:22,160 --> 00:41:24,360
the media

678
00:41:25,480 --> 00:41:27,520
accessibility

679
00:41:27,520 --> 00:41:30,920
we are mostly kind of think
about our consumption case,

680
00:41:31,200 --> 00:41:34,720
like how do I help some people
who can not see to kind of

681
00:41:35,640 --> 00:41:40,560
to kind of consume photos that posted by

682
00:41:42,000 --> 00:41:45,720
others
and mostly by kind of sighted folks.

683
00:41:45,720 --> 00:41:49,920
But I think it's equally important
but largely kind of

684
00:41:51,720 --> 00:41:54,000
overlook is,

685
00:41:54,000 --> 00:41:56,520
is these kind of media

686
00:41:57,600 --> 00:42:02,760
creation

687
00:42:03,680 --> 00:42:06,440
use cases, you know, like
how can we support

688
00:42:07,680 --> 00:42:10,000
people with visual

689
00:42:10,320 --> 00:42:12,960
impairment to kind of

690
00:42:13,600 --> 00:42:19,080
create and kind of share photos and videos

691
00:42:20,440 --> 00:42:22,800
in my own work into, you know,

692
00:42:22,960 --> 00:42:26,000
these use cases, which is why you know

693
00:42:27,240 --> 00:42:30,000
there's like such a gap in what
the kind of

694
00:42:31,800 --> 00:42:34,120
current technology can do.

695
00:42:34,120 --> 00:42:37,360
For example one like all a kind of modern

696
00:42:38,520 --> 00:42:41,720
AI models really failed

697
00:42:41,720 --> 00:42:45,120
when it came to processing photos

698
00:42:46,280 --> 00:42:52,200
taken by people
with visual impairments

699
00:42:53,320 --> 00:42:56,840
because they are
just not the same kind of photo that are used to

700
00:42:56,880 --> 00:43:00,080
train those

701
00:43:00,080 --> 00:43:01,360
models.

702
00:43:01,360 --> 00:43:06,000
So, you know, there's a huge gap in
what kind of current like the kind of

703
00:43:07,240 --> 00:43:09,080
fundamentals of

704
00:43:09,080 --> 00:43:11,760
those models and then what they can do.

705
00:43:12,000 --> 00:43:15,840
And then second is
there is a lot need for more

706
00:43:17,640 --> 00:43:20,560
personalized and

707
00:43:20,920 --> 00:43:24,600
and aesthetic needs.

708
00:43:24,920 --> 00:43:27,920
Right after I take ten

709
00:43:28,840 --> 00:43:31,960
selfies, I wanna find out why

710
00:43:32,200 --> 00:43:35,800
I wanna post that kind of share who I am

711
00:43:36,440 --> 00:43:40,320
and that it's or
it's like we cannot do at all.

712
00:43:40,320 --> 00:43:46,080
We can, you know we can kind of tell
you're like, okay, you'll have ten

713
00:43:47,120 --> 00:43:50,680
photos
and are kind of containing your face,

714
00:43:51,600 --> 00:43:53,760
but you know but, but,

715
00:43:54,400 --> 00:43:59,880
but then how like,
how can we change kind of the models

716
00:43:59,880 --> 00:44:04,400
that can really represent somebody's
space and then, you know,

717
00:44:04,440 --> 00:44:09,240
somebody’s kind of aesthetics
and I think that's another interesting

718
00:44:10,560 --> 00:44:11,840
future

719
00:44:12,280 --> 00:44:14,640
development that I want to see.

720
00:44:15,680 --> 00:44:17,840
So that's all.

721
00:44:17,840 --> 00:44:19,400
Thank you so much, Shaomei.

722
00:44:19,400 --> 00:44:23,960
And I think we
we only have 4 minutes more.

723
00:44:23,960 --> 00:44:26,800
So I won’t risk another question

724
00:44:27,520 --> 00:44:31,160
because we need to
to to end at the top of the hour.

725
00:44:31,560 --> 00:44:36,240
And so I will take the opportunity
to once again, thanks

726
00:44:36,680 --> 00:44:42,360
thank our our panelists and I hope
everyone enjoyed it as much as I did.

727
00:44:42,360 --> 00:44:43,080
And it was

728
00:44:44,280 --> 00:44:45,480
really interesting

729
00:44:45,480 --> 00:44:51,880
and very, very optimistic perspectives
so that

730
00:44:53,240 --> 00:44:56,600
we can see that's not just the, uh, the,

731
00:44:56,960 --> 00:45:00,200
um, risky, uh,

732
00:45:00,360 --> 00:45:05,400
or risk enabling, uh, outputs that A.I.

733
00:45:05,400 --> 00:45:06,040
can have.

734
00:45:06,040 --> 00:45:09,160
So it's nice to,
to have these perspectives.

735
00:45:09,160 --> 00:45:10,600
So thank you once again.

736
00:45:10,600 --> 00:45:15,720
So Shaomei, Shivam, Amy and Michael,
it was brilliant to have you here,

737
00:45:17,160 --> 00:45:19,880
and thanks

738
00:45:19,880 --> 00:45:21,200
who attended.