1 00:00:02,040 --> 00:00:04,480 The topic for for this panel 2 00:00:05,680 --> 00:00:08,800 will be computer vision for media accessibility. 3 00:00:09,040 --> 00:00:11,920 So here we aim to foster 4 00:00:11,920 --> 00:00:15,880 a discussion on the current state of computer vision techniques 5 00:00:16,000 --> 00:00:20,200 and focus on image recognition and identification 6 00:00:20,200 --> 00:00:24,840 and recognition of elements and text in web images and media. 7 00:00:25,480 --> 00:00:28,560 And considering all the different usage 8 00:00:28,800 --> 00:00:31,960 scenarios that 9 00:00:31,960 --> 00:00:34,160 that that emerge on the web. 10 00:00:35,080 --> 00:00:38,760 And so we'll be looking here at aspects 11 00:00:38,960 --> 00:00:41,560 like how can we improve quality and 12 00:00:42,920 --> 00:00:45,000 and how do we define quality for this 13 00:00:45,840 --> 00:00:49,600 quality and accuracy of current computer vision techniques 14 00:00:49,960 --> 00:00:55,360 and what are the opportunities and what are the future directions for this 15 00:00:56,560 --> 00:00:59,280 in the in this domain? So 16 00:00:59,280 --> 00:01:01,240 we'll be joined 17 00:01:01,240 --> 00:01:04,920 by three panelists for this first panel 18 00:01:05,280 --> 00:01:09,160 Amy Pavel from the University of Texas 19 00:01:09,640 --> 00:01:12,600 and Shivam Singh from mavQ 20 00:01:13,200 --> 00:01:19,400 and Michael Cooper from the W3C. 21 00:01:19,400 --> 00:01:21,800 And Okay, great. 22 00:01:23,320 --> 00:01:27,080 Everyone's online and sharing their videos. 23 00:01:27,080 --> 00:01:29,600 So thank you all for agreeing to join. 24 00:01:30,080 --> 00:01:33,880 I will ask you to before your first intervention. 25 00:01:33,880 --> 00:01:37,720 Just give a brief introduction to yourself. 26 00:01:37,720 --> 00:01:41,760 So let people know who you are and what you're doing. 27 00:01:42,400 --> 00:01:45,960 And I would like to start on 28 00:01:47,000 --> 00:01:49,280 one of the issues of quality. 29 00:01:49,280 --> 00:01:53,600 And as I was saying, so how do we define quality 30 00:01:54,280 --> 00:01:56,360 here? And here 31 00:01:57,520 --> 00:02:01,960 I was looking at aspects such as how do we 32 00:02:02,560 --> 00:02:06,800 or how can we train AI models 33 00:02:07,920 --> 00:02:11,520 that are able to identify aspects in an image 34 00:02:12,120 --> 00:02:15,800 such as identity, emotion and appearance, 35 00:02:15,800 --> 00:02:20,680 which are particular relevant for personal images. 36 00:02:20,920 --> 00:02:22,600 So how can we 37 00:02:23,760 --> 00:02:27,800 get API to do that that we humans can do? 38 00:02:27,800 --> 00:02:30,360 So and I'll start with you, Amy. 39 00:02:32,320 --> 00:02:33,200 Excellent. 40 00:02:33,200 --> 00:02:36,640 Thank you so much. So my name is Amy Pavel. 41 00:02:36,640 --> 00:02:40,040 I am an assistant professor at UT Austin 42 00:02:40,280 --> 00:02:42,920 in the computer science department, and I'm super excited to be here 43 00:02:43,440 --> 00:02:46,520 because a big part of my research is exploring how to 44 00:02:47,040 --> 00:02:49,920 create better descriptions for online media. 45 00:02:49,920 --> 00:02:54,640 And so I work everywhere from social media, like describing 46 00:02:54,640 --> 00:02:59,760 images on Twitter and as well as new forms of online media like GIFs and Means. 47 00:03:00,280 --> 00:03:02,080 And I've also worked a little bit on videos. 48 00:03:02,080 --> 00:03:07,680 So both educational videos like making the descriptions for lectures 49 00:03:08,040 --> 00:03:10,640 as well as entertainment videos. 50 00:03:10,640 --> 00:03:14,920 So improving the accessibility of user generated like YouTube videos, 51 00:03:14,920 --> 00:03:16,000 for instance. 52 00:03:16,080 --> 00:03:19,280 So I think this question you bring up is really important, 53 00:03:19,480 --> 00:03:21,600 and I typically think about it in two ways. 54 00:03:21,600 --> 00:03:25,120 So I think about what what does our computer understand 55 00:03:25,120 --> 00:03:28,560 about an image and then how do we express 56 00:03:29,200 --> 00:03:33,880 what the computer understands about an image and or other form of media? 57 00:03:34,400 --> 00:03:37,480 And so I think that we're getting better and better at 58 00:03:38,760 --> 00:03:39,400 computers 59 00:03:39,400 --> 00:03:42,400 that can understand more of the underlying image. 60 00:03:42,600 --> 00:03:46,400 For instance, we've gotten, if we think about something like emotion, 61 00:03:46,960 --> 00:03:50,880 we've gotten a lot better at determining exact landmarks on the face 62 00:03:51,040 --> 00:03:53,440 and how they how they move, for instance, 63 00:03:54,040 --> 00:03:57,400 or we might be able to describe something specific about a person 64 00:03:57,960 --> 00:04:00,520 So if you look at me in this image, 65 00:04:01,000 --> 00:04:07,000 I have brown hair tied back into a bun and a black turtleneck on and 66 00:04:07,000 --> 00:04:12,720 this is the type of thing we might be able to understand using automated systems. 67 00:04:13,200 --> 00:04:16,800 However, the second question is kind of how do we describe what we know 68 00:04:17,320 --> 00:04:18,160 about an image? 69 00:04:18,160 --> 00:04:22,680 And if I give you all of the information about my facial landmarks 70 00:04:22,680 --> 00:04:27,160 and what I'm wearing for every context, that might not be super useful. 71 00:04:27,160 --> 00:04:31,520 And so a lot of what I think about is sort of how we can best describe 72 00:04:32,320 --> 00:04:35,760 and like what people might want to know about an image 73 00:04:35,760 --> 00:04:39,640 given its context and and the background of the user. 74 00:04:40,800 --> 00:04:44,680 So just briefly on that point, I usually think about 75 00:04:44,960 --> 00:04:48,640 who is viewing this image and what might they want to get out of it 76 00:04:48,840 --> 00:04:52,560 and also who's creating it and what did they intend to communicate. 77 00:04:53,440 --> 00:04:57,160 So there's these two questions, I think give us interesting ideas about what 78 00:04:57,160 --> 00:05:02,200 data we could use to train to create better descriptions based on the context. 79 00:05:03,200 --> 00:05:07,760 So example, for example, we might use descriptions 80 00:05:07,760 --> 00:05:11,720 that are actually given by people to describe their their own images 81 00:05:11,720 --> 00:05:16,880 or their identities or aspects that they've shown in videos in the past. 82 00:05:16,880 --> 00:05:19,240 On the other hand, we might improve, 83 00:05:21,040 --> 00:05:22,960 so we might use a bunch of different methods 84 00:05:22,960 --> 00:05:28,720 and improve our ability to select a method based on the context of the image. 85 00:05:28,720 --> 00:05:32,200 So for instance, when I worked on Twitter images, we would run things 86 00:05:32,200 --> 00:05:37,040 like captioning to describe the image, like an image of a a note. 87 00:05:37,200 --> 00:05:40,840 It might just say note, but we also ran like OCR to automatically 88 00:05:40,840 --> 00:05:44,600 extract the text and tried to pick the best strategy to give people. 89 00:05:45,200 --> 00:05:49,360 You know what we thought might be the best amount of information given the image. 90 00:05:49,560 --> 00:05:50,800 So that's my initial. 91 00:05:50,800 --> 00:05:54,640 I'm sure more aspects of this will come up as as we have a conversation, 92 00:05:54,640 --> 00:05:57,480 but I just wanted to give that as my first part of my answer. 93 00:05:57,480 --> 00:05:57,720 Yeah. 94 00:05:58,840 --> 00:05:59,440 Okay. 95 00:05:59,440 --> 00:06:00,720 Thank you so much. 96 00:06:01,120 --> 00:06:05,360 So Shivam you want to go next? 97 00:06:05,360 --> 00:06:07,840 Yeah, sure. Hi everyone, I am Shivam. 98 00:06:08,120 --> 00:06:13,120 I lead the document based products at mavQ India 99 00:06:13,640 --> 00:06:16,600 and I'm super excited to be here in front of all of you. 100 00:06:17,000 --> 00:06:21,440 So the question here is how should we train models that are capable 101 00:06:21,440 --> 00:06:25,600 of identifying aspects like identity, emotion and appearance in personal images? 102 00:06:26,000 --> 00:06:28,240 So this is a two part answers. So 103 00:06:29,560 --> 00:06:33,320 I'm more in a technical background, so I would go bit of technical 104 00:06:33,400 --> 00:06:36,680 diverse figures. So preparing a data on diverse dataset. 105 00:06:36,680 --> 00:06:38,200 So that is the first point. 106 00:06:38,200 --> 00:06:42,320 So most of our available quality data, it's sourced from a publicly available 107 00:06:42,320 --> 00:06:43,480 data. That's right. 108 00:06:43,480 --> 00:06:47,400 So we can carefully plan and prepare the data before training our models 109 00:06:47,840 --> 00:06:51,720 to include the rates for peripheral peripheral data surrounding environment. 110 00:06:51,720 --> 00:06:56,560 Like in an image, there can a subject and there can be a lot of peripheral data. 111 00:06:57,000 --> 00:07:01,720 So if if we train, if we do the algorithm that take care of that data as well, 112 00:07:02,480 --> 00:07:04,960 that will be helpful in getting a better output. 113 00:07:06,160 --> 00:07:08,560 For example, you have 114 00:07:08,560 --> 00:07:11,760 like subjects gesture, it's relation with the environment 115 00:07:11,760 --> 00:07:17,800 and it's linking emotion to its external manifestation on a subject area. 116 00:07:18,280 --> 00:07:22,120 Now this will give us a more inclusive output 117 00:07:22,920 --> 00:07:26,040 if and you have a subject of the user, a person, 118 00:07:26,040 --> 00:07:29,040 then it will give you a better emotion identity in appearance. 119 00:07:29,440 --> 00:07:33,880 And there should be a thought where we can have a diverse dataset. 120 00:07:34,240 --> 00:07:38,200 Not but it's only depends on availability of data. 121 00:07:38,640 --> 00:07:42,600 Now the second part of it would be fine-tuning your model based on 122 00:07:43,240 --> 00:07:44,160 personal preferences. 123 00:07:44,160 --> 00:07:47,440 Let's say you have a better, bigger model, right? 124 00:07:47,760 --> 00:07:51,600 And you can use that as a general model and then you can fine tune that on 125 00:07:51,680 --> 00:07:57,160 based on a small little on small scale trainings and smaller datasets. 126 00:07:57,160 --> 00:08:00,040 And you can continuously find you need to get a better result. 127 00:08:00,480 --> 00:08:04,880 Now the the fine tuning is kind of a human in the loop feature 128 00:08:05,240 --> 00:08:10,360 where every time you get a data, you can expect some feedback on your data 129 00:08:10,360 --> 00:08:13,760 and then perform a better output of it. 130 00:08:13,760 --> 00:08:18,320 So that's something which is a bit of... includes 131 00:08:18,360 --> 00:08:22,760 a bit of human intervention that yeah, that's how I see. 132 00:08:22,760 --> 00:08:26,720 How can we train models? 133 00:08:26,720 --> 00:08:27,520 Hey, thank you. 134 00:08:27,520 --> 00:08:35,440 Shivam uh, Michael. So, 135 00:08:36,520 --> 00:08:37,760 Michael Cooper, I 136 00:08:37,760 --> 00:08:41,880 work with the web accessibility initiative and I'm speaking 137 00:08:42,240 --> 00:08:45,920 specifically from my role there, I'm not a machine learning professional, 138 00:08:45,920 --> 00:08:51,400 so I'm not speaking about technology so much as some considerations 139 00:08:51,400 --> 00:08:53,480 for accessibility that I'm aware of for that. 140 00:08:54,600 --> 00:08:58,360 So in terms of improving quality of descriptions, 141 00:08:59,080 --> 00:09:03,160 the other two speakers spoke about, you know, technically how we do it. 142 00:09:04,400 --> 00:09:06,920 I think we might be able to give advice on that. 143 00:09:06,920 --> 00:09:11,040 Some of what needs to be done, for instance, machine learning 144 00:09:12,120 --> 00:09:13,200 should... it’s output 145 00:09:13,200 --> 00:09:16,920 should be able to conform to the media accessibility user requirements 146 00:09:17,840 --> 00:09:19,600 and the cognitive accessibility guidance. 147 00:09:19,600 --> 00:09:23,320 For instance, as as sources of of 148 00:09:23,600 --> 00:09:26,000 of information about what would be useful to users. 149 00:09:27,200 --> 00:09:29,240 And I'm also thinking of 150 00:09:30,480 --> 00:09:33,480 machine learning more broadly in terms 151 00:09:33,480 --> 00:09:37,280 of what tools might be used in these different circumstances 152 00:09:37,280 --> 00:09:41,440 and in particular contexts as a potential assistive technology. 153 00:09:42,880 --> 00:09:45,080 And so 154 00:09:46,160 --> 00:09:48,400 the question for accessibility there is not just 155 00:09:48,400 --> 00:09:52,120 what is the description of this image, What is the description of this image 156 00:09:52,120 --> 00:09:56,920 in this page for me, for for the purpose I'm seeking. 157 00:09:57,640 --> 00:10:01,720 So tools can get context from HTML semantics, 158 00:10:02,000 --> 00:10:05,320 accessibility, semantics like ARIA and adapt to technology. 159 00:10:05,920 --> 00:10:10,840 They can also generate their own context from machine learning algorithms. 160 00:10:10,840 --> 00:10:13,400 But I think there is going to be a need 161 00:10:13,880 --> 00:10:16,040 to have a way to communicate 162 00:10:17,040 --> 00:10:20,240 user preferences to machine learning, whether that is added 163 00:10:20,240 --> 00:10:23,000 to the semantics or something and 164 00:10:25,640 --> 00:10:26,160 let's see. 165 00:10:26,160 --> 00:10:29,520 So just a couple closing notes on that. 166 00:10:31,000 --> 00:10:33,840 Users need to be involved in the design and training process. 167 00:10:33,840 --> 00:10:37,120 That's a sort of an aphorism that needs to be repeated. 168 00:10:38,560 --> 00:10:42,360 So we have to pay attention to that as we're looking at improving it. 169 00:10:43,120 --> 00:10:46,880 And I would also note that while this session is mainly focused on 170 00:10:47,960 --> 00:10:50,080 images and media, 171 00:10:50,080 --> 00:10:53,560 virtual and augmented reality has a lot of the same problems 172 00:10:53,560 --> 00:11:00,160 and solutions that we should be looking at. 173 00:11:00,160 --> 00:11:01,720 Okay, thank you. 174 00:11:01,720 --> 00:11:08,320 to the three for starting this discussion. 175 00:11:08,320 --> 00:11:14,360 One of the things that I guess it was mentioned by all of you 176 00:11:14,480 --> 00:11:17,760 and different ways, it's 177 00:11:17,760 --> 00:11:21,080 the role of the end user. 178 00:11:21,160 --> 00:11:24,200 And in fact, 179 00:11:24,200 --> 00:11:26,680 I guess both 180 00:11:26,680 --> 00:11:30,240 users were mentioned, the one that's viewing 181 00:11:30,640 --> 00:11:36,360 or acquiring the the image or the description of the image, 182 00:11:36,360 --> 00:11:41,160 but also the one that's creating or sharing the image. 183 00:11:41,680 --> 00:11:44,640 And and for that 184 00:11:44,640 --> 00:11:49,360 one, there is, the responsibility of generating a description. 185 00:11:49,720 --> 00:11:53,440 And of course, we know that most people don't do that. 186 00:11:53,920 --> 00:11:57,080 So that's why we also need this AI based 187 00:11:57,080 --> 00:11:59,440 assistance to, to, to take on that role. 188 00:12:00,160 --> 00:12:05,560 But this leads me to, to another aspect that if we have 189 00:12:06,840 --> 00:12:09,160 an AI based system that 190 00:12:09,680 --> 00:12:14,480 is capable of assisting both the content creator and the content consumer, 191 00:12:14,480 --> 00:12:19,360 and how can this impact the agency of end users? 192 00:12:19,360 --> 00:12:23,600 So will end users feel that this is no longer their responsibility 193 00:12:23,600 --> 00:12:27,520 because there’s a tool that can do this for them? 194 00:12:28,600 --> 00:12:32,000 Or if we explore this as something that 195 00:12:32,080 --> 00:12:36,120 and now looking at this from the the content producer perspective, 196 00:12:36,640 --> 00:12:41,360 if we if we see these tools as something that helps someone generating 197 00:12:41,360 --> 00:12:44,960 a description, would this 198 00:12:45,560 --> 00:12:49,520 producer just start relying on the on the 199 00:12:49,880 --> 00:12:52,560 the output from the AI and here thinking 200 00:12:52,560 --> 00:12:55,240 about what Jutta was 201 00:12:55,600 --> 00:12:58,640 introducing earlier today wouldn't 202 00:12:59,440 --> 00:13:02,480 and she mentioned this as an organizational monoculture 203 00:13:02,480 --> 00:13:05,960 but can we also think about the description monoculture 204 00:13:05,960 --> 00:13:08,800 in which all descriptions would start 205 00:13:09,560 --> 00:13:12,160 conveying the same kind of information. So 206 00:13:13,880 --> 00:13:15,760 what are your perspectives on 207 00:13:15,760 --> 00:13:20,080 this, on the impact that this has on the agency of end users? 208 00:13:20,120 --> 00:13:23,640 And I'll start with you Shivam now. 209 00:13:23,640 --> 00:13:24,760 Awesome, awesome. 210 00:13:24,760 --> 00:13:27,440 So it is a quite good of a question. 211 00:13:27,440 --> 00:13:33,000 So let's say we are basically talking about the quality of our output 212 00:13:33,000 --> 00:13:35,800 based on a user, the end user, the agency of end user. 213 00:13:36,160 --> 00:13:41,120 Now the quality of these descriptions caption depend on how end users consume it. 214 00:13:41,120 --> 00:13:44,440 For example, most of the models currently provide high level 215 00:13:44,440 --> 00:13:47,080 and grammatically correct caption in English, 216 00:13:47,560 --> 00:13:51,040 but that would not be true for captions generated in a native language of end user 217 00:13:51,200 --> 00:13:54,920 because there might not be enough dataset to train 218 00:13:54,920 --> 00:13:55,960 our model. 219 00:13:55,960 --> 00:13:59,760 Now on the premise of training restricts this diversity of generated captions 220 00:14:00,600 --> 00:14:04,600 and the use cases of what all things an AI model can comprehend. 221 00:14:04,920 --> 00:14:07,120 And then the caption 222 00:14:07,120 --> 00:14:11,320 which includes like a diverse text, like an email, a date or 223 00:14:11,360 --> 00:14:14,880 correctly explaining graphs, which has been really a very big problem 224 00:14:15,360 --> 00:14:19,400 until now and once any translational AI 225 00:14:19,520 --> 00:14:23,480 is employed, how well it becomes an input, it often takes more. 226 00:14:23,480 --> 00:14:27,400 So for example, you can have two different models, one specialized in one of them 227 00:14:27,680 --> 00:14:28,840 and one general. 228 00:14:28,840 --> 00:14:32,480 Now your general output of a model can become an input 229 00:14:32,480 --> 00:14:35,000 for any specialized model and then you can refine it. 230 00:14:35,000 --> 00:14:37,560 This is how now we are achieving it. 231 00:14:38,640 --> 00:14:39,800 That the thing is 232 00:14:39,800 --> 00:14:44,080 the caption denoting AI consumes very large amount of data to curate content. 233 00:14:44,080 --> 00:14:46,920 And then in many cases of live caption generation. 234 00:14:47,560 --> 00:14:51,400 AI should put in context the earlier events or early input as well. 235 00:14:51,400 --> 00:14:54,800 Now this is true for a context of a conversational part, 236 00:14:54,800 --> 00:15:00,000 but this can also be thought to where you have a live caption generation. 237 00:15:00,440 --> 00:15:03,960 So you have to put some context there and then you have to generate the captions. 238 00:15:04,360 --> 00:15:07,320 Now we have mature Indian's legibility, right? 239 00:15:07,560 --> 00:15:09,640 But this is more complex than a simple image to text caption 240 00:15:09,640 --> 00:15:13,800 generation, the speed, the attention, the handling of peripheral data 241 00:15:13,840 --> 00:15:15,360 is very much necessary 242 00:15:15,360 --> 00:15:17,120 and we have these great partnership interpreting 243 00:15:17,120 --> 00:15:20,280 and we are looking forward that we will have a better solution where 244 00:15:20,280 --> 00:15:25,480 end users are really satisfied with what they're getting. 245 00:15:25,480 --> 00:15:26,040 Thanks, 246 00:15:27,200 --> 00:15:28,120 Michael. 247 00:15:28,200 --> 00:15:30,440 What about the perspective from 248 00:15:32,120 --> 00:15:34,760 the end user or the agency of end users 249 00:15:34,760 --> 00:15:37,160 from your point of view from 250 00:15:37,800 --> 00:15:40,480 I guess the more the 251 00:15:41,640 --> 00:15:45,400 the Web accessibility initiative and that role in 252 00:15:45,400 --> 00:15:47,400 how how can we 253 00:15:49,240 --> 00:15:50,360 guide 254 00:15:51,240 --> 00:15:53,000 technical creators to 255 00:15:53,000 --> 00:15:55,880 to ensure that end users remain with 256 00:15:57,280 --> 00:15:59,480 autonomy to to 257 00:15:59,920 --> 00:16:02,680 when creating this kind of content. 258 00:16:05,280 --> 00:16:11,440 So you know first I would 259 00:16:12,080 --> 00:16:15,800 you know, look at you know, what are the ways in which, 260 00:16:16,640 --> 00:16:21,760 you know, machine learning generated descriptions and captions 261 00:16:21,760 --> 00:16:28,000 increase user agency and then there's ways that they decrease it as well. 262 00:16:28,000 --> 00:16:31,080 So you know, for instance, 263 00:16:31,080 --> 00:16:32,520 although 264 00:16:33,680 --> 00:16:35,800 we would prefer that authors provide 265 00:16:35,800 --> 00:16:39,720 these these features, if they don't, providing them 266 00:16:39,720 --> 00:16:43,480 via machine learning will help the user access the page 267 00:16:44,200 --> 00:16:47,960 and, you know, give them the agency they were looking for in their task. 268 00:16:49,480 --> 00:16:50,280 You know, the 269 00:16:50,280 --> 00:16:54,040 you know, the descriptions don't have to be perfect to provide that agency. 270 00:16:54,800 --> 00:16:58,600 That said, it's frustrating when they're not good enough. 271 00:16:58,600 --> 00:17:00,640 They can often mislead users 272 00:17:02,200 --> 00:17:03,720 and cause them 273 00:17:03,720 --> 00:17:07,520 to not get what they were looking for, spend time, etc. 274 00:17:08,960 --> 00:17:10,560 So, you know, that's 275 00:17:10,560 --> 00:17:13,080 a way that this can be a risk for users. 276 00:17:13,840 --> 00:17:16,640 And, you know, as you mentioned, 277 00:17:16,640 --> 00:17:21,520 there's likely to be a tendency for content developers to say, 278 00:17:21,520 --> 00:17:23,920 well, machine descriptions are there, so 279 00:17:24,920 --> 00:17:27,920 we don't need to worry about it 280 00:17:29,080 --> 00:17:30,720 now. So, you know, 281 00:17:30,720 --> 00:17:34,560 I think those are simply considerations that we 282 00:17:34,960 --> 00:17:38,080 you'll have to pay attention 283 00:17:38,080 --> 00:17:40,960 to in our advocacy 284 00:17:41,880 --> 00:17:43,960 in education work in the field 285 00:17:44,560 --> 00:17:47,920 also in documenting 286 00:17:49,080 --> 00:17:51,360 best practices for machine learning. 287 00:17:51,920 --> 00:17:56,920 For instance, the W3C has a publication called Ethical Principles 288 00:17:57,280 --> 00:18:01,720 for Web Machine Learning that, you know, you know, talk about 289 00:18:01,880 --> 00:18:05,320 they address accessibility considerations, among others. 290 00:18:06,320 --> 00:18:07,560 And, you know, it's 291 00:18:07,560 --> 00:18:12,000 possible that, you know, the industry might want 292 00:18:12,040 --> 00:18:17,600 a documented set of ethical principles or code of contact 293 00:18:18,280 --> 00:18:22,040 conduct that industry organizations sign on to saying here's 294 00:18:23,200 --> 00:18:26,600 here's accessibility ethics in machine learning that the 295 00:18:26,720 --> 00:18:30,080 you know, in addition to other ethics we are paying attention to. 296 00:18:30,360 --> 00:18:34,880 So those could be ways that we can support the growth of user agency in the end, 297 00:18:34,880 --> 00:18:39,360 the end of this, yeah. 298 00:18:39,360 --> 00:18:40,200 Thanks. 299 00:18:40,200 --> 00:18:44,560 Thank you for that perspective and for raising awareness 300 00:18:44,760 --> 00:18:46,200 to that kind of information. 301 00:18:46,200 --> 00:18:49,000 That's the WAI group is 302 00:18:50,240 --> 00:18:51,960 is making available. 303 00:18:51,960 --> 00:18:55,120 I think that's that's really important for everyone else to know. 304 00:18:55,880 --> 00:18:59,240 So, Amy, what's your take on this, 305 00:18:59,760 --> 00:19:04,360 the impact that these tools can have on the agency of end users? 306 00:19:05,280 --> 00:19:05,600 Yeah. 307 00:19:05,600 --> 00:19:10,360 So I might answer this briefly from the sort of content creator side. 308 00:19:10,360 --> 00:19:12,760 So say you are out to make a description. 309 00:19:12,760 --> 00:19:14,440 How could we use A.I. 310 00:19:14,440 --> 00:19:16,560 to improve the description? 311 00:19:16,600 --> 00:19:19,720 Improve your quality of descriptions and the efficiency 312 00:19:20,120 --> 00:19:22,440 rather than sacrificing one for the other? 313 00:19:22,960 --> 00:19:24,520 So one. I'll start with what? 314 00:19:24,520 --> 00:19:25,760 Like I worked on tools 315 00:19:25,760 --> 00:19:29,520 a lot in this space, and so I'll kind start with what hasn't worked in the past 316 00:19:29,720 --> 00:19:32,920 and then share some possibilities on things that work a little bit better. 317 00:19:33,760 --> 00:19:36,880 So one thing that I've worked on for quite a while 318 00:19:36,880 --> 00:19:40,960 has been creating user generated descriptions of videos. 319 00:19:41,680 --> 00:19:46,800 Video descriptions currently appear mostly in highly produced TV and film, 320 00:19:46,800 --> 00:19:48,760 and they're quite difficult 321 00:19:48,760 --> 00:19:51,080 to produce yourself because they're sort of an art form. 322 00:19:51,080 --> 00:19:54,640 You have to fit these descriptions within the dialog. 323 00:19:54,760 --> 00:19:56,720 They're they're really hard to make. 324 00:19:56,720 --> 00:20:00,080 So one thing we worked on was some tools to make it easier 325 00:20:00,080 --> 00:20:04,160 for people to create video descriptions by using A.I.. 326 00:20:04,560 --> 00:20:08,960 So what didn't work was automatically generating these descriptions. 327 00:20:09,160 --> 00:20:12,520 The descriptions were often uninteresting and they didn't 328 00:20:12,520 --> 00:20:15,520 provide quite the depth of... that 329 00:20:15,520 --> 00:20:18,520 the original content creator had included in their visual, 330 00:20:18,640 --> 00:20:20,160 in the visual information in the scene. 331 00:20:20,160 --> 00:20:21,680 So if the scene was really simple, 332 00:20:21,680 --> 00:20:23,680 like just a house and a tree, sure it might get it. 333 00:20:24,240 --> 00:20:29,280 But if it was something that was domain specific or had something 334 00:20:29,280 --> 00:20:32,280 extra to it that you might want to share, it was completely missing. 335 00:20:32,520 --> 00:20:34,520 And so one thing we looked at is how we could 336 00:20:34,520 --> 00:20:38,840 identify areas where people could add descriptions or silences or 337 00:20:38,840 --> 00:20:42,120 how we could identify things that weren't already described in the narration. 338 00:20:42,280 --> 00:20:46,600 So at this point, the narration of the video talks about 339 00:20:47,360 --> 00:20:50,320 is talking about something completely unrelated to the visual content. 340 00:20:50,440 --> 00:20:52,800 So people might be missing out on that visual content. 341 00:20:53,040 --> 00:20:56,040 So rather than trying to like, automatically generate descriptions, 342 00:20:56,040 --> 00:21:00,840 I think promising approach can be to identify places 343 00:21:00,840 --> 00:21:05,800 where people could put in descriptions or if they write a description, identify 344 00:21:06,040 --> 00:21:08,840 parts of the image that that description doesn't cover yet. 345 00:21:09,040 --> 00:21:12,280 So I think there's kind of some cool opportunities to use 346 00:21:12,400 --> 00:21:17,040 AI in kind of unexpected ways to help people create better descriptions. 347 00:21:17,040 --> 00:21:19,840 And then I'll briefly address the end user part. 348 00:21:20,960 --> 00:21:22,880 You know, if if the user's lacking. 349 00:21:22,880 --> 00:21:26,080 And so the person using the captions or the descriptions, 350 00:21:26,080 --> 00:21:30,040 if they're lacking information that can decrease their ability 351 00:21:30,040 --> 00:21:32,880 to have agency and responding to that information. 352 00:21:32,880 --> 00:21:33,280 Right. 353 00:21:33,280 --> 00:21:35,400 But if you give them all of the information 354 00:21:35,400 --> 00:21:39,080 you in one big piece of alt text, then you might not be giving people 355 00:21:39,080 --> 00:21:41,360 much agency over what they're what they're hearing. 356 00:21:41,360 --> 00:21:43,440 You're probably not matching with the cognitive 357 00:21:43,440 --> 00:21:46,480 accessibility guidelines that Michael... Michael mentioned. 358 00:21:47,200 --> 00:21:50,200 And so I've experimented with some ways to try to like 359 00:21:51,240 --> 00:21:55,840 maybe help people use get agency over automated descriptions. 360 00:21:55,840 --> 00:22:00,120 The one thing we've played with a little bit is, you know, asking 361 00:22:00,640 --> 00:22:03,200 basically alerting people to the fact that there's a mismatch 362 00:22:03,200 --> 00:22:04,720 between the audio and visual. 363 00:22:04,720 --> 00:22:07,040 For instance, in listening to a lecture, 364 00:22:07,040 --> 00:22:11,080 hey, the lecturer hasn't talked about this piece of text that's on their slide. 365 00:22:11,480 --> 00:22:12,640 Would you like to hear more about it? 366 00:22:12,640 --> 00:22:15,720 And then people can optionally hear a little bit more about it. 367 00:22:15,880 --> 00:22:18,840 And that's, you know, something like OCR, which automatically detects 368 00:22:18,840 --> 00:22:20,120 text, works quite well. 369 00:22:20,120 --> 00:22:23,280 So I think there's these opportunities that you don't want to overwhelm people 370 00:22:23,280 --> 00:22:25,800 with information when they're doing a task that's not related. 371 00:22:25,800 --> 00:22:27,440 But there are some cool opportunities, 372 00:22:27,440 --> 00:22:31,240 I think, to like give people control over when they get more information. 373 00:22:31,240 --> 00:22:36,520 Yeah, Okay. 374 00:22:37,240 --> 00:22:39,680 Just and thanks for that, Amy. Also, 375 00:22:41,320 --> 00:22:44,200 just before moving to the next question 376 00:22:44,200 --> 00:22:47,880 that I had here, Matt Campbell 377 00:22:49,120 --> 00:22:51,680 asked a follow up question on this. 378 00:22:52,240 --> 00:22:57,480 So and it's about what you just mentioned, Michael So you mentioned 379 00:22:57,480 --> 00:23:01,640 that descriptions not being good enough are a risk for user agency. 380 00:23:01,920 --> 00:23:06,240 And what Matt's inquiring is how much can this be 381 00:23:06,240 --> 00:23:10,960 mitigated by just tagging the descriptions as automatically generated? 382 00:23:10,960 --> 00:23:13,720 So to 383 00:23:15,480 --> 00:23:18,160 give a perspective on this and also Amy 384 00:23:18,160 --> 00:23:22,120 if you if you want to make. 385 00:23:22,120 --> 00:23:24,280 A try to give a quick answer. 386 00:23:25,360 --> 00:23:30,520 So is the the ARIA technology, accessible rich Internet applications 387 00:23:30,520 --> 00:23:37,040 technology, enhances HTML with the ability to point to a description 388 00:23:37,040 --> 00:23:40,880 elsewhere in the HTML document rather than providing a simple alt text 389 00:23:41,200 --> 00:23:45,080 And that gives you the rich HTML capability. 390 00:23:45,680 --> 00:23:49,840 So so we have that now in terms of identifying it as a machine 391 00:23:50,320 --> 00:23:53,840 generated description, we don't have a semantic for that, 392 00:23:53,840 --> 00:23:57,240 but you know, that's the sort of thing that would get added to ARIA 393 00:23:57,240 --> 00:24:01,640 if the use cases were emerging. 394 00:24:01,640 --> 00:24:02,080 Yeah. 395 00:24:02,080 --> 00:24:06,280 So I will also I'm happy to also answer this question. 396 00:24:06,280 --> 00:24:09,920 Well maybe I was looking at maps other question which is kind of related. 397 00:24:09,920 --> 00:24:10,600 I think so. 398 00:24:10,600 --> 00:24:14,240 Are there other alternatives that are richer than alt text alone? 399 00:24:15,000 --> 00:24:17,160 One thing we've looked at a little bit for, 400 00:24:18,000 --> 00:24:21,240 I've worked a little bit on the accessibility of complex scientific 401 00:24:21,240 --> 00:24:25,400 images, and what you end up with are these like complex multipart diagrams 402 00:24:25,400 --> 00:24:28,240 that if you try to describe in like one single, 403 00:24:29,080 --> 00:24:31,760 you know, alt text field, it performs quite badly. 404 00:24:31,760 --> 00:24:35,200 So we're kind of starting to see like, oh, could we automatically 405 00:24:35,800 --> 00:24:39,080 break that big piece of alt text down into a hierarchy 406 00:24:39,480 --> 00:24:42,160 to match the image so that maybe people can more flexibly 407 00:24:42,960 --> 00:24:45,640 explore like they would basically an HTML version 408 00:24:46,360 --> 00:24:49,640 that sort of captures the structure of the image that people could explore. 409 00:24:49,640 --> 00:24:50,760 So kind of trying 410 00:24:50,760 --> 00:24:54,400 to think about some other ways to present all the information that currently gets 411 00:24:54,400 --> 00:24:58,360 relegated sometimes to a single alt text into something that's a little more rich. 412 00:24:58,360 --> 00:25:05,680 Yeah. 413 00:25:05,680 --> 00:25:08,440 Carlos, you're on mute. Sorry. Thanks. 414 00:25:09,760 --> 00:25:12,080 Uh, and 415 00:25:12,080 --> 00:25:15,280 what I was saying is that since we have been coming 416 00:25:15,280 --> 00:25:20,200 always around to the topic of or to the concept of quality, 417 00:25:21,040 --> 00:25:24,040 also, when questioned by Mark, 418 00:25:24,400 --> 00:25:26,440 Mark Urban I think it's 419 00:25:27,120 --> 00:25:29,760 it would be rather interesting to know 420 00:25:29,760 --> 00:25:32,560 what's your take on this. So 421 00:25:33,080 --> 00:25:38,440 is there a documented metric that measures the quality of an image description? 422 00:25:38,800 --> 00:25:41,680 And if if there is so, what? 423 00:25:42,280 --> 00:25:48,160 What would be the most important priorities for defining quality? 424 00:25:49,080 --> 00:25:52,760 Amy, you want to go first? 425 00:25:52,760 --> 00:25:55,480 This is a hard question for me because I think the answer is no. 426 00:25:55,960 --> 00:25:59,680 But it's really it's a really good it's a really good question 427 00:25:59,680 --> 00:26:03,520 and something that we constantly sort of battle with. 428 00:26:04,120 --> 00:26:07,760 So we kind of abuse in our work, you know, a four point description 429 00:26:07,760 --> 00:26:11,720 that's like no description, like literally nothing, you know, one that's like 430 00:26:11,720 --> 00:26:14,800 there's something in the description field, but it's in no way related. 431 00:26:15,400 --> 00:26:19,000 There is something related to the image, but it's missing some key points. 432 00:26:19,000 --> 00:26:22,000 And this covers most of the key points in the image and we've kind of been 433 00:26:22,000 --> 00:26:27,480 using this, but the what those values mean depends a lot on the domain and what 434 00:26:28,640 --> 00:26:30,840 what task the person is using the image for. 435 00:26:30,840 --> 00:26:33,960 But it's been like... you know we've we've used this in a couple of papers 436 00:26:33,960 --> 00:26:38,120 and it's just been like a way for us to, you know, make progress on this problem. 437 00:26:38,120 --> 00:26:41,600 And we've also tried to for each domain we're working in, kind of tried to inform 438 00:26:41,600 --> 00:26:44,640 it based on existing guidelines as well as like, you know, 439 00:26:44,640 --> 00:26:48,400 the literally the existing W3C guidelines as well as like 440 00:26:48,400 --> 00:26:51,720 what users have told us about specific to that domain. 441 00:26:51,880 --> 00:26:53,200 But I don't know of a good one. 442 00:26:53,200 --> 00:26:54,600 And that's something that like 443 00:26:54,600 --> 00:26:58,360 we just sort of worked around, but I think it would be great to have more 444 00:26:58,600 --> 00:27:00,520 efforts on that in the future. 445 00:27:00,520 --> 00:27:05,320 Yeah, definitely something that's been more qualitative than quantitative. 446 00:27:05,320 --> 00:27:06,040 Definitely. 447 00:27:06,040 --> 00:27:08,720 That's that's what you just described. 448 00:27:08,720 --> 00:27:10,000 It's a good way to start. 449 00:27:10,000 --> 00:27:14,280 So Shivam, in your take on the quality of image description. 450 00:27:15,160 --> 00:27:19,360 I shure, so I guess in when we come to industry 451 00:27:19,760 --> 00:27:22,440 set up, right, we have certain evaluation tools. 452 00:27:23,360 --> 00:27:26,840 We evaluate our models as well as some of the outputs, there’s 453 00:27:26,840 --> 00:27:28,760 a rigorous testing that goes on. 454 00:27:28,760 --> 00:27:32,200 But there's no set of metrics that we have. 455 00:27:32,640 --> 00:27:36,440 But certainly we have some rules, we have the W3C guidelines, we have 456 00:27:36,920 --> 00:27:39,720 we have some other guidelines as well that that are in place. 457 00:27:40,440 --> 00:27:41,480 There are not set rules. 458 00:27:41,480 --> 00:27:47,120 But yeah, we have those as a yardstick and we can really test based on that only. 459 00:27:47,400 --> 00:27:49,400 So there can be some work done with there. 460 00:27:49,400 --> 00:27:54,880 But yeah, certainly this is what we have currently. 461 00:27:54,880 --> 00:27:56,680 Okay. Okay. Yeah. And Michael 462 00:27:57,680 --> 00:28:01,000 hey, Amy just mentioned in her answer. 463 00:28:01,400 --> 00:28:06,040 Looking also at the definitions that the W3C provided... they’re provided. 464 00:28:06,040 --> 00:28:08,200 So do you want to add something on? 465 00:28:08,200 --> 00:28:13,240 How can we measure quality of image descriptions? 466 00:28:13,240 --> 00:28:16,440 The only thing I would really add to what she said is so. 467 00:28:16,560 --> 00:28:19,960 So we produce resources like understanding WCAG, 468 00:28:21,080 --> 00:28:23,440 understanding the web content accessibility guidelines, which 469 00:28:24,440 --> 00:28:28,280 go into when you're writing image descriptions, what are the considerations? 470 00:28:28,280 --> 00:28:30,640 How would you make a good one? 471 00:28:30,720 --> 00:28:33,320 And one of the big challenges I think for machine 472 00:28:33,320 --> 00:28:37,480 learning in particular is that the quality, 473 00:28:38,120 --> 00:28:41,880 the appropriate description for an image will depend very much on its context. 474 00:28:42,360 --> 00:28:45,400 We describe several different contexts in the guide, 475 00:28:45,400 --> 00:28:49,520 in the support materials, and yeah, those are the 476 00:28:49,720 --> 00:28:53,320 the right description for one is the wrong one for another. 477 00:28:53,320 --> 00:28:56,440 So sorting that out I think is going to be one of the big challenges 478 00:28:56,680 --> 00:29:00,280 beyond what others have said. 479 00:29:00,280 --> 00:29:01,480 Yeah, definitely. 480 00:29:01,480 --> 00:29:05,840 I have to agree with you that apparently we're losing 481 00:29:06,400 --> 00:29:09,960 Shivam intermittently, but okay, he is back on that. 482 00:29:10,520 --> 00:29:10,840 Okay. 483 00:29:10,840 --> 00:29:14,800 And I'm going to combine two questions that 484 00:29:14,800 --> 00:29:17,480 that we have here in the Q&A. 485 00:29:18,240 --> 00:29:21,160 The one from Jan Benjamin 486 00:29:21,160 --> 00:29:23,440 and the other one from Wilco Fiers. 487 00:29:23,440 --> 00:29:26,480 So this is most it's more about 488 00:29:28,800 --> 00:29:30,760 qualifying images 489 00:29:30,760 --> 00:29:34,280 than really generating descriptions for, for the image. 490 00:29:34,680 --> 00:29:36,760 So Jan asks if 491 00:29:38,320 --> 00:29:40,880 can AI differentiate between, 492 00:29:41,320 --> 00:29:44,080 for example, functional and decorative images 493 00:29:44,080 --> 00:29:47,680 instead of generating a description, just differentiating between 494 00:29:49,160 --> 00:29:51,800 an image that needs a description and one that doesn't? 495 00:29:51,800 --> 00:29:54,400 And Wilco asks if 496 00:29:55,400 --> 00:29:59,680 if it's viable to spot images where automated captions 497 00:29:59,920 --> 00:30:04,920 will likely be sufficient so that content authors can focus on those 498 00:30:04,960 --> 00:30:07,400 and leave the AI to 499 00:30:07,960 --> 00:30:11,520 to to caption to describe others that might be easier for them. 500 00:30:11,520 --> 00:30:15,520 So, Amy, would you go first? 501 00:30:16,440 --> 00:30:17,760 Sure. Yeah. 502 00:30:17,760 --> 00:30:19,880 So I love both of these questions. 503 00:30:20,400 --> 00:30:24,040 So I would say to Jen's question, I don't think, 504 00:30:24,040 --> 00:30:27,640 you know, I guess when when the question is, can AI do this? 505 00:30:27,880 --> 00:30:32,320 You know, I have we've tried this a little bit, four slide presentations. 506 00:30:32,320 --> 00:30:33,640 And the answer is yes. 507 00:30:33,640 --> 00:30:35,640 To some extent it's going to fail some places. 508 00:30:35,640 --> 00:30:38,200 But just to give you kind of an idea of how, 509 00:30:38,760 --> 00:30:42,440 you know, AI could maybe help detect decorative 510 00:30:42,440 --> 00:30:44,680 from non decorative from more informative images 511 00:30:44,800 --> 00:30:47,720 like in the context of a slide presentation is you know, 512 00:30:47,720 --> 00:30:51,520 informative images might be more complex, they might be more related 513 00:30:51,520 --> 00:30:56,080 to the content on the rest of the slide and in the narration. Informative images, 514 00:30:56,720 --> 00:31:00,560 they might be larger on the screen, whereas decorative images and slides 515 00:31:01,000 --> 00:31:04,720 might be like, you know, like little decorations on the side. 516 00:31:04,720 --> 00:31:08,680 They might be logos or, or like emojis or less 517 00:31:08,680 --> 00:31:12,240 related to less related to the content on the on the screen. 518 00:31:12,240 --> 00:31:13,360 So what we found out is 519 00:31:13,360 --> 00:31:17,160 we can do a decent job at this, but it will fail in some cases. 520 00:31:17,160 --> 00:31:17,720 Always like, 521 00:31:17,720 --> 00:31:21,320 you know, maybe an image is included, but there's no other information about it. 522 00:31:21,680 --> 00:31:23,720 And and so it's it's tricky. 523 00:31:23,720 --> 00:31:27,080 I think in doing this, you would want to be overly inclusive 524 00:31:27,080 --> 00:31:29,440 of the images that you identified as informative. 525 00:31:30,280 --> 00:31:33,760 So so that maybe you could help content authors make sure that they at least 526 00:31:33,760 --> 00:31:35,160 review most of the images. 527 00:31:36,400 --> 00:31:37,120 And then I would say 528 00:31:37,120 --> 00:31:41,040 to Wilco, I yeah, I think that's a great idea. 529 00:31:41,040 --> 00:31:43,040 We've tried it a little bit on Twitter. 530 00:31:43,040 --> 00:31:46,120 So one time we ran basically a bunch of different 531 00:31:47,280 --> 00:31:49,360 AI methods to try to describe images on Twitter. 532 00:31:49,880 --> 00:31:53,680 And so for each image we would try to run captioning OCR 533 00:31:54,360 --> 00:31:57,800 we did like this URL tracing to see if we could find a caption elsewhere 534 00:31:57,800 --> 00:32:02,480 on the web and basically if all of those had like low confidence or 535 00:32:03,200 --> 00:32:06,760 or they didn't return anything, then we kind of automatically sent 536 00:32:07,000 --> 00:32:11,800 the image for it to get more human written descriptions and another thing 537 00:32:11,800 --> 00:32:15,760 we explored with is like users optionally like retrieving that description. 538 00:32:15,760 --> 00:32:16,840 So I think it's possible. 539 00:32:16,840 --> 00:32:19,960 I think that the like the subtleties, there's subtleties there 540 00:32:19,960 --> 00:32:21,680 that would be really difficult to do automatically. 541 00:32:21,680 --> 00:32:25,360 But, but at least that was the way, given how many images were on Twitter without 542 00:32:25,680 --> 00:32:28,360 description, that was sort of a way for us to filter out the ones 543 00:32:28,360 --> 00:32:31,360 where we definitely needed to get more information from a human. 544 00:32:31,360 --> 00:32:33,760 Yeah, great. 545 00:32:34,080 --> 00:32:37,800 Thanks for sharing those experiences. Shivam... 546 00:32:39,800 --> 00:32:41,600 Yeah, I guess 547 00:32:41,600 --> 00:32:44,760 I have had an encounter with this scenario where 548 00:32:45,920 --> 00:32:48,040 I had to get descriptions of images that 549 00:32:48,040 --> 00:32:51,760 most likely not get very sufficient on a machine description. 550 00:32:52,080 --> 00:32:55,680 So there are always there are tools that can do that for you 551 00:32:55,680 --> 00:32:57,440 there on websites. 552 00:32:57,440 --> 00:32:59,960 I think there are multiple plugins that you can use. 553 00:33:00,360 --> 00:33:04,320 You can get certain descriptions and people can put certain human descriptions 554 00:33:04,320 --> 00:33:05,080 out there 555 00:33:05,920 --> 00:33:09,040 to mark them, to spot them in a scalable manner. 556 00:33:09,040 --> 00:33:12,080 It sometimes doesn't become scalable and that's the whole issue. 557 00:33:12,320 --> 00:33:13,280 You can have a tool. 558 00:33:13,280 --> 00:33:17,120 It might not be scalable for every user out there, every website out there. 559 00:33:17,120 --> 00:33:19,600 So this can be done. 560 00:33:19,600 --> 00:33:23,400 But yeah, again, there are some things that where it can used it can't. 561 00:33:24,400 --> 00:33:27,520 So there's certainly this technology is the answer. How to scale 562 00:33:27,520 --> 00:33:30,760 it is the question. 563 00:33:31,360 --> 00:33:32,560 Great, thanks Shivam. 564 00:33:34,120 --> 00:33:36,480 Michael do you have any input on this. 565 00:33:37,280 --> 00:33:38,920 No not on this one. Yeah. 566 00:33:38,920 --> 00:33:40,000 Okay. 567 00:33:40,000 --> 00:33:46,440 Um, that, that takes me back to one question that I had here. 568 00:33:46,440 --> 00:33:50,000 Uh, and, uh, I think these opportunities 569 00:33:50,320 --> 00:33:52,840 to go back there and I will start with you, Michael. 570 00:33:53,400 --> 00:33:55,520 Uh, it's, uh, 571 00:33:56,080 --> 00:33:59,400 going in a different direction from what we have been going so far, 572 00:33:59,800 --> 00:34:02,000 but How do you think that 573 00:34:03,000 --> 00:34:05,200 we need to deal with 574 00:34:05,440 --> 00:34:08,920 legal copyright and responsibility issues 575 00:34:08,920 --> 00:34:11,720 when generating descriptions 576 00:34:12,120 --> 00:34:16,560 with AI-based models? 577 00:34:16,840 --> 00:34:21,560 How do we tackle that? 578 00:34:21,560 --> 00:34:22,520 Yeah. 579 00:34:22,800 --> 00:34:23,200 Okay. 580 00:34:23,200 --> 00:34:26,440 So, you know, also, we're not speaking 581 00:34:26,440 --> 00:34:29,320 as a legal professional, but the issues that I know 582 00:34:30,880 --> 00:34:36,160 in general, at least for accessibility, there is often a fair use. 583 00:34:36,160 --> 00:34:38,080 The right to transform content. 584 00:34:38,080 --> 00:34:43,000 But to circle back to that, 585 00:34:43,000 --> 00:34:47,080 you know, so you know our priority but that's my first answer. 586 00:34:47,080 --> 00:34:49,080 But then there are issues around accuracy. 587 00:34:50,800 --> 00:34:52,600 So, you know, if 588 00:34:52,600 --> 00:34:55,520 a machine has generated a caption 589 00:34:55,640 --> 00:34:59,480 or description, you know, how accurate is that description? 590 00:34:59,520 --> 00:35:01,840 Who knows how accurate it is? 591 00:35:01,840 --> 00:35:05,920 You know, and also publishing it, especially with potential inaccuracies, 592 00:35:06,320 --> 00:35:08,760 can bring on, you know, liability consequences, 593 00:35:09,040 --> 00:35:12,040 even if very useful as otherwise. 594 00:35:12,040 --> 00:35:15,160 Allowing that publication is. 595 00:35:15,160 --> 00:35:18,320 So another challenge is 596 00:35:19,520 --> 00:35:21,080 meeting requirements. 597 00:35:21,080 --> 00:35:26,280 You know, if the accuracy is high, pretty high, but still not quite right. 598 00:35:26,280 --> 00:35:29,600 If it's a legal document, it might not be sufficient. 599 00:35:29,880 --> 00:35:34,360 So either, depending on the accuracy of these kinds of descriptions, 600 00:35:34,360 --> 00:35:35,560 is going to be a big, 601 00:35:35,560 --> 00:35:38,560 you know, legal challenge, I think, from a bunch of different directions. 602 00:35:39,880 --> 00:35:42,920 You know, of course there is the benefit, the reason to do it, 603 00:35:43,120 --> 00:35:46,360 and this can still be better than nothing for many users, 604 00:35:47,000 --> 00:35:50,080 you know, who get used to some of the inaccuracies. 605 00:35:50,720 --> 00:35:52,760 And it does provide scalability, 606 00:35:53,120 --> 00:35:57,000 you know, you know, given how image and video focused our web has become. 607 00:35:58,600 --> 00:35:59,880 So I would 608 00:35:59,880 --> 00:36:00,680 highlight one of 609 00:36:00,680 --> 00:36:04,520 the ethical principles from the other goal machine learning document, which is that 610 00:36:04,520 --> 00:36:08,680 it should be clear that machine, the content is machine generated 611 00:36:09,040 --> 00:36:12,320 that allows many actors to evaluate, 612 00:36:13,280 --> 00:36:17,760 evaluate it and then, you know, circling back to fair use, 613 00:36:18,400 --> 00:36:22,960 I think who is doing the generating or publishing 614 00:36:23,160 --> 00:36:27,920 of of machine learning content will probably impact that. If it's a user 615 00:36:27,920 --> 00:36:31,960 agent and assistive technology probably is covered by fair use. 616 00:36:32,800 --> 00:36:35,400 And if the content producer is doing it, 617 00:36:36,200 --> 00:36:40,920 you know, they probably are declaring fair use for themselves. 618 00:36:41,320 --> 00:36:45,400 But the responsibility for accuracy will be higher for them 619 00:36:46,000 --> 00:36:48,400 because they are now the publisher. 620 00:36:49,040 --> 00:36:52,480 And then there are, you know, third party agents of various 621 00:36:52,480 --> 00:36:54,880 sorts accessibility remediation tools, 622 00:36:56,440 --> 00:36:58,720 other other sorts 623 00:36:58,720 --> 00:37:04,640 where I assume it's a legal Wild West. 624 00:37:04,640 --> 00:37:05,720 Yeah, definitely. 625 00:37:05,720 --> 00:37:09,960 And to make it worse, I guess there are many Wild West 626 00:37:09,960 --> 00:37:14,080 because every every country, every region might have different 627 00:37:14,240 --> 00:37:15,600 legal constraints there. 628 00:37:16,600 --> 00:37:17,480 Shivam, 629 00:37:18,120 --> 00:37:19,720 any take on this? 630 00:37:19,960 --> 00:37:20,560 Yeah. 631 00:37:20,560 --> 00:37:23,960 So I have a holistic view of how technical this has been. 632 00:37:24,160 --> 00:37:27,880 This was when this is an ongoing issue with a lot of countries now. 633 00:37:28,360 --> 00:37:31,200 So you see almost all publicly available data sets, right... 634 00:37:31,680 --> 00:37:35,600 These are the data that are associated in some or other form of copyright one. 635 00:37:35,760 --> 00:37:36,200 Right. 636 00:37:36,400 --> 00:37:39,640 And although there is no frame, most of the part of what 637 00:37:40,280 --> 00:37:43,360 deals with the legality of AI generated captions, I mean, 638 00:37:43,360 --> 00:37:46,880 there is no written law with any place of what currently it might come later. 639 00:37:47,200 --> 00:37:52,280 Maybe in US first, is just so this is a complexity of some other complexity. 640 00:37:52,280 --> 00:37:55,640 Also like owning of AI generated... who would own that data, right? 641 00:37:55,640 --> 00:37:59,000 I mean, if it's a machine generated data, who would be owning the 642 00:37:59,040 --> 00:38:03,080 the industry that has built that model or the dataset that has been 643 00:38:03,640 --> 00:38:05,200 gathered from different data sources. 644 00:38:05,200 --> 00:38:07,480 Now, this is a very complex challenge. 645 00:38:07,480 --> 00:38:10,480 The other part of it is how would you fix the responsibility? 646 00:38:10,840 --> 00:38:13,600 But with that in mind, if it depends on the end user of the 647 00:38:13,600 --> 00:38:14,840 ML model, when you use that, 648 00:38:15,840 --> 00:38:17,280 in what context are you using? 649 00:38:17,280 --> 00:38:20,200 I mean, well, for example, some some of the models are used in 650 00:38:20,400 --> 00:38:21,400 Academy, right. 651 00:38:21,400 --> 00:38:24,520 I know these are just for research and development purposes. 652 00:38:24,760 --> 00:38:28,440 There is no way where you can 653 00:38:28,480 --> 00:38:31,640 fix the responsibility on an academy of an ML output. 654 00:38:31,640 --> 00:38:32,080 Right. 655 00:38:32,160 --> 00:38:34,960 So these are the this this helps in two ways 656 00:38:35,520 --> 00:38:38,200 like there is how you're sourcing the data. 657 00:38:38,520 --> 00:38:42,280 Either you have to get the figures on the data, where it is coming from. 658 00:38:42,280 --> 00:38:46,320 You, you, you gather your data based on written sources. 659 00:38:46,320 --> 00:38:49,360 You have a mutual understanding between the data generator 660 00:38:49,360 --> 00:38:51,840 creator and you, and then you train on the data. 661 00:38:51,960 --> 00:38:55,080 But that gives you a complexity where you have very small data 662 00:38:55,280 --> 00:38:57,880 and there is a large input going and training your data. 663 00:38:58,120 --> 00:39:01,640 So yeah, these are the complexity currently, but yeah, it's all depends on 664 00:39:01,640 --> 00:39:04,560 where the ML model or the output is being used 665 00:39:05,120 --> 00:39:07,160 and that's where the fair use policy comes. 666 00:39:09,560 --> 00:39:13,040 Context all the way in all scenarios, right? 667 00:39:14,160 --> 00:39:16,600 Amy? Yeah, 668 00:39:16,600 --> 00:39:21,200 So I am not as familiar with, kind of like the legal and copyright side of this, 669 00:39:21,200 --> 00:39:26,080 but I do think, you know, oftentimes I do think about like the responsibility 670 00:39:26,080 --> 00:39:28,440 aspects of the captions that we're generating, 671 00:39:28,440 --> 00:39:31,960 especially when we're doing these kind of like new forms of it 672 00:39:32,000 --> 00:39:34,840 where we're generating things for like user generated media. 673 00:39:35,000 --> 00:39:37,120 And I think this more goes back to the 674 00:39:38,120 --> 00:39:41,200 to potential harms brought up in the keynote. 675 00:39:41,480 --> 00:39:45,880 So so for instance, like I guess one thing I often am thinking about is like 676 00:39:46,280 --> 00:39:50,920 when are errors not that big of a deal and when are they a bigger deal? 677 00:39:50,920 --> 00:39:54,640 And then, you know, kind of trade looking at their risks and trade offs 678 00:39:54,640 --> 00:39:59,000 in terms of like who like who's receiving the image and who's or who's 679 00:39:59,000 --> 00:40:02,840 getting identified by the the tool and who is receiving the image. 680 00:40:03,720 --> 00:40:08,800 So, for instance, if I misidentified my shirt as dark blue 681 00:40:08,800 --> 00:40:12,600 instead of black, this error is unlikely to be as harmful to me, 682 00:40:12,920 --> 00:40:15,120 but for some people might experience 683 00:40:15,640 --> 00:40:18,400 misgendering them with image classification to be harmful. 684 00:40:18,600 --> 00:40:21,280 And so I guess two ways I've seen with dealing with this. 685 00:40:22,120 --> 00:40:26,320 You know, not to say that either of them is good right now. 686 00:40:26,640 --> 00:40:29,760 So one is like I think a lot of tools actually back off 687 00:40:29,760 --> 00:40:32,640 to saying person instead of woman or man. 688 00:40:33,280 --> 00:40:37,160 And another way that you could imagine doing it is also like describing 689 00:40:37,480 --> 00:40:41,200 physical characteristics of the person that are less subjective. 690 00:40:41,400 --> 00:40:46,120 And a final way you might imagine doing it is like take... is considering people's 691 00:40:46,120 --> 00:40:49,480 own identifications of how they would like to be described, 692 00:40:49,840 --> 00:40:51,680 and sometimes that varies in different contexts. 693 00:40:51,680 --> 00:40:54,000 So I think that's itself a hard problem. 694 00:40:54,000 --> 00:40:56,920 But yeah, I don't have much to say on the legal or copyright side. 695 00:40:56,920 --> 00:40:58,120 I just wanted to bring up that. 696 00:40:58,120 --> 00:41:00,440 That's something that's come up in my work before. Yeah. 697 00:41:01,520 --> 00:41:02,080 Okay. 698 00:41:02,120 --> 00:41:03,440 Thank you so much. 699 00:41:03,440 --> 00:41:06,400 I think we're almost at the end. 700 00:41:06,400 --> 00:41:11,960 We have less than 10 minutes, but and questions keep coming, which is great. 701 00:41:11,960 --> 00:41:16,360 So you will have the opportunity, I guess, to to try to answer somewhat, 702 00:41:16,360 --> 00:41:20,560 some of them offline if you if you wish to, But I'll still take another one. 703 00:41:20,720 --> 00:41:24,240 The last one that we have here from Antonio Gambabari, 704 00:41:24,760 --> 00:41:27,320 and I think it's 705 00:41:27,320 --> 00:41:31,640 that the question is how do you envision the challenges of explainable A.I. 706 00:41:31,640 --> 00:41:34,360 initiatives in the context of image recognition? 707 00:41:34,360 --> 00:41:34,840 Right. 708 00:41:34,880 --> 00:41:38,400 And I think this relates to several of the aspects 709 00:41:38,680 --> 00:41:42,320 that we've dealt with, with the uncertainty of images 710 00:41:42,320 --> 00:41:48,120 and how do we convey that to users even just by labeling 711 00:41:48,120 --> 00:41:52,600 something as automatically generated would be a way to convey that. 712 00:41:52,960 --> 00:41:56,200 But do you think that explainable A.I. 713 00:41:56,200 --> 00:42:00,080 initiatives have the potential to improve this kind of 714 00:42:02,520 --> 00:42:04,720 augmented context for the user? 715 00:42:04,720 --> 00:42:08,240 And where did the description came from? 716 00:42:08,680 --> 00:42:12,040 And this time, I'll start with you Shivam. 717 00:42:12,040 --> 00:42:15,400 I think yes, and it is a good point. 718 00:42:15,400 --> 00:42:18,760 Explainable AI initiative deals with how 719 00:42:19,960 --> 00:42:23,880 metadata can help the end user to know the context of 720 00:42:23,880 --> 00:42:27,680 what is being generated, any quantitative score on any of the models. 721 00:42:27,720 --> 00:42:33,040 It is supported by a lot of data that goes beyond your training data. 722 00:42:33,880 --> 00:42:37,720 There is a distinction, though, that whatever things you are getting 723 00:42:37,720 --> 00:42:41,200 an output, right, the metadata can there are multiple layers of training. 724 00:42:41,200 --> 00:42:43,320 If you look into training, there are multiple layers of training. 725 00:42:43,320 --> 00:42:46,960 So how that decision has been made by an AI, it can give you 726 00:42:46,960 --> 00:42:49,720 a certain level of metadata, but not all. 727 00:42:50,080 --> 00:42:53,800 So yeah, it can augment the user, but that won't be the complete solution. 728 00:42:53,800 --> 00:42:57,880 But that's how I see. 729 00:42:57,880 --> 00:42:58,320 Amy, 730 00:42:59,600 --> 00:43:01,120 any thoughts on this? 731 00:43:01,120 --> 00:43:03,480 Yeah, so that that's a good question. 732 00:43:03,480 --> 00:43:05,840 I don't, I don't know. 733 00:43:05,840 --> 00:43:10,000 So I think some things that I've, I've seen 734 00:43:11,040 --> 00:43:13,880 so, so one thing I would think about a little bit in this is in 735 00:43:13,960 --> 00:43:16,520 and I've had to think about before is is sort of like 736 00:43:17,040 --> 00:43:20,880 the tradeoff between receiving information efficiently 737 00:43:20,880 --> 00:43:24,400 and explaining where you got all of that information from. 738 00:43:25,120 --> 00:43:28,320 And I think both are important and I think maybe 739 00:43:29,080 --> 00:43:31,600 like I think what my experience has been is that users 740 00:43:31,600 --> 00:43:35,280 are used to certain types of errors and can recover from them quickly. 741 00:43:35,400 --> 00:43:37,560 So for instance, 742 00:43:37,600 --> 00:43:40,600 like when when a user's reviewing their own content, for example, 743 00:43:40,600 --> 00:43:45,400 they took pictures or video and they hear something described is a leash. 744 00:43:45,400 --> 00:43:47,920 I have had the experience of users being like, Oh no, that's my cane. 745 00:43:48,040 --> 00:43:50,120 Like it always calls my cane a leash. So. 746 00:43:50,120 --> 00:43:53,800 So I think in some cases, like people can get like can get used 747 00:43:53,800 --> 00:43:58,080 to identifying the errors for the, for the like known unknowns. 748 00:43:58,080 --> 00:44:00,520 So this is just like a wrong identification, I'm used to it. 749 00:44:00,640 --> 00:44:04,160 And I do think it's harder to recover from areas that are like unknown unknowns. 750 00:44:04,160 --> 00:44:07,240 You don't have any other context about it, so you're not sure what else it would be. 751 00:44:07,360 --> 00:44:11,320 And I think in maybe those those cases where users haven't identified it before, 752 00:44:11,840 --> 00:44:15,800 that that confidence information would be like extra important and so yeah, 753 00:44:15,800 --> 00:44:17,920 I'm not really sure what the answer is, but I think that like 754 00:44:18,040 --> 00:44:22,880 considering the balance between what is the what's important and to know 755 00:44:22,880 --> 00:44:26,760 more information about will will be like a tricky design question as well as 756 00:44:27,920 --> 00:44:31,160 a question for how to develop technology. 757 00:44:31,280 --> 00:44:31,880 Okay, great. 758 00:44:31,880 --> 00:44:32,440 Thanks. 759 00:44:32,440 --> 00:44:35,840 And Michael, any any input on this one? 760 00:44:36,520 --> 00:44:39,640 So I would just add to all that that, you know, 761 00:44:39,640 --> 00:44:44,440 this again, falls into the question of of ethics, transparency 762 00:44:44,440 --> 00:44:47,920 and Explainability is one of the sections of the machine learning Ethics 763 00:44:48,560 --> 00:44:51,360 is intended for several aspects of it. 764 00:44:51,480 --> 00:44:54,080 You should know how the machine learning was built. 765 00:44:54,080 --> 00:44:56,560 It should be auditable for various issues. 766 00:44:56,800 --> 00:45:00,920 These ethics are probably less specific to some of the use cases 767 00:45:00,920 --> 00:45:04,920 we're discussing in this symposium, so there might be room for adding 768 00:45:05,000 --> 00:45:08,440 to this section of the document. 769 00:45:08,440 --> 00:45:09,400 Yeah. Yeah. 770 00:45:09,400 --> 00:45:11,840 I think that might be a good idea. 771 00:45:11,840 --> 00:45:15,040 And I'll I'll take just the final one 772 00:45:16,360 --> 00:45:19,760 and I'll go back to the topic and one from Matt. 773 00:45:19,840 --> 00:45:23,120 And because it's something that we have touched upon before 774 00:45:23,680 --> 00:45:26,440 and I'll start with you Michael here, because we, 775 00:45:26,440 --> 00:45:30,400 you all were mentioning this in the scope of ARIA. 776 00:45:30,880 --> 00:45:36,640 And so it's the question about having richer alternatives to to the image 777 00:45:36,640 --> 00:45:40,600 description, to the standard alt text, which is usually short. 778 00:45:41,080 --> 00:45:44,440 And what are your thoughts on the usefulness 779 00:45:44,440 --> 00:45:47,720 of having richer descriptions 780 00:45:48,200 --> 00:45:54,640 for image alternatives? 781 00:45:54,640 --> 00:45:56,000 Oh. Let’s see 782 00:45:58,240 --> 00:45:58,960 as far as the 783 00:45:58,960 --> 00:46:01,640 general idea in terms of the usefulness of of 784 00:46:02,240 --> 00:46:08,680 of making use of richer descriptions. So 785 00:46:11,920 --> 00:46:12,880 so for very simple 786 00:46:12,880 --> 00:46:15,720 images, you know, sort of the way the web 787 00:46:16,320 --> 00:46:19,680 started, where images were largely providing small 788 00:46:19,680 --> 00:46:21,280 functional roles, you know, the alt attribute 789 00:46:21,280 --> 00:46:23,840 was probably sufficient for many of their cases. 790 00:46:23,840 --> 00:46:28,720 Images are being used in nowadays for a variety of purposes. 791 00:46:29,680 --> 00:46:33,480 You know some of them are reducible to an old like photo of my dog. 792 00:46:33,480 --> 00:46:35,760 But you know, that's not really providing the experience. 793 00:46:35,760 --> 00:46:39,560 So, you know, there's definitely 794 00:46:40,920 --> 00:46:44,040 a need for a richer alternative 795 00:46:45,840 --> 00:46:51,120 and longer alternatives, you know, ones that can have structures, 796 00:46:51,120 --> 00:46:54,440 you can skim them, you know, depending on the context, ones 797 00:46:54,440 --> 00:46:57,880 that you can provide links to the necessary bits of alternative data, 798 00:46:58,440 --> 00:46:59,640 which is... 799 00:46:59,640 --> 00:47:01,840 a question about images and charts. 800 00:47:01,840 --> 00:47:06,280 Often the description for a chart is much more structured semantically 801 00:47:06,520 --> 00:47:09,000 than one for other kinds of images, and that's 802 00:47:09,160 --> 00:47:12,640 you really want to be able to take on it, take advantage of rich text markup. So 803 00:47:13,920 --> 00:47:15,760 I believe that, 804 00:47:15,760 --> 00:47:18,840 you know, assistive technologies are supporting, 805 00:47:18,840 --> 00:47:22,360 you know, rich text descriptions whenever they're available. 806 00:47:23,360 --> 00:47:27,640 So it's a question of getting people to use them more. 807 00:47:27,640 --> 00:47:31,840 And of course, for machine learning, generally, they would rather them 808 00:47:31,840 --> 00:47:36,400 do richer rather than less rich output. 809 00:47:36,400 --> 00:47:37,240 Okay. Yeah. 810 00:47:37,240 --> 00:47:45,120 And following up on that for Shivam and for Amy, by having richer... 811 00:47:45,280 --> 00:47:48,160 richer and longer descriptions, 812 00:47:48,160 --> 00:47:52,480 are we increasing the, the the the chances that 813 00:47:52,920 --> 00:47:56,320 AI generated descriptions will mess up 814 00:47:56,800 --> 00:48:00,040 or isn't that the risk 815 00:48:00,040 --> 00:48:02,520 Who wants to start? 816 00:48:02,520 --> 00:48:06,880 Amy? Sure I think we're definitely yeah I agree 817 00:48:06,880 --> 00:48:10,360 that like oftentimes the more details 818 00:48:10,360 --> 00:48:13,840 that you get, the more 819 00:48:13,840 --> 00:48:16,080 the more opportunities there are for errors. 820 00:48:16,400 --> 00:48:19,080 I think one way that we've kind of explored this 821 00:48:19,080 --> 00:48:23,120 a little bit is seeing if we can bring for for 822 00:48:23,600 --> 00:48:27,440 like very informative images that maybe a lot of people will see. 823 00:48:27,840 --> 00:48:30,120 We've thought about how we could maybe combine 824 00:48:31,320 --> 00:48:32,280 automated tools 825 00:48:32,280 --> 00:48:35,480 with with like human written descriptions 826 00:48:35,480 --> 00:48:38,520 to hopefully make some of the descriptions better. 827 00:48:38,520 --> 00:48:42,480 So maybe automated tools could help you, like help automatically extract 828 00:48:42,480 --> 00:48:46,480 the structure of the image, and then humans could go in to write 829 00:48:47,200 --> 00:48:50,480 more detail about the parts of the images that are really unlikely 830 00:48:50,480 --> 00:48:54,520 to be fully like fully described by the computer. 831 00:48:54,520 --> 00:48:57,600 So so I think for now, the way 832 00:48:57,600 --> 00:49:00,880 I've been thinking about those more complex images is often in like, 833 00:49:00,880 --> 00:49:04,040 how are we going to help humans create descriptions 834 00:49:04,960 --> 00:49:07,240 more efficiently by while still maintaining really 835 00:49:07,240 --> 00:49:10,600 high quality rather than thinking about how to do it fully automatically? 836 00:49:10,600 --> 00:49:13,800 Just based on the images I've looked at in the past year. 837 00:49:15,240 --> 00:49:18,120 OK, thanks and Shivam any input? 838 00:49:18,840 --> 00:49:24,160 Yeah I think the inspiration behind the question would be to give a structure 839 00:49:24,160 --> 00:49:29,800 to the output of any of the old images like so it can be a structure output 840 00:49:29,960 --> 00:49:33,400 make more sense than to we have a fallback estimate right so you're 841 00:49:35,040 --> 00:49:35,680 you can 842 00:49:35,680 --> 00:49:40,800 provide more information to an output but the output would rest 843 00:49:40,840 --> 00:49:43,600 should remain actually shorter and more explainable. 844 00:49:43,920 --> 00:49:47,200 It may be grammatically more correct that would make more sense to the end user. 845 00:49:47,520 --> 00:49:50,280 And he might have one other option to explain that. 846 00:49:50,520 --> 00:49:54,600 It's not like you have a string generated out of an image, right? 847 00:49:55,600 --> 00:49:57,320 When you read out to a screen, right 848 00:49:57,320 --> 00:50:00,880 your screen reader, it should concisely read it shot briefly. 849 00:50:00,880 --> 00:50:04,680 And for more description, there should be some other excellent 850 00:50:04,680 --> 00:50:05,840 data can be supplied to it. 851 00:50:05,840 --> 00:50:08,440 And then there are multiple ways we can do this. 852 00:50:08,800 --> 00:50:14,080 But the description of an ultimate should remain concise and grammatically correct. 853 00:50:14,200 --> 00:50:16,320 So that screen readers can try to read it, 854 00:50:16,320 --> 00:50:19,080 but that's how I see it. 855 00:50:19,400 --> 00:50:20,920 Okay. Thank you so much. 856 00:50:20,920 --> 00:50:26,200 And I want to thank the three of you once more for agreeing to take part 857 00:50:26,200 --> 00:50:30,000 in this panel, also for agreeing to take part in the next panel. 858 00:50:30,480 --> 00:50:35,840 So as we can see, media accessibility, it's really a rich topic and 859 00:50:36,480 --> 00:50:38,920 definitely computer generated descriptions 860 00:50:39,280 --> 00:50:42,560 are also linked with natural language processing. 861 00:50:42,560 --> 00:50:45,440 So what that will be the topic for the next panel 862 00:50:46,040 --> 00:50:48,520 in just under 10 minutes. 863 00:50:48,520 --> 00:50:53,360 So we'll have a coffee break now and I hope everyone's enjoying 864 00:50:53,360 --> 00:51:00,840 and we'll be back at ten past the hour.