CUNY2020 virtualWorkshop 3.23.2020
1:54PM Mar 24, 2020
Shohini & William
Noah & William
Noah, Patty, William
Patricia (Patty) Reeder
Patricia & William
Patricia & Noah
Brielle Stark & Brian MacWhinney
Brian MacWhinney & William Matchin
Okay, hello, everyone. It's about 10 o'clock. So I think we should get started here. So just to start, my name is William Matchin.
This workshop started after CUNY 2020 at UMass Amherst had to be unfortunately, cancelled due to the corona virus pandemic that was occurring, but they very fortunately for all of us, I think many of you attended CUNY. They transitioned the conference to a virtual conference using zoom. And it was actually I think, fantastic. And just thanks to all the organizers for getting that going in such a short time, and it was really surprisingly good. And one of the things that was good about it was that it allowed Junior scientists who benefit greatly from having these presentations to put on their CV and network released, allow them to get some of that. Some of that opportunity But one of the things that came up during the conference just in talking informally to attendees was a concern about data collection. And the opportunities for data collection that of course, are extremely problematic are mostly halted right now due to the academic and social distancing that everyone is doing. And so there was a lot of interest in doing here. Sorry, my family's being distracted here. There's a lot of interest in getting remote or online data collection going. And I think it's actually quite fortunate that we're at this part of our careers right now, because there's actually a lot of tools available that people have been using for some time in linguistics and psycholinguistics. So I think that this actually is a good opportunity for us to learn about them and become familiar with it. And basically, you know, start using those tools as much as we can to get that going. So at any rate, so for today This is the schedule that I was able to cobble together again, just to let you all know this. Basically it was cobbled together in the last two days. So, you know, it's I'm extremely grateful to everyone who reached out to me to who offered to help or to present. And also, thank you so much to you guys for showing up and basically helping to make this worthwhile workshop. So basically we'll be doing is we'll be starting starting the presentation with a hope that Chahine about to solly is here. Chahine, if you're here, can you please chat me just to make sure that we're on schedule for today. And so Chahine is a researcher at the University of Maryland. And she has been doing this interesting work in neuro imaging. And one of the things that is very difficult right now. Now for all of us that do neuroimaging. So, you know, I am a cognitive neuroscientist, I study language in the brain and use typically fMRI lesions symptomatic and aphasia. And right now, of course, even with the remote tools that we're going to be discussing, it's going to be, you know, essentially still impossible to collect neuroimaging data. So one thing that we wanted to make sure to add was the opportunity for the basically to access publicly available datasets. And fortunately, again, like I said, there's a lot of this is becoming more and more available. And so she's been doing some pretty interesting work, and they've posted their datasets online. So her presentation will focus on discussing those data sets. Basically, what they are, they're both in fMRI and eg and how to access them and understand the data. And then following that, we're going to have a nice long, almost two hour workshop by Noah Nelson and his colleagues. at finding five. So again, you know, I do neuroimaging, I'm not really familiar with any of these tools. So it's actually one of the great opportunities of organizing workshops as you can organize it by not knowing anything yourself about what you're organizing, but getting other people that do know what they're doing to come and teach you. So Noah has been working on finding five, which is really a great platform for online linguistic and psycholinguistic experiments. And they've got a lot of nice features. And they focused a lot of making this tool accessible and easy for those of us that are not as savvy with programming or familiar with the tools to get them up and running quickly. and integrate that with Amazon Mechanical Turk, which I've shared. No, we'll talk about in detail.
So that's gonna be a great workshop. It's going to be almost two hours and it's going to be hands on so you'll have an opportunity to access their system and try to get your own experiment up and running. And so while that's happening, if you need help, Feel free to use this q&a document, I posted the beginning of a chat links to three documents online that you can edit. One is a Google document for q&a. So basically, if you have any kind of just question, I'm going to keep you all muted. And then what I'm going to do is basically go to the question document and have some wonderful volunteers that that offered to help. That's Nick Wang at UConn when rhetoric at UC Davis. And I also got some help from Lena, Carlos skya. And Jennifer Arnold, who offered to sort of monitor the q&a document and basically what we'll do is we'll look at the q&a document we see questions when the appropriate time comes. We'll basically unmute you so that you can ask your question and get that response. If you want to have a like a more private video chat and that might be particularly helpful for the finding five tutorial. You can go to that Excel spreadsheet. It's this open zoom meeting spreadsheet In that first tab, a Monday virtual workshop, basically, you can just put your name and put a link to a zoom meeting or something like Skype. Oh, yeah. Okay, so someone just asked if I can post the documents again, I'm going to send those in the chat right now.
So let's see.
And please bear with me because I'm still getting used to zoom myself. By the way, zoom is really fantastic for meetings like Skype and using Skype for all this time. And, you know, they're really, Skype is awful. The Zoom is great. So it's really a great resource, any rate, so this Excel spreadsheet assuming signups just good put your name, put a link to a zoom meeting or a Skype meeting or any other kind of
Maximum participants, does anyone know how to add more participants? I didn't realize that there was some kind of limit to the number of people that can join.
If anyone knows how to add more than 100 could you add to that chat? If you well, so if you know people that are trying to get in Oh 6000 or whatever, I just noticed that on the Google Sheets, someone don't have to create. Okay, wow, that's crazy. Okay, so apparently, apparently I have to pay more money to get more than 100 people. So consider yourselves the lucky ones who are able to log in. And I think I've got our presenters all logged in for this first session. So that's good. But this will all be recorded and posted later. Okay. So if you weren't able to make it, sorry to people that are listening in the future, but we had you in mind at this moment in our lives. Okay, so the so like I said, we'll do the finding five workshop you'll be able to log into their system and get an experiment up and running. Like I said, if you want some one on one help, feel free to post a link to your chat. So that way we can get someone to come and talk to you and give you any help that you might need. And then there's one more link to resources. That's the third document that I sent out. Basically, this is just a link to all the materials that we discussed. So there's just a list of things that you can find online. And if you have anything to add, please feel free to add to that and edit that document. That's not a problem at all. I just wanted this to be a resource that's available to all of you that you can just, you know, communicate with each other and share any resources that you might have or experience that you're willing to share. So then we're gonna have lunch around 1215 to 45. This is mostly just to give you guys a break, kind of tune out for a little bit. And then at 1245 when it come back with a short presentation by Florian Schwartz, and he's going to talk about PC ibex, which is a sort of alternative to finding five. It should be it's very similar from what I've been told in terms of how to use it, how to the programming language that's involved. But I think it offers a lot of other customizable features. And particularly he mentioned there's a kind of beta version of online eye tracking using webcams. And I was particularly interested in that, because that's hasn't been developed as much as other methods. So that could be interesting. And then we'll have some discussion by Bruce Stark, and Joshua, I forgot his last name. But basically, there'll be talking about some online databases for neuroimaging data, as well as data from a typical populations, particularly aphasia, traumatic brain injury and dementia. So for those of us that do work in those areas, there's actually these great public, publicly available sort of semi publicly available datasets, if you do work in aphasia. So, let's see. Yeah, and then following that, we're just gonna have a short panel discussion, and the idea with that panel discussion, let's see if I can go to my presentation here. Yeah, the idea of the panel was simply to point out that a lot of things that might be very useful for us right now in the short term is getting getting data from another person that you may know or not know. But there's maybe there's data that's published, but they haven't publicly released it. It's still leaves, a lot of researchers are perfectly happy to share that data with you. So we just wanted to discuss a little bit but how to navigate that. And so restart has also done that a few times already. And so she's going to be discussing some of that and if I've got some other people too, that can jump in here and discuss their experiences with just getting data from people. That could be could be helpful. Okay. With that said, I'd like to turn it over to Chahine. And so what I'm going to do is I am going to in the show, and I am going to Okay, great. We've already got that going. I think I need to unmute you
Are you unmuted Chahine?
No you're not. Okay. I will unmute you. Okay. You should be unmuted.
Yep. Hi, can everyone see my screen? Yeah, if you can't see this, I can see your screen fine. So I assume that that will work for everyone. Okay, so, um, yeah. So today I'm going to be talking about the Alice data sets.
So, these are, we just want to point out, these are two kind of newly released data sets that you know, you can use if you're right now working from home or unable to collect any kind of new data for your experiments. So these are two parallel nationalistic eg and fMRI data sets, and it's based on chapter one of Alice's Adventures in Wonderland. So as you can see in the figure, so they all the participants heard the same chapter so we have the time point of every word in the story. And depending on the modality fMRI or EEG, we measure both signals or the electrophysiological signal. And overall, in this chapter, there were 2129 words, and 84 sentences, which are on an average 20, you know, 25 ish words long. And overall, the stimuli is about 12 and a half minutes. So, in terms of detail how you can access them, so they're both publicly available now. One's shed on the machine website. So the link is posted here. And the other one is shared on the open your repository, and both are shared under creative commons license. So you know, you can use it as long as you attribute us and if you want more details about the pay plan, so I'm going to give a gonna go over a bit of it, but if you want a few more details, You can check out our forthcoming paper which is like alright, data paper kind of going really into details about data collection, and how the scanning protocols and everything if you're interested. So in terms of data collection of For eg we had 49 participants, and there were different analyses done and like the paper kind of describes if in certain cases certain participants were excluded in case there was too much movement, or they had like a really low score on a quiz. But they were recorded a 500 hertz from 61 active electrodes. And this data collection was done with john brennan and his students at the University of Michigan. And for the fMRI, there were we had 26 participants, and it was a three Tesla scanner with a 32 channel headquarters at Cornell University. And this was collected I in john Hale and some other students. We collected this data set while I was at Cornell, and the study design was the same so the participants came in and you know, There was no explicit task, they just had to listen to the entirety, like have that audiobook. And once they heard that audio section, they all completed a multiple choice questionnaire at the end and the oldest fairly well, some of the eg participants didn't do as well. And that's also noted in the paper. So you can kind of take a look at, if you think that, you know, some people weren't paying attention. So that also kind of given that paper. And so along with actual data set that we released, we also released the timestamp of every word in the story and also some of the predictors. So we already have some published analyses based on these data sets. So last year, john brennan and john Hale had eg paper in class one about hierarchical structure guides, rapid linguistic prediction during naturalistic listening. So for this one, they basically found that it's hierarchical structure rather than sequential information, and all the predictors they use for this. It has been really But the data said to encase interest in replicating it or, you know, trying to kind of extend that analysis further. Then another example of an existing analysis would be a brand language paper from 2016. That, you know, we had john read and john Hill and some other people on and there they were trying to kind of look at how linguistically rich grammars work, you know, were found to be correlated in temporal but not frontal regions. And all the predictors that we use in that analysis has also been released. So it's also available. And then one of our other colleagues recently did a study using fMRI data said kind of do test to predict to based on the distributional hypothesis confirming a meaning related role in the anterior temporal lobe. So, just to give you a sense of kind of the space of research questions has been asked, but there are definitely lots of kind of further questions you can ask and you know, new hypothesis that you could test because this data set is not toss base, so Anything that you're interested in, you know, you can go and test your new predictor based on that, or a lot of the prior analyses have been done in SPM in MATLAB, if you're interested in trying it out in a Python environment, so I know for eg there's, I mean, there's like eel green ne, and for fMRI, there's nylon. So if you want to try it in, like different environment replicates analysis, you know, that's also another possibility. So yeah.
If anyone has any questions
about the data set, or like,
if there's anything that's come up in the chat, I'd be happy to answer those.
Okay, so thank you. Chahine. That's great. Um, if people have questions, like I said, Please try to use the Google Doc.
So again, I can send out that link here because I think some people have joined now so I'm going to just copy that link and put it into the chat.
And Sue question. So this is both fMRI and eg retro Heaney. Yeah, it's both and it's so there were different participants, but they heard like the exact same stimuli at the same rate. So,
So for people that were getting started on, they just wanted to do like a first pass at this. So this is all naturalistic comprehension. Right. So these are like, it's a naturalistic kind of story. And this is very different, I think, from what many of us are used to doing, which is a controlled experiment. So can you just give some thoughts on like, how would you start approaching, you know, doing that, like, Is it is it necessary to use the same kind of computational, you know, parsing approach that you used?
No, I mean, you could at least I would like the simplest thing would be so the tat for example, like the timestamp of every word is given. So if you want to just do something like noun versus verbs, right, you could just annotate them, you could have a binary like 01 sort of predictor. And just in a same for example, an fMRI thing if you just did a typical like GLM thing, and you wanted to kind of compare the effect of noun versus verbs, you could just annotate all the nouns with like a one and verbs with the one to do like a contrast like that, which would be comparable to kind of more of the kind of classic kind of control based designs. We did more like gradient predictors based on different kind of NLP stuff. But you can kind of start with like a very simple predictor and see you know, how that works out. And then you can maybe extend that to like further kind of part, like parsing metrics or like anything else that you're interested in.
Okay, I'm gonna take a look at the google doc and so one person is asking.
Let's see are there metadata on Participants linguistic background, are they monolingual? No a second language, etc.
So I think I'm done. I'm pretty sure to the eg One, two, but for the fMRI one, we made an order when participants were bilingual, but are like multilingual. But we didn't make a note of the exact
languages this book. We, as long as they were self reported native English speakers, they were included in the study. So we ended up we had some like Korean English, Hindi English, like bilingual speakers. So we just kind of made a note that they, you know, they spoke another language.
Okay, so they're all native English speakers, possibly are bilingual. And that there is it noted that they're bilingual in the data set
in the metadata. So yeah, if you go to the repository, there is a readme file with the metadata for describing all the participants, including I think even so we did a little audio check to make sure like they could understand the story properly. And seeing the decibel at which the story was stated was kind of similar across all policies, but I decibel level is also noted for all the participants. We didn't use that value in our study or that quiz comprehension results, but they how well they did on the quiz, what kind of volume the story was played. And those are all noted in the metadata.
Okay, that's fantastic. Okay, so that behavioral data could all be used. Yeah.
So that's great. One person asked a question is the audio that participants listened to you also available? Like the stimulus? Yes. So we included with the data set, but we just, we also send it's a link from it's what is it called library box, which has like tons of audio book. It's actually free downloads. If you go to that website. You can also download it directly from that website. But we also include the link in our repository
and audiobook for free.
It's great. If anyone has more questions, if you're just tuning in, please go to the Google Doc that I just sent a link to.
You can add your questions there. So you, so maybe you can just talk for a minute about what you've done already, like some of the results that you guys have done with this data said.
Sure. So I gave a couple of examples here. Um, so I'm trying to think of some other stuff that people have done. Um, so you presented actually a CUNY right it was your poster and so
that so the second paper you've listed there, this the fMRI paper and you've used that same data set, and then what you did is you edit regressors based on properties of the verbs based on selectional restrictions, right?
Well, William, that's actually a different data set so I collected the essence was my dissertation so that's based on little prints English version so that's not publicly available Yes. That he was involved in before this Alice in Wonderland one. That one you're hoping for these in the future because that's actually across linguistic work because I we collected the English data set Who's listed on the screen, she collected the Mandarin version of The Little Prince. So she has f my data for that. And currently in France, there's a group collecting the French data for the Little Prince. So the goal was do a cross linguistic fMRI study. We haven't kind of finished scanning. So we haven't released that data set down the road that should roll, that's the goal, down the road, kind of release it. And that that'd be like kind of cross linguistic parallel fMRI data set. That's not out there yet. But yet, my recent work has been based on similar so that was actually longer because that's one and a half hours. That's a longer data set, a similar naturalistic stimuli and all of my recent work on argument structure, looking at non compositional expressions kind of has used that data set. And right now I'm looking at things like tropical surprisal and how to incorporate kind of contextual information. But yeah, like, that's the thing. I mean, I've been using that same data set now for like, I think over two years once all sorts of questions. So this kind of naturalistic stimulus definitely kind of lends itself To them, that kind of flexibility.
That's great. Okay, we got a couple more questions here. So one person is asking is the code used for the NLP models also available if they want to implement the analysis in a different pipeline.
So the predictors that are available, the way we kind of derived the predictors like that kind of code isn't available. But if you're going to reach out to me, or john, we can probably help you out with that. But for example, for the brain and language people, you know, they calculated like by Grand trigrams, appraisal, and all of those values for each words, it's already available. So if you just want to replicate it with a different pipeline, you can just run it. We have it all in a spreadsheet. So that's available. If you want to specifically how we did it, you can, it's described in the paper, but you can also reach out to me or john, we'd be happy to kind of help out answer any questions.
Okay, that's great. And then there's another question here, which is, as, as far as I remember, the audio was slowed down. So it may not matter.
What was in the audio book from LibriVox? Is that correct?
Yes, it was slow down by 20%. I think we had the slow down version included when we just slowed it down and like prod. So
it's, we didn't do anything like super fancy with slowing it down. So it was just on a prod. So you could download and do it yourself or the slow down version isn't loaded.
Okay, that's fantastic.
So just want to let everyone know, it's great questions, if you wouldn't mind adding your name, that's helpful as well if you feel comfortable with that, because when there's this is sort of a rush session here. But if there was more time, we can actually you know, click on you unmute you so you can ask your question yourself and any kind of follow ups or clarifications.
If anyone else has questions, please post them right now on the Google Doc. If not, then we're going to we're going to transition to the finding five tutorial.
Yeah, and people can definitely reach out to me. Um, you can also this whole the data paper that I
I mentioned it in the schedule that William sent out the link to it. And also it's on my website. So if you want to read the data paper, it has links to all the data sets. And like it describes all the data sets, too. So and also, you can email me just email@example.com. And I leave my contact information in the Google Doc, if people have questions after this. Okay, that's fantastic. Thank you so much for doing this at the last minute. That's really helpful. And just want to let everyone know that in the afternoon, we're gonna have a presentation by Joshua and he is going to talk about just generally different databases that are online publicly available, neuroimaging databases and his experiences accessing and using those. I'm sure that one of the biggest issues with these publicly available datasets is trying to understand what people did exactly how they did it and whether how to find things are actually of interest to you that are usable for the questions you want to answer. So they'll be covered in the afternoon session. Okay, so now I would like to transition
to Noah Nelson. Again, he just I believe got his PhD not long ago from University of Arizona, and he is one of the representative of the finding five platform. Okay, so I'm going to go ahead and turn it over to
Noah. Let's see if I can find you on the list. Okay. We're gonna try to unmute you. Okay, you are now unmuted.
Hi, Can you all hear me? Yes, I can hear you great. Your audio quality is good Um, I don't see your video. Yeah. Is that something might be let me see here. I think I can quit There we go. Yes working now while I'm in control of that. I can guarantee you guys that know his beard is far bigger than the picture on the finding five websites so you should congratulate him on the accomplishment.
This is this is just during coronavirus. I read this is just the quarantine beard okay. But anyways, that Thank you.
So please let me know. If you need anything please let me know and we can you know, get resources out to people if they need to, you know, access anything online.
Okay, great. Yeah, thank you. Um, before I dig in, I just want to say thank you to William and everyone else who set this up this is a really great thing that must have taken a decent amount of effort to put together so thanks for that. I am going to try to share my screen here let's see if I can
make the other should be
better the bottom in the middle like a share button. Yeah, it if you
click the arrow or sorry if you click it, there's you can just share like a computer screen or just a program so I was able to just share like PowerPoint.
Yeah, sorry, I have to do.
term Okay, never. So I might have to leave and come back. It looks like my computer won't let me share and Till I restart zoom. Okay, that's all right.
Will I be able to get back in? I don't
Yes, I have now expanded, was able to upgrade the plan as zoo is apparently very useful for I was able to upgrade my plan and then it automatically allowed people to join me. Yeah.
All right. In that case I will be right back.
Okay. Yeah, sounds good. Yeah.
So now we'll be entering into the very brief stand up portion of the meeting, where we try to entertain you for a minute until Noah gets back.
If anyone has any really good linguistics jokes, please feel free to share them on the resources page.
I'm trying to think of any off the top of my head. There's really none that are coming to mind. So I'm trying to think of good Oh yeah, we had We had some jokes during CUNY about dog semantics. So like the question was, you know, what do dogs? What is their logical semantic representations if they have any or not? Right? So this is an interesting Jeopardy music. No, no jeopardy. I mean, I could try to Okay, no is back. All right? You guys are spared the continuation of the rambling and so I will now go back into let's see if I can unmute No.
Yes, I can
unmute it. Okay. Take it away. No.
Okay. So I'm
hoping you guys are seeing my slides.
And we can see it perfectly fine. It's like a just a full screen presentation and your videos on the top here. Great, great. Okay. So, um, I Alright, so thank you for the introduction. I'm no one else and like
William said, I just graduated into
Just a little bit kind of about how finding five works essentially, if you have some idea for a study, when you build it on the final five platform you write some code in a very simplified coding language that has a pretty low learning curve to it. The code itself is trying to model is trying to be modeled off of like the actual design of behavioral experiments. So we use terms like you know stimuli responses trial templates, blocks, procedure, etc. Just to try to kind of make this all more accessible and familiar to us, all right. And once you've coded it, you can launch a session of it to begin recruiting participants. And essentially you code the same experiment. And then you just select either want to launch this on the finding five platform, or I want to launch this through Mechanical Turk. And as long as you have your Amazon Web Services account all set up and ready to go, we essentially convert the study to the format it needs to be to be run on Mechanical Turk for you. So just real briefly, before I get into actually showing you guys finding five itself, I want to talk just a little bit about the experiment structure and finding five and just kind of highlight. So the parallels with a normal behavioral experiment. I mean, I don't have to explain to anyone here I think what a typical behavioral experiment might be contained. But I just kind of want to highlight that the components of a study and finding five are meant to mirror them explicitly. One of the few differences you'll notice is that we, we have component of finding five study that we call trial templates. And this is basically just for us to make it so that you don't have to code every single trial, you know, hard code, every trial individually, you make a template and say, you know, I want these stimuli distributed with these responses to make trials in this block. And that's just kind of tried to try to make it a little simpler and easier for our users. I think just to kind of cover sort of briefly, those sorts of things that that are covered here, like stimuli that we have, or just any stimuli that you can present to participants. So what we refer to as stimuli can include things like text or audio files, or images or videos. Pretty much You name it, and responses are essentially any opportunities for data collection that we make possible. And these include things like, you know, participants can do a free response in text or audio. So they can, as long as they give permissions to for their browser to access their microphone, you can record audio from them. You can have them do free response text, we have choice responses that they can click or use key presses, rating scales, etc. And then our trial templates are, like I mentioned, just templates for how to pair responses and stimulate together. And then we have a procedure section, which is what you might imagine we essentially say, which trial templates Do you want to put into which blocks and what order Do you want to put those blocks in for your participants to see. So essentially, you're defining these different components, configuring them together. And that's really all there is to it. And this is obviously a comprehensive logic and so some studies don't necessarily need, like every component in the maximum complexity, but we sort of compel you to use it anyway so that all the studies have the same underlying fundamental logic. So what I want to do right now is just demonstrate we have some demos that I want to show with a couple features. Let me see if I tap that share and change to
sorry, my zoom thing is in the way, how do I change tabs?
There we go. Okay,
so this is our website and our landing page. And as you can see, one of our big focuses is to try and make the entire experiment process something that you can handle directly just on our platform. There are a lot of alternatives out there where you might like DIY alternatives, for example, where you can maybe code your study in one place, recruit participants somewhere else and kind of run it over there. And we try to make it all sort of more of an all in one experience. And I encourage you guys to kind of visit the website and take a look and poke around, I think. I think we made a special case. So we are typically by invite only, so that we can kind of control who's coming in and starting to use our product. Not that we want to limit it from anyone but we do things like we manually verify that you are actually a legitimate researcher from a legitimate institution. For example. Right now, I think we've kind of waived the invitation code portion. So I think if you guys tried to sign up You should be able to make an account and get into finding five, you won't be able to launch a study on the platform or through Mechanical Turk until we do verify your researcher status, which we have to do manually. So that will happen later.
But just kind of a heads up there. Well,
I guess maybe zoom slowing down the internet.
Or maybe it's because we're all rushing in to sign up. Maybe that's why
I know there's some weird things sometimes zoom in interactions, web browsers or stuff like that. So hopefully, it's just the internet. Yeah.
Also, my personal Internet's probably not the greatest here at my house. So.
Okay, so right now I'm signed into a sort of administrative account that we have research at finding five but you can just sort of see the all researchers on finding five are also printing participants. So what you're seeing right now is sort of a landing page that a participant might see if they come to finding five. And there are some studies that are being offered for credit at the University of Arizona that are kind of dominating the main thread here. But as you can see, there are some other ones from other researchers that we can see. And it looks like not that many are active right now. But in our research page, just to kind of show you guys that the demos, right. This is where you would see any studies that you've made. Through our interface, you can see like we have a button to create a new study, when you actually create a study. I'm going to do this tokenize Text Tutorial for researchers. This is sort of what the materials look like that you actually code up. So you can see we have organization of trial templates and procedure and there's Kind of a tab to go back and forth to the different interfaces to work on that. And over on the right side of the screen, we have a list of our stimuli, which is searchable. You can add new ones manually, where you give them a name, and you define some attributes of that stimulus, like what kind of stimulus it is and what its content is. And you can also say, just define a whole bunch of stimuli in a CSV file, and just upload them directly, which makes it really convenient to do lots of stimuli all at once. And of course, you can sort of download and export your stimulus definitions that you have for purposes, say of sharing your materials with another researcher, for instance. And we also have responses as another thing, and it's the same sort of kind of interface that we have here.
So what I want to do is I want to actually
show you the preview so when you're working on a study in Finding five, at any point provided that your code can compile correctly, you can preview that study. And so we'll get a look at what this particular demo tutorial is about. And that's the tokenize text stimulus, which is a special kind of stimulus that we created to facilitate, like self paced reading, for example, and other
other studies of that kind of nature.
So, in this particular case, since this has a self paced reading component, there's keyboard interactions. And a lot of people don't know this, but Apple's Safari web browser, will not let you do that in full screen mode. So we have this nice little warning about that, but that's all this is saying. So just just a heads up that if you happen to be somebody who knows that your participants like to use Safari for some reason, this is something to be aware of. Another thing just to that We can point out, we always have a landing page here with some instructions, just sort of general, finding five why's instructions for participants when they take a study. And, for example, you know, we require that you have a consent form, in order to be able to run a study. And, you know, it's sort of up to the participant to actually read it, of course, and they have to actually give their consent to begin the study. This, of course, is a preview and so it's not really a real study, but it kind of gives you a sense of what would be happening.
just to kind of demonstrate this particular stimulus type, since this is just one of the many kinds of stimuli that we have. This little tutorial will help us explore a variety of ways to create engaging studies with this tokenize text stimulus. So as you can see here, a tokenized text stimulus is just a string of text that is broken up into tokens. In this case, it's designed to present text content in an incremental manner with or without interactions from participants. So in this case, it didn't require any interaction from me this was automatically set. Right? And then I don't know, I think that was pretty clear and biased. But so when we have this sort of automatic presentation, obviously, we allow you to adjust the speed of the stimulus, right. So we have a case with a slower speed and a case with a faster speed to kind of show you what, what that looks like. But we also have different presentation modes, right. So the plain mode just kind of presents the text on the screen as you've just seen.
The Mask mode actually
can do like masking of the actual tokens as they appear. In this case, we have a backwards mask. So they're all hidden after they're displayed.
It looks like we have other chat questions.
Yeah, I'm trying to monitor that the chat but I might not do the best job. Okay, yeah. And so we also in addition to masking, we also have like Singleton displays where it's one word at a time, and everything else is hidden. So then of course, we don't have to display these automatically. So in this case, this is self paced. So I'm going to pace it myself. Finding five takes care of the nitty gritty details, so that you can focus on research.
So reaction times for,
for that whole process are recorded. for you automatically. So you can implement a sort of self paced reading of tokenize text Emily under any of the presentation modes plain masked or single then depending on your study needs. And these are just sort of the some of the core functions of the tokenistic stimulus we obviously we we have more that it can do and so you can go to help finding five comm to look at our API documentation, and specifically our study specification grammar reference there, where we talk about how to define different types of stimuli and what they can do. Okay, I'm going to skip the this part since I don't really need to, maybe, since I don't really need to give myself feedback. Okay.
just kind of as a as a brief little interview just to give you the basics.
We have for example, what we saw was essentially two
that were defined in here to consist of various trial templates. And we have things like cover trials for instructions prior to trial templates, trial templates themselves and trials to come at the end of a block. And in this case is pretty simple because this is just a demo in a tutorial. But there are a lot of different things you can do in the procedure with say, ordering the trial templates, and you can order stimuli and responses within the trial templates as well. What I would like to do now, is, I think I'm going to hand over the presentation to my colleague to Patricia reader.
And, William, if you can unmute her and Patti, are you
ready to do a demo for the audio? Yes, I am. Before we do that, do we want to answer some of the questions in the chat? Yeah, if we have questions to answer I would love to. Great. Hi, everybody. Yes. So there are a number of questions that we saw. I think one of the first ones that occurred to me
was about precision, timing precision. So this one question says, what level of precision do you have for reaction time responses, compared to for example, psycho js, etc. Since for many studies, a lag of several milliseconds could be too much.
Yeah, so we are. Our sensitivity is on the order of magnitudes of milliseconds.
Great. Okay, another question. It's finding five, open to entering into third party data processing agreements with universities. One person said they asked this because their institution, the Arctic University in Norway requires us for GDPR reasons.
I think we would be open to it, we would have to discuss specifics. Obviously, that's not something that we've done yet. But I don't see why we wouldn't be open to such a thing.
Okay, um, another question is pseudo randomization of stimuli available for instance, Latin square structure and then they mentioned that this thing seems amazing.
Now, thanks. So we, we are sort of our view is we don't really feel like we need to buy just automatically do a Latin square for you. I think most researchers are pretty capable of designing such a thing.
And pseudo randomization can be a little tricky to automate. There are some things that we Some features that we make available. So just to kind of give a sense, I mean, we true randomization is easy. We do that quite well. But we do have, Missy, I think it's in the trial templates. I'm showing help documentation right now, by the way. stimulus patterns. So we do have like a pseudo random stimulus pattern that does random reordering of stimulus presentation, but subject to certain constraints, right. So you can define attributes to apply constraints and say, you know, I want at least this many of a certain attribute in a row, but at most this many, and we'll do randomization within those constraints. Does that answer the question? I hope?
I think we'll find out a few questions question, feel free to follow up on the Google Doc
One person asked if it's free. And Sharla has been doing a great job as well as Patty of answering on Google Doc. So thank you guys so much for doing that online. But it looks like the question was answered that is free. Is that correct? Yeah. So I Wow, I should have mentioned that. Yeah, finding five is free. And that's one of the I mean, there are definitely other platforms out there that do
much of what we do, but you typically have to pay for them.
Ours is free, we do have to kind of since this is entirely run by volunteer operation, just to kind of help us with server costs and stuff like that. We do have premium plans available with some perks, but all the core functionality and everything you need to get started is totally free.
And just to jump in on the Google Doc I listed the webpage where you can learn about the differences between membership tiers. So basic uses everything you could possibly want to do on finding five basically, premium membership.
We'll just bump up that ability a bit more
It's great. Okay, so
let's see how much customizability is there in the various presentation modes for example, underlining words different fonts, bolding, etc. or non Latin scripts.
really, let me show you what a tokenized text stimulus actually looks like. So, in here we have this long, while not that long, but somewhat long definition. So you can see we have like a definition of what the character is that we're using to mask in this case it's being
forward masked and backward masked.
We have it set to being self paced. The content is the actual string of text itself. And by default, whatever you put in here, so you can put any duty UTF eight viable stuff into here. And by default, it splits based on spaces. But you can specify other characters. For example, if you want. You can actually use a regular expression to break up the text as well. That's another feature that we allow for. I don't know. And then basically, we have parameters for things like the size of the fun.
I believe, actually, I've never now that I think about it, I'm really glad somebody asked this. I've never tried messing with the font itself. But we, I mean, we're essentially accessing the CSS based on properties that you give. So it looks like
And size by default, I don't know about font but that would be something that would be pretty easy for us to help implement something for it. Think what you could do, at the very least, because this should work. Because we take HTML tags. So you should be able to do things like span, style equals and define CSS style, right? For anyone who knows anything about each HTML,
That kind of thing, I guess it would have to be like this. Right? And you could, you could put style specifications in there.
But you might have to learn a little bit of HTML to make that work. Otherwise, this would be the kind of thing if you needed a specific font, and we didn't have a way for you to do that. You'd reach out to us and we would be happy to implement something like that. That's definitely generalizable and usable for a lot of people. Right. So I think that's one thing you wanted to underscore is the responsibility of your team to you know, there's something that's built in that they want that's not there that they want, and they can reach out to you and then your guests are flexible and responsive and trying to adapt. Yeah, absolutely. Yeah. I mean,
Finding five is as good as it is right now because people have used it and made suggestions. And so we are very onboard with that approach to continuing to improve it. That's fantastic. If you do, I'm sure you're going to be inundated in the next few weeks and months, you know, with all of a sudden all the people in the world doing eye tracking or sorry, a selfish reading or just gonna all of a sudden be on finding five. But that's a different problem. Okay. And so, one person had asked before about the turnaround, I think that Shiloh had respond to that, like how quickly how people are asking questions or asking for changes. How could we do get back? What's that kind of timeframe?
Yeah, I should probably be trying to glance at this document too. I'm not sure what Shilo said but we try to be really fast. I mean, usually. So you know, taking an eye for example, when someone sends an email to research help at finding five.com it goes directly to our phones and we one of us will usually answer Within 24 hours with at least a preliminary response sometimes, right? Sometimes our response might be, oh, this sounds like you are going to need a new feature. We're having a meeting tomorrow. Let us talk about it, we'll get back to you. But
we usually at least let you know that we're paying attention and that we care about your issue as fast as we can. Right, right. Much better than Verizon on your call.
I hope so.
So someone asked a specific question about the variability in our teas. Like what is the variability in reaction times that you collect?
Which, you know, I don't I don't know sort of ting would be more familiar with those kinds of details. I know that in my, in sort of my field as a linguist and people I've worked with that, I had kind of learned that, you know, you always want to use a button box rather than a keyboard because keyboards aren't as responsive and things like that. But having talked to Tim about this, apparently, a lot of those studies were done with much older keyboards and keyboards have improved a lot. So he's, he's pretty sure that the difference is is very small. I couldn't give you a number though. He's way more knowledgeable about that kind of stuff than I am. So if that's a question that's important to you, I encourage you to send an email to us either at the researcher help email or at feedback at finding five.com and I'm sure Tim would be more than happy to answer you. That's great. We could have a ton of questions. So there's definitely More for me to ask. I don't know, if you wanted to come back to questions or just keep going. Let's, um, let's come back to it. So we have two other demos that we want to show. So I think there's always a chance that some of these questions will be kind of implicitly answered through the process of going through those demos. So
why don't we hand it over to Patti? If that's, that's great. Yeah. So I just want to let everyone know that's listening. Your questions, keep adding them to the Google Doc, you know, we're going to try to answer every question during the live chat here. But even if we don't get to it, we're going to make sure that we get some responses into the Google Doc from the team from planning five. So you know, your questions will be answered regardless of whether we get to them during the live session here. Okay, so anyways, take it away, guys. Right. So the purpose of us going through these demos is to show you some of these stimulus and response types that we think might be most useful for this audience. So Noah just showed you tokenize text
I'm going to try to share my screen as well. Somebody in the Google Chat just asked about recording of participants voice or other kinds of audio. And we can do that. So let me just share my screen here.
Okay, so what I'm showing you right now Oops, sorry. Earlier No, I was showing you our API, our studies specification grammar reference in our help documents. And we do have a ton of different types of stimuli and response types. So one of the response types is audio. In the demo I'm going to be showing you we it's incredibly simple, we'll just be recording a participants audio response to a query on the screen. So you can see here this Is the researcher panel that Noah was showing you earlier, we again have a very simple procedure with just one block our trials there, I think there's only going to be two. So it's a fixed order of trials. And in our trial template, we are setting up some instructions for the participant. And then just two basic ways of recording audio stimuli. So let's dive in to
previewing the study. So I'm actually go back to soon.
So now you should be seeing What this will look like for participants. Again, just like Noah said, there'll be a consent form here, it's up to the participant to read the consent form. And we'll go ahead and participates.
Okay, I agree to the terms I'm going to begin
Okay, thank you for checking out this demo study on the audio recording feature.
Recording participants on finding five can be achieved by using this audio response type. Once finding five detects the presence of an audio response in a study, it will attempt to request microphone permission from participants before the study starts. If finding five detects no audio recording equipment on a participants computer, it will prevent the participant from starting the study at all. So here we go. The audio response creates a simple recording interface purchase can control it on their own. So it allows participants to record themselves review the recording, and confirm if they're satisfied with it. So I'm going to go ahead and record my speech right now you can see the volume bar is moving in accordance with the volume of my voice. And I'll say I'm done with this trial as the participant. Now I can listen to what I just recorded. I'm not exactly sure if through zoom, you're going to be able to hear this, but let's give it a shot.
Yeah, I don't think we can hear that. You can. Okay. Sorry. I was listening to my voice speaker. Does it sound wonderful? It sounds strange. As listening to one's own voice can be
why ray is good quality. the quality is fantastic. Yes, yeah. All right. Another feature of the audio recording, which can be particularly helpful is changing the padding on the recordings. So you might have a participant
Click the record button, and then hit the stop button, but they're continuing to speak. So if you anticipate that this could potentially be an issue, you might want to build in some padding on this response type. So for example, in this particular trial, we're going to be recording my voice here for a couple of seconds either participant, I'm going to hit stop, but in fact, the recording is continuing for an additional 200 milliseconds, 500 milliseconds, whatever you want it to be. So the audio file that you will get when you collect your data is actually going to be slightly longer than what the participant marked by hitting the stop button. So audio responses recorded in stereo compressed at a bit rate of 64 K. The fidelity is good enough for human speech. Each file You will receive as an OTG file, similar to mp3, you should be able to convert this if you need to, using any number of free tools or open it up in a tool like Audacity.
And again, our API has a lot of details on this. All right.
let me go back over here.
All right. No. Would you like to come back at this point in time?
We are. Oh, there we go. Okay.
Do you have anything you'd like to add? I guess I can really quick, just highlight that when you look at the API share this
Hear, when you look at the API for audio responses, you can see one of the things you can change here is the padding on it. Again, the default is 500 milliseconds after the participants had stopped, it will continue to record.
yeah, I mean, one thing I would add, I think
I, if I'm remembering correctly, we put a implicit upper limit on the padding because
we don't want our if
you had a typo, where you put in a really large padding, you would essentially be recording somebody without their consent, really, because they would think that they were no longer being recorded when they were. So we did put an upper limit on that. I think I can't I can't remember what it is. But that's implicit. But 500 is usually in our experience is a good number. We realized there were a lot of issues with the data where it was getting cut off, that a lot of researchers were happening happening and once we Put that in and suggested we kind of played with a number and suggested that number. those problems basically went away. So
and just to show how simple this is, when you build an audio response, all you need to do is create a new response type here and the researcher screen, set it to type audio. incredibly simple. If you want to add padding, that's an optional setting that you can have in your response type. Yeah.
Okay, do we want to answer more questions? Go on to the third demo, and
maybe we should do some questions about audio specifically that you
Okay, um, so
there's some questions of whether you can record voices. So we just established that yes, that works great. Let's see, I'm trying to find Could you describe how voice responses work a bit more? Can you advanced trials based on a voice response
like have their voice trigger a trial to be event?
no, we cannot have that as of yet. Yeah.
I'm trying to think whether there would be any
technical issues with that. I mean, as long as it's done by they actually press the record button, right so that we're actually recording them. I suppose it might be doable in theory, but I there might be some kind of technical barriers there. One thing I do want to point out just as a general bit of information for people who aren't used to collecting data over the web, your your the circumstances your participants are going to be in are not the ones you're used to in the lab. So that might be the kind of the kind of study design that is a little more difficult to, to do over the web people despite whatever instructions you might give, people will do these things in coffee shops sometimes.
Right? So on that point, follow
to that question was someone asking about two to three year old participants? And I'm not sure exactly the question, but they were asking again about the possibility to pantsing trials based on voice. But also, is it possible to have general audio recordings? I'm assuming what the question was is like, can you record like throughout the experiment rather than having it be recording a press to initiate recording?
So no, that's not something we can do right now. We can't just do a blanket recording throughout the whole thing.
I'm trying to think, you know, I that would, that might be the kind of feature we could certainly talk about it, but it might be the kind of feature that we would be hesitant to make possible in the first place. There's just sort of some some issues with how clear would it be to participants the data itself and transmitting it over the servers that could be a very big audio file and it depending on how many Participants you're having in your study that could cause some real server load issues that might affect your study. We certainly would be open to discussing feasibility and options for a particular study designer particular goal of a researcher. But yeah, just my gut reaction is that there might be a little bit of pushback on our end, we would want to at least talk about other options. Well, that that
I think, raises an important question that maybe we should discuss, and maybe allocate some time to. It's just, I think there's a lot of people that are not generally familiar, including myself with online experiments, and just one of the things that like, you know, you're gonna run into that maybe even obvious, but you just, you know, you have to sort of do it to understand like, the fact like you mentioned about the controlled setting, other kinds of issues that might pop up. So maybe, you know, if you guys think about it, maybe like, you know, 10 or 15 minutes just at the end to discuss that specifically. I think that's a great way to do it. Yeah. I think from that point as well, I think a lot of people probably don't know about Mechanical Turk payment and that sort of thing. So maybe that would be good to include in that as well. Now this
whole back discussions together at the end, I think,
okay, yeah, that's great. Okay. Let me see they someone said, I've got three plus ones on a question. You mentioned a beta online tracking system. Is it possible to integrate other web based data collection with the experiment and finding fine, boy, so I think that may be referring to the fact that PCI, Beck's has a beta tracking system. And I knew I had asked you this question as well before and you said that you don't currently have online web based training, right?
Yeah, we don't currently have it and we don't have any plans to do it in the immediate future.
We were just a little uncertain, uncertain about the general reliability and generalizability of such a paradigm through the webcam. But I mean, it sounds like if if NPC if PCI, Beck's is doing it, we would be, of course very interested in learning more about how that is going for them. Because if they're getting, you know, reliable enough data for people to do visual world paradigm research, then that could be something useful for us as well.
Right, in a visual world paradigm, you know, is much less requires much less precision, I think, rather than reading experiments or other kind of visual psychophysics.
certainly. Okay. So another question was, can participants see a visual stimulus in record their voice at the same time, like in a picture naming experiment? Yeah. And to measure and is it possible to measure latency, response latency from stimulus onset to voice onset?
Yes, so well, a little bit of that calculation would have to be your own. So what we can measure is the amount of time until they press the record button
and, and then
So what might be more challenging is trying to like, have them open the recording and then see the image and you measure that time. But I think we can do that we have a property we call barrier that allows us to define whether certain stimuli block other stimuli or responses in the same trial from appearing yet. So for example, it's very useful when you have like a video and you want someone to watch the whole video before they respond, right? You make that video a barrier to the responses. If you take that property away, you could in theory, you know, they could press recorded any time, you could have some delay on your image stimulus. You know what the delay is, and that measurement is very precise. And then all you have to do is measure the time within the audio recording before they started actually speaking. Of course, it's up to you if you wanted to actually do voice on set or the click of the record button because that clicked. might better represent when they're ready to speak.
That's a great thing you mentioned, right actually using the click sound, is there a way to, you know, just incorporate in the experiment, you know, like a tone that would play or something like that that would be useful for
that. So that's I mean, that's entirely up to you. If you make an audio stimulus that has a tone in it, you should be able to have that audio stimulus triggered when they press click, for example, but since each recording that they're taking is isolated, usually the tone thing is done for purposes of kind of helping the researcher separate out, okay, this is the stretch of time that I'm interested in. For us, your your users are usually going to be clicking record and stop. So that gives you a start and an end time to be interested in. But yeah, I mean, certainly if you set up your own audio stimulus, you could do the sort of deep paradigm.
We don't have such a such a thing currently in place now.
You know, that's, that's an interesting question, though. I mean, it depending on the format of it, it might not be that much work to translate it into finding five but I mean, probably that would be something we would ask the researcher to do. So.
Right. Okay. Another question. Is there a way to interact with stimuli on the screen like clicking on images to play different sounds and such Okay. Yes, for those that are listening only. Let's see. Let's I'm not sure how much we should expand here. I can you can present audio visual stimuli. We clarified that already. You can do self paced audio moving window presentation.
Oh, No, no. But that's very interesting.
That sounds like a cool feature.
Get in touch with us. I want to talk
to whoever asked the question about self paced audio moving window presentation, email these guys, and they will do their best I assume to try to implement that. It should be it's not too much of a technical burden, right to figure that out.
Well, that's something I wouldn't want to comment on until I talk to
the team, right. All right. All right. If someone's a really good programmer, you know, you can join finding five and help them add features to their platform.
Yeah, no, well definitely plug ourselves that way. We want more help.
And I'm sure you're gonna need the help and you're calling once here. Okay, and maybe we should move on. There's a more questions, but I think they're less specific to this kind of stuff. If you have another demo or more material to cover here. Okay.
so let me share my screen again.
What do we got here?
Oh, by the way, since I accidentally clicked on the wrong window. So this is a news app finding five calm and just I'll briefly say so we have, like blog posts with news about, you know, our community. We have a mission statement here, which sort of tells you a bit about what we're all about. And I encourage you guys to take a look at that at some point. And we also have here as part of our blog, we have some tutorials. Like for example, we have a tutorial on on like automatic creation of compensation hits for Mechanical Turk workers. So that's something we can talk about later when we get into Mechanical Turk, but this is just an example of a tutorial. We have. So just since I was on this page, I'll take this opportunity to share that.
if I can actually just jump in for a second while you're pulling it up now, we have a crash course on that news dot finding five.com, which walks through building a simple memory based study. And actually, if you go to research resources, Yep, exactly. This Crash Course is pretty straightforward and would walk you through step by step, all of the programming concepts, there are very few that you would need to go through in order to build just your very first simplistic study on finding five so that that would be a really good resource
to get started,
this is a little outdated. This is our old.
This is our old interface before it got prettier. But the, the information is all the same, it just looks a smidgen different and we'll be updating those screenshots. Very soon.
Is it possible for people to try to do that now and then get feedback from you? Sure.
Yeah. That should work. You should I mean, if you were, by the way, if anyone's trying to sign up and encountering problems, let us know.
If you're trying to sign up or getting problems, let them know and if you're able to get into their system you can try messing around with this and trying to tutorial and then if you have questions in real time while you're working on it, we can you know, get those questions to know it Paddy. And also we can also potentially set up some video chats if you need help to walk through getting started. Anyways, sorry to interrupt.
No, he's gonna show us a slightly more complicated grammar.
Yeah, and, sorry, plugging my computer in because I realized I didn't do that. Okay.
So, what I'm gonna demonstrate Now is conditional branching? Which, if that's not a term that you're familiar with, well, I guess I'll let the the demo get it started for you. So, um, conditional branching is essentially assigning participants into different arms or branches of a study based on some test condition. And usually this means like based on a particular response they gave, or based on whether they pass an accuracy threshold and a training block or something like that. So essentially, what it is is kind of defining different versions of your experiment that are conditional on participants experience or performance. So what this amounts to is essentially changing the trials that a participant experiences based on their responses to previous trials. And we have two methods of conditional branching sort of in the broader sense that we have
support for at the moment, we have a match method and an accuracy method. And these are the methods used to evaluate the condition upon which participants will be sorted. So using the match method, different response options are matched with different branches of the experiment. So we're going to try it with an alternative forced choice trial.
based basically, this is just a total toy example, obviously, but depending on whether I choose the left button or the right button, I'm going to get a different version of this experiment. So if I press left, it actually tells me Well, you clicked on the left button. So you're seeing this. Obviously, if I had picked right, it would be a different trial. Right? And so this is the essence of conditional branching and how it works. This same Match method can also be applied to, for example, or rating response. So, here participants are sorted into one branch if they select either one or to a different branch if they select three or a different branch if they select either four or five, so you can see how this might be useful under, say, survey conditions, for example. So I don't know I'll pick two. And if you notice up here, it said, branched into branch a at the top of my screen. I don't know if you missed it, it was there shortly. I'm currently in preview mode. So I just want to highlight this preview mode is for the benefit of the researcher testing their code, participants would not see that right, they're not going to be told that they were put into a particular branch. But in this case, I defined something that I called branch a and that was contingent on that response. So here since I picked a value of one or two, this is the message I'm getting. We also have the accuracy method, of course, and when we use the accuracy method, participants are sorted into their branches based on whether they pass some accuracy threshold. So what I'm going to do is I'm going to have five AFC trials in a row in a row and an accuracy threshold of 80%. Right? So I have to get four to five. So in this case, for the purposes of the demonstration, I'm telling you explicitly what the correct answer is, right? But as you can see, if I say choose the wrong answer,
I'm not going to pass that threshold.
So I did not get four to five, I failed to pass the accuracy threshold. If I had, I'd be getting a different branch of the experiment. Now one way we can use this, that's not actually in the demo at the moment. I created this demo on short notice, so I didn't have time to do this. But one thing you can do, one thing that's really useful for training in an experiment is you can set it up so that if somebody doesn't pass the accuracy threshold, they essentially just take the same block again They go through, you know, are the same series of blocks again, for example, so they just go through the training again, they do the test again. And you can evaluate them, for example, based on their overall performance across all their efforts, or just on their last effort. You can set a number of iterations for how many times they can try to do the training and the test and so on. So that's, uh, oh, I probably shouldn't have done that. So that's conditional branching. I think maybe, before we turn to questions, well, actually, let's turn to questions first in case I'm sure people are interested in this. And we'll turn the questions first, and then I'll show the I'll show some results files after that.
Okay, so let me see if I can find any questions, particularly about the branching.
don't see any if
if somebody can find those questions on the Google Doc I'm trying to see.
So I mean, it may have just been too too quick.
But I mean, I think this is a pretty powerful feature.
it you know, it took us a while to develop, but we were pretty excited when we got it working.
Just Why don't actually what I should do is show you kind of how,
how it sort of works and our procedures. So you can see that it's not terribly complicated. So this particular design that I did here has a lot of blocks because conditional branching and finding five happens exclusively at the block level. We did this just sort of, because we were looking at for example, Qualtrics has a bunch of different kinds of conditional branching ones based on an individual response that happens at a trial level where you know, the test condition is a trial. And the branching outcomes are single trials. And then separately, they have something else that's more of like a group of trial levels, something else that's more of a block level and so on. And we just thought, you know, it's not that much work to take all of this stuff and just do it only at the block level. And it's actually easier, I think, for the researchers this way, because when I was trying to learn how to do this in Qualtrics, it got very confusing very fast.
So to give you a sense,
the first evaluation condition that we had was a matching match using the matching method to evaluate the condition on an AFC trial. So I essentially just made a block with my trial template consisting of my FC trial in it. And what we do is we just define this branching dictionary here. And that's what tells finding five that This block of trials is to be treated as the condition for conditional branching. I set the method to match, I set some triggers, like which trials within this block we want to actually use to trigger the conditional branching. In this case, there was only one trial. So that was quite simple. But and you can even say, you know, I want all the trials in this trial template, but only the ones that have a certain response in them. Right. So that can be something that you can do to have rather complex trials, where you actually evaluate your conditional based on just one of many responses that participants give. And then we just define the actual branches themselves, which are then specified in the actual block sequence of the experiment. So in this case, this is that block that evaluates the condition. When participants complete that block, finding five knows what their responses were, and what branch to assign them to. And then we have this dictionary here that defines those branches. And as long as the names of these branches match the names that I gave in my branching dictionary, then finding five can handle it, and we get conditional branching. And so in this case, they see different blocks, depending on which response I gave. You can see the same thing for the rating block, we have the same kind of strategy branch a, branch B, and branch three. And also for the accuracy block, right, where I made blocks that I called our branches that I called pass and fail, right? Which I can just kind of show you what that looks like. Right? So in the accuracy condition, it's the same basic structure as the other block except in my branching dictionary here. My method is accuracy. I have a minimum score that people have to reach in this case it was 80%. So the 0.8 define triggers in exactly the same way and in the case of inaccuracy By you evaluate your branches based on a true or false condition, did they pass this threshold?
So this is like a really complex choose your own adventure, situation experience for your subject.
Yeah, although they don't know that they're choosing their own adventure in most cases. You obviously whether you want to alert them to that fact is up to you as the researcher, but in most conditions they wouldn't know. Yeah.
Okay. Should we do some more questions here? I don't know. Okay. Yes. So someone asked, Would you use conditional branching to pseudo randomly send participants to the lists or versions of the experiment in between subjects design?
No. So we have a different feature called participant grouping, where basically we will automatically group group participants into two different lists, conditional branching is going to always be conditional on the specific response. So it's dynamic from within the actual Whereas assigning participants to different groups is something you want to do upfront before the experiment begins, right. So as soon as somebody joins the study, you want to assign them to one of those lists, right. And that's done through participant grouping, which actually looks very similar. So I don't have to do any kind of special blocks or anything. But I can just define, you know,
group one, and they get this
list of blocks.
And group two,
gets some different list of blocks. And that would be how I could just automatically assign participants to different groups. What finding five will do in this case behind the scenes is the first participant who joins the study will get assigned to one of these groups randomly and then the next participant will get assigned to the other group and so on.
That's not simple. A
made a demo for anything but it's pretty straightforward, I think.
I think we have some other questions about other aspects of finding five. I don't know if you want to answer those now or wait.
I don't know. Patti, do you have any thoughts? Um,
I guess, would it be worthwhile. We have Shiloh Drake also here who has conducted research using finding five I thought she could talk a little bit about what the results look like, how simple it is to actually look at your data, which is why we're going through all of this in the first place.
I actually think that would be wonderful because there are some questions about data storage and access and things like that. So maybe that would be perfect to address those questions as well.
Shilo, are you ready?
Okay, let me see if I can unmute her.
They will unmute Shiloh if she's ready. And you fellow among others have been very helpful. editing the Google Doc online and formatting it nicely and adding the responses to the questions and things like that. So thank you so much for that as well.
So let's see. I've been answering questions on the Google Doc By the way, it's just someone who's like us fighting five and I'm not affiliated with the team in any way. Just so you all know.
So I guess my my role right now is to
show you what the results end up looking like. So I'll just pull up a CSV file that that one of my latest studies from finding five generated
So you should be able to see my Excel file on the screen right now. And what this has it's an it's a CSV file before anything else before I did any of the cleaning of the response or of the of the file. So it's got everything from, like the idea of the experiment over on the far left column to weather. Let's see, this is weather I wanted them to use or that type of response that they're giving. So in this case, I'm only using key presses, this is a reaction time experiment. So finding five recorded their response reaction time and this is in milliseconds. And they also knew you can also see that this is the trial template which which trial are these in?
Are there are there any questions? On a Google Doc For what?
For like, what?
Sorry, go ahead.
Are there are there any questions for like what else? We'll
have won a lot of interest in what the data file output looks like, which you're showing now, which is great. And we
should say that data is sent to you as a CSV. Yes. Audio responses, like I said earlier Oh, GG files. But this is exactly what you'll receive pretty much.
It is convenient that it comes as a CSV if you're going to be importing it into our for example,
right into our or just manually examining it in Excel, which is great. So there was question about data storage, so in which country or the server storing the data located and if there are multiple can we choose them? And this is because some European ethical privacy regulations could be a concern there.
Now, where would you like to speak to that?
Okay, can you hear me? Sorry? I think assessor that I can only unmute you So, okay. So I will no longer mute myself when other people are talking.
I'll just be quiet.
Okay, yes. So we have servers on the east coast and the West Coast of the United States. I do not. Yeah, we do not currently have any server locations in Europe. This has been brought to our attention before. But we had like one single lab that had an interest for which this was an issue. And so it wasn't Really, you know, we pay for our team pays for these servers out of pocket basically, unless we get donations. So
that, unfortunately, is a restriction we have at the moment, although
we are currently in a stage of expansion, and we're trying to get more people involved. And so if if this platform seems like something that you want to use, and you're willing to donate a little bit of money to help us with server costs, that would certainly facilitate us opening servers in Europe to to expand into that market.
because I think another question was just in reference to our IRB. So it might be useful to include some links in the other document to publications, perhaps, I'm using finding five that maybe people can look to how to navigate to the IRB there, but um, it's a good question. I guess maybe this is good for Shiloh. Like, what was the situation for navigating IRB using finding files
This is I mean, my this is for When did I run this study? This was
oh yeah, this is run this was run while I was
a visiting researcher shortly after I got my PhD at the University of Arizona. And they're the linguistics department had a had an in house IRB. And as someone else mentioned, on the Google Doc, this is also the university from which can forester de master DMD came from as well.
So we're used to reading a lot of online
and remote data collection studies. So I think our IRB was like they said, Okay, yeah, this is just another remote Experiment. And because all the all the participant ideas, I'll expand this column
participant ideas look like absolute gibberish.
the confidentiality of their responses is matched by that. And you can really only find you can really only find who's done the survey by or who's done the study, sorry, by looking at. Or I think you have to have an option to collect the emails of someone or the names of someone, but they're not associated with individual responses, which usually has satisfied IR B's in my experience.
That's correct. Yeah. So yeah, usually since we since we automatically generate participants From the researchers point of view, it's totally anonymized unless they ask for emails because they need to do compensation or study credit or something like that. But I understand there's a concern about, you know, we have that information on the back end. But, I mean, we use a third party resource who are quite secure, their servers
very secure, because it's a very big, it's CloudFlare. And they they handle a lot of server storage for a lot of big companies. So in in our experience, my experience I mean, at the U of A, I should say, and helping other people do studies and finding five I haven't known of anyone who has an issue, but like Shilo mentioned we have an in house IRB and they are they're sort of aware of the fact that the actual personal sensitivity and a lot of this data is limited and the servers are secure enough to say by them. I should mention
that, you know, we're kind of in uncharted territory now with this pandemic and more researchers are probably going to be conducting online research at universities and in labs, or that hasn't been done before. So I imagine that, you know, the finding five team would be willing to help if your IRB is having difficulty understanding what online research is all about, or needs more detail about our security features.
Right, I'll point out that before finding five there's been plenty of online research using Mechanical Turk and other platforms. So there are plenty of publications and precedents. If you're worried about those issues, then that's a one question with a plus one is how do you export the results? I'm not sure if you quite illustrated that. Is that automatic?
Yeah, so one thing we haven't talked about. So when you do when you do a preview of a study, you download your previews. results right away.
But that's just for kind of the researcher,
you know, testing out and piloting for themselves. When you do an actual an actual study, you run what we call a session of that study. So, you code up your study, you decide you want to run a session, you can do it on finding five or Mechanical Turk, you'd say how many participants you want to recruit, etc. When that session is completed, you can download, you download a single CSV like this one with and that's why there's participant IDs, right with all of the participants data from that session. And essentially, there's just a button that you click to download that data and it comes to you in this CSV format.
And I can actually I can stop sharing my screen with Excel and started with I've got got Firefox on in the background with like where I'm where I can get the union Get the CSV file. So here, you can you can see that I ran three separate studies or three separate sessions of this study. And I've, you know, I'm looking at just the ones that are finished. Right, so I can I get this button said, batch download data. And if I click that button, it'll say, oh, yep, here you go. Here's your data in this nice CSV file in this like very large folder downloads that I never clean out.
Yes, we all struggle with that
So just talk a little bit through what this screen is actually showing you, the researcher. Over on the right hand side it says platforms. So that's going to tell you if you're on Turk or deploying the study through finding five in the lab and participant Statistics. So how full is the study? If you wanted 50 participants? Are you done completing or collecting 50 participants worth of data?
I'm sorry, william go ahead.
No, no, that's, that's perfect. So another question was, can finding five generate a link for studies that we can post this link on a crowdsourcing website with nice Mechanical Turk?
Yeah, so if you were to say try to send a link to your actual study, the person who were trying to look at that link would not be able to access it because you're the only person with readwrite authority. So but if they have a finding five account, you can add collaborators to a study. So you can see on Shilo screen on the left, there's a collaboration button or tab. And if you actually shy if you can click on that, you can just see like she has some collaborators. Tang, I'm assuming was added as a collaborator to help you troubleshoot something. Yeah. And he says an actual reason Trick collaborator, but you can add new collaborators who are part of finding five to a study. And then they have, and you can, you know, affect whether or not they can edit the study whether or not they can run sessions. And this will allow them to have access to everything. That's as much as we do right now. But we do have plans to as you can see, there's also a findings tab and right now that we don't have this set up, but this is something that's on our radar that we really want to do to try to make this a place to facilitate replication of research and sharing of research, both within our platform and with other platforms such as GitHub or OSF.
So now rather than the so we're talking about collaboration with other researchers. The question I think, was also aimed at could you use a link to the study on finding five to recruit participants elsewhere? Oh,
participation through a link is a feature that we have. Yeah. So you can To say email, the link to participants, they have to, if you're running the study through the finding five platform, they have to create a finding five account, a participant account, or if they want to research or one researchers can participate in studies as well. So they would have to sign up, but they can use the use that link to participate in your study.
Right? It can't be embedded right now without either Mechanical Turk worker account or finding five participant account. But you can definitely send the link to your study to whomever you want.
That's great. And then that's I think, another question was, can you redirect to a URL or URL at the end of the experiment like Qualtrics? Because of subject credit, compensation? Yeah, like a subject cool.
yes, you should be able to add a URL into the study whether whether we have it set To make this clickable I've never actually done this before, personally, and I haven't actually thought about this, but I believe we can do that. At the very least, if we can't do that, at the very least what we can do is, you, as a researcher can ask for participant emails and when somebody completes a study, you can just send them that link to through the email, at the very least,
is that we've done in the past Shilo
in the past, what have I done? In the past, I collected emails and set it up that way. I've also done trying to remember what was what we did here. I think this one was just for course credit. So they just sent me the email and at the UVA had a system that collected all of the participants. And so I could just click off a box and say, Yep, you participated. Here's your course credit.
And just To reiterate that email is not attached to their data
so they can maintain anonymity. Right? Yeah, I have no idea whose data goes to whose responses or whose email goes to whose data.
on one, just clarification question. So is it the case that participants are required to have either of finding five account or Amazon inter account? Is that correct? Or is it you must have a finding five account account regardless,
either so it's gonna depend on on the platform that you use to launch your session. If you do it through finding five they have to have a finding five account. And by do it through finding five I don't mean, you know, did you make this study and finding five when you go to launch a session, you can say I either want to use finding five itself to recruit participants like well, to as the platform not to recruit, you have to recruit them, but I ever want to use finding five itself as the platform or I want to launch it on mechanic Turk and if you do the Mechanical Turk option, they do not have to have a finding five account, they, in fact, your participants will probably have no idea that they're using finding five on any level.
So I'm trying to update the response times where I forget. Okay, so that's great. Thank you. And I think if I don't ask this question, I'm going to get some rights here. In terms of, again, the question about eye tracking or webcam based data collection, can you at least record webcam data? If not use
online eye tracking? No, we don't have any functionality built into our platform right now to access the webcam at all. I mean, that's something we could do, but we've just been hesitant to do it so far, because I don't know it gets a little hairy and we just weren't sure that the data quality is good enough to be worth it at this point. But like I said, I mean, if others are doing it in their quality of data is sufficient for are typical researchers, researcher users, then we would definitely be interested in pursuing that.
And here's this is one question from me that I know. Other people will probably be interested in, in terms of video presentation, stimulus files or videos.
Is that that's perfectly possible or is it perfectly possible?
Yeah. We recommend. This is something we can talk about. I mean, we're sort of running a little short, I think on time, is that right?
Well, yeah, we're slated to take a break at 1215. We could probably go over a little bit. we've exhausted most of the questions, I believe on the Google Doc. So that's good.
So I don't know if people are already doing this. But we could, we could maybe turn to our discussion of like, online study best practices and mturk integration.
idea yeah. Yeah. Cuz I, we could do that briefly and maybe have a little time at the end for people. To play around with the platform, right, and you should feel free, I think to play around with it. And we should be able to grab someone to help you afterwards if possible during the session. But yeah, I think it's a great idea to try to get through that.
Okay, sure. Okay. So just really quick to answer questions about, you know, can you present video or can you collect auditory data from a participant? I just want to refer folks to our API, Help Desk finding five calm, you'll be able to see all the types of stimuli you can use like text, stimuli, images, playing audio clips, to the participant playing videos, and then tokenize text what Noah showed earlier. And then in terms of responses, again, you can collect text box responses, you can do some of the types of responses that Noah was showing earlier, like a choice response left, right. Yes, no, a rating scale. one to five, one to 10 or a recording the participants voice and audio response. So check those out to see some of the features. And as hopefully, it's become clear, we're really excited to hear about new things that people want. So please tell us your ideas for new features, and we'll see what we can do.
Noah, did you have anything you wanted to add to that?
I guess, you know, while we're talking about all those, just say,
I think Patti is going to jump in and talk mostly about mturk. But just because it's related to this. One thing that a lot of people who are new to online research don't really think about is sort of the the size of your stimulus files and how many of them you have is going to affect the quality of the study, because each one of these files has to be loaded individually on the browser for each participant. So we absolutely can do video. We recommend that if possible, they be compressed right or that if they're not compressed that you limit the numbers to some extent, we have some functionality built into lessen these burdens. So for example, we can make it so that when participants are trying to take their study, finding five will only load, say the first two blocks first and then after those, they'll separately load the next two blocks to kind of prevent the browser's from holding on to too much data and having that slow down their experience. And these are things that are within the researchers control, but it's just something I wanted to put out there while we're talking about stimuli that we've had instances in the past. Like we had somebody doing a phonetic study where they had, I think,
I think about 800
audio stimulus files that they were using for their phonetic perception study, and they were all uncompressed and they were all being loaded at once, and so They were very curious to find out why all the participants were reporting that their study was crashing. And so that's the kind of thing that I think a lot of researchers Don't think about. Because when you have people come into your lab, you know, that stuff's pre loaded on your computer. And so there's no loading required. But online, every participant is essentially downloading those files when they take your study.
So that should be of course, be of great relevance to sign language research as well.
Go back to them here.
Okay, so we do have another tutorial on the steps you'll need to go through. First, if you are going to recruit participants through Turk rather than sending them a link, and just having them do the study on finding five and figuring out how to compensate them on your own, you'll need to go through the steps of creating an Amazon Web Services account and creating a new I am user or identity and account management user. I wish I could show you this process. It's pretty straightforward. And we like I said, we do have a tutorial on how to do that. But I'm just going to leave that to you guys to figure out you do need a credit card to set up that I am user. So just keep that in mind. Once you set up your Amazon Web Services account, and your I am user you're going to get an access key and a secret. When you go to finding five, you'll have your own account profile. So this is my personal profile here. And when you scroll down to the bottom, you'll see that we asked you for your AWS access key and your AWS access secret. So that information that Mechanical Turk gave you, you'll just pop that right in there. And then once you do that, you will be ready. Here's just a study that we have set up, showing folks how to use the barrier feature. It's a tutorial you can find on our news dot finding five calm, but if you go here, like Shilo was showing us earlier, you'll see a sessions tab. I can click on sessions. We have no active or scheduled sessions for this particular study, but I'm going to create a new one. And rather than doing the study through finding five, I'm going to actually do it on Turk so that I can recruit participants from around the world. You'll see a pop up menu here that's going to ask, Do I want to start the study in sandbox mode? Or do I want to go into full production so I'm actually collecting data from participants. One best practice is to always, always try your study in sandbox mode. First, it ensures that workers don't get frustrated when there are errors in your study. glitches pretend we did that already. And we're now moving on to production. So we are ready to set up a session and collect data from participants through Turk. You want to create a name for your study that is informative, recognizable, typically short description that sounds enticing to workers. And you can give this particular session a name. I don't remember a Shiloh session list actually used names for her different different studies. Okay, you can do that. Otherwise, finding I will assign a selection of numbers and letters to give you a session name. But you could call it you know, first set of participants. And then some details on what the embedded window should look like the size of it. Next, you'll be asked additional questions about the participation restrictions. So how many participants Do you want to run in this particular section session? What's the estimated duration of your study? Do you want the study to timeout? So if a participant walks away in the middle of completing your study, should that session or should that hit for them stop. You can also set different features like blocking participants who have completed a past session of the study so that they can't keep doing your study over and over again, block participants who attempted but failed to complete a pass session. So perhaps those folks that timed out maybe now you Want to make sure that they can attempt again, and ensure that workers have to be over the age of 18? All right, we can do geo location restrictions for Turk. Try go through this really, really quick. If you have other study sessions on your user account, you can restrict participation in this particular session based on what they've done before. So if you have a multi session study, you can make sure they've done part one before moving on to part two. Likewise, you can block participants who have completed part one from participating in part two. And then compensation. So in terms of best practices, there's a lot of papers out there that I can share links to about, you know, guesstimates of how much you should reward participants in order to ensure that you know, you're being fair and ethical
going very quickly here, you'll also be asked to select a consent form. These are just the silly consent forms that I've uploaded to my account. But you'll want to include a consent form that you have approved through your IRB, you'll get to see a preview of it here. And then we have some information about sharing data with us. Once you go through and agree to all of these features, and identify the different restrictions you want to have on your participants, you're ready to go and your hit will be active relatively soon.
To be clear, though, you don't have to agree to any of those to launch a session.
Great. Okay, I take it back.
Any of these three things, right? This is stuff that we're asking for, if you don't mind so that we can learn more about how people use the platform, what's successful and what's not successful, especially in terms of Turk integration, right.
So again, in terms of benefits, practices, we really strongly encourage you to try running your study first in sandbox mode. If you're using Turk, consider the possible distractions a worker might encounter when they're doing your study. We could have a participant who is walking away from their computer every five minutes or has loud noises in the background. So consider quality assurance when you are preparing your study for online deployments. You'll want to have really clear instructions in some way of ensuring that people are following your instructions. Keep in mind, folks are not doing the study in your lab. You don't have control over the distractions that they may experience. So we have some features and finding five that can hopefully help you overcome those issues. We have catch trials, which we strongly encourage you to use if you're doing if you're deploying your study online. You can identify how often you want these catch trials to occur, but they basically would be ensuring that the participant is not just clicking yes to everything and they're actually paying attention to what's occurring. Though his description of conditional branching earlier also could be a way for you to engage in quality assurance, and ensure that participants are using the kind of accuracy you want in your study.
And I think, can I jump in with one little thing I know there was, um, in in retrospect, I should have asked her if she'd be willing to say something or prepare something to for me to say. But there was one of my, my entire close colleagues from shallows days at the University of Arizona did actually did a study where she specifically she did it in finding five and she specifically wanted to see how participants performed in the lab versus over Mechanical Turk. And in her case, I mean, despite all of these warnings that we give, right, I mean, they did consider these things when they design the experiment. But despite these warnings, actually, the Mechanical Turk workers performed better than the undergraduates at the University of errors. on it. And they just got a little bit cleaner data from them than they did from the undergrads who, despite our published studies should take you should take these issues seriously. But don't be discouraged online data can actually be pretty good.
Yeah, lots of really nice published studies comparing in lab data and Mechanical Turk data. Just one last thing we might want to mention is that currently, the only way to automatically pay your participants is if they are through Mechanical Turk. However, we are going to be implementing that feature a through finding five hopefully in as few as a couple months. So there is the possibility for you to not have to have your participants create a worker account on Mechanical Turk in order to be automatically compensated.
Okay, so I think That's all the questions. I mean, it was one random question. I don't know if you're answered already this finding five uses JS like,
no, yeah, we we, we don't use any sort of third party packages of that sort. I mean, you know, experimental kind of stuff. No.
Let's see. So one person is forming a question right now. So this is like a real time question I'm relaying. Any chances some good example papers for comparison between em Turk, and I'm assuming they meant like, conventional data collection.
Didn't you read? Oh,
yeah. Um, let me know.
Let me get some citations. And I'll put them
in the Google Doc.
Exactly. So there's another Google Doc. It's a separate one, and I'll link to it now. That just has a list of resources that you know, feel free anyone else to add to that, but there will be some references there for you if you're interested. Thank you. Let's see, are there tools for embedding ghost links in trials to weed out AMT bots?
That's an interesting question.
No, we we have not done that. Although, in our particular case, we've, I mean, we've kind of been monitoring, we were expecting bots to be an issue and it kind of hasn't been for our research or users that I'm aware of. It's not something that's really been an issue for us so far. So we haven't really done anything like that.
Right. Um, that's great. Let's see. II don't know if I missed any questions. But if I missed one, feel free to reiterate that question on the dog. So that make me make sure to catch it. But um, one thing I maybe was curious if you know a little bit about the demographics of the inter community or anything like that.
There's also published papers exploring that each year so I can include links to those
fantastic. Um, so yeah, I guess we're getting close to the end here. If people still have questions, please add them but I think might be a good place to stop here. So with the the end of the session, I'm just curious again, like if people are trying this out, they were looking for help or assistance, they just contact you through their normal, you know, information that's listed on the websites.
Is that just the way to do it there?
Okay. Yeah, I am, depending on the particular things that you need. There might be a email address that feels more suited but at the end of the day, all the different email addresses go to the same people.
So yeah, any of them that you find that you send an email to we will get back to you.
Okay, well, I want to thank you guys tremendously. Thank you so much. Again, this was last minute for me but even more last minute for you guys because I had like an extra day on you to get the star And thank you so much for jumping at the last minute and doing this. And I can imagine this is going to be useful to a lot of people. Again, this recording will be posted online at some point, hopefully very soon. Let's see, what else did I want to say there, I just am imagining that this and so many people are able to collect data, there's going to be a surge of interest in this plus, I imagine also, unfortunately, a surge in people that are using Mechanical Turk because of meeting money. So that's, that's going to be hopefully consistent there. What I'm gonna do now is I'm going to try something, I have no idea how this is gonna work, but I'm going to try unmuting everybody simultaneously, and then asking them all to give a very big round of applause for our presenters. So if anyone out there is listening, please just get ready. Okay, I'm gonna unmute you all. And then once I've unmuted you, I want you to start applauding and then we'll very I'm just very curious to hear how this is going to actually sound and my dogs are gonna freak out. All right, ready? 321
Okay, my dog just barked. So he's upset that this happened. He's mostly upset that I clapped. But thank you all for participating that um, so I think now we'll do is we'll just stop for half an hour for lunch, come back at 1245 we're going to have some presentations on neural imaging and other sorts of data like in aphasia and other sort of a typical population is in terms of datasets that are available already online. And then additionally, kind of a discussion about how to go about just asking people for data, which maybe is the easiest and quickest way to to get data in the short term, which I think is quite possible. Noah and Patty, is that okay, sorry, and shallow. If you want to come back, that's great. But don't you know, feel obliged, you know, and I'm sure you've got busy, plenty of things to work on. So we'd love to have you again, but you know, no worries about that. That. And so I guess Oh, sorry, I think I ended up meeting you guys. So I'm gonna try unmuting you. Okay? Yes. So, um, Oh, see everyone back at 1245 and I will be sending out information about where to find the videos once they're finally posted and available. Okay, and I will see you guys at 245
Thank you, everybody. Thanks everyone.
And I will stop sharing.
Okay, good. Okay, so it's 1245 I think people are still just starting to file back in here. It looks like the number is increasing again. That's great. Just to let everyone know that didn't hear before we actually had a great show a turnout for the first session we had over 100 people in this We'll all be posted online if there's a recording this going automatically for zoom. And it's a great tool. By the way, for those of you that don't know yet, zoom will automatically record all of your session, and then it will post it. And then you can download it and edit it and do everything else that you might need to do.
So now up next, so we just had a long workshop on finding five, which is this great platform for doing online data collection for psycholinguistic experiments or any other kind of psychophysical measures. And now I'd really love to welcome Florian Schwartz. And he graves gave some great presentations at CUNY and is also kindly agreed to talk about a PC ibex, which is based on ibex is an online platform similar to finding five but been around for a bit longer, and it has some perhaps more flexibility other features. So, point is going to just talk about that for about 15 minutes or so. And again, if you have questions, I'm going to direct you to this online Google Doc that we have. I'm going to copy the link instantly
Put that back into the messages, the chat box down here. So if you have any questions, please go to the Google Doc. And just there should be a section for for Ian's presentation there PCI VIX overview and just go ahead and add your questions there. And we'll do our best to do those out loud. If you put your name I can unmute you, and you can ask your question in person, but I'm happy to ask that for you. And if we don't get to your question, it will always be answered on the Google Doc by the end of this. Well, at some point in the day or so we'll get the questions all answered. So with that, I'd like to welcome for you.
And it has sort of an illustration of what this comes in a tank which is perfectly square.
So we have some images displayed here we have this text, which just for fancy illustration, is actually unfolding with the audio that's being played back. And you can now select an image by either running a word which is extremely sparse or by using the mouse. This
is in a pen, which is strikingly red
are possible. I'll stop going through the trials. There's just a few Have them to just give you a rough sense of what the code behind the sort of experiment looks like. Here is the core trial templates. So in PC ibex, you can work with template trials and then feed all the information from a CSV file. So you basically create elements, the core syntax of ibex is such that you create elements like a text element, and then you can modify its properties. And then you can carry out actions like printing it onto a screen, you can set timers, you can this sort of thing is called a selector in order to be able to click on parts of what's displayed on the screen. And this bit here, and I'll show maybe a little bit from the tutorial with more elaborate code is an extremely useful feature for anybody working with some sort of elaborate visual stimuli and the sort of what we call a canvas element, just like a painting canvas. You can very freely Ctrl, where to put images. So let me just scroll here.
I'm sorry, to
the illustration of the canvas element. So you basically create a pixel area on the screen or on the browser window rather. And within that you can put images and text and anything else visual, wherever you like, you can overlay things. And this is really a great tool for visual stimuli. And a lot of limitations. And this is true of the old ibex and other platforms as well. So that putting things on the screen where you want them is hard because everything gets executed sequentially. And this basically printed out line by line underneath each other, this canvas element, and basically gets beyond that and gives you complete freedom about where to put things. And so that's an extremely useful feature here. Um, let me just sort of maybe We highlight a couple of other things about the platform, we have pretty good integration of various recruitment tools. So we use this a lot with prolific which is a great online recruitment for scientific purposes, an alternative worth considering to Amazon Mechanical Turk. And we also use it with Sona, which many universities use for internal subject pool. So we get participants, for course credit through that internally. And all of this is fully integrated, and it's all described here. And even, there's even a section here on how to extract the data from ibex PCI VIX and analyze it in our so there's a lot of code scripts here. Something else that's worth mentioning, is that there's a really good documentation and support. So there's a forum on the website. And Jeremy very carefully tends to this many of your questions may already be answered there if you check there and so You can post comments there or send them to us at the email address that we have here. Let me perhaps in sort of closing, and maybe we have even a minute or two for questions otherwise.
also mentioned that
there are some more advanced capacities that are not sort of ready, boilerplate if you get something on PC, but we do have access or can give access to people's both microphones and video cameras. So for linguists, if you're interested in production, that may be something that's useful, you can actually very easily do audio recordings. And ultimately, you can also do use the webcam cam and in principle, the capacities are there based on that for allowing something like web based eye tracking. Now this is nowhere near ready to go. And we've actually be very happy to have other people chime in if they want to sort of get involved in working with But it's basically working like any other eye tracker, it tries to identify features and the visual input from the camera, as you look at different parts of the screen. Now, one shouldn't have any illusions about the accuracy of this, you might be able to something like are you looking at the left half of the screen or the right half. But for many visual world type studies in linguistics, for example, in psycholinguistics, that's already really useful. There are issues with that, you can imagine that that involves a lot of data storage and data storage is an issue for the results as well. So on the PC ibex form that we offer space is limited. You can easily set up your own farm. This is true of the old ibex and of our PC ibex farm, as well. Everything is open source, and the code is freely accessible, so that way you can get around things. But here we are, of course, looking at much more advanced types of features. So we're really excited to get this out there. We're very happy to answer any questions if we have a couple minutes, William, maybe I'll stop sharing right now. And just get back on video. And I'm happy to answer any questions that come up either from you as things that you've seen on the Google Doc.
Yes. So one thing that popped up in the Google Doc that got lost, it's actually from Noah Nelson who's from finding five and I'm just gonna unmute you know if that's alright, so you can go ahead and ask your question to Florian directly.
Hey, Florian, this is great. I hadn't known about PCI max before this this is really cool stuff. Um, so I'm actually wondering about the Sona and prolific integration that you guys have and what that kind of looks like how it works. Right?
Yeah, sure. So basically, the thing that's needed and and that the other integrations those platforms offer, and basically how it works and how it works in PC ibex is that you have to include say for Sona, the Sona ID in the link that sends people to PCI, Beck's and you just you know, you have a thing that's called ID, you know, Id question mark equals and then the Sona ID. And that allows you then there's some script add ons that you need to add in your PCI big script to save that as a value for a variable. And basically, you just keep that stored throughout. And at the very end of the experiment, you have a return link to Sona. And it's the same for prolific where again, since you have stored that value, you can incorporate that. And then you get linked back to Sona, let's say with that information. And they typically have the sort of return link where you can come back once you've completed and then the Sona system, let's say, will automatically register that somebody has reached the end of experiment and therefore should be counted as as having completed the study and get credit that way, especially for large studies. You don't have to go through manually to approve all the participations if you don't want to.
No, no, no, absolutely not. So all that PCI MCs generates is, it's just a web link that people go to the way maybe another worthwhile thing to mention just briefly is that I don't know how you guys handle it. I haven't heard about finding five before. So I'm excited because that is that basically all the downloading of stimuli. And this is especially relevant for audio and image files were a bit more data gets introduced. And if you're interested in the timing, you can download all the stimuli before people start the experiment. So they don't they don't have to do anything, but automatically, we usually do it as soon as they start reading the consent document, there was a download in the background. There's different methods of doing it, but you can set it so that trials don't start until all the resources are available. This is two pretty important advantages. One, right there is just no experience of lags and so on once the experience of the experiment starts for the participant no matter what their internet speed is because they only do the experiment once everything is in the air and it also means
Within each trial, the timing accuracy is actually really quite decent. Obviously, we're limited to what people's devices are like in terms of keyboard accuracy, which usually is what plus minus 100 milliseconds or something. But we've done a lot of response time studies, let's say, when the effect sizes are big enough, you can easily do this. And the accuracy really is good. And it's not affected by internet speed because everything gets pre loaded. And so a little bit of a tangent beyond what you asked, but they don't have to do anything by clicking on the link and then automatically stuff gets downloaded in the background and then the experiment will run.
Yeah, so that that like script download you mentioned was just on the researcher side.
Yeah, yeah, that's right. Yeah.
So there's like a few but very small clarification questions. Yeah. So I'll just answer ask all three maybe so if you have an account an ibex firm G to create a new one for PCI Beck's?
Yes, they're completely separately. How It and there's no direct transferring either, although everything that works on ibex should work on PC ibex too. So you can, you can migrate your stuff, but it's completely separate offerings by different
people. Yes. It is PCI MCs free.
It is absolutely free. And we intend to keep it that way and
yeah, right. Okay. And then there's one question that I've got a couple plus ones. So maybe Noah and for me, we'll have an impromptu kind of discussion here. you discuss the sort of pros and cons of these various platforms we this is this is probably a longer discussion and we have time for but I don't know if you guys both maybe have some just very brief comments, you could say to, you know, because people say the quote, whoever wrote it says, everybody says, quote, check the mountainside for yourself, but they said they feel overwhelmed. And this choice is sort of taxing and, you know, there are other some quick and dirty comments you guys could make about that?
Right, so I guess, maybe a good start for a competition, right? So maybe half the people in the you know, the tutorial here can try using PCI bags and half canoes, finding five and then they'll come they could switch should we can see, you know, who's experienced for more successful and got better, you know, publications and things. Which platform do we post the study to evaluate performance?
We need we need randomly assigned participants though, right? We are.
I see. All right. All right. It's good to have you here. All right. So there's some more questions, but I think those can be answered. offline in the Google Doc. So we also have maybe have some time at the end of the session to come back if people are still around and want to discuss this more. But I would like to keep going here because, you know, everyone has these diverse different methods that we're using. And for instance, I do neuroimaging and aphasia research, and I hope that there's a lot of people as well, they're interested in that. So with that, I would like to welcome real see Stark, I made sure to use the seat, which is important for the authorship. She had a PhD in clinical neuroscience at Cambridge, and she's now an assistant professor at Indiana University, and she's done a lot of work in neuroimaging and aphasia. And she is going to talk about a number of data sets and platforms that are online in terms of aphasia and other atypical populations. So with that, please welcome Brie.
William, can you hear me okay?
Yeah, you sound great.
Right, let me share my screen if it wants to play nicely, right. So great. And mcweeny had to step off for another call. But he hopes to jump back in at the end. So hopefully we'll get some insight from him as well. So, Brian mcweeny is the person who created talk bank and who gets all the funding to keep it going. So I hope he jumps back on because they've created some really cool tools. And so as William said, one of the things that I do is figure out how we can make clinical language data bit bigger, how we can use bigger data to answer some of our core questions. I've always looked at this through the lens of aphasia, as well as typical aging. So this talk will be a little bit more about that. But there's tons of other resources that I have on here too. So I'm hopeful if you are not just here for a patient, you'll still find it interesting.
You're showing your Presenter View rather than the weird. Okay, so I was a little bit weird. So just in case Oh,
no. How about that.
That's much better. It still seems to be kind of strange, a little bit on my screen. He's like, it's kind of cut off on the left. I don't know what that is from, but maybe you can just show Yeah, yeah, that's that's better. That's, that's good.
two monitors plugged in. So it didn't, yeah.
So now I have one monitor. Okay, so just to highlight. So topic. org is where all of these things currently live. And the goal of that is to bring together a whole bunch of resources from a variety of places, and put it all in one place to help those of us who are interested in big data, specifically language related data. This is what the website of topic looks like. Hopefully, Brian can jump in at the end because he wants to talk about this new thing that they've just instituted called talk bank dB, which is essentially a really great way to search the databases without having to kind of go in and manually scroll, you can actually use search terms to figure out out what databases to use, as well as some other really cool things. But talk bank has 14 different research areas that it supports. They're all called banks. And I've highlighted them here. So there's some that are based on conversations, some more dialogue based. And those RCA bank, Sam tail bank, and then class bank. They've got child specific ones, which you can see there as child language banks, many people are familiar with child's, which is also a part of the system. But there's also foreign bank and home bank. Some of those have conversations and some are more monologues. They have multilingual specific banks. You can see here a bilingual one, a second language one, and then they have clinical banks, which they continue to grow, which are invaluable resources because for those of you like me who collect clinical data, it's really quite difficult to find participants who fit your inclusion criteria. So having this Available is fantastic. So I'll show you kind of what, what the things look like in a second. I'll talk a little bit more about aphasia bank just because it's kind of the easiest to talk about since I use it the most. I do want to mention that even though there are multilingual banks specifically, a lot of these banks like aphasia bank have, they collect demographics that discuss whether the speaker is monolingual if they're bilingual, I don't know what proficiency metrics they use, but they do make it available, that type of demographic information available and aphasia bag is now available on something like eight or nine different languages, Spanish, Mandarin, Cantonese, etc. English is the biggest at the moment, but there are lots of different ones available. So it's not just English only. And so in order to access it, it's all password protected. Each bank comes with what are called ground rules. So all you to do is read those and then you email either Brian which has emails on here or to via Fromm, if you're really interested in aphasia bank, and her emails on the aphasia bank main website, which I'll show you a little screenshot of in a second, students can join, I make sure that if I have a PhD student or a master's student in my lab who's working on this data, I sponsor them as a as a faculty member, and then they join as a member if they they're interested in using this. And they also have some really great things you can use in class too, if you want to use it as a teaching resource. So this is what the main page of aphasia bank looks like. Let me see if my little pointer works here. Can you see my laser pointer? Yes. Fantastic. So on all of these banks are set up in a very similar way. So right now, my little laser pointer is over aphasia bank protocol. If you click on that, it will say exactly What data is collected, what tests are used? So for instance, in aphasia bank, they have demographics that they collect, and you can see what what demographics they're collecting all these sites. They do a neuropsychological battery, including a naming test, I believe it's the Boston naming test, really detailed repetition battery as well. And then they do a spoken discourse protocol, which is all monologue. It's not dialogue. That is based on picture descriptions, procedural descriptions, narratives like story, retelling personal life events, things like that. And for aphasia bank, all of the spoken discourse stuff is videotaped and already transcribed for you.
For reasons related to pH I and protection of health information, I'm not going to show you the whole interface because it would show an individual space. So I'll show you in a second what a transcript looks like. But I do want to mention that the neuro psych battery For instance, the naming tests are not fully transcribed, just yet, that might happen in the future. So I've been told. But one way you can do this on the aphasia bank website, is if you're just interested in browsing some of the data that's available, you would hit this browsable database that will take you to some choices. If you're interested in speakers with aphasia, you would then click the aphasia link, and then it would take you to languages that you're interested, right. So if you're interested in English, you continue there. And then it's organized by site. So for instance, Julius Fredrickson from University of South Carolina, he collected some data that's now an aphasia bank. So you'd click the Fredrickson site and it would take you to all of the participants that he collected at South Carolina. There's something like 20 sites at the moment that collect aphasia data, and you can always become one too. So just email Brian to join that you just have to edit your IRB accordingly. They also have things that called non protocol. There's a lot of interesting things like group therapy sessions that are recorded. But since they're not part of this giant protocol, they're not transcribed. So you can still search those, you can transcribe them on your own, but they're not as easily accessible. So, one of the things that I think aphasia bank is fantastic for is the fact that it's already transcribed for you and coded. So it uses clan chat clan, which is a type of coding and analysis software that Brian and his team have come up with over I guess the last 20 years now. I think he published a manual on it in 2000. It's fantastic. It's all free where it's compatible with almost every operating system. I think they're working on Mac Catalina right now. And what you get is a transcript that you can download and then manipulate using the clan software. So right down here, this is what your clan interface window looks like. That's where you tell it the commands that you want. It's a little bit like coding, but it's a very basic thing to learn. The manuals are fantastic. They kind of lead you step by step. And there's some tutorials as well on the aphasia bank website and talk bank called screencasts that will walk you through these things. And if I can zoom in, and hopefully this works, all of the transcripts are coded with gestures, pauses. The code at the word level as well as at the utterance level, they give you a fantastic amount of information about errors like pair aphasias morphological errors, a grammatic or paradigmatic sentence or utterance level. They give you a just fantastic amount of data that you can ask it to output later. And they're all checked on the transcripts quality encoding is all checked by the team at talkback.
Just to give you an example of some of the information you can get, this is called the eval tool. Plan and it gives you really valuable information like speaking time, you can ask it to do it for all stories. So for instance, you're interested in the story retelling of the protocol, which happens to be Cinderella. You can ask for information for every speaker from Cinderella. Or you can ask for every speaker across all stories, for instance. It'll give you things like mean length of utterance, mean like of utterance and words, or you can specify if you want it in morphemes. It'll give you total tokens, things like that percentage of parts of speech. It gives you an enormous amount of data and it's a fantastic way to probe language structure, but you could also use it to do language function, right if you wanted to go in and code things related to story grammar, cohesion, coherence, main concept analysis, core lexicon, all of those, you can use this data to do they're not coded specifically for those more discourse level analyses, but they're free for use once you're kind of a member of this team. So then I thought, well, William suggested I talked about kind of how I've used this data as maybe a probe some insight, or give some examples. So this is not official bank data. But this is the chat clan coding, just to give you an example of how this might be used. So will you mention that I do have a degree in neuroscience. So I approach everything related to language from the point of view of kind of what's going on in the brain. So what we did with my colleagues at South Carolina, is that we went in and we coded a whole bunch of picture description, spoken language from individuals who had had a left hemisphere stroke. And we coded a whole bunch of paraphrases or word level errors. And we looked at the brain damage or the lesions that were associated with semantically related errors or For anemic errors, for instance, and having chat clan coded, doing our coding, making sure we were reliable, but using that system allowed us to get this information really, really easily. Patient bank, talkback do not come with brain data attached to them. That gets a pie in the sky dream that Brian has. Maybe someday that will come to fruition. But you are able to probe more deeply into behaviors from a very large data set, which is now what I thought I'd give some examples of so how I've used aphasia bank, I did publish an article most recently and American Journal of speech language pathology, I think it was 2019 out you can see there on the left, and then my colleague Julia and I are trying to currently get this thing on the right, published at some point soon, fresh fingers are to 2020. So let me give you a little insight into what I was doing there. So in the AJ SLP paper We were trying to figure out essentially, do speakers with and without aphasia show tasks specific, realistic differences as in linguistic microstructure. So we were really looking at some of these variables you see here such as mean like of utterance words per minute verbs per minute or verbs rather, it's rather token ratio, those types of things kind of singular measures of discourse that one might be interested in that tells you something about the quality of the structure of the language, very much kind of at that
structure level, not so much functional level. But the cool thing we were able to do with aphasia bank is we had an enormous for aphasia, sample size, so we ended up having 90 individuals with aphasia. We got 80 for age and education and sex matched controls also from aphasia bank, and we were able to really directly compare how individuals differed on these kind of languages. outcomes across tasks. So across narrative tasks versus expository tasks versus procedural tasks. What we wanted to do after that is expand a little bit and do a bit more of a sophisticated analysis, which is what Julie and I were working on for the past year or so. And this is again, taking all of the data available in aphasia bank, which is reaching close to 300 speakers with aphasia at the moment, over 200 controls of various ages. We wanted to again look at kind of linguistic structure by task. But rather than do singular metrics, like mean like of autoruns, right, whichever one kind of disagrees on what that truly represents, why not just compare all of this data in a multivariate way? So what we did is we took a bunch of the parts of speech, all the parts of speech data that you get from the cloud. Put tensor usage all these interesting structure variables, and we wanted to model them in a multi dimensional space and then reduce down our space so we can kind of interpret it. And we wanted to look at the difference between tasks and again that linguistic structure. On the left you can see just a comparison of controls. Whereas yellow is a procedural task where people describe how to make a sandwich. Cinderella and important event there in the middle are both narrative tasks. One is autobiographical. Tell me about your life or an important event. And one is a story retelling which is the Cinderella story. And then you have expository tasks one is a call to Cat Rescue, which is a picture of a cat being rescued. And then window or broken window which is a sequence of pictures that people are asked to describe. We wanted to see if there was a difference by task and you see it in for linguistic variables of interest. You see it for good roles and you also see it. For speakers with aphasia, it's probably easier to look at this bottom graph here, which is just kind of narrowing down by aphasia type if you're interested in that, and seeing that there's some task similarity in terms of linguistic structure that breaks down there, we actually go into more detail and we look at severity as well. People with aphasia, various severities. So that's something we're also interested in kind of looking at. So that is how I have used aphasia bank. Many, many publications are out there using aphasia bank, many of them are hosted on the aphasia bank websites, you can get a great idea of all of the interesting ways to use this. But I didn't want to just give you things about aphasia bank, I wanted to leave you with some other resources as well. So you want to screenshot this. This is a great one. I'm sure this will be recorded for later. But Oscar is a fantastic resource. It's I think, hosted at Northwestern lots of speakers. data available there. And there are some resources for data collection like iPod, which is English words and pseudo words. They've also got things like clicks, which I believe is cross linguistic, which others at the bottom. But then there's also been some really fantastic fantastic things available in conversation. My personal favorite is the Carolina conversations collection. They have two cohorts of adults here, one cohort, kind of talks about their chronic disease. Another cohort talks about the cognitive impairment. They rotate who they are speaking with. So if it's a student versus kind of a known peer, and they also have longitudinal data on some of those speakers, which is really unique, that's not typical dementia bank, which is another part of talk bank, if you go to the pit database, it's one of those kind of by site organizations with And dementia bank. They have a lot of longitudinal data on a picture description, a sentence generation task, I think word fluency as well for individuals who have various types of neurocognitive disorders, including mild cognitive impairment, Alzheimer's disease, and then I think they have a few primary progressive aphasia as well. And Johns Hopkins also has a database thanks to RG Hillis, which has some primary progressive aphasia, also included on dementia bank if you're interested. There's a really great blog, I just left here on the bottom. It's a WordPress site. Hopefully it lives there forever. But if it doesn't go copy it to a Word document somewhere, but it has fantastic resources for open source language, either data like aphasia bank, or things to use in your own data. So you'll have to create your own things. And just a little plug before way it makes me stop and answer questions. If you're interested in anything related to smoking.
Language and aphasia specifically, please join this working group but kind of trying to improve the evidence and work together to make these databases even bigger. So that's my little plug. And I guess I'll take questions unless Brian, Brian joined back.
Sure. Yeah, I'm here. Yes. Drama video. All right. Great. So I don't have to say all that.
Obviously, you're, you're, you're the man the leader.
So you know, that once you get to be 74, we're very happy to have these young people are gonna take over Pretty soon, I hope. All, you know, all these types of data. I think eventually, we're going to have to have so many types of database like this that we're going to have, you know, lots of different input not just aphasia, but a lot of it is child, I just want to add a few things. So, first up for the non clinical, there's no password required. So trials by Ling bang someone totally open Also you your obviously your work depended a lot on the fact that the data would tag morphologically. You didn't show that in the particular ones you showed. But, you know, from all of the Asia bank, all of all of the English trials out of Spanish, German, French, we've automatically tagged by parts of speech and then by a grammatical dependency. That isn't true. I think we have 12 languages, we do that way. So that's, that's an important thing. Um, in terms of dementia bank, I think it's very interesting to note that there is now a challenge at interspeech for the best computer program that will be able to differentiate mild cognitive impairment from normal from, from really offensive, real dementia. And we know something like 150 computer science labs around the world that are, you know, basically using these data to make the most wonderful algorithms they possibly Can so that's a very speech technology type of world. There are also speech technology things for aphasia that some people, particularly for a praxian that you know people are getting into. And then within the child language area, there's a project called fun bank, which is really looks at young children's of phonological productions and it has a program called fun that has integrated within it all of product. So you can run pry inside fine and do all these fantastic you know, pitch extraction and following jitter and vocal fry, or whatever it is you or IPA or whatever you want to do in phonology. So I think those are important things. Um, let's see. One thing is, I don't know if I can share my screen here. I can't share screen okay, but main site, there's a there's a link to a tutorial page that I think would be very helpful to people In the tutorials are all screencasts for all these different, you know, facilities inside of them.
Yeah. And and you know not to push too well. So there's two other things. One is that we really do want to move more and more to linking to the brain like, just like you're doing great. And I right now I'm, I think the best approach would be to go to open neuro. We're working like with a project over at Pitt with Julie fees that will be taking aphasia bank data, and also we'll get scans with mostly talking structural here where, you know, it might be somewhat functional, but mostly structural, including white matter. And I think that is very important. So anyone who has ideas on on how we should best do that. My idea would be that we could keep all the language data on talk bank, but then link to open neuro for a specific data set. So you know, I think hopefully that would work. But we haven't really done that yet. And then the other is is talkback database talkback dB. On there's a link from the talkback page to talkback TV. Right now, it only goes to child's data. But really, by the end of the week, we're going to be we already have pulled in all the other databases. So aphasia bank, dimension bank will all be searchable on in that system. You know, you basically specify what data you want. When you want types or tokens, you can use CQ L, which is corpus query language to look for sequences of grammatical structures. And then you can get all the matches to those queries in a downloadable CSV, which you could you know, then pull into our or, or into Excel or what or whatever, but also we also have a package inside our where you can pull directly using our query so that that I think is going to be a really important thing. It is. We haven't got everything out there on the web. So I would say try it next week.
Okay, so that's enough to add. I think there's a few questions you can answer better than me. I'm just reading off of your questions.
Oh, I was just gonna read off of your questions. Are there any banks for signed languages or...?
No sign language is a really tough one. The problem is not not the nature of the sign. But problem is the privacy concerns are just extreme. It's been really difficult. There are there are some banks But no, yeah, it's it's really a problem.
Can students use it if they do not have a faculty sponsor?
They use first of all the open databases are open. And you know, it's really easy to get you this is I know, it seems like a barrier but it's really easy if if someone doesn't have a faculty sponsor. I mean, I guess I'll sponsor them you knowMostly what we reason we want the faculty sponsors. We we think students have much more energy than their advisors. And but we want to get those advisors somewhat involved to, you know, to make them aware of so that, you know, there's sort of a sort of the history of the whole thing. Yeah.
Yeah. There was a question that kind of disappeared, but just how consistent codes are across lab sites? If you want to speak to that, I can speak to that about aphasia bank.
Yummy. Well, I mean, the coding system of Chinese totally everything is in chat. So that's, uh, that's nothing. There are no other databases and spoken language like that. Where everything is in the same transcript format? Yeah. Yeah. So it's, it's, we try and keep it consistent across a lot of sites. You know, reliability is done at the lab, for the most part, you know, reliability coding and whatnot. Yeah, I mean, obviously, when you talk about coding, there's a difference between transcription and annotation and coding. So I think the transcription is really really tightly specified by chat, but then people want to code additional things. You know, that's the project specific and there are methods inside the clan program. Graham's for tracking that. But we're not, you know, we don't have a discipline. Everybody needs their own codes. Yeah.
I mean, we use we kind of make up a few of our own because we're interested in.
Right, but those are add ons. lets you have a basic transcript.
Yeah, right. Exactly. Yeah.
But I would say no. And then be clear, we really prefer to have transcript link to media. So all the older data don't have that. But we've really been pushing to make it so that the media, either audio or video and allow music video.
So yeah, I didn't show it just for PHP reasons. But yeah, when you go to the browser, the database, everything is locked, the transcript is actually locked to the video, which I think is fantastic. So you can, you know, go and click anywhere on the transcript and it goes automatically to the video and you hear that person, which I think is a really great thing that talkback does.
That's also true for the talkback dp. So once you get this output, it still has the link to every utterance that match back to the transcript that you can just go right there. Yeah.
Any other questions?
Sorry, my dog is barking just ignore.
Bree got most of those questions in the Google Doc but there was one. Are there any examples of children with TBI or aphasia?
Children with children don't get aphasia is no such thing. Children with TBI, but we have we have a lot of TBI. No, we don't. Thank God. There's very little money. But we do have some young people with TBI down to I believe 12. So, but but really TBI typically has motorcycle accidents starts around 30. Yeah.
Okay, so thanks, guys. I am sure we have plenty more to discuss, but like to get with that. Yeah. And I really appreciate you both doing this. I'm going to try what we did in the last session. So I will very soon. unmute everybody. And then hopefully we can clap to thank these people that get jumped in at the last minute to present this awesome stuff. So okay, get ready, guys. Kate 321
Okay, looks like my dog was not as upset about this as the last time. So that's good. He's learning. Now I'd like to welcome Josh Paskowitz. He's a PhD student in neuroscience and psychology at Indiana University. And he does work in the computational cognitive neuroscience lab. And he's got a lot of experience in using completely publicly accessible online neuroimaging datasets. And again, as we mentioned, online data collection is good, but just getting data that's already been collected is better. So this is something that you modulo, all sorts of other concerns. But anyways, I just like to welcome Josh and if you can take it away, that would be great.
Cool, yeah. Hey, I'm Josh. Josh baskets. I'm a student here in Indiana. And I just like to thank Think thank you guys for the opportunity to share my share my knowledge share all this data that I've looked at over the over my years in grad school. So let me set up the the screen share thing here share my screen, share. Okay, oh, this and then let me try to
Can I get present?
Okay, cool. So, um, yeah, I kind of just made a whirlwind of a tour of some freely available data across the web. Just so just some background about me. I do brain network. I looked at structural brain networks, functional brain networks, we merged structure and functional brain networks. And over the course of my studies, there's so much free data online. I haven't Fortunately, I've haven't had to collect a lot of data that consists of a large chunk of what I've done. So these are examples. Some of the data that I've used now, I'll describe to you some of the stuff. And so I guess I would consider myself a research parasite on this is like a phrase that has come up recently about people who use other people's data and make new hypotheses about it. And we totally depend on other people collecting the data. And it came up as a negative thing to begin with, I think in a New England Journal of Medicine, commentary, but we've since owned it our community of research paradigm, parasites, so So yeah, so I'm very appreciative of everyone collecting data MRI data specifically. So I'm not affiliated with any of these resources that I'm about to show you. I'm just going to give you my the user level experience and some practical practical experiences with this data. So one of the one of the places that has a ton of data is picture you can google it pretty easy fix, share calm. So sometimes if I'm bored, I just just google MIT one was in state anti rage system key terms. And also, when you're, when you're publishing your own study, an MRI study, and they ask you to share data, this is a great resource to put data, it's they give you a pretty large amount. I don't know exactly the numbers of gigabytes that they give you, but they allow you to put a lot of data here. And so there's been some stuff written about open data, Russell, Paul drac, in one of the big proponents in MRI, about sharing data. So we're going to go through this pyramid a little bit. So the top of the pyramid is the Like, what he calls potential for re reuse, like. So the top of the pyramid is like results, someone's analysis. neuroscience is a super cool website. I'm hoping I can actually preview some of this stuff. So if I click here, yeah, I'm still screen sharing. neuro sent is a great website if you just want let's see, when I do a analysis of just the key he meta analysis of ICA results across neuro imaging studies, neuroscience has collated coordinates in a machine learning manner. And they have these maps of activation for you. And you can actually so if you wish, I just clicked the link for language we can use any search term here, and you can, you can, you can actually download this map for your own purposes. So So you can doubt I mean, you're interested in default fault network, for example. Now, this is a magnetic map compiled for you, and you can download it here. So it's a it's a resource for free. Brain map is a similar one, this is a, it has coordinates. So you can put your, you can extract the coordinates from this one neural vault is a place where you can download and upload maps on thresholds, disco maps from your studies. So it's kind of kind of like neuroscience, but this is not just the
not just the results of the study, but like sorry, not just the significant clusters, let's say, but the whole on thresholded map. Okay, so one of the big if you're there, neuroimaging researcher and you just you just want data available human can ACTA project is like a one stop shop to get a bunch of data. So Human Connectome Project. The main study was over 1000 subjects at a 3d scanner at Wash U in St. Louis. And they since collected around 180 more subjects up in Minnesota with a 70 scanner. And they really went extensive in their imaging protocol. So this huge project, this part of it wrapped up the 1000 subjects, their age, they're pretty young, I think it's like around 25 to 35. There's a twin structure here. It's it's a really big data set on the level of terabytes the raw data, you can access it through. So this data set to get access they have a you have to sign up so it's not a point and click and then you get the data automatically. You have to do a little waiver but it Pretty, pretty easy to do. I signed up for a while ago and it took a week or two but, and I had to be underneath a pie, of course. But once you go once you have good access, you have access to pretty high resolution T one weighted image and to two weighted image and point seven millimeter isotropic voxels. The resting state it's sampled at a tr of point seven seconds. So and that's for an hour they have for 15 minutes sessions. So that's pretty cool. You're into functional network reconstruction such as, like I'm interested in it. They have a lot they put people through a number of tasks. The tests aren't so in aren't so deep in meaning I would say that I mean they're not as they kind of were just doing a wide survey tests in my opinion. Say I'm working memory gambling, motor language. But they still like the resting state functional scans, it's pretty highly sampled. So it's pretty good quality data. And then diffusion imaging, so you can run your favorite tractography. With these data. It's a pretty good diffusion acquisition. The coolest one of the coolest parts about this project is that they pre processed the data for you with their minimal pre processing protocols. So you don't have to worry about they did a lot of technical advancement. And you can just download the results of their pre processed data. So they've already gotten for you the movement parameters, let's say and they have already normalized it to MSI space. They actually even created their own proprietary format called safety, which has both the surface data projected to a cortical surface in a standard space and subcortical structures and letters Say you're not even interested in all the data, subject level data, they actually have pre processed group average, Cohen's d activation maps for their tasks. So if you just want to look at a map of activation and kind of just do your work from home during this quarantine stage, and you can also download that. So that's, that's a pretty useful I did in this paper, for example. And the cool thing, so HCP is such a pervasive, I mean, we probably heard of it most, a lot of us have. They have, there's a lot of people that have released their versions of the data. So here's some examples. I've, I've included, links to where people have processed 1000 structural brain networks, for example, you can download, you can download the 70 tractography data time series at the bottom here. So there's a lot of options now, so at CP, you have to sign a waiver. Indi, the International neuroimaging data sharing initiative is just pointing collect for most of the data, you just go to this link right here. And you can get access to a lot of different data studies, I'd say that this is, in my opinion, one of the older data sharing initiatives and MRI. So there's a lot of different studies, I would estimate over 30 or 50.
And for most of it, you just go and click the data and you start downloading to your computer so it'll take a while. Some of the datasets you need to register with nitric network is pretty easy. It's a neuro imaging and informatics resource. You just had to sign up there. So just that they have your email. But here's a view of some of the some of the samples that they have. I mean, I could just click but you can see that they tell you I mean, there's 200 scans here. 28 scans here. 25 scans here. There's a lot so I don't think anyone is going to struggle for general data, again is not so specific. But if you just want to play around with some data processing, indie is a great resource to look at. I have some highlights from MB that I like. So Anka enhanced is a particularly good data set. In my opinion, it's a lifespan data set. So it has data from 10 to 18 years old. And they also have deep phenotypic information, the deep the information, you will have to apply for to get a waiver to get to get all these tasks and the physical measurements, but once you get that, it gets pretty deep. I didn't mention the HTTP also, it's pretty deep phenotypic information. By this a large autism data set 1000 combined autism and typical controls, mostly resting state data. ADHD. Slim is a longitudinal data set. from China. MPI is another data set the brain body mind brain body data set from Leipzig. This also has pretty deep phenotypic information and there's an insurer scientific data article. Atlas is a is an openly available data set with lesion segmentations from from USC USC, the Southern California USC. So that's cool to check out if you're into some lesion segmentation is already done if you want to test your new lesion segmentation software. How about some clinical data while there's so these ones you definitely have had aadmi and P PMI, Alzheimer's and Parkinson's data sets? They're huge over 1000 and over five 500 each of these you do have to go through a little bit of a lengthy application for but they're just no they're out there prevent ad they recently wrote a piece Print and I don't really know much about it, but it's available. It's available and abide This is they even have more autism data, movie data. So cam can is 600 people, over 600 people from Cambridge collected at Cambridge across the lifespan and they have rest in movie fMRI and a good amount of behavioral data. This one you have to apply for as well, but it was pretty easy. They just took a while to get back like a signing the form it was very, didn't take much but it took them a while to respond. Study forest is around 20 subjects but they watch Forrest Gump in the scanner. So that's pretty cool if you want to, and they also this data set in particular, they did a great job at annotating the forrest gump movie. I think they presented this at a group from Germany. So they they have all these annotations in German, of like what was shown at the time. And then Jim XP at Dartmouth has openly available fMRI data from someone watching Raiders of the Lost Ark. The Healthy Brain network serial scanning and initiative also they 10 people that have watched Raiders of the Lost Ark plus 10 sessions resting state, and also like watching movie trailers. Healthy Brain networks. So the other one was like a, like a test phase for this larger study. And this has a lot of children in it. And they watched a movie I think they watched Cloudy with a Chance of Meatballs. I don't know much about the eg but they had that available here to this one. I'm not sure if it's a point and click and download or if it's a waiver and then deep data on single subjects. So my connectome is Russell projects dataset where he scanned himself 60 times within a year, and he actually has saram data on his blood And he, he has some tasks resting state, so to one person but really densely sampled if you want to and they actually pre processed data through fMRI prep available for this one. Simon is a guy who got scanned at many different locations around Canada, I think
this is Todd constables dataset where he could scan 30 sessions within a 10 months. And he was a couple movies he did resting state. That's a double download at midnight scan club. This is been a DISA data set that was recently used that that 10 people 10 sessions, and is pretty high quality data. The Grattan lab actually has at this link in the slide they have data available to download like already time series available To download for this data set, so that's helpful too if you don't even want to do the pre processing, and then we even have non human so monkey data available from the group in New York, the is available. And then Allen brain analysis is not MRI. Well, there's some of that. But this is a whole Allen brain analysis is a great resource for mouse brain activity. Open neuro. So this is all pretty all overwhelming, open neuro might be a little bit more overwhelming. And I just want to click here really quickly, about open neuro I mean overwhelming in a good way. Open neuro is a place to where people deposit their datasets. And they have a philosophy of storing their data and the bids format, meaning that it's just ordered as well for you. But you can click a dataset here, I just click this random data set, and then it takes a while to load but you'll see that the These data are available and cannot be. So here's the study. Usually there's a readme, some acknowledgments if you reuse the data they asked you to cite here, and then I'm just on the right side here, clicking through each subject and you can just see the day that they have anatomical data functional data, and you can download these in bulk too. So it's really helpful. You can browse this website, you can see if you have people with some coffee their day so that's really useful. I just have some pictures from it. I'm finally so this is this is a data this is an initiative from the pistil lab here at IU. I can't get too in depth into it because it's it does a lot but this platform can in the first place. form, I just recommend going to the link and looking at the booking that what they have to say here. But basically what you can do is you can they have the computer setup for you, you just click the kind of processing, they do freesurfer they do fMRI prep at this website. So it's pretty cool, I'd recommend checking out the resources. And then finally, since I'm a network guy, I actually have, if you go to my GitHub, I have compiled a whole page of openly available network data for you to play around with. So this is just my good job. I have non human animals, I've human animals, network data, so that's available. So and then finally, if you want some help, go to neuroscience calm I really like this resource. So that's, that's it for me. You can hit me up on Twitter, or I'm going to share this I'm going to share this PowerPoint with The links in it. But in this time where we're all at home, there's definitely a lot of data to, to play around with. We're not, we're not alone here we have a good community of very generous people. And I think them as a data parasite myself.
Thanks, Josh. That's fantastic. And I just want to particularly thank Joshua for doing this because I think literally, he was asked maybe last night at like five or 6pm Central this together. So thank you for doing that the last minute that fit perfectly. So as Josh said, he'll send the PowerPoints so we can upload that and you can access all those links and resources. So I'll just ask a couple questions, if you don't mind from the Google Doc. So the first one is asking for the resources. So yeah, we'll make sure that those get linked. And then the second question is more general about so this person said I'm not familiar with using big data or data sharing from other labs from your experience, how is publishing with it? Do reviewers criticize it? Is it pretty welcoming literature
sorry. publishing the
Yeah. So what's, um, you know, if you were to compare publishing a controlled experiment versus data that you, you know, got from somebody else, how does that fit with reviewers and journals, you know, is that?
Yeah, yeah. That's a good. That's that's a great question.
In terms of, yeah, because I'm not the one me personally, since I'm not the one collecting the data, I actually do feel compelled to share the derivatives of my study. For example, I had a I published on that nk dataset, and I actually uploaded the networks that I generated, so that I'm doing my part two and sharing the data. This was also though, there's some trickiness of you, in any of these situations, you should consult like the documentation from the from the original data source and make sure you can share derivatives for the nk it's it's one of these datasets where you can kind of point and click so it's a pretty open data sharing
agreement that they have, let's say, but other it would be a case by case basis with other data sets, for example, add knee, which is like clinical Alzheimer's data, you would have to check in with their documentation to make sure you can share the derivatives. But other than that, yeah, I mean, you didn't collect the data, but you're still using it. Obviously, you shouldn't share the raw data on someone else's behalf. But I would say it's a good gesture to to, to share what you've done with the data. That's unique on your end.
I think you're muted.
Sorry about that. Yeah. So I think that another part of that question is really just you know, is that harder to publish using borrowed data or shared data in your own data.
Mmm hmm. I this is just the perspective of me as a student. I think it's, it's harder to generate hypotheses. I mean, in terms of you have the restriction of using data already collected. So your hypotheses might not be able to be as, as you might not be able to tailor the data to answer the exact question that you want. How so in that regard? Maybe it could be harder, however. I think. I mean, from my perspective, as a grad student, I don't think I don't think it should be looked down upon or anything from from a journal editor, let's say have they there, be reluctant to publish it? Just because it's not your own? Correct. I wouldn't I haven't heard any stories about that. Or it's hard to know, I don't know, editor anything.
Right, right. I guess we should all make sure to browbeat people, you know, to make sure that they're okay with doing this sort of thing, because especially now, right. It's not like we have options anyway. But it's also really expensive and can be difficult to collect. You don't have to enter where you are. Um, another question that just got from the chat was, I'd like to know to which extent data or raw data or. Right, yeah, so like, I guess the question is really about, like, you know, if data have already been processed, how hard is that to work with?
Yeah, yeah, well, so most of the data that I was in this that I shared here, they're all a nifty format. There's no daikon. So it's a little bit above raw. But like for HCP, for example, that's pre processed. And they have raw but I wouldn't actually recommend downloading the raw because it's it's terabytes in data, the Open, open neuro that huge website with all the studies, I'd say they have over 100 studies on there. That's nifty level data. And that's not that's not pre processed. So that is, but you also should do whenever you're working in this situation, you should do the legwork to make sure it is what you think it is. I say this because for example, on some open neuro datasets, some individual data sets, they've school stripped the T one, for example. And so you wouldn't want to apply SPM normalized to a certain Atlas, if it was brain on or sorry, skull on or skull off. So that's kind of the things you have to just check out with, with the documentation. So all these all these resources should be well documented. And maybe as a, from my perspective, if it's not well documented, it's probably not worth using. So because then you don't know if they've pre prod like if they've nuisance regressed the data already. You don't want to redo that, for example. So stuff like that. I mean, it's on a case by case basis, obviously, but, but generally the resources I have here, that they're well documented, they'll tell you what's raw, what's pre processed. But most of it is nifty format, unprocessed.
Cool. So another quick question is, is there a convention for citation of data? Like you just cite the paper? Is there some other thing you have to do?
Yeah, yeah, that's that's a fantastic question. opener. Oh, again, I keep on coming back to this. They asked that you cite open neuro, and I think they have a creative I think they forced everyone. I think if you're gonna post your data on open URL, you have to accept their creative commons license.
You did last time. Right. And that's what they said they make sure that it's gonna be creative commons.
Yeah, yeah. And, but add me, for example they have, when you sign up for adney, the Alzheimer's data or imaging initiative, you have to you actually some, in some cases have to add them as a last author. So that's a totally different level of acknowledgment.
Um, that's a pretty big jump there.
Yeah, yeah, that's a pretty pretty darn big jump. So but I would say, I would say it's just, uh, if there's no explicit, if there's, if there's no agreement, if it's really open, for example, indeed, I think it's still a good gesture to put it in acknowledgments when you're publishing. Just Just so that your readers are aware. And just so that they, they at least have some sort of metrics in terms of when people Google the data center, or PubMed or something like that. It kind of shows up. But there is no standardized i think i think there are people are pushing for standardization of assigning credit to dates. I don't think there's consensus. But I would say, I would say the best practice is to when in doubt, just put it in the acknowledgments at minimum.
Yeah, that seems like a great option. So another question here. This may be a little off topic, but has using shared data opened up collaborations with other researchers or sites? Or do you just typically work independently? Once the data is available? You just download it or do you more work actively with those researchers?
Looks like either me or Josh is frozen right now. And not sure what it is.
So I'm just asking if people can hear me more Okay, so it looks like Josh is frozen.
So is Josh frozen, please indicate.
Okay, all right. Well, um, you know, Josh may or may not be able to hear us, but this is being recorded. So, you know, he'll be able to go back to that Google doc and answer your questions on that, hopefully. But at any rate, you were the judge can hear us now or whether he will come back and view this on its own time. I'd like to do again, the applause. So again, I'll do the same thing. I'll count down and then please just give your applause to Josh. They can forgive us together at the last minute and helping us out. So 321
awesome. Okay. So we're at two o'clock, but I would like to just stick around if people are okay with that for just a few more minutes and just basically the idea was You know, we've talked about a lot of interesting things publicly available data sets, methods for collecting data online. But again, maybe the easiest way to go about this is someone published a paper, you realize you read that paper and you're like, I could probably use that data to do X. And you can just ask them for the data. And I think it's often the case that people are willing to share that. So basically, I just wanted to bring Bree back. And if there's anyone else that has a lot of experience with this, please just go ahead and indicate, you know, in the chat if you are interested in sort of talking about this, but I thought that Bri would be good because we have a lot of experience in asking people for data. So again, like I said, if you are interested in chiming in here, please just go ahead and
indicate that in the chat. I'm going to actually looks like Josh is like, thawed so I might try to bring him back.
Just in case Yes. Hey,
I am sorry about that
wars. But maybe you want to chime in on this topic too. I'm not sure if you're new with directly asking people for data. But at any rate, I think this might be a good place to get a lot of questions. So I don't know. I mean, from my experience personally in graduate school, I mean, I started off and I probably could have done this, but I think I was very shy and did not want to just ask somebody, I felt too brazen to do that, which I didn't do. But I think in retrospect, it wouldn't have been a big deal to do that. So that was my experience. I don't know if other people have that sort of experience. But if you have questions about that, or concerns, please just go ahead to go to the Google Doc. And just type your questions or comments, if you want to add comments. And again, like I said, if you have some experience, please chime in here. But I don't know maybe Bree, could you maybe comment on this a little bit?
Yeah, I think this is really tricky when you work in clinical data like I do. A lot of
It's still kind of gray territory, whether or not lesion two brains constitute protected health information. So it depends kind of where you are. I know in the UK, so I did my PhD at Cambridge. So I was based in the UK. They their ethics basically state that you can't share any brain. That's not quote typical, because it does contain PhD data. Here in the US IRB is kind of vary depending where you are, they're more stringent at a hospital or Medical University versus not a Medical University. But something that typically has worked for me is just asking for standardized lesion masks. So masks that are already standardized to something like the M and I space to which you can then do some analysis on and most people or many people are free to do that. But as someone who collects clinical data, it is a lot of money and time to get this information.
And it is difficult to share. But I think it's always worth an ask that I my caveat is to let people know why you wanted exactly what analysis you want to do. And in the first email to say, let's work out authorship criteria, because I think that puts forward. You know, I respect that you collected this data I want to give you, whatever you think that you know, you need in terms of getting people to give you credit, and that typically has worked out the best for me, I would say, right, and that's a great point that you make, right, I think, you know, if you author if you are offering authorship, I think pretty much everyone is going to want another paper on their CV and citations of their pre existing paper. So I don't think so is something that people are, you know, I mean, I can imagine situations, I'd have known some people that are more protective of data, but I think once they published a paper already on it, for example, then I think that's it opens up a lot more flexible.
Yeah, I'm trying to think of filling out some more thoughts on this. Does anyone else want to chime in here? Bree can go ahead if you want or someone else has a question or or like I said, if anyone has an experience with this, please feel free to add something if you'd like. So, yeah, someone asked a question on the Google Doc with regard to authorship, with the data collected go last or just somewhere in the middle.
For me, it's kind of dependent on the preference. I think in neuroscience, typically, the most senior author is either first or last. So a lot of times people sharing are fine with being in the middle because they they collected the data and according to like APA agreement, data collection alone doesn't necessarily constitute senior authorship. It kind of depends on what authorship criteria you're honoring. But I try to be very forthright with that and saying, you know, this is what we typically follow in terms of APA
You want to talk about it? But typically I see them at least a neuro falling in the middle. Josh, I don't know if you have had a different experience. I know you said aadmi likes to be senior author,
but I don't know if you've had any other different experience with that.
Yeah, so here at IU I've, I've actually helped pre process data for other labs. And so as a data preprocessor, where I'd actually take the raw data and pre process it. I've been a middle author for that contribution.
But never have I been as a data collector of the only thing that I can think about is at USC in California. They have Paul Thompson's imaging genetic center, they have the Enigma project, and they're, they collected across they collect thousands of stuff Subjects so they have a 200 person author list. And everyone who's collected data and done the basic pre processing to aggregate the data is in that middle 200. And then there's a block of senior authors at the end who are so maybe like three senior authors, and they're in a little non alpha, but that middle block is alphabetized. And the last block is not alphabetized. And then the first block of people who did the first draft is, is they're there. They're in a first author block. So that's my only experienced that this is, in terms of the that's just a bunch of data providers. And not just though I hesitate. I mean, they're providing a important part of the Enigma project in those instances.
Right, right. And I think an interesting point to bring up here is if your research is federally funded, there is in some sense an obligation to disseminate your research and your data in some sense.
Right. So I think this is something that for us to think about, it's something I've never really done. I've never publicly posted data at the same point, you know, is there some obligation that I have, because my research is NIH funded, you know, to make sure that that's available so that the maximum, you know, gain can be had from that data that was collected. So it's a, I think, an interesting point to make, you know. Yeah. Does anyone else have any questions, comments, concerns? In one on one I just like go back into finding five and just like, you know, practice? Yeah, I guess not.
So, if that's it, then I just really want to thank everyone who participated is still here. Brie. Thank you for just responding to my email yesterday. And Josh, like I said, for coming online, and then know from earlier, that was really an amazing presentation, and everyone else that partook as well. So we're going to make sure that these recordings are posted. I'm not sure exactly when, and how that's going to be done, but I'll make sure to disseminate that information. And maybe the case that I post all these recordings on the CUNY conference website, maybe along with the recordings from the conference itself, that might be a good solution. But we'll be however you found out about it for today, then you're going to find out about it for the posted videos. Please feel free to share those. Um, I think that's pretty much it.
I don't know how to end these things. I don't know. But anyway, I'm just gonna say goodbye. Good night. Good luck, Live long and prosper. And, you know, like I said, feel free. You can ask us for data. I know that I'm willing to share anything that I published. If I can still find the data. I'm perfectly happy to share that with you as well.
So All right, thanks, guys. Goodbye. And thanks to the CUNY organizers for making this happen really because they hadn't. They hadn't followed through and actually hold CUNY then this workshop would not have happened either. Okay. By saga soul stop coding