Ep. 31, The Social Science of Invented Languages with David J. Peterson & Jessie Sams
1:40PM Nov 14, 2022
Speakers:
Dr. Ian Anson
Campus Connections
Alex Andrews
David J. Peterson
Dr. Jesse Sams
Keywords:
language
created
sound
form
writing
word
consonant
noun
vowel
english
umbc
people
means
quechua
character
esperanto
called
dog
human
type
Hello and welcome to Retrieving the Social Sciences, a production of the Center for Social Science Scholarship. I'm your host, Ian Anson, Associate Professor of Political Science here at UMBC. On today's show, as always, we'll be hearing from UMBC faculty, students, visiting speakers, and community partners about the social science research they've been performing in recent times. Qualitative, quantitative, applied, empirical, normative. On Retrieving the Social Sciences we bring the best of UMBC's social science community to you.
soliton cash both the non Alere ki rodella sociocracy and Soroush. Sorry, let me try this again. What may little while blah, ah could may do to this. Alright, that's not it either. Okay, let me give this one more try. My throat maroon aka ie the top retrieving gin, social sciences. Okay, let me try something. Hello, and welcome to Retrieving the Social Sciences. All right, that's better. Today we're delving into a fascinating topic that has captivated minds since time immemorial. We're thinking today about the social science of language creation. And how better to start this discussion by accidentally introducing our podcasts in three different completely made up languages. You know, I might be a little bit rusty, but those were approximations of Esperanto, Clingon, and Dothrocky from the worlds of the 1870s in Poland, the Star Trek universe, and HBOs Game of Thrones respectively. Incredibly, these three languages were all created on purpose by individuals. But how does one make up a language from whole cloth? As it turns out, the social science of language invention is really complicated. But in our recent Social Sciences Forum lecture, co sponsored by UMBC is Modern Languages, Linguistics, and Intercultural Communications department, the Department of Media and Communication Studies, and CS3, we got to take a close look at this fascinating process. That's because we recently invited two experts in made up languages to campus. David J. Peterson holds a master's degree in linguistics from UC San Diego, and is a co founder and original board member of the Language Creation Society. He also served as its president from 2011 to 2014. And he works as a language creator and author full time today. In fact, David has created languages for many hit TV shows and movies, including Game of Thrones. Our second guest, Dr. Jesse Sams holds a PhD in linguistics from the University of Colorado at Boulder, and currently works full time with David on language creation for film and television. I'm so thrilled to bring you this fascinating rebroadcast of our Social Sciences Forum lecture. Let's get into it or should I say, much QA?
A construct, a conlang is a constructed language that is a language that was specifically created all the languages that we speak or sign have been created by humans. They were just created kind of incidentally, conlang is something that was created on purpose, usually by one person, but sometimes by two or more people. Very brief history of language creation. There, we kind of date the history of language creation, to Hildegard von Bingen, which is she was a German abbess from the 12th century. And she created her own language, which she called link Magnotta. And she used it primarily for writing songs, it was also mainly nouns that was just added to Latin grammar. But that's the first physical evidence that we have of somebody sitting down and creating a language on purpose, feel free, feel free. So that was a in the 1100s. Then around the 1600s, there was kind of a new boom of language construction, the philosophical language movement. The most famous of these was John Wilkins language, which he didn't name he just called it the philosophical or a philosophical language, where scientists reason that human languages were imperfect. So we needed to create one that we could use for scientific discussion, that their their aim was to create a word that that basically you could tell what it meant simply by how it was pronounced. And so essentially, what it was was a classification system, where everything was classified by sound in a single word, very cumbersome. There were a lot of projects around this time. None of them were extremely successful. This was the only one that people remember, in the late part of the 19th century. That was the advent of the International auxiliary language movement where, where people were creating languages for promoting peace, the idea being that if everybody spoke the same language, everybody would agree and there would be harmony, which is really funny if you look at America. But anyway, so this is the most famous of them. This is ll Zamenhof, who created Esperanto. It wasn't the first the first was Valade peak, which the two parts of that word apparently came from the English words world and speak vola peak. And that was, Martin Finch layers attempt at creating a universal language, Esperanto was much more successful. There were many, many others at that time. And there have been many others since Esperanto is really the only one that survived and is still spoken today by a few. But really, in the 20th century, that was when we saw people creating languages for kind of the love of it. And the first one that we have on record again, it probably existed for many centuries, but it's just hard to find this stuff. The first one that we have a record of is JRR Tolkien, who created his languages before he actually wrote any of his books, kind of the books were a way to give his language is a place to breathe, according to him, and so he was one of the first language graders that we know of that just created for the pure joy of it, and many in the 20th century, took after him, either directly because they read the Lord of the Rings and saw the appendices and were inspired, or just because there was lots more time for it. In the late 70s, mid 70s, we had the first person ever paid to create a language, which was actually Victoria Frumkin, who created a language for the land of the last television series. But the first one that people really know about is Mark O'Quinn, who created the clang on language, who by the way, lives about 20 minutes from here, he lives in DC. And this is a photo we were all at a thing in Amarillo this summer. And so there he is, Marco Korean created the clang on language for Star Trek three. And that was really the first language that wasn't created for international communication that kind of captured people's attention, really, outside of Tokyo and stuff. And he went on to create a couple of others, including the Atlantean for the Disney movie Atlantis. But then the latter part of the 20th century, once the internet kind of got into homes, that was really where personal language creation took off. Because basically, a lot of people and maybe you maybe some people here fall into this category, maybe you know, somebody like this would try to create a language or a little bit of a language as a kid just for fun. But usually it doesn't go anywhere. And that was because you will often find other people where if you tell them about it, they're extremely uninterested in it, and don't understand why you're doing it. The internet allowed people to find each other. And so in like 1991, new listserv was formed built from a Usenet group that turned into the conlang listserv. This eventually was what gave birth to the language creation conference. And this is a picture of the first language creation conference here, which gave birth to the language creation society. Anyway, so that's kind of where we are at now. So what I wanted to discuss, what we wanted to discuss at the beginning was authentic language. Because we talked about creating an authentic language. And really, when it comes to that, it depends what you're doing it for. So here are just four different types of languages that are very, very different. And they're different, because they're created for very, very different purposes. So looking at one of these, so Esperanto, for example, this is English, right? So we have these verbs here that have to do with sitting down, right? And as you can see these, there are kind of bizarre interrelationships. So see, seems like it's related to sit in historically it is, but it sounds a little different. And we see that we have like a regular, non past and past tense here, but then we have sit and sat. And that's kind of a bizarre thing, where if you're learning language, or if you're learning English, you just have to learn this and know that okay, it's a weird regularity of English, all languages have these. But if you're creating Esperanto, which you're hoping a whole bunch of people are going to learn and use in their everyday lives and you want it to be easy. Well, it's like why do that instead, you could do something like this, where there's just one route here, right? And you can see just one route that's related to all of these. So remember, this is seat seated sit sat, same route, and you can see the exact same tenses. So the present tense is always asked the past tense is always easy. And there's even this little bit right here, where to seat somebody right to seat somebody means to make them sit. And so you just add this little bit and that's a very rare The other thing, and so it's like this makes sense, if that's why you're creating the language. The question is, What about like, if you have, like, these are the three key what are they? They're just humans. So what do you do if you're creating a human language? Because our languages don't look like Esperanto? They're not regular. They're not simple. They're not easy. So what do you do if you're just creating a language for humans? How do you create something that's authentic? Well, I want you to consider just for a brief moment, considering like humans versus machines. And so I have this this anecdote, I love this. There's a game that a whole series of games we used to play on the apple, two Gs computer, this one happens to be King's Quest for. And what this type of game is, is like you can move a character around. But in order to interact with the world, you have to type in commands. And so this is one particular instance where my friend and I blame we were playing this game. And it's like, the whole thing is like, fairy tales reenacted, and this is the seven dwarfs house, house, you get in there. And it's all messy. This is the character you are. And it's very clear what you need to do, you get into this house, and you're supposed to clean it up. And so then there's this cursor down here. So it's like, alright, type something to do. So the first thing we try clean house, and it says, You cannot do that right now. It's like, okay, tidy house. It's like, you cannot do that right now. It's okay, clean kitchen, you cannot do that right now. It's like, we're just sitting here typing these things in and it's like, wash dishes. It's like, why would you do that. And it's like, then we see it. This thing is a cupboard. And we like we go over there. And we like open cupboard, and there's a broom in there, like, Aha. And it's like get broom, you can't do that right now. It's like, sweep house, you can't do that right now sweep with broom, you can't do that right now. And so like, for 15 minutes, we're typing things in here trying to figure out so we know what to do. And at one point in time, I'm at the keyboard, because we were taking turns. And I type in clean, because I was going to do clean house and blade said, we already did that. Like Oh, right. And so then I just hit enter. So I can get a new one just for the word clean. Suddenly, character stops, walks by herself over to the cupboard, opens, it gets the broom out music starts and she starts cleaning, you just had a tight clean. Because it didn't matter if you said clean house or clean kitchen, because you're working with a machine here, it doesn't understand anything, it understands what it's been told. And so it's like clean means something clean house meant nothing to this thing. Humans on the other hand, are quite different. So here's a picture of me. And I want you to imagine like, this is what I'm doing right? And I say anything to drink. And Jesse replies, top shelf, right?
This is successful human communication. Think about this. Like, nothing in here suggests anything. And yet we know that it works. Because when I say anything to drink, I know. And I believe that my interlocutor knows that. What I mean is do you have anything to drink? And what that means is do you have anything to drink that I can have? Right now, all of this is just understood. And of course, by the way, I'm assuming that that something to drink will be in your refrigerator. And that you will know that I have opened your refrigerator and I'm looking inside of it right this moment. Like all of this stuff is just assumed in anything to drink. And then in the response, right? Top shelf, right, we know that that means something to drink is on the top shelf of the refrigerator on the right hand side, right? And because I'm telling you where this is, we both recognize that means that I'm allowing you to have whatever is there. And of course, like the way that the response works, it's like of course, it acknowledges that you know, basically yeah, I'm it's okay that you've opened my refrigerator and asking for this. You can tell just by the way, the responses that goes like humans are absolutely great at this stuff, right? And so when we're talking about creating an authentic human language, right, I want you to think about this as linguistic phenomenon. Okay? This is the stuff that's attested in actual, not actual natural languages, the stuff that actually happens. But this is stuff that's possible. The stuff that actually occurs in our language is such a small percentage of the things that are possible. In a human language, we could have verbs that conjugate based on the color of your socks, it would be easy to implement, we just don't do it because it's pointless. And then of course, there's phenomenon that's possible in a system but not in a language. There's plenty of that. So when we're talking about creating an authentic human language, we're aiming for like right here, stuff that happens, maybe a little bit of stuff that could happen in a language that we don't see. But we're not, you know, stretching the boundaries. So that's what we're talking about when we're creating a human language. Alright.
Um, alright, so when we create a human language, we're looking Got three major systems of creation that we're going to talk about here. Sounds, grammar and orthography. We're gonna go through each of these in the presentation starting with sounds. So when we talk about sounds that appear in human languages, we can turn to something like the International Phonetic Alphabet. This is the pulmonic consonant chart. And so you see this a lot. These are the consonant sounds that start in your lungs, and they are produced as air leaves your body. And there's a lot of them. And if a machine were to look at this, and were to say, hey, we could pick any set of like, let's say, 20 of these, maybe 25, if we're going wild, you could pick any of these, you could just kind of treat it like a dartboard. And you know, say I'm gonna take the sound and the sound and the sound, and do it that way. And that that would be a randomized approach. But we can't do it that way. Because that wouldn't be authentic. So we need to look at this chart and see what's actually being represented is the human body here. And so that chart is actually organized to talk about sounds in the manner in which they're produced, but also in the location. So sounds can be produced that really, starting here, the pulmonic consonants anyway, sounds can be produced anywhere along this tract on the way up, when we're looking at the sounds of English, you can group them. And you can see that English completely avoids this section, where sounds are totally producible. But they just don't they, we I am an English speaker, we don't use it. And so when we learn languages, like Arabic that do use this area, it's it's difficult, you have to learn how to produce sounds that that aren't in your native tongue. But you could organize them to say, here are where all the sounds are produced, which means like the sounds have more in common together than say, you know, just picking the sound and the sound and the sound. And so we have to look at it as a pattern as a system of patterns. And so that's what we're doing when we put together a sound system. In general, even when we're creating languages for alien species, we are really bound by creating a sound system that's pronounceable by human actors, because it's like, at the end of the day, a human is creating this Yeah, I'm looking at you to Carl. Humans are producing the sounds. And so it's like, no matter how alien like the creatures are on the screen, it has to be done by an actor, David has to be able to send sound files. And so we are constricted to those elements for people creating alien languages for their own projects, they can get really creative and make up their own sounds. And that's great, doesn't work when you need it to be produced by humans. We're also going for putting together those sounds and unique ways to create systems that you can hear and say, Yes, that sounds like a language. But it doesn't sound like any one language. And so that is our goal. When we create languages, most of the time, sometimes we do work on projects, where they say things like when David was working on the 100, they said, We want this to sound like a future version of English. So it needs to be based on English. So at that point, the goal is to make it semi recognizable as a form of future English. But in most cases, they want it to be a completely separate language. In general, because we are constricted by which sounds we can choose and really the patterns in which they appear. Really, it's the photo tactics that make it distinguishable. So the photo tactics of the language are the most noticeable features, the things that you notice when you hear the language. Even if you notice nothing else about the language like you don't know any of its vocab you don't know any of its grammar, or its structures, you're going to notice how it sounds and info tactics, we're looking at things like where is the stress? Is it a tonal language? You know, how are syllables brought together? Do they have a lot of consonant clusters are they restricted and where consonants can occur, and so on. And so to demonstrate how phonotactic 's are actually even more important than the individual sounds chosen, we're gonna do a quick comparison of Quechua and Inhofe and so casual is the language David and I created for Paper Girls, which lasted for exactly one season on Amazon, and which featured our language in static radio bursts instead of being spoken as it was supposed to be. But moving on, it's a gorgeous language. In hat is a language that David created for Emerald City which also only made it one season and should have gone much longer because it was an amazing gorgeous retelling of Wizard of Oz. So check it out if you can find it anywhere. What we're going to look at, we're gonna look at a continent chart, also a vowel chart and we're going to look at what sounds the two languages share. And if the two languages share the sounds, the sound will be printed in white. If it only appears in Quechua, it will be in pink if it only appears in in hot it is green. So looking at consonants first, even if you don't understand any of The terms on this chart just notice how many of these are white, we only have two sounds that appear in Quechua but not in hmm. And that is this sound. So it's like the voiced fricative at the beginning of words like that. And also this glottal stop sign which you can hear if you have a dialect like mine I say, in I don't say kitten or anything like that. It's that middle sound in. That's the glottal stop. It shows up in casual but not in half. And then the eight shows up in in Ha, which I hope so because it's in his name, but it shows up in EAN hat but not Quechua. As far as vowels go, they pretty much share all the basic five vowels. The differences in Ha has long versions of the E and ooh, and Quechua has a schwa which you don't find in in ha So in other words, the sound systems in terms of comparing two languages are really close. And yet when you listen to them, they sound quite different. And so here's where we're going to cross our fingers and hope that our sound is still working. So here is Quechua.
Investors morning was supposed to be in season episode one. But that is a sound sample of Quechua. And then here is a sound sample of in half.
You get the advice? Yes. Alright, so even though these languages pretty much almost completely overlap in the actual sound inventory, I don't think anyone would hear these two and say, Yeah, that's the same language, they can talk to each other. They're really quite distinct. And that's because of the photo tactics, the rhythms being created, the ways the sounds actually come together. And so that's actually quite important in terms of what we do just to kind of take you behind the scenes, because these are sound samples actually sent to actors. For scripts, what we get is a copy of the script, where it's usually marked somehow this one is from Emerald City. And so Glenda is speaking usually has like angle brackets, or like, translate into language, it has some sort of marker in the script where we say, Okay, this is we need to be working on this. A lot of times though, it really just shows up like this. And you just have to search and make sure you search very carefully to find all the lines you're supposed to translate in the script. From there, we put the script what we're doing in a program called Final Draft. And so final draft is essentially a script writing program. And they use it a lot in movies and TV shows. And so when we send these files to them, they're like, Yeah, we got it, we can put it in our own program and get it, you know, distributed to all the people who need it. And so what you'll see is we put in the same information that they give us. So at the top, we make sure they know what scene it's for what scene number, what is the scene called. Underneath that you see the mp3 file, which we'll talk more about in a moment, but with each of these translations, we also send mp3 files, then you'll see the exact line that needs to be translated including the character. So Glinda is speaking here. Underneath that the actual translation and of course, spellcheck thinks we spelled everything wrong under under the sun, so a lot of red lines there. Underneath that, you see a phonetic rendering. Now, this is not an IPA because it's not for linguists. This is for actors, dialect coaches too, sometimes. And so it needs to be as transmissible or as understandable to people who are not linguists. And so what you'll see is some pretty standard phonetic renderings. And it also comes with a code for like, here's how you you read the phonetics. What you'll see too is like things like the stress syllables, get all caps and things like that to again, just try to help them produce what they're hearing in the mp3. We also give them a word by word sort of translation. So that way you can see for instance, here it is we have a new guest, but it's actually guest knew to us is and so as you do like word by word guests knew she was is what we the reason we provide this is because as actors are producing these lines, they need to be able to get phrasing, where can I take a breath, where it would be natural in this language to take a breath in a sentence structure. And so we want them to know where the the approximate phrasings are, so they also get a word for word. Now those mp3 files are all recorded in Audacity And so you can kind of monitor them. But it's actually you can kind of see different parts here. But first, it's recorded fast and normal speed, then there's a pause, then it's recorded slow, where it's like syllable by syllable for the actor. And then after that it's followed up with the English translation, just so that way they know you're listening to the right recording at the end of the day. And so all that is done in Audacity and sent off to the actors who then have that file and can work on mimicking what they hear. Going back to the sounds, we talked about phonotactic. Another thing that we consider as we create languages, is that languages change. So if you're trying to create an authentic language, you do need to think about this fact all languages that are living, change, and continue to change. And that is just one of their features. And so their sounds, grammar and vocabulary shift, often, in subtle ways over centuries, that over centuries, you see big shifts, but you know, as they're happening, they're usually pretty subtle. And so as con laners, we use what's called the historical method of con laning, to emulate this process. And what we do is we actually create archaic forms. So we start back in like the old days of the language before we are actually using it. And then we craft a series of sound changes to run those forms through for the language. So to show you an example, this is from ng Allah. And so if you know we wanted to create a word for claws, so we started with this form, tuk, tuk here is marked by an asterisk. And in our notation, what that means is, it's an archaic form. So this form is not the modern word for claw. This is where we're starting. The modern word for claw is actually to Gu. And so how we got there, these are just a few of our sound changes, there are more, but just a few of them to run these through our original form is took, one of the rules we're going to highlight today is that if you happen to have two vowels, it's going to get reduced to one. But that doesn't apply to us. So it's still tuk. So we go on to the next stage. Here, this is a copy Val insertion rule. And so what that means is if a word ends in a sound like K, that's not possible anymore, that language has changed. It doesn't want that syllable structure. And so you have to insert a vowel after it. And it looks the language looks to the vowel before it and just inserts a copy. And so took becomes too cool. And then finally, we also had a rule where boys were consonants like hey, become voiced if they're in between vowels, and so to cool in its modern form, is actually to go. And so you can imagine, as we run words through this, how different they can be, especially as you're putting words together into compounds and things like that. So another example, we needed a word for pile. And so it's archaic form was two, which came out as tau in its modern form, because here's where we have AU, becomes Oh, over time, and so its modern form is tau. And it's completely unaffected by these other two rules. Now we put these two together because pile actually became a plural prefix marker for nouns in this language. And so to say clause, we use this form attached to claw as a prefix. So to say clause, you say to do Gu, and that is affected by all the rules here, because that you becomes Oh, we still need the copy Val insertion, so it becomes Totaku. And then you need to voice not just the K, but now the T is in between two vowels here. So it comes out as to do goo. So even though you say one, two goo, you have to say two. So do Gu with a D, because of the sound changes. And so you get some irregularities in language that are not there in languages like Esperanto that are meant to be totally regular, because this is what happens in natural languages, we get the irregularities, the sound changes as they happen, we could do all that information by hand. And so you know, write out all the sound changes, run it through every time you create a route. But Lexar G, which is a website automates the process, and is a quite amazing tool. So for anybody who's dealing with proteoforms of any kind, whether they're from natural languages or, or if you're creating them for conlangs lexer g.com, created by Graham Hill allows you to write up all the sound changes couldn't forms and it will tell you, no matter how many sound changes you write, it will tell you what they they're going to be. So here's an example again from a gala, we've got input words, including R two, two and two tooth. Over on the left, the sound changes are in the middle and then you get the output forms. And so again, just to highlight these areas, you put in the beginning, you write up all the rules you need and then It tells you here's what it's going to be, and not just an IPA, but you can even write a Romanization rule. So you can see what it would look like written out in the alphabet. He even put in this really, really cool feature that allows you to trace an evolution. So you can see every rule that affected it in terms of its sound. And so here we're tracing, specifically to Tuk. And so looking at how it was affected over time. And so the output form would just tell you to do goo. But here it says, Here are all the things that was affected by at each stage of the language development. And that's really handy, especially when you get a form that was really unexpected to be like, Wait, why is it look like this, and you can go back and see all the rules that applied and in what order. So it's a very handy thing. So Luxor g.com, Graham Hill is a genius. And I will be forever grateful to him.
All right, so now we're going to talk about orthography. So you saw with the scripts that we did the romanization that's just a way to write our languages using the Roman alphabet, usually with ASCII, because final draft is not very comfortable with characters with diacritics. And that's the main program that the script writers use. So the orthography of a language though, is how you actually write it like and we do get to do that from time to time, not all the time. But from time to time. Before we talk about it. First thing to know is that an alphabet is not the same thing as a writing system. There are actually many different types of writing systems. So we have alphabets OB Zotz, Obrigada, syllabaries, logo, graphes. And this, by the way, is English, Arabic, Hindi, Japanese, and Chinese. And even within these, there are also sub systems. These are just kind of the major ones. And to show you kind of how these systems work, I thought we start with something familiar, and then show you how it works. So let's work with the word retriever, spelling this and an alphabet. This is an alphabet that I created for handling and from The Witcher, you get more or less what you expect with English. That is there is a unique glyph for each consonant and vowel sound. And it's of course, it's vowel sound, we see that we spell these with two letters in English. And in this case, I kind of re spelled that I didn't want to bother with how the R was going to work. So I just made it retriever in that one. But that's that works pretty much how you'd expect an object is a little different. An object is one where you just write the consonants, the vowels are considered unimportant unless they're special. Like for example, in Arabic, the long vowels are considered important. The short vowels are, you know, stuff you only see in newspapers really. And so this word, it doesn't even write this part at all, there's just no reference to it. In fact, what it says, By the way, this is a writing system I created for this Sunday language for starcrossed, a show that ran for one season. And so really, what it kind of looks like is this. And you just kind of assume the vowels. In this case, it's an H. And when you spell somebody with an H at the end of a word, you know that it's kind of like an R sound at the end. So I just threw that in. This is an Abu Gita and obligated we created for Vampire Academy, which is really cool, because they use it all the time. And an ABA Gita has characters just for consonants. But then when a vowel comes after it, it usually combines with the consonant in some way to to do the whole thing. So we have retriever here, which is actually like this. So this whole thing means it's three. But this whole thing is just a T. This is really again, and then that is VA, and it's like the whole letter is that way. Silla berries, on the other hand, are not as permissive. So you saw that just the T by itself, we could write it and that was no problem. And the last one, this is syllabary that we created for language called work with EG. And it requires you to have a consonant followed by a vowel no matter what. So it lines up like this. But what it actually says there's no RNs. So L is the closest you can get is lead to Leafa. And so you can't you have to put this vowel in there. You can't just put the T next to the URL slash R, there has to be a vowel so you just kind of toss one in there. For those who are familiar with Japanese as hiragana or katakana. Aside from the end, which you can put this is the type of thing that you have to do so like strike was for baseball was borrowed from English to Japanese, and when you spell it out, it looks like your spelling sir. Or they d like that, even though it's pronounced roughly the same. A lot of the graphic on the other hand is a fun one, because you can do all sorts of different stuff. So this is a writing system I created for high valerian. And the way this one kind of works is the it does have kind of an object as part of it. So I just put the R in there. And then that it really just means dog. The word for dog is jealous, in high valerian, and that are actually comes from the word for mouth. So it's kind of like the word for Retriever is kind of like a mouth dog, which doesn't make a ton of sense. But you know, that would just be how you do it and how I've learned. So when it comes to representation, there are three different levels of representation. The top, this is the orthography. This is, by the way, the checkups or writing system I created for dune. This would be like, kind of like a representation. That's English, it's not really a Romanization, it's the English orthography, but whatever. And then this is our, our IPA form that this is the stuff that we would use. So basically, it's like, this is what the art department gets, this is what the actors get. And this is what we work with. When it comes to creating fonts, there is plenty of software for it. But it's kind of unique. So this is the the font for Chuck kobza, from zoom. But the thing with font software, this is all fun software, font software assumes and this is because it's kind of like goes back to the history of the printing press and type writing, it assumes that you are using an alphabet. And it assumes that you are writing from left to right, top to bottom. And if you're creating a writing system like that, it's so happy. If you're creating anything else, things get very, very difficult. So it's like, here's an alphabet, the the alphabet they've created for The Witcher. And it's like there's mostly more or less something that lines up with H's is actually a K, but it's like an whatever. And so it's like if you want to type something out in this alphabet, it works very well. So like here's a one word says to you type B, you get that you type Y, you get that you type R, you get that you type K, you get that you type E, you get that and there it is you have the word big K, and it works very, very well. Now, this is the writing system, go go right ahead, go right ahead. This is the writing system for chikitsa for Dune, and this is the way the writing system works. It's an obligate, so the top one over here, it can be just a B, but it could also be BA. And this is by the way is B but only at the end of a word, you kind of do it with a pen, you do a little thing. But otherwise, these are all syllables. And so when you type it, you like when you use the writing system, you actually have to change the form of the letter depending on the syllable, right? So if we were to take that same word from handling and write it, like this is the way it should look, if you were to type bid k, this is the way it should look. However, this is what happens if you try to just type it out without any ornamentation, basically. So you have like b i, r k, e. And it's not even close to what you saw originally. And you'll notice that the characters aren't even spaced correctly. Because it's just a mess, right? Because it's not actually an alphabet. And so like this is the way it should look. This is what happens if you just try to type it like an alphabet. And you see this kind of phenomena in the real world all the time. So earlier, Jesse, we were in Vienna, and there was a butterfly kind of sanctuary there called a mantling house. And they had this wonderful little sign that has the word for butterfly house in many, many different languages. Really lovely. And if you look at Arabic, anybody who has studied Arabic, so at least they got the directionality, right. But it looks like this is an Arabic, it's just it's it's this is what they have written here. And it's, by the way, what they're trying to say his foot Asha bait. That's what it looked like on the sign. This is what it's supposed to look like. Because of course, in Arabic, most of the characters fit together. And by the way, of course, it's not even translated appropriately. It shouldn't be baited Fidessa. But you know, there's only so much you could expect from a multilingual sign.
Anyway, but it's like so yeah, this type of thing happens all the time. And it's again because the darn thing just expects it to be an alphabet and to work just sell. So the way that you get around this is using what are called OpenType features which they are increasing support for that's all this code down here. So for example, for checkups to the way It works as you write code that looks like this and has to be ordered like this because it runs through the list from top to bottom, this substance for substitute, and so it says, All right, if somebody types in be a, you substitute that sequence by this special character, and then the, the semicolon at the intelligence, the end of the line. And so you go through that for every single character sequence. And so then what happens is, once you do this and OpenType features are implemented, then this is what happens. So you type B, you get that which is familiar. But now you type VI and suddenly look, it's dynamically changes. You type the R, you get that. And remember, it can appear as a constant as long as it's not at the end of the word. So that one's fine. You type K keep going, you get the initial form, but then you type the E and again, it changes dynamically. And so then you actually have the real thing just by typing in an alphabet. It's almost like you're typing in an alphabet as a code to get the non alphabet. And that's basically how what we have to do to create all of our non alphabetic writing systems. You might be wondering, well, how do you deal with a lug Rafi? Because there isn't really any systematic correspondence between the form of a word and a glyphs like you can just kind of just look at these words and be like, yeah, now there's nothing here that ties the sound to the form at all. And it's just like Chinese. Well, Chinese, there are phonetic components, sometimes, but even so they're like compound glyphs. So you might be wondering, how do we do that? This is how, so I came up with a system. Alright, so you this is what happens if you're typing in high valerian. First, you type the number sign, and it returns this little character. This is just for people using it, it says your word here, because then you start typing a word R, and then an art shows up, right, o n o shows up P A, P shows up E, and he shows up, then you type the number sign again. And suddenly, it returns the breath for rope. So it's kind of funny, if you go into the high valerian font. There's almost like a mini dictionary in there, because it's all of these English words that you typed to get the actual characters, and you just need to know which ones. But again, so I mentioned that like, what it really likes these fun programs that wants an alphabet written from left to right. But not all systems are like that this is a great sign because Chinese can optionally be written from top to bottom. Mongolian is usually written from top to bottom. And then of course, you know, Arabic is written from right to left, not left to right in the characters connect. And also like you can do some wild things with writing systems. This is just an example of Arabic a particular form of art, which is calligraphy, where you can see there's lots of things layered on top of one another, writing systems or font programs hate this. And you can see the results of this. When people think that they're working, you see stuff like this. For those who read Arabic, this is clearly somebody who wanted to get their name tattooed on their arm. And what we have here is something that's basically the equivalent of m a r, and then a y and then an A, for the name, Maria, which is delightful, except that a it's going from left to right, not right to left and none of the characters Connect. It's an absolute disaster. If you wanted to write the name, Maria in Arabic, it would look like that. So you know, right, let's say would approximately be like this night and that's just one way that you could have written it. There are a couple of other ways. So it's like it's not even the right character. At the end, it shouldn't be that it really should be Tamar buta anyway, then it's like, alright, OpenType features are great. And it's like we're doing our best with this program the way it is. But even if you change the program, all of this stuff is represented with one of these things. And so like, it doesn't even necessarily matter if the font is done correctly. If the thing that you're using the font in doesn't work, which was always the case with Microsoft Word for so long, but it doesn't even matter if you got it right. It'll still get screwed up. That's why you'll still see these mistakes like on productions like New Amsterdam for a while they had a multilingual signup, had information on it and the Arabic was backwards, and you know, all of the characters is wrong. And it's like, it just takes a moment. You don't even know need to know what they were trying to say, to see when it's wrong. It's that easy. And it was probably because they were using some Adobe product because certain I've noticed that certain Adobe products are very sensitive to ligatures. Others are not. So like Photoshop, it works but like, what InDesign it doesn't, and it's just like, anyway, so that's the type of things that we need to deal with when it comes to creating writing systems. Alright, so
we're gonna talk about our final section we talked about sounds orthography. Now we're gonna go into some of what we do for grammar, looking specifically at morphology. So within morphology, what we're really looking at is the fact that words can change Ange to reflect grammatical distinctions and languages vary and how much information speakers must necessarily provide. So that is one way that languages vary. So looking at how these distinctions are made, the first question you have to ask is, what does the language require. And so here, instead of calling this retriever, we're gonna go with a more basic word, dog. And that's just because that's what we translated in other languages. But no, the retriever spirit is in that dog's heart. But we see one of them in English, and we say, dog, but when you see too, you need to say, dogs with it with that plural marker, right? So now we've got dogs, you see three dogs, you see a whole bunch, which means you're in heaven. But still dogs, right? But not all languages work that way. And so this is a very colorful chart to showcase the differences here. So in trigger displaying, which is a language that David created for the 100. Like some natural languages, it doesn't distinguish between singular and plural. So one dog feature two dogs feature, Rita. Ah, it's all the same. Fate. You betcha. I wanted to make it ah, I betcha Gotcha. So in my dialect is different. And so the form is the same regardless of the number of dogs. Now, of course, that doesn't mean that in the language, they couldn't talk about the fact that there's more than one dog. They would just have to use phrases and different descriptions like put many with fecha betcha. Push Facha that's how you would say and trigger this like, so you can definitely like get the idea across what speakers don't have to that's not necessary for the language in the way that it works. In a language like Ash nama tree, this is it does the same thing English does. You see one dog you say to SHA, you see more than one dog? Whether it's 235 or 100? You say? Tosha? No, yeah, Tosha Bosch, that's the plural form. In will cookies, you it works a little bit differently. It has a dual, as you see in some languages, so Wookiee means one duck. Okey sua means two and exactly two dogs, if you have three or more, it's Gifu. So it has a different marking. Hi, Valerian is different, even still, you see one dog, and it's jealous. You see two or three dogs, it's a few dogs, and this is called a Palko. Form gel when you use that form. If you see quite a few dogs, then it's your whole saga, which is just it's normal plural form. Now, if you're talking about all the dogs collectively in the world, you have even a different form. Yeah, for. And so that's a completely different form. And so those are some differences that the language requires. And all of these here are marked to showcase how those forms are different. But languages also vary in how the information is encoded. So on that previous slide, you saw that we had a lot of suffixes. That's how a lot of languages mark things like plurality, however, it's not like that in all languages. So going back to Janelle Marie as an example of a suffixing language, one leaf is bull, two or more leaves, we've got Bonanzas. But then it's PAFA. One leaf is SHA, two or more. Now you have to add a prefix Xisha. And so that is marked with a prefix instead of a suffix. Then you get languages like Nautilus, where fill stitch is singular. And notice that you take off that ending to make it plural, so fill stuff. So instead of having a singular and plural form, it has what's called a singular native marker, where the plural form is sort of the base form of the word. Then in languages like a Ruthenians, you have a marker for both forms. So the MA one we've given me two or more leaves, and to kobza, there's an internal change. So Woody is one leaf, but cavalry is two or more leaves. And so it's actually the internal change here you get a vowel change on the inside, which we have some examples of an English some app loading forms from like two cities, foot to feet, mouse to mice, and so on. And then in some deve this language keeps the some set consonant. So the en d is the same across but it's got a completely different pattern for singular versus plural, where you have this sort of internal structure change. So NovoEd is one leaf. Dual is two or more leaves. And so these are examples of how language is very, not only in what information must be encoded, but how it gets encoded. And now David is going to talk more about how languages vary in how much information can be shown down to a single word in languages, it gets exciting for
languages vary in how much information is encoded on a single word. And so we wanted to take a look at nouns nouns tend to be easier to kind of follow and understand than verbs. Nouns can encode, among other things. Number that is, you know, one versus two versus whatever case the grammatical function of the noun in the sentence, gender when nouns are divided into classes definiteness, that would be like a versus da, and dependents because I don't know the proper term for this, but it's when a noun indicates if it is possessed or not. So if you ever seen any words borrowed from that, watch, that end with a T L, that is usually the non possessed form. Yes, it indicates that it has not been possessed. So some languages do absolutely nothing. And so this is one that I created for a video game called arena, valor, pockets claw. And it just there's no information on the noun at all. Some indicate just number I think this is familiar for English speakers. Again, this is handling of from The Witcher. So blade is Wolf and blader is wolves. Some indicate just case, this is a cast in language that are created for defiance, it doesn't have any number marking at all. But it does change form for case. So if we always kind of like a doubt, it's also the doubles as the vocative. So if you were to call doubt out doubt, why do you assail me that type of thing. That's the form you would use. Some languages just do gender. This is a language that we're working on right now. Suck Ha, it doesn't have number marking, it doesn't really have case. malman is kitten and malha is puppy. So this part, the white part just means baby, kind of a small animal. And then the May is the cat class. And then that is the dog class because it's language for anthropomorphic cats. Trigger thing just has definiteness so fetishize dog, and Fitch is the dog. We usually do it with a hyphen. But it's it's really just part of the word to be honest. Many share, which is the language that we created for motherland fort Salem, does number and gender So either you're gonna is voice and data is voices. And that is also letting you know that this is the human class, the voice happens to be human classes. It's something that humans have high valerian has number case, and gender coldry on is bath, and then cold reality is of baths if you wanted to say that it's also two baths, and in baths. Ah, so like, if you were to talk about like, you know, coldry t Viola? No, I'm sorry, caudry t Valley, that would be men in Bad's. So now you know how to say that in high valerian. As now Maria has number case, and definiteness, and so is a foot and then in as Rumi is to the feet. And it works out very nicely in the spoken language. But since it's an Abu Gita, only, only like halves of the words are relevant. And also, Jessie was very, very clever, because this is actually kind of like the vowel indicator. And that's the part that is actually the case. So this is a Alisha plus, which is the language I created for dominion. And this one actually is a, an off posterior language. It's derived from proto Afro Asiatic, amongst other things. And so in scene is of men, but eats is what's called the construct form. And kind of means like the man of it's followed by a noun in the genitive. And so it's like kind of double marking, as Ron is a language I was super proud of, because it's a future form of Spanish kind of like trigger the same as a future form of English as a future form of Spanish that I created for Into the Badlands for season three on AMC. And so we have Lobel, which is tree and nice. Oh, boy, I'm sorry, these Oh, boy, that's still it's still low tone of my trees. And in case you're wondering, this comes from odd boys in Spanish. And that one comes from out of Willis. And that's why the L appears because the L dropped off for this one, but not for that one because it was the s that dropped off the end. Oh, this is a fun language. Speaking of fun languages, so this is a language that I created with Carl buck, Sung Haley for the Halo series and Carl happens to be Right here, he's not from Baltimore. He's from Pennsylvania, but close enough that he comes to town. And so Carl and I created this language together, we are working on it right now. It's actually the busiest thing that I have. And, and Carl knows that like, we should be working on it right now. To give us 50 Some lines to translate for one episode, and that's not even the big one. So 8282 is mercy. And you might have heard that in season one because one of the prophets is named mercy. So 82 is mercy and then at 20 would be like, Have mercy depending on like, we don't have a genitive. This is the this is the IDS if case you use different cases for possession depending on the level of possession, so it also means like on mercy, but that doesn't mean as much anyway, and that's an objective. Q by the way. Lovely, some casual we mentioned this, so we have number indefiniteness. Oh, that's right. So now we get to the place where we have separate words. So G guys actually one word that means a goal. But now we have uneager, the goals and on is a separate word. All the stuff Oh, I'm sorry. No, you know what? This is the first place where we had separate words excuse me, because Ni is a separate word, right? Okay. Same with here. So now we've split off we've done everything that we can do with one word now we have separate words. nolloth is great. No love is one of my favorite languages created this for the Shannara Chronicles. It was better than the show. The language was, but but Thor is a wolf. And Andorra is of the wolves. And this is the definite article right here. The case you can see is actually just the just the first consonant. Anyway, and so then this is me. Okay. So you might wonder, how do we how do we keep track of all of this. And especially, we know that there are some computer science people in the room. So like, really, it's just, it's just, this is the mini che language.
And you can see, it's not a Word document, it's a Pages document. But it's the basically the same thing. That's it, it's just a it's just a word processing document. So it's like here, you know, we have tables that we can made by hand, we write up the grammar and everything, we've got more tables that you know, help us refer to things. How do we keep track of words, I created this, this handy table that I've put in all of my different languages. And there's a little formula here that divides this by this in a way. Now this by this, yeah, yeah. And, and these, this will actually total up the columns, which is very nice. But we have to enter all of these by hand. So every time we create a new word, we have to remember to go and add it. And every so often, it's like, we look at it and say, it seems like there's a lot more words there than we have in that. And so then we have to go back and count by hand and update, it happens with all the languages. And so even that total, there is just an approximation, our dictionary looks like this. All of the again, all of this stuff was just added by hand. And you know, we copy the, the formatting from one to the other. So this is the language to English side of the dictionary. This is the English to language side of the dictionary. And really, it's just a reference. This is not where the all the work goes in. But again, like this doesn't auto populate, we have to add all this by hand. So sometimes there will be things that show up in the first dictionary and not in the other because we forgot to add them every so often they show up here and not there because we missed it, which is really a disaster. And for other documents. This is for anomaly sometimes, you know, we write in a little bit, nice little thing about how to use the writing system. And then like this is high valerian. Sometimes, the systems are so large and complex that it's like they span several pages. It's just verb conjugation for one verb. Also, for high valerian, since it was this bizarre logo graphic, there's a separate system where it's like, I might keep track of every English word you can type and what the symbol is you get what the name of the glyph is, and some words that you get from it. Again, all updated by hand. Same with this where I kind of have arranged them by shape. So if I know kind of the shape of the glyph, I can go and look it up. And this is all done in pages on the Mac. So at least we can work on it at the same time in iCloud. But, so I wanted to give you an example. This is these are dictionary entries from many different languages. And so I want to show you there are a lot of things that these things have in common. But there are also a lot of things is sort of very different, you know what I should have included a Dothraki one, because at the top, we have many che. And you see in a up there, that stands for a noun of the air class and air class is something that's only relevant for many che doth rocky also is in a but there it means animate noun, which is something again, it's only relevant to both Rocky, but it's the same kind of function. So you can see there's always like IPA in there. Sometimes there's writing system like in high valerian, it's necessary, there's no standard spelling. So it's necessary to have the variations in spelling written up there. And then sometimes, like, for example, I have a special entry for words that George RR Martin created specifically, and they're underlined so that I can keep track of things that he created that I didn't create. I created this little code, by the way, it's a very simple code. 00 means it's a basic word 01 means it's a potentially insulting word. One zero means that it's either a technical word or an artistic word, or a non standard variant, something like that. And then one, one would be, I guess, like a really archaic insult. Very few of those. And then I put a little asterisks in there, like right next to it to let me know if I like the word for some reason, because often we get interviews are like, what are your favorite words, I'm like, I don't care. So I just put those in there. So I can search for them really quickly. So we have here at like, you see that for some of these. It's just like align. For some of them, there is the root, like for that, and many che, and, and for some of these we have, this was cool. Like, there was like a whole, we wrote down the justification that we had for coming up with this word, because this is a word for time. This is a language for time travelers that borrowed words from many different cultures across many different, you know, time spans. And so we wrote down why they chose that word for their language, and stuff we have triggered a sign up there, we can tell whether something is a compound or not. But so this is just an example. It's like they're all the same, but their individualized, are very different.
All right. So we do all that by hand. As David mentioned, more than once it gets messy, it gets wrong. And so we're like, what would we love to be able to do? We would love to be able to answer questions like this, how many intransitive verbs just with Rocky half, we could do a search and find but then you got to make sure you go through and type it in in a particular way. And it doesn't always work as well as you want. questions like how many words did as nanometer you borrow from Bulgarian? How many air class nouns are there many che? How many high valerian verbs? And in I II, how many compounds are there in triggered a slang? What is the Swadesh list and Chuck kobza. So looking at lists of words that are unlikely to be borrowed, they're usually native words to the language, what English words have equivalents in handling but not in Arabia and vice versa? These are the kinds of questions we would love to be able to answer but without doing a lot of searches and having 20 documents open at once, it's not easily answerable. So it's not something we can easily answer. So then we sat down and said, Well, what is our wish list since there are hopefully some computer science people here, here's our wish list, take a picture, make it a life goal. One thing we would love is to have one input interface across multiple device types. So being able to, you know, work on a computer work on an iPad, iPhone, Android phone, whatever, we would love to have one input a PC, throw that in there, we would also have to have multi project support so that the same app would have the same we'd be able to work on multiple projects through it. So we could ask questions like what about this language versus this language, and have comparisons be great if we were able to maybe we should have said defined tags and variables and then be able to sort everything by those tags and variables. So things like those noun classes and you know, if intransitive versus transitive verbs matter for this language, being able to say this is a tag we need for it. And now let's be able to search by that tag. Another thing that would be so incredibly helpful is to auto generate noun declensions and verb conjugations. But also with the ability to override for any irregular forms, which would be so helpful for many reasons. And not just for the ability to say okay, what would this verb look like if it's, you know, I am running I ran, I will run versus they ran, they will run etcetera, so not just to be able to see all the forms, but as you're creating a language, having something like this auto generated is immediately going to point out if you need to revisit the phonology where it's like, Oh, I thought these four I'm so at work, they don't play well together now I need to rethink through am I going to add more sound changes am I going to work on this form tweak it to make it. So it works better in this you know declension table or conjugation situation. So that would be amazing. Another thing that would be so cool is to have embeddable audio, video, a video, especially for like David has created sign languages, and so on. So to be able to have the document with embedded video in it would be great. But then audio for the spoken languages to be able to say, how does this sound because as any linguist knows, IPA is great to a point. But vowels especially, are very sneaky. And so be great to be able to hear it. A Unicode font support must, especially if you're working with anything written in a different orthography. And then the ability to just generate and print reports of all kinds. So not just being able to say let's produce the entire dictionary as a document. But again, to be able to say, I want to search for all the verbs in this language. And now can I print that page out? So I just have all these verbs? That would be amazing. Right now, it doesn't exist. So. So so so nothing up here does so students, senior projects, I don't know. You know, think about it, because in the meantime, think about us and what it looks like when we try to work on multiple projects. And we tried to find things and we have to have all these and this is just one of five desktops on David's computer that is open with multiple documents. And so in the meantime, we're stuck doing this. So please help. A plea for help help all your friendly village comm listeners. That is what we do. We hope you enjoyed the presentation. Thank you so much for being here. This is thank you across a variety of our languages. We appreciate you being here. And if you want to follow us on social media, you can find both of us together on Twitter and Instagram at length time studio. We also run a weekly live stream on YouTube where we create languages live. You can join in on that fun how you can find me at quoth the linguists on both Twitter and Instagram and you can find David on Instagram at Deborah's are, which of course means excellent in those rocky.
Now it's time for Campus Connections, the part of the podcast where we connect today's show to other work happening on UMBC's campus. Today, our production assistant Alex is here to translate the complex language of social science research back into plain English for us. What do you come up with this time, Alex?
Thanks, Dr. Anson. This week on Campus Connections, we'll be taking a look at the research of Dr. Renee Lambert- Bretiere, an associate professor of Linguistics and French at UMBC. She played a very big role in hosting the lecture you just heard, so I thought it would be fitting to discuss some of her work. In this case her work "Relabeling and Word Order: a Construction Grammar Perspective" looks to understand the word order and relabeling of Caribbean Creole, one of six official languages spoken in the Caribbean. Caribbean Creole does not systematically draw from its contributing languages. This makes analyzing its grammar a difficult task. Full disclaimer, the many variations of Creole that are out there are definitely not invented languages. They are languages that changed based on speakers conventions as they have been repurposed and synthesized over time. However, people who do invent languages can learn a lot from the way Caribbean Creole has evolved because it showcases one of the creative ways that human language has developed. That's all for this installment of Campus Connections. Back to you. Dr. Anson.
Thanks again, Alex, for that excellent summary. You know, I really enjoyed hearing about the linguistics of invented languages, and I hope that today's rebroadcast has stimulated your imagination as much as it has mine. Until next time, par mock, or I mean, keep questioning.
Retrieving the Social Sciences is a production of UMBC's Center for Social Science Scholarship. Our director is Dr. Christina Mallinson, our Associate Director is Dr. Felipe Filomeno, and our production intern is Alex Andrews. Our theme music was composed and recorded by D'Juan Moreland. Find out more about CS3 at socialscience@umbc.edu and make sure to follow us on Twitter, Facebook, Instagram, and YouTube, where you can find full video recordings of recent CS3 events. Until next time, keep questioning.