Trevor: Tonight, we are going to start a new segment on the show that looks entrepreneurial world and start-ups working in the digital space which seems to be increasingly successful ideas and surprisingly we might call the segment ‘start me up’. So, my first guest tonight takes us to a very interesting innovation. In today’s technological events it seems almost anything is possible.
Even the most laborious tasks can be…it seems resolved through technological events. So, what if someone came along and developed some software that made one of the most arduous part of your job virtually non-existent and subsequently made the job more productive my freeing up your time. Now I am not talking about hard labor but something that requires considerable brain power and creativity.
Could a program do that? Well in the case of real estate industry someone has done just that. His name is James MacKay and he is the co-founder of real words. James thanks for coming in.
James: Hi Trevor, good to be here.
Trevor: In essence what is Realwords?
James: Essentially it is software that generates texts for real-estate agents property ads. It is that simple.
Trevor: Ok. I made it sound like really complex but I think it is more complex than that. So how does it work?
James: You are right! It is much more complex than that. We make it look really easy on the user side but in the back-end we have put lots of time and effort into developing an algorithm which works on process called natural language generation.
Natural language generation is essentially the process of taking data, putting it through some sort of software and generating what you would call English prose at the other end.
It is very interesting how it started; it has been around since the 1960s and it basically started because there were scientists who had lots of very complex data – you can imagine huge spread sheets – and they thought “well, how are we going to get this data into a form that other people can understand?”
So, they started playing around figuring out how this software could work. It has evolved and natural language generation now essentially takes data and turns it into sentences that a lay person like me could just read and understand.
Trevor: Ok, I like this quote of NLG being a robot that is input data which produces English prose. I started conjuring up visions of old sci-fi with robots saying things like “does not compute, does not compute”.
James: Well, exactly – it is complicated software! If you try and think about what goes into turning data into English language, which takes in all the variables and then makes it so that someone can understand it, well, if you don’t have it right, what you are trying to generate just won’t make sense.
It might come out as jumbled words and not actually put in structured sentences. The way NLG works is very “scientific magical stuff” but the most important thing is what the end product is. Natural Language Generation software delivers a solution.
There are lots of different applications for this software.
There are three different levels of complexity of NLG software and each has its own purpose.
There is a very basic level, a template level and then there is the complex level that can generate all sorts of things like news stories. I would love to talk about that later. But what our software does is turn the data into real estate ads.
Trevor: Ok, and your software is the application that we are primarily talking about tonight. So are we talking about like using complex algorithms or is it just a case of someone like yourself entering a database of words which the program then works off? Like suggestions that fills in the blanks.
James: Right. Well let’s start with the basics one first. Just so you can understand what I am talking about. And remember, this is different to text-to-speech. For example, if you are on your phone, it might read out what you type. That’s in the same vein as NLG but different – that’s essentially a robot reading something out to you.
What we are doing is creating the text that the robot reads. If we think about the weather for instance, we might look out the window and try to forecast. We say “OK we are in Brisbane, the maximum temperature is 24 degrees, the wind is coming from south east. We take all that data and we can put the items in columns and say “location is Brisbane, maximum temperature is 22, etc”.
Trevor: Yeah and it comes out sounding like this? “Brisbane forecast, 4:20 p.m. eastern standard time on Wednesday 7th of June 2017 for the period, forecast for rest of Wednesday 7th June”.
James: You can see nearly everything that recording said could have been put into a column. We then snatch from columns, like column a, column b, and so on. We take all the bits of data, and turn it into a sentence of very simple words.
Trevor: Individual words basically put together to make it sound like sentence? So a very basic form.
James: Yeah, and then the next level of complexity is like a template form of natural language generation. A great example of this would be something that only has a few different options, like cricket or baseball.
Because you know it is going to start, it is going to finish, someone is going to bowl, someone is going to bat, and something might happen. There are definite outcomes. Knowing that, we can draw up a template. You can draw up 100 templates. England played Australia at Lords – England obviously got thrashed – and it was rained out, etc.
In this example we know certain things can or may happen, and we can write a template for how that will proceed. Then you could take the data who is playing, the score, who bowled, who batted and put that into a template.
Trevor: So, the program must work out which is the right template or that’s manually sorted and the program fills in the blanks? How does it work?
James: The good software that does this on a high-level basis will just randomise it all, trying to make the commentary make sense as much as possible.
For example it might say well if England’s playing Australia, we might put it in a different sort of location atmosphere because England vs Australia is a little bit better than local teams. The more data that you can put in, the better the results or the more interesting the result that you get. Also, most importantly, the more accurate result.
And that’s where we get to the most complex level, which takes data and then turns that data into sentences.
This may sound ridiculous, but it analyses the data and tries to form sentences based on what the software thinks is going to make the most sense for the person reading it.
Trevor: So, we are talking about artificial intelligence here?
James: That’s exactly what it is: It is artificial intelligence software to help people and the original reason is natural language software was developed was to take complex data and turn it into simple language.
But what we have done is taken data and turned that into something that saves real estate agents time. We take the data and we turn it into English language. We describe homes and properties.
The purpose of NLG (natural language generation) for us is to turn the data into sentences as quickly as possible. It is a productivity thing as opposed to a simplification process.
Trevor: Ok so are you creating real estate ads for print or they are for television or what format? What form are these ads in?
James: Right, that’s a good question.
Generally speaking, these days when someone lists their home, they put it on an internet portal, such as realestate.com.au or domain.com.au or whatever your preferences. You might put them on all.
The problem is, they all have different character limits and if you put it in the newspaper, there is another format that you need to develop a profile for. So when a real estate agent writes these ads, it could take them up to 90 minutes to write these things.
Of course, these are the property profiles that accompany very expensive photos on those online listings. So what happens is the real-estate agent gets the photos taken and they sit there and they either write these property profiles – the ads – or they outsource them to a professional copy writer who would charge about $150 to write a profile.
Trevor: So, would you say that in most cases the agents decide not to write the profiles themselves and they outsource it?
James: I wouldn’t say most cases but there are a lot of cases. It’s interesting, I was blown away when I found this out, but the stats show that at any point in time there are approximately three hundred thousand homes for sale in Australia. That’s more than a quarter of a million of these profiles that would be written in probably the last 90 days.
Trevor: Wow, that’s a lot of writing! And real estate agents are not writers, they are sales people. They are selling property.
James: They have much better things to be doing than writing these profiles and of course, if they are writing profiles, they missing out trying to sell the home. It is an opportunity cost and what’s really importantly for these guys is maximising their time and minimising their cost, so that the person selling their home gets the best result possible.
Trevor: You are listening to ABC radio Brisbane in Queensland.
My guest is James Mackay. He is co-founder of Realwords which is, well, the first time we tackled this on the new segment ‘start me up’. It is all about startups and digital world. So, Realwords is focusing on the real estate industry at the moment.
So when you are talking about input for Realwords, it is not just from the agents, it is from the clients. In other words, could you end up with prose that sound something like this?
Plays audio clip from John Clarke’s character Fred Dagg:
“There are three types of houses, dream homes majestic that are built on cliff faces, private bush built down on halls, and a very affordable solid family residences in areas which of course has gun placements and warnings. Cottage is care van with wheels taken off and breath taking views are clear indication of houses windows although of course if the view is unique there is probably the one window”.
Trevor: The late John Clarke of course. Yes, adding very colourful descriptions in real estate. Is Realwords getting to a level where the program can include colourful language and flowery language to sell a property?
James: Interesting question. I speak to a lot of people about Realwords and about writing property profiles. Most people really hate it when real estate agents write overly descriptive or ridiculous profiles, shall we say.
It is best to sit down and figure out exactly what the reader or the potential buyer needs to read about the property. When we started doing this, we did a lot of research into what makes a good profile.
Because we cannot just invent software and come up with a result unless we knew what that result needed to be. We did a lot of research into copy writing and we looked up at some old instructions an writing tips from ad guru David Ogilvy. He would say things like “80 cents in your dollar is spent on the headline.”
David Ogilvy’s copywriting rules help develop natural language generation for real estate
In other words, if you don’t have a good headline then people won’t keep reading for start. We looked at that and we said “what does make a good headline?” It turns out that it needs to be ‘camel text’ which means the first letter of each word needs to be capitalised.
It is a little thing but then you move into the next sentence, which needs to be the lead paragraph and that needs to explain what the best asset or what the vendor/real estate agent thinks is going to sell the house.
The best asset could be the kitchen or it could be the location. From there the ad needs to go into sentences and bullet points. We do this because some people respond to bullet points, some respond to blocks of text.
The last section is the summary paragraph and call to action, which is generally something like “if you would like to inspect this home, call Joe blogs on this mobile number.” Something like that, would be called a call to action.
To summarise, we did a lot of copy writing for a real estate agents before we started the software.
Trevor: I have to say you must have gone through a lot of profiles to really understand and get the feeling for what agents and vendors want.
James: Yeah. Hundreds. We wrote a LOT of profiles and the reason we did it was we had an advertising agency. The real estate agents would come to us and ask write their property ads at the market rate of $150. As we were writing them, we were thinking that there had to be a better way of putting these things together.
From there we decided to try to automate the process of writing property advertisements.
It took us 18 months from researching it, with one and half developers working on this full time for 18 months to get it right.
I can tell you that the process of doing this is really frustrating. Developing software is like a boat and the boat is hole in the ocean into which you throw money. It goes on and on. It is really hard to get complex software right.
We finally got there and we developed it. It’s designed in such a way that when a real estate generates a property profile, it generates the prose formatted to be how we think an excellent profile should be; it has got the right headline with the bullet points and all of the other factors.
You can see, the research played a really important part. Because it is a software you can generally produce a unique profile for every home that you sell. That means an agent shouldn’t get the same repetitive text that you might if you were writing the ads manually.
Trevor: Right. Well that was the thing I was thinking, too. Are all ads going to have that kind of tone that you and your partner would normally do when writing a spot like that? Because does it turn out the way you write or can the software think outside of that and create its own personality, if I can put that way.
James: Well, the software can only generate profiles based on the data that is given. It might say, for instance, for a first home buyer, it might focus a little bit more on language that might appeal to a first home buyer as opposed to what a retired couple might expect to read.
But it is software, not a reasoned human being. Maybe in future software might be able to but at the moment it won’t develop personality. It just sticks within the boundaries or parameters that you set for it.
Trevor: But there is more to copy than just words and the words making sense. What about tone, for instance, like you know, the tone of an ad. How does it convey that or can it?
James: We designed this to make sure that the output is written in such a way that it is designed to sell the property. When we say “sell” the property, we highlight the benefits. So, the profile might mention some features but we highlight benefits.
Here’s an example: if the home is close to a train station, that’s a feature. What is the benefit of being close to a train station? Well, you don’t spend so much time commuting to work and so on and so forth.
The tone is generally suggesting benefits to the reader. But what is really interesting is how it takes the really disparate information and turns it into something that is not just understandable or sensible for the property it is writing about.
Let’s explore your place for a minute. Just think about your house. What would you say is the best asset of your home? As in, what would be the best-selling feature?
Trevor: Well serene location, views, maybe not so much about the house’s actual location.
James: Ok, so let’s say the best asset is its views. What would you say the view is?
Trevor: Coastal views.
James: Let’s break down coastal views. We might have surf beach, we might have a lake, we might have marina. All of these different things just goes into a coastal view.
Other views might include bush view, mountain view, tree view. All of these different aspects of view are providing lots of different data points that we have to take in to account.
Now let’s think about your kitchen for a minute. How would you describe your kitchen? I will ask you some questions to make it easier.
Trevor: Ok.
James: A kitchen floor might be tiled or slate or marbled or polished timber.
Trevor: Polished timber.
James: Great, same as mine. And do we say your kitchen is galley or open plan or something else?
Trevor: Open plan
James: What about the bench tops. They might be Caesar stone, granite, marble…
Trevor: Yeah, I think that would be marble or whatever that stuff is. I am not a builder, hah!
James: Great so now we have one room, and already we’ve got lots of different options that we need to try and describe in a sentence or two. Generally, we only have about 200 words in a well-written ad. And here’s an example of the result: the software produces a sentence that might say “gorgeous open plan kitchen, with warm floor boards and a functional bench top.” Ha! Something like that.
Trevor: Well that’s the thing, how does it prioritise? What is good and what isn’t? Because the bench tops might not be a prominent feature. For me they are not.
James: Right! We have spoken quite a lot about how complex it is and how the software produces the end product.
Importantly, from the user’s point of view, it is actually quite easy. The real estate agent just goes to a webpage, goes through a very simple form. It’s designed to be very easy to use on their tablet, on their laptop, or whatever. It has big buttons and it is not like an old fashioned internet form with radio buttons and check boxes.
The agent just taps some buttons to answer questions such as: how many bedrooms does the property have? Obviously, the agent just clicks the number. Then they click a button to describe the best asset of the property – in your case view.
The agent just goes through and just taps a few little buttons. That’s how they fill out the Realwords form. When they have finished inputting all the data and they are happy, they hit “generate profile”.
And when it generates, it sends them an email which includes a word doc.
The property agent can open the word doc and tweak it if they think they need to, and then they copy and paste it straight to the online listing.
The very first time the real estate agent uses this software, it might take them 15 minutes to generate a property advert.
But here is the really cool thing: once they have used the software a few times, it learns, so the more you use it the better it gets.
Here’s an example: if you have sold a house for instance in Kenmore and you are selling another house in Kenmore, then you can take original profile, re-generate it, add a few options to personalise it – such as changing the number of rooms the house has.
Trevor: So, it takes into account certain characteristics and something that it understands that’s already there. And when there are attributes that are different, you can make some changes.
James: And you can generate that and that might take you less than a minute. And if you don’t like it, you just keep generating them until you get one that you do like.
Trevor: So, James you are talking about this software Realwords being used at this stage for real estate industry, but what other applications do you see being useful? What other industries? You mentioned even journalism at the start of our chat, in what way?
James: Right, it is probably interesting to know that Associated Press uses natural language generation software to produce nearly all its financial reports now. All sorts of financial complex data from a company report from say BHP can be input into the system, and then it generates stories.
Trevor: Yeah, so if you see a financial report that looks like it is written by computer, it probably is! Ha!
James: I’d hope you probably wouldn’t be able to tell the difference. Also, ESPN writes all its baseball reports using NLG software. You probably wouldn’t be able to tell the difference.
And one of the things I do with real estate agents when I’m showing them Realwords, is I show them a property profile written by a professional copy writer and one written by our software. Then I ask if they can spot the difference. They might get it right, they might get it wrong. But you can’t tell the difference off hand.
Trevor: What other applications can you see beyond that how might that work in the future?
James: Well, journalism. The beauty of journalism is that it cannot replace journalist but it enhances what they do.
Trevor: Ha, lucky!
James: Let’s think about what is happening in Iran right now. We can set up a natural language generation engine for journalism and keep inputting all the data: What’s happening? Where is it? What time was it? How many people are involved?
Natural Language Generation could be used for up-to-date news stories
A journalist could put in these data points and the software can just continue generating a news story as it is happening.
Of course, we have got Twitter with 140 characters, but that doesn’t really tell you too much of the story. If you can set up a natural language software properly, it can just continue feeding accurate, up to date real stories for people to read.
That’s really exciting for journalism. Then we can speak about other functions that take a lot of process writing things. For instance, think of school teachers or university lecturers. They are marking a lot of things. At the moment there is some basic natural language generation for these guys. But if you read it, “Tom was a good boy”, “Tom behaved well”. Blah, it is all very repetitive. Education is an example of an industry that could use it. Another might be job descriptions; they can whip up that sort of information quick smart.
Trevor: It is a worry though, saying that “Tom is a good boy” or “Tom did well” or whatever that is, because it is really impersonalizing. If you think the teacher is really not putting their own thoughts that are genuine then it is not really how it is a true assessment of each student.
James: Well at the moment you might read that in a report because if you read some student report cards these days, they don’t sound like they are written by a person. They sound like they are printed from excel or something.
Trevor: Right. And what about something like applications for the sight impaired? Something you could think of having the software working there?
James: This is really interesting. I didn’t come up with this, I heard about it just recently. But I think this is a fantastic way to use natural language generation. It doesn’t exist yet but I expect soon it will. So, do you know what Google Glass is?
Trevor: Yes.
James: Google Glass is essentially goggles with a camera. The camera is not necessarily recording, it is accepting data. When it sees something, it understands it.
Let’s pretend that we give pair of Google Glasses to a vision impaired person. They walk down the road and they come to the gutter. They look to the right and they cannot see anything obviously but their glasses pick up that there is a bus coming. The glasses automatically and instantaneously takes the data, being “bus coming”, turns it into text, then turns text to voice. That goes straight into the person’s ear-piece as “stop, there is a bus coming”.
I mean that’s a fantastic way to think of natural language generation in the future.
Trevor: Fascinating insights into the world of NLG with James Mackay co-founder of Realwords. Thanks for coming in and being part of ‘start me up’ as a digital entrepreneur.
This is ABC Radio Brisbane in Queensland. Latest news is next.