Entity Disambiguation - the Semantic XRay

21
Semantic X-Ray LEARNING TO VIEW CONTENT LIKE A SEARCH ENGINE

Transcript of Entity Disambiguation - the Semantic XRay

Semantic X-RayLEARNING TO VIEW CONTENT LIKE A SEARCH ENGINE

OverviewIn this presentation, we‘ll use your example: Maria’s Florist, Monterotondo.

We’d just like to put on record, before we rip it to shreds, that compared to much of the content we view online, the original copy was fair! ☺

Taking our own Web Copywriting guidelines and then subjecting the improved text to further analysis, we’ll build up a template to which your copywriters can adhere for future articles.

Over time, these best practices do become habit rather than the chore they may will seem today.

If any of the following training you find ambiguous, please, please ask for clarification.

Remember, the goal is to produce copy that leaves the web’s search indexers in no doubt about the topic of your content, yet present it so that it appears natural and fluent to the untrained eye.

Improbable without training? Yes. Impossible? Absolutely not. Buckle up – it’s a bumpy ride »

Rip it up and start againDISSECTING MARIA’S FLORIST, MONTEROTONDO

ReadabilityThis is the original analysis result, testing your article for NLP.

We have:

» 6 x hard to read sentences

» 4 adverbs

» 3 overly-complicated phrases

» 4 uses of the passive voice

The readability Grade 8 owes much to short sentences (named universities, hospital and funeral homes, etc.).

From this original content, we can discern the following »

What the document tells GooglebotFrom the extracted entities, one could assume that all of the key figures were visible to the search engine. Woo-hoo!

Well, maybe. But that’s only half of the story.

What does your content say about those ‘entities’? »

Concepts – what’s your content about?Now we’re seeing the bigger picture.

Because of the overwhelming presence of content (by percentage) about universities, the search engine thinks that we’re writing about Graduation.

Flower appears, so that’s great. Although it could do with being more relevant (61% not great).

However, there are some references that mayseem to fit in with the context at first glance, but then we dig deep into dbpedia:

» A Great Way to Care refers to a Hong Kong medical drama;

» Arrangement is musical, rather than a display of flora;

» Pomp and Circumstance Marches also musical.

Taxonomies – fitting in with a hierarchyThis was the most pleasing aspect of the original document’s content.

When categorising the topics found, the tool’s algorithms identified:

» /shopping/gifts (although not confident)

» /shopping/gifts/flowers

» /travel/tourist destinations/Italy (country identified, but troubled that the content is classified as ‘travel’ – perhaps due to the tone of the ‘flowery’ description ☺)

The Eye of the BeholderMAKING CONTENT BEAUTIFUL FOR HUMAN AND MACHINE ALIKE

Formatting for scanning on the webThe Internet has given birth to a new type of reader: the scanner.

The way your copy and accompanying images are laid out on page has never been more critical.

As well as the layout appearing aesthetically pleasing to the reader, search engines also understand HTML, the code that marks up text, images and hyperlinks and controls how on-page elements are displayed. HTML is also the foundation of the web’s semantic layer.

For the human, there are three key areas of formatting to ensure they, well, read your content:

1. Headings and sub-headings

2. Sentence and paragraph structure

3. Bullet (unordered) or numbered (ordered) lists when summarising numerous benefits and features of a product or service, or its unique elements.

Stuffing keywordsOnce upon a time, the more often a keyword appeared in your content, the greater the likelihood that it would rank well in SERPs. Add an in-linking template to focus the indexer on those keywords and you could appear in the top 3 results with ease, even in highly competitive niches.

Nowadays, that’s just not so. Old keyword-stuffing and linking practises will see you penalised with an indefinite recovery period.

There are, however, hot-spots, places where keywords are appropriate. The difference is that relevance is based on the human factor, rather than trying to game a search engine:

1. Keyword density: no greater than 2% (max. 14 appearances per 700-word article);

2. Your main keyword should appear in the main heading and one sub-heading;

3. Your keywords should appear naturally, as they would be spoken in conversation

Composition – handing over the batonPromotional tone should be used sparingly and only when expressly relevant, for two main reasons:

1. Google has stated categorically that its search engine is an informational highway, not a sales channel for businesses with an Internet presence;

2. Customers do not react well to pressurised sales patter. They need to realise the benefits of using your service/product and feel that they are in control of the decision-making process.

Your copy should also appeal to your entire target audience. This means making it comprehensible for all potential customers, irrespective of academia.

Specifically for Maria’s Florist, there are no intellectual barriers to people who can buy flowers. Copy must therefore be accessible to all.

The Fleisch Reading Ease scale will help you determine for whom your copy is suitable. With personalised search now so prevalent, it’s possible that Google Search can tailor results to the academic level of the customer if they’re signed into their Google account.

If your copy is deemed too academic, you could potentially lose an audience sector that’s not educated to the standard your copy dictates is prerequisite.

Like your content, but betterUSING TOOLS TO BREAK DOWN THE COPY’S COMPONENTS

De-fluffing –keep it relevantIs all that content necessary?

There’s a reason that fiction writers edit with a hatchet. The reader only wants copy that contributes to the story.

For web copywriting, this practise has become more critical. Initially because the reader wants the information quickly. But there’s something else.

The more content you include as fluff – or irrelevant to your message – the more off topic your article will be in the eyes of the search engine.

In the GIF to the right, we look at how this impacts the original content, along with some other errors sampled from the opening paragraphs.

(GIF won't play? Go here or Download this presentation.)

Use the |► button to flick through frames.

Comparing apples with core values [1]YOUR ORIGINAL RE-WRITE

Comparing apples with core values [2]YOUR ORIGINAL RE-WRITE

Let me entity tame youNothing pleases us more than a green screen.

Maria’s Florist and Monterotondo are the stand-out entities. They are both also classified correctly.

All other entities are relevant, nearby locations all incorporated and, more importantly, all are expressed in a positive light. »

This information is classifiedThe highest taxonomy is now clearly defined:

/shopping/gifts/flowers

With a score of 0.64 relevance, any doubt about what the content describes has been eliminated.

Also pleasing, the inclusion of

/marriage/weddings

Although it’s ‘not confident’, a search for wedding flowers Monterotondo(big money spinner) will make it so.

Copywriting is a conceptual artAs with the Taxonomy, the Concepts in the article are now crystal clear.

By making the copywriting strong and diversifying with longer-tail keywords, we’ve brought more (relevant) concepts to the table.

No, I’ve got no idea where ‘2005 singles’ fits into the mix, but compare the ‘Relevance’ score to the original document’s concepts:

none score less than 0.52, showing strong alignment with the topic.

Keywords – who gives a stuff? Not us!Similar to Taxonomies and Concepts, the lowest ‘Relevance’ keyword score is now markedly higher than in the original document.

Even without physically including “Florist Monterotondo”, it’s listed as the top keyword with a 0.94 relevance factor (1.0 being optimum).

We’ve also clarified that ‘arrangements’ refers to flowers, not music.

Plus we see ‘best bouquets’, ‘strong reputation’, ‘skilled florists’ and many more double-barrel keywords becoming highly relevant.

Yes, that’s pleasing.

SummaryPlan of action:

► Look for ways to add value to the topic you’re writing about over and above that which exists online;

►► This will often mean researching the topic and competition before you write your first word;

► Identify your customer’s pain points and provide them with a solution;

►Write the content so that it is readable (accessible) by your entire audience;

► Remove the passive voice, overly-complicated words and adverbs (as per slide 4)

►► “The road to Hell is paved with adverbs”, Stephen King;

► Pick out the main points in the article and craft them into suitable sub-headings;

►► It’s suggested that a reader should be able to grasp the article’s point by scanning the headings only;

► Structure sentences and paragraphs so that readers can scan them;

►► don’t write huge blocks of text and use bullet/numbered lists where appropriate;

► Ensure that the subject is always doing something to the object:

►►When you need to send birthday wishes to your wife…, not:

►►When birthday wishes need to be sent by you to your wife…;

► Do avail yourself of SEOWorkers.com Copywriting Guidelines and form them as habits.

So, there we have it. All of the steps you need to disambiguate entities to make Googlebot bow to your will.

It’s not about gaming search engines. It’s not about sell, sell, selling to your human audience.

It is about:

» clarifying your product and service;

» identifying your customer and their pain points;

» and then providing solutions that:

»»» work for the reader, and

»»» that Googlebot can associate with their query.

Thank you for your interest.

Get in touchThank you so much for seeing this presentation through to the end.

You host has been Jason Darrell.

On social, you'll find him on:◦ Google+

◦ LinkedIn

◦ Twitter

◦ Pinterest

You can order your Semantic XRay through this PPH 'Hourlie'

And do check out his F+ daily ezine (free): Freelancer Plus ezine

For more in-depth background to NLP, please see Jason's "Disambiguate Entities" article.