Lullabot

DrupalCon Vienna Preview

Chris gets the skinny from a number of Lullabots and Drupalize.me team members on their upcoming sessions at DrupalCon Vienna.

Behind the Screens with Joe Shindelar

Chris goes behind the screens with Drupalize.me’s lead developer and trainer, Joe Shindelar. Joe and Chris discus topics such as: learning Drupal in order to teach it to others, how to prepare for a live presentation and what it’s like delivering a keynote presentation, and advice for new community members who want to get involved. Joe also takes us back in time to his first ever Drupal session, talks about what he would do if the internet went away, and has a special place in his heart for the Twin Cities Drupal Users Group. Bonus: Can you guess Joe’s spirit module?

An HTML and JavaScript Client for Elasticsearch

In my last article, Indexing Content from Drupal 8 using Elasticsearch, we saw how to configure a Drupal 8 site and an Elasticsearch server so content changes are pushed automatically. In this article, we will discover how to implement a very simple search engine that only requires HTML and JavaScript. It is intentionally as simple as possible so you can grab the key concepts and then adjust it to your project needs.

The demo in action

Here is a screenshot of the search demo, whose source code is available at this repository:

undefined

The above form contains a text input field that searches for a string among all full-text fields (in our case, the title and the body summary) and a filter by document type (articles or pages). When we click Search, the following query is submitted to Elasticsearch:

{ "size": 20, "query": { "bool": { "must": { "multi_match": { "query": "Melior Vereor", "fields": [ "title^2", "summary_processed" ] } }, "filter": { "term": { "type": "article" } } } } }

In the above query we are searching for the string Melior Vereor at the title and summary_processed fields, with a boost on the title so if the string is found there it shows first in the results. We are also filtering documents by article type.

The best way to discover the Query API in Elasticsearch is by installing Kibana, a web UI to browse, analyze, and perform requests. Here is a screenshot taken while I was building the above query before coding it into JavaScript:

undefined Configuring and securing Elasticsearch

While being able to perform client-side requests is great, it requires a few configurations and security settings in order to block access to the Elasticsearch server. Here are the things that you should do so that only the allowed applications can push content (i.e. a Drupal site) while every other application is only permitted to perform search requests.

For additional settings have a look at the Elasticsearch configuration reference, which is organized by module.

Setting up network access and CORS

Elasticsearch binds by default to the local interface, meaning that it can only be reached locally. If we want to allow external access, we need to adjust the following setting at /etc/elasticsearch/elasticsearch.yml:

# Set the bind address to a specific IP (IPv4 or IPv6):# network.host: [_local_, _eth0_]

_local_ is a special keyword that refers to the local host, while _eth0_ refers to the network interface whose identifier is eth0. I figured this out by executing ifconfig at the server where Elasticsearch is installed.

Next, we need to add the CORS settings so client-side applications like the one we saw above can perform requests from the browser. Here is a configuration set where we only allow the HTTP methods for performing search requests which should be appended to /etc/elasticsearch/elasticsearch.yml:

# CORS settings. http.cors.enabled: true http.cors.allow-origin: "*" http.cors.allow-methods : OPTIONS, HEAD, GET, POST Locking the Elasticsearch server

There are a few ways to configure which applications are allowed to manage an Elasticsearch server:

  1. Installing and configuring the ReadOnly REST plugin.
  2. Use a web server that authorizes and proxies requests to Elasticsearch.
  3. Use a server-side application that acts as middleware, like a Node.js application.

This article covers the first two options, while the third one will be explained in an upcoming article.

Installing and configuring ReadOnly REST plugin

ReadOnly REST makes it easy to define the policy to manage an Elasticsearch server and is the option that we chose for this demo. We started by following the installation instructions from the plugin’s documentation and then we added the following configuration to  /etc/elasticsearch/elasticsearch.yml:

readonlyrest: access_control_rules: - name: "Accept all requests from localhost and Drupal" hosts: [127.0.0.1, the.drupal.IP.address] - name: "Everything else can only query the index." indices: ["elasticsearch_index_draco_elastic"] actions: ["indices:data/read/*"]

There are two policies above:

  1. Allow all requests coming from the local machine where Elasticsearch is installed. This allows us to manage indexes via the command line and lets Drupal push content changes to the index that we use in the demo.
  2. Everything else can only query the index. There we specify the index identifier and what actions other applications can perform. In this particular case, just searching.

With the above setup, applications trying to alter the Elasticsearch server will only be able to do so if they comply with the rules. Here is an example where I attempted to create an index against the Elasticsearch server via the command line:

[[email protected] ~/Dropbox/Projects/Turner]$ curl -i -XPUT 'https://elastic.some.site/foo?pretty' HTTP/1.1 403 Forbidden content-type: text/plain; charset=UTF-8 content-length: 0

As expected, the ReadOnly REST plugin blocked it.

Using a web server as a proxy to authorize requests

An alternative approach is to put Elasticsearch behind a web server that performs the authorization. If you need further control over the authorization process than what ReadOnly REST plugin provides, then this could be a good option. Here is an example that uses nginx as a proxy.

Go search!

You have now seen how to query Elasticsearch from the browser using just HTML and JavaScript, and how to configure and secure the Elasticsearch server. In the next article, we will take a look at how to use a Node.js application that presents a search form, processes the form submission by querying Elasticsearch, and prints the results on screen.

Acknowledgements

Matt Oliveira for introducing me to Kibana and for his editorial and technical review.

The State of Media in Drupal Core

Matt and Mike talk with Drupal Media Initiative folks, Janez Urevc, Sean Blommaert, and Lullabot's own Marcos Cano about getting modern media management in Drupal core.

Lullabots Coming to DrupalCon Vienna

Several of our Lullabots and the team from our sister company, Drupalize.me, are about to descend upon the City of Music to present seven kick-ass sessions to the Drupal community in the EU. There will be a cornucopia of topics presented — from softer human-centric topics such as imposter syndrome to more technical topics such as Decoupled Drupal. So, if you're headed to DrupalCon Vienna next week, be sure to eat plenty of Sachertorte, drink lots of Ottakringer, and check out these sessions that will Rock You Like Amadeus:

Contenta - Drupal’s API Distribution Tuesday, September 26, 10:45-11:45

Sally Young, Cristina Chumillas, and Daniel Wehner

Contenta is a decoupled Drupal distribution that has many examples of various front-ends available as best practices guides. Lullabot Senior Technical Architect Sally Young, Christina Chumillas, and Daniel Wehner will bring you up to speed on the latest Contenta developments, including its current features and roadmap. You will also get a tour of Contenta’s possibilities that come with reference applications that implement the out-of-the-box initiative’s cooking recipe.

Automated Testing 101 Tuesday, September 26th, 10:45 - 11:45

Ezequiel “Zequi” Vázquez

Lullabot Developer, Ezequiel “Zequi” Vázquez, will explore the current state of test automation and present the most useful tools that provide testing capabilities for security, accessibility, performance, scaling, and more. Zequi will also give you advice on the best strategies to implement automated testing for your application, and how to cover relevant aspects of your software.

Get Started with Voice User Interfaces Tuesday, September 26th, 15:45 - 16:45

Amber Himes Matz

Drupalize.me Production Manager & Trainer, Amber Himes Matz, will survey the current state of voice and conversational interface APIs with an eye toward global language support. She’ll cover services including Alexa, Google, and Cortana by examining their distinct features and the devices, platforms, interactions, and spoken languages they support. If you’re looking for a better understanding of the voice and conversational interface services landscape, ideas on how to approach the voice UI design process, an understanding of concepts and terminology related to voice interaction, and ways to get started, this is the right session for you - complete with a demo!

Breaking the Myths of the Rockstar Developer Wednesday, September 27th, 10:45 - 11:45

Juan Olalla Olmo & Salvador Molina

Lullabot Developer, Juan Olalla Olmo, and Salvador Molina will share their experiences and explore the areas and attitudes that can help everyone become better professionals by embracing who they are and ultimately empower others to do the same. This inspiring session aims to help you grow professionally and provide more value at work by focusing on fostering the human relationships and growing as people.

Juan gave this presentation internally at Lullabot’s recent Design and Development Retreat. It was a highlight that sparked a lively conversation.

Virtual Reality on the Web - Overview and "How-to" Demo Wednesday, September 27th, 13:35 - 14:00

Wes Ruvalcaba

Want to make your own virtual reality experiences? Lullabot Senior Front-end Developer Wes Ruvalcaba will show you how. Starting with an overview of VR (and AR) concepts, technologies, and what its uses are, Wes will also demo and share code examples of VR websites we’ve made at Lullabot. You’ll also get an intro to A-Frame and Wes will explain how you can get started.

Thursday Keynote - Everyone Has Something to Share Thursday, September 28th, 9:00 - 10:15

Joe Shindelar

We’re especially proud of Drupalize.me's Joe Shindelar for being selected to give the Community Keynote. If you’ve been around Drupal for a while, it’s likely you’ve either met or learned from Joe. In this session, Joe will reflect on 10 years of both successfully and unsuccessfully engaging with the community. By doing so he hopes to help others learn about what they have to share, and the benefits of doing so. This is important because sharing:

  • Creates diversity, both of thought and culture
  • Builds people up, helps them realize their potential, and enriches our community
  • Fosters connections, and makes you, as an individual, smarter
  • Creates opportunities for yourself and others
  • Feels all warm and fuzzy
Making Content Editors Happy in Drupal 8 with Entity Browser Thursday, September 28th, 14:15 - 15:15

Marcos Cano

Lullabot Developer Marcos Cano will be presenting on Entity Browser, which is a Drupal 8 contrib module created to upload multiple images/files at once, select and re-use an image/file already present on the server, and more. In this session Marcos will:

  • Explain the basic architecture of the module, and how to take advantage of its plugin-based approach to extend and customize it
  • See how to configure it from scratch to solve different use-cases, including some pitfalls that often occur in that process
  • Check what we can copy or re-use from other contrib modules
  • Explore some possible integrations with other parts of the media ecosystem

See you next week in Wien!

Lullabots Coming to DrupalCon Vienna

Several of our Lullabots and the team from our sister company, Drupalize.me, are about to descend upon the City of Music to present seven kick-ass sessions to the Drupal community in the EU. There will be a cornucopia of topics presented — from softer human-centric topics such as imposter syndrome to more technical topics such as Decoupled Drupal. So, if you're headed to DrupalCon Vienna next week, be sure to eat plenty of Sachertorte, drink lots of Ottakringer, and check out these sessions that will Rock You Like Amadeus:

Contenta - Drupal’s API Distribution Tuesday, September 26, 10:45-11:45

Sally Young, Cristina Chumillas, and Daniel Wehner

Contenta is a decoupled Drupal distribution that has many examples of various front-ends available as best practices guides. Lullabot Senior Technical Architect Sally Young, Christina Chumillas, and Daniel Wehner will bring you up to speed on the latest Contenta developments, including its current features and roadmap. You will also get a tour of Contenta’s possibilities that come with reference applications that implement the out-of-the-box initiative’s cooking recipe.

Automated Testing 101 Tuesday, September 26th, 10:45 - 11:45

Ezequiel “Zequi” Vázquez

Lullabot Developer, Ezequiel “Zequi” Vázquez, will explore the current state of test automation and present the most useful tools that provide testing capabilities for security, accessibility, performance, scaling, and more. Zequi will also give you advice on the best strategies to implement automated testing for your application, and how to cover relevant aspects of your software.

Get Started with Voice User Interfaces Tuesday, September 26th, 15:45 - 16:45

Amber Himes Matz

Drupalize.me Production Manager & Trainer, Amber Himes Matz, will survey the current state of voice and conversational interface APIs with an eye toward global language support. She’ll cover services including Alexa, Google, and Cortana by examining their distinct features and the devices, platforms, interactions, and spoken languages they support. If you’re looking for a better understanding of the voice and conversational interface services landscape, ideas on how to approach the voice UI design process, an understanding of concepts and terminology related to voice interaction, and ways to get started, this is the right session for you - complete with a demo!

Breaking the Myths of the Rockstar Developer Wednesday, September 27th, 10:45 - 11:45

Juan Olalla Olmo & Salvador Molina

Lullabot Developer, Juan Olalla Olmo, and Salvador Molina will share their experiences and explore the areas and attitudes that can help everyone become better professionals by embracing who they are and ultimately empower others to do the same. This inspiring session aims to help you grow professionally and provide more value at work by focusing on fostering the human relationships and growing as people.

Juan gave this presentation internally at Lullabot’s recent Design and Development Retreat. It was a highlight that sparked a lively conversation.

Virtual Reality on the Web - Overview and "How-to" Demo Wednesday, September 27th, 13:35 - 14:00

Wes Ruvalcaba

Want to make your own virtual reality experiences? Lullabot Senior Front-end Developer Wes Ruvalcaba will show you how. Starting with an overview of VR (and AR) concepts, technologies, and what its uses are, Wes will also demo and share code examples of VR websites we’ve made at Lullabot. You’ll also get an intro to A-Frame and Wes will explain how you can get started.

Thursday Keynote - Everyone Has Something to Share Thursday, September 28th, 9:00 - 10:15

Joe Shindelar

We’re especially proud of Drupalize.me's Joe Shindelar for being selected to give the Community Keynote. If you’ve been around Drupal for a while, it’s likely you’ve either met or learned from Joe. In this session, Joe will reflect on 10 years of both successfully and unsuccessfully engaging with the community. By doing so he hopes to help others learn about what they have to share, and the benefits of doing so. This is important because sharing:

  • Creates diversity, both of thought and culture
  • Builds people up, helps them realize their potential, and enriches our community
  • Fosters connections, and makes you, as an individual, smarter
  • Creates opportunities for yourself and others
  • Feels all warm and fuzzy
Making Content Editors Happy in Drupal 8 with Entity Browser Thursday, September 28th, 14:15 - 15:15

Marcos Cano

Lullabot Developer Marcos Cano will be presenting on Entity Browser, which is a Drupal 8 contrib module created to upload multiple images/files at once, select and re-use an image/file already present on the server, and more. In this session Marcos will:

  • Explain the basic architecture of the module, and how to take advantage of its plugin-based approach to extend and customize it
  • See how to configure it from scratch to solve different use-cases, including some pitfalls that often occur in that process
  • Check what we can copy or re-use from other contrib modules
  • Explore some possible integrations with other parts of the media ecosystem

See you next week in Wien!

Drupal Voices is Back!

Drupal Voices is being revived and rebranded. We have a new name, an improved format, and some amazing interviews with members of the community.

Talking Performance with Pantheon's David Strauss and Josh Koenig

Matt & Mike are joined by Pantheon co-founder and CTO David Strauss, Pantheon co-founder and Head of Product Josh Koenig, as well as Lullabot's own performance expert Nate Lampton to talk everything performance. Topics include front-end performance, server-side PHP, CDNs, caching, and more.

Testing Latency with the Flexible Network Tester

I recently rebuilt my home network with a new router. As we all work from home, we use Slack, Hangouts, and other VoIP services all the time. It’s really important that voice and video calls are reliable and glitch-free. As I was researching the right configuration for my router, I noticed that there were lots of really nice graphs in various blog posts.

It turned out that they were all being generated by Flent, the “Flexible Network Tester.” With Flent, you can easily see how your internet connection behaves under load. When we think of testing a home internet connection, we usually just use a speed test like those from speedtest.net that measures download and upload speeds. While these are great for initial testing, they don’t accurately represent what really matters for most people: latency, how long it takes to send a small bit of data and get a response back. If you’ve ever had to drop a Skype or Hangouts call because of glitches, odds are you’re running into latency spikes, and not low bandwidth. Using Flent requires a bit of setup locally, and a remote server to test against. Here’s how I got Flent up and running on my Mac laptop, testing against an Ubuntu server hosted at Linode.

Netperf and fping setup

Flent is a wrapper around other, lower-level network testing tools. On the server side, it requires the netserver daemon to be running, which is included in Netperf. When I was first testing, I was using an Ubuntu 12.04 server where Netperf was quite out of date and didn’t meet the minimum version required by Flent. Since then, I’ve upgraded to Ubuntu 16.04, and in basic testing Flent works as expected. However, the package runs the daemon by default as root, and is still one point release out of date. Here’s how I compiled the latest version and ran it as a regular user. Note that there is also a public test server at netperf.bufferbloat.net, though I haven’t used it myself.

When compiling software, I like to check out code from source control instead of using tarballs, as it makes it easy to track any changes or new files. Netperf is using Subversion, so I installed that along with the basic compiler tools:

$ sudo apt install build-essential subversion $ svn checkout http://www.netperf.org/svn/netperf2/tags/netperf-2.7.0 $ cd netperf-2.7.0 $ ./configure && make $ src/netserver -D # Run netserver without actually installing it

On the client, both Netperf and a compatible ping tool are required. On macOS, Netperf is available in Homebrew, which makes it almost easy to install. Unfortunately, Flent requires that Netperf is compiled with “demo mode” enabled, so we have to edit the formula for it. This gist got me pointed in the right direction. It turns out that Flent’s 2.7.0 demo mode has been broken on macOS for some time. Run brew edit netperf, and add support for HEAD and the --enable-demo flag:

diff --git a/Formula/netperf.rb b/Formula/netperf.rb index 3ba24764a..daaa27e2a 100644 --- a/Formula/netperf.rb +++ b/Formula/netperf.rb @@ -3,6 +3,7 @@ class Netperf < Formula homepage "http://netperf.org" url "ftp://ftp.netperf.org/netperf/netperf-2.7.0.tar.bz2" sha256 "842af17655835c8be7203808c3393e6cb327a8067f3ed1f1053eb78b4e40375a" + head "http://www.netperf.org/svn/netperf2/trunk", :using => :svn bottle do cellar :any_skip_relocation @@ -15,7 +16,8 @@ class Netperf < Formula def install system "./configure", "--disable-dependency-tracking", - "--prefix=#{prefix}" + "--prefix=#{prefix}", + "--enable-demo" system "make", "install" end

Finally, install with brew install --HEAD netperf.

As I was using macOS, I needed to install fping with brew install fping.

Flent itself is written in Python, and is available in pip. Since we’re already using Homebrew, we may as well avoid the macOS python and it’s sudo requirements by installing python with brew install python. Now, we can install Flent with pip install flent. Verify it’s working by running flent --help.

Running Tests

Lets run some tests and generate graphs! The most common test is the ‘realtime response under load’, or ‘rrul’ test. This test runs 4 downloads, 4 uploads, and pings at the same time.

undefined

$ flent rrul -p all_scaled -l 60 -H myserver.example.com -o graph.png && open graph.png

On my network with the Flow Queuing CoDel traffic shaper enabled, we can see I’m getting around 27Mbit/s (6.7 MBit/s times 4 download streams), and around 3.4Mbit/s upload, with latency no worse than 80ms. In this case, my network was in no way “quiet,” with several other applications and computers using bandwidth at the same time. If I had truly limited bandwidth to Flent, the graphs would have much less deviation.

To see all of the available tests, look at the configuration files in /usr/local/lib/python2.7/site-packages/flent/tests. They are well documented and offer some really interesting scenarios for this sort of performance testing.

Flent has been an invaluable tool in testing my home network. While it’s great for running active tests, it isn’t a replacement for network monitoring. Next, I hope to build a dashboard to capture historical bandwidth and latency on my network connection.

The Hidden Costs of Decoupling

Decoupled Drupal has been well understood at a technical level for many years now. While the implementation details vary, most Drupal teams can handle working on decoupled projects. However, we’ve heard the following from many of our clients:

  1. We want a decoupled site. Why is this web project so expensive compared to sites I worked on in the past?
  2. Why do our decoupled projects seem so unpredictable?
  3. If we decide to invest in decoupled technologies, what can we expect in return?

Let’s dive into these questions.

Why Can Decoupled Sites Cost More?

Before getting too much into the details of decoupled versus full-stack, I like to ask stakeholders:

“What does your website need to do today that it didn’t 5 years ago?”

Often, the answer is quite a lot! Live video, authenticated traffic, multiple mobile apps, and additional advertising deals all add to more requirements, more code, and more complexity. In many cases, the costs that are unique to decoupling are quite small compared to the costs imposed by the real business requirements.

However, I have worked on some projects where the shift to a decoupled architecture is fundamentally a technology shift to enable future improvements, but the initial build is very similar to the existing site. In those cases, there are some very specific costs of decoupled architectures.

Decoupling means forgoing Drupal functionality

Many contributed modules provide pre-built functionality we rely on for Drupal site builds. For example, the Quickedit module enables in-place editing of content. In a decoupled architecture, prepare to rewrite this functionality. Website preview (or even authenticated viewing of content) has to be built into every front end, instead of using the features we get for free with Drupal. Need UI localization? Content translation? Get ready for some custom code. Drupal has solved a lot of problems over the course of its evolution, so you don’t have to—unless you decouple.

Decoupling is shorthand for Service Oriented Architectures

For many organizations, a decoupled website is their first foray into Service Oriented Architectures. Most full-stack Drupal sites are a single application, with constrained integration points. In contrast, a decoupled Drupal site is best conceived of as a “content service,” accessed by many disparate consumers.

I’ve found that the “black-boxing” of a decoupled Drupal site is a common stumbling block for organizations and a driver behind the increased costs of decoupling. To properly abstract a system requires up-front systems design and development that doesn’t always fit within the time and budget constraints of a web project. Instead, internal details end up being encoded into the APIs Drupal exposes, or visual design is reflected in data structures, making future upgrades and redesigns much more expensive. Writing good APIs is hard! To do it well, you need a team who is capable of handling the responsibility—and those developers are harder to find and cost more.

Scalable systems and network effects

Once your team dives into decoupling Drupal, they are going to want to build more than just a single Drupal site and a single JavaScript application. For example, lullabot.com actually consists of five systems in production:

  1. Drupal for content management
  2. A CouchDB application to serve content over an API
  3. A second CouchDB application to support internal content preview
  4. A React app for the site front end
  5. Disqus for commenting

Compared to the sites our clients need, lullabot.com is a simple site. In other words, as you build, expect to be building a web of systems, and not just a “decoupled” website. It’s possible to have a consumer request Drupal content directly, especially in Drupal 8, but expect your tech teams to push for smaller “micro” services as they get used to decoupling.

Building and testing a network of systems requires a lot of focus and discipline. For example, I’ve worked with APIs that expose internal traces of exceptions instead of returning something usable to API consumers. Writing that error handling code on the service is important, but takes time! Is your team going to have the bandwidth to focus on building a robust API, or are they going to be focusing on the front-end features your stakeholders prioritize?

I’ve also seen decoupled systems end up requiring a ton of human intervention in day-to-day use. For example, I’ve worked with systems where not only is an API account created manually, but manual configuration is required on the API end to work properly. The API consumer is supposed to be abstracted from these details, but in the end, simple API calls are tightly coupled to the behind-the-scenes configuration. A manual set up might be OK for small numbers of clients, but try setting up 30 new clients at once, and a bottleneck forms around a few overworked developers.

Another common mistake is not to allow API consumers to test their integrations in “production.” Think about Amazon’s web services—even if your application is working from a QA instance, as far as Amazon is concerned there are only production API calls available. Forcing other teams to use your QA or sandbox instance means that they won’t be testing with production constraints, and they will have production-only bugs. It’s more difficult to think about clients creating test content in production—but if the API doesn’t have a good way to support that (such as with multiple accounts), then you’re missing a key set of functionality.

It’s also important to think about error conditions in a self-serve context. Any error returned by an API must make clear if the error is due to an error in the API, or the request made of the API. Server-side errors should be wired up to reporting and monitoring by the API team. I worked with one team where client-side errors triggered alerts and SMS notifications. This stopped the client-side QA team from doing any testing where users entered bad data beyond very specific cases. If the API had been built to validate inbound requests (instead of passing untrusted data through its whole application), this wouldn’t have been a problem.

There's a lot to think of when it comes to decoupled Drupal sites, but it’s the only way to build decoupled architectures that are scalable, and that lead to faster development. Otherwise, decoupling is going to be more expensive and slower, leaving your stakeholders unsatisfied.

Why are decoupled projects unpredictable?

When clients are struggling with decoupled projects, we’ve often found it’s not due to the technology at all. Instead, poor team structure and discipline lead to communication breakdowns that are compounded by decoupled architectures.

The team must be strong developers and testers

Building decoupled sites means teams have to be self-driving in terms of automated testing, documentation, and REST best practices. QA team members need to be familiar with testing outside of the browser if they are going to test APIs. If any of these components are missing, then sprints will start to become unpredictable. The riskiest scenario is where these best practices are known, but ignored due to stakeholders prioritizing “features.” Unlike one-off, full-stack architectures, there is little room to ignore these foundational techniques. If they’re ignored, expect the team to be more and more consumed by technical debt and hacking code instead of solving the actual difficult business problems of your project.

The organizational culture must prioritize reliable systems over human interactions

The real value in decoupled architectures comes not in the technology, but in the effects on how teams interact with each other. Ask yourself: when a new team wants to consume an API, where do they get their information? Is it primarily from project managers and lead developers, or documentation and code examples? Is your team focused on providing “exactly perfect” APIs for individual consumers, or a single reusable API? Are you beholden to a single knowledge holder?

This is often a struggle for teams, as it significantly redefines the role of project managers. Instead of knowing the who of different systems the organization provides, it refocuses on the what - documentation, SDKs, and examples. Contacting a person and scheduling a meeting becomes a last resort, not a first step. Remember, there’s no value in decoupling Drupal if you’ve just coupled yourself to a lead developer on another team.

Hosting complexity

One of the most common technological reasons driving a decoupled project is a desire to use nodejs, React, or other JavaScript technologies. Of course, this brings in an entire parallel stack of infrastructure that a team needs to support, including:

  • HTTP servers
  • Databases
  • Deployment scripts
  • Testing and automation tools
  • Caching and other performance tools
  • Monitoring
  • Local development for all of the above

On the Drupal side, we’ve seen many clients want to host with an application-specific host like Acquia or Pantheon, but neither of those support running JavaScript server-side. JavaScript-oriented hosts likewise don’t support PHP or Drupal well or at all. It can lead to some messy and fragile infrastructure setups.

All of this means that it’s very difficult for a team to estimate how long it will take to build out such an infrastructure, and maintenance after a launch can be unpredictable as well. Having strong DevOps expertise on hand (and not outsourced) is critical here.

Decoupled often means “use a bunch of new nodejs / JavaScript frameworks”

While server-side JavaScript seems to be settling down towards maturity nicely, the JavaScript ecosystem for building websites is reinventing itself every six months. React of today is not the same React of 18 months ago, especially when you start considering some of the tertiary libraries that fill in the gaps you need to make a real application. That’s fine, especially if your project is expected to take less than 6 months! However, if your timeline is closer to 12-18 months, it can be frustrating to stakeholders to see rework of components they thought were “done,” simply because some library is no longer supported.

What’s important here is to remember that this instability isn’t due to decoupling—it’s due to front-end architecture decisions. There’s nothing that stops a team from building a decoupled front-end in PHP with Twig, as another Drupal site, or anything else.

If we invest in Decoupled Drupal, what’s the payoff?

It’s not all doom and decoupled gloom. I’ve recommended and enjoyed working on decoupled projects in the past, and I continue to recommend them in discoveries with clients. Before you start decoupling, you need to know what your goals are.

A JavaScript front end?

If your only goal is to decouple Drupal so you can build a completely JavaScript-driven website front end, then simply doing the work will give you what you want. Infrastructure and JavaScript framework churn are most common stumbling blocks and not much else. If your team makes mistakes in the content API, it’s not like you have dozens of apps relying on it. Decouple and be happy!

Faster development?

To have faster site development in a decoupled context, a team needs to have enough developers so they can be experts in an area. Sure, the best JavaScript developers can work with PHP and Drupal but are they the most efficient at it? If your team is small and a set of “full-stack” developers, decoupling is going to add abstraction that slows everything down. I’ve found teams need to have at least 3 full-time developers to get efficiency improvements from decoupling. If your team is this size or larger, you can significantly reduce the time to launch new features, assuming everyone understands and follows best development practices.

Multichannel publishing?

Many teams I’ve worked with have approached decoupled Drupal not so much to use fancy JavaScript tools, but to “push” the website front end to be equal to all other apps consuming the same content. This is especially important when your CMS is driving not just a website and a single app, but multiple apps such as set-top TV boxes, game consoles, and even apps developed completely externally.

With full-stack Drupal, it’s easy to create and show content that is impossible to view on mobile or set-tops apps. By decoupling the Drupal front end, and using the same APIs as every other app, it forces CMS teams to develop with an API-first mentality. It puts all consumers on an equal playing field, simplifying the development effort in adding a new app or platform. That, on it’s own, might be a win for your organization.

Scaling large teams?

Most large Drupal sites, even enterprise sites, have somewhere between 5-10 active developers at a time. What if your team has the budget to grow to 30 or 50 developers?

In that case, decoupled Drupal is almost the only solution to keep individuals working smoothly. However, decoupled Drupal isn’t enough. Your team will need to completely adopt an SOA approach to building software. Otherwise, you’ll end up paying developers to build a feature that takes them months instead of days.

Decoupling with your eyes open

The most successful decoupled projects are those where everyone is on board—developers, QA, editorial, and stakeholders. It’s the attitude towards decoupling that can really push teams to the next level of capability. Decoupling is a technical architecture that doesn’t work well when the business isn’t buying in as well. It’s worth thinking about your competitors too—because if they are tech companies, odds are they are already investing in their teams and systems to fully embrace decoupling.

Lullabot Featured as Drupal Expert on Clutch

We were recently approached by Clutch, the Washington, DC-based research firm focused on the technology, marketing, and digital industries, for an interview about Drupal. The interview is centered around how the platform addresses our clients’ challenges, what types of organizations Drupal best serves, features and drawbacks, development, and how to choose the right CMS.

Our company’s president, Brian Skowron, was the lucky participant and had the opportunity to share his insights as part of Clutch’s expert interview series. This series highlights web agencies’ technical wisdom with the goal of helping other companies learn more about different website platforms’ strengths and limitations as they look to build new sites.

Here are a few highlights from the interview:

Drupal's strengths

Drupal is one of the most widely-known, open-source CMS platforms that is a powerful tool for managing large amounts of content. In the interview, Brian explains that Drupal’s main strengths are its flexibility and ability to be customized:

It’s a powerful platform, and it can do just about anything. It’s very feature-rich in terms of its ability to model content, in terms of editorial capabilities, and in terms of its abilities to accommodate customized workflows and permissions governance. Choosing the right platform

With so many existing and rapidly emerging content channels, knowing what CMS platform would be the best fit is often challenging. It's important to know the right questions to ask because of the complexity involved with a CMS and all of the stakeholders who use, maintain, and manage it.

Many times, at least within the CMS landscape, people are drawn towards demoable features, whatever may be interesting and unique, but, what’s really important in choosing a platform, is how well it will match the needs of the organization, and the publishing needs of websites and various other channels. Drupal security

Clutch also asked about Drupal’s security capabilities, an increasingly important topic in today’s insecure world, to which Brian explained that the platform is known as one of the most secure among CMSs. Given its open-source nature, Drupal has a large community of developers behind it ensuring that all security-related issues are resolved to avoid website vulnerabilities. However, he also notes that it’s ultimately up to the end-user to keep their website updated and secure.

Unlike working with proprietary vendor software, where the client is reliant on them to give notifications on and patch any vulnerabilities, Drupal has a team which is constantly working on this. Some of our team members at Lullabot are part of that security committee. For the end-user of Drupal, the most important thing is to keep their environment up-to-date and patched. These security releases happen on a routine basis, and, as with any software, there is some responsibility on the user side to patch their system. Otherwise, there is a number of security best practices around securely hosting the site, as well as configuring Drupal such that it’s secure.

Read Brian’s full interview and find out how he rated Drupal on functionality, implementation, and more.

Photo by Alejandro Escamilla

Indexing content from Drupal 8 using Elasticsearch

Last week, a client asked me to investigate the state of the Elasticsearch support in Drupal 8. They're using a decoupled architecture and wanted to know how—using only core and contrib modules—Drupal data could be exposed to Elasticsearch. Elasticsearch would then index that data and make it available to the site's presentation layer via the Elasticsearch  Search API

During my research, I was impressed by the results. Thanks to Typed Data API plus a couple of contributed modules, an administrator can browse the structure of the content in Drupal and select what and how it should be indexed by Elasticsearch. All of this can be done using Drupal's admin interface.

In this article, we will take a vanilla Drupal 8 installation and configure it so that Elasticsearch receives any content changes. Let’s get started!

Downloading and starting Elasticsearch

We will begin by downloading and starting Elasticsearch 5, which is the latest stable release. Open https://www.elastic.co/downloads/elasticsearch and follow the installation instructions. Once you start the process, open your browser and enter http://127.0.0.1:9200. You should see something like the following screenshot:

undefined

Now let’s setup our Drupal site so it can talk to Elasticsearch.

Setting up Search API

High five to Thomas Seidl for the Search API module and Nikolay Ignatov for the Elasticsearch Connector module. Thanks to them, pushing content to Elasticsearch is a matter of a few clicks.

At the time of this writing there is no available release for Elasticsearch Connector, so you will have to clone the repository and checkout the 8.x-5.x branch and follow the installation instructions. As for Search API, just download and install the latest stable version.

Connecting Drupal to Elasticsearch

Next, let’s connect Drupal to the Elasticsearch server that we configured in the previous section. Navigate to Configuration > Search and Metadata > Elasticsearch Connector and then fill out the form to add a cluster:

undefined

Click 'Save' and check that the connection to the server was successful:

undefined

That’s it for Elasticsearch Connector. The rest of the configuration will be done using the Search API module.

Configuring a search index

Search API provides an abstraction layer that allows Drupal to push content changes to different servers, whether that's Elasticsearch, Apache Solr, or any other provider that has a Search API compatible module. Within each server, search API can create indexes, which are like buckets where you can push data that can be searched in different ways. Here is a drawing to illustrate the setup:

undefined

Now navigate to Configuration > Search and Metadata > Search API and click on Add server:

undefined

Fill out the form to let Search API manage the Elasticsearch server:

undefined

Click Save, then check that the connection was successful:

undefined

Next, we will create an index in the Elasticsearch server where we will specify that we want to push all of the content in Drupal. Go back to Configuration > Search and Metadata > Search API and click on Add index:

undefined

Fill out the form to create an index where content will be pushed by Drupal:

undefined undefined undefined

Click Save and verify that the index creation was successful:

undefined

Verify the index creation at the Elasticsearch server by opening http://127.0.0.1:9200/_cat/indices?v in a new browser tab:

undefined

That’s it! We will now test whether Drupal can properly update Elasticsearch when the index should reflect content changes.

Indexing content

Create a node and then run cron. Verify that the node has been pushed to Elasticsearch by opening the URL http://127.0.0.1:9200/elasticsearch_index_draco_elastic_index/_search, where elasticsearch_index_draco_elastic_index is obtained from the above screenshot:

undefined

Success! The node has been pushed but only it’s identifier is there. We need to select which fields we do want to push to Elasticsearch via the Search API interface at Configuration > Search and Metadata > Search API > Our Elasticsearch index > Fields:

undefined

Click on Add fields and select the fields that you want to push to Elasticsearch:

undefined

Add the fields and click Save. This time we will use Drush to reset the index and index the content again:

undefined

After reloading http://127.0.0.1:9200/elasticsearch_index_draco_elastic_index/_search, we can see the added(s) field(s):

undefined Processing the data prior to indexing it

This is the extra ball: Search API provides a list of processors that will alter the data to be indexed to Elasticsearch. Things like transliteration, filtering out unpublished content, or case insensitive searching, are available via the web interface. Here is the list, which you can find by clicking Processors when you are viewing the server at Search API :

undefined When you need more, extend from the APIs

Now that you have an Elasticsearch engine, it’s time to start hooking it up with your front-end applications. We have seen that the web interface of the Search API module saves a ton of development time, but if you ever need to go the extra mile, there are hooks, events, and plugins that you can use in order to fit your requirements. A good place to start is the Search API’s project homepage. Happy searching!

Acknowledgements

Thanks to:

Eliminating Robots and VoIP Glitches with Active Queue Management

You’re in the middle of a standup giving an update on a particularly thorny bug. Suddenly, the PM starts interrupting, but you can’t understand what they’re saying. You both talk over each other for 20 seconds, until finally, you say “I’ll drop the Hangout and join with my phone”. And, just as you go to hang up, the PM’s words come in clearly. The standup continues.

These sorts of temporary, vexing issues with VoIP and video calls are somewhat normal for those who work online, either at home or in an office. Figuring out the root cause of packet loss and latency is difficult and time-consuming. Is it you or the other end? Your WiFi or your ISP? Or, did your phone start downloading a large update while you were on the call, creating congestion over your internet connection? If we can eliminate one of these variables, it will be much easier to fix latency when it does strike.

Other than throwing more money at your ISP (which might not even be possible), there is a solution to congestion: Quality of Service, or “QoS”, is a system where your router manages congestion, instead of relying on the relatively simple TCP congestion algorithms. At the core, the idea is that the router understands what types of traffic have low latency requirements (like video calls or games), and what types of traffic are “bulk”, like that new multi-gigabyte OS update.

QoS is by no means a new idea. The old WRT54G routers with the Tomato firmware have a QoS system that is still available today. Most prior QoS systems require explicit rules to classify traffic. For example:

  • Any HTTP request up to 1 megabyte gets “high” priority
  • Any DNS traffic gets “highest” priority
  • Any UDP traffic on ports 19302 to 19309 (Google Hangouts voice and video) gets placed in “high” priority
  • All other traffic is assigned the “low” priority

This works well when you have a small number of easily identified traffic flows. However, it gets difficult to manage when you have many different services to identify. Also, many services use common protocols to encapsulate their traffic to make it difficult to block or control. Skype is a classic example of this, where it intentionally does everything it can to prevent firewalls from identifying it’s traffic. HTTPS traffic is also difficult to manage. For example, some services use HTTPS for video chat, where latency matters. At the same time, many remote backup services also use HTTPS for uploads, where latency doesn’t matter. Untangling these two flows is very difficult with most QoS systems.

Ideally, DSCP would allow applications to self-identify their traffic into different classes, without needing classification rules at the firewall. Most applications don’t do this, and Windows in particular blocks applications from setting DSCP flags without special policy settings).

A new class of QoS algorithms has been researched and developed in the last decade, known as “Active Queue Management” or AQM. The goal of these algorithms is to be “knobless” - meaning, an administrator shouldn’t need to create and continually manage classification rules. I found this episode of the Packet Pushers Podcast to be invaluable in figuring out AQM. Let’s see how to set up AQM using the “FlowQueue CoDel” algorithm on our home router, and find out how much better it makes our internet connection.

The Software

First, you need to have a router that supports AQM. CoDel has been available in the Linux and FreeBSD kernels for some time, but most consumer and ISP-deployed routers ship ancient versions of either. I’m going to show how to set up with OPNSense, but similar functionality is available with OpenWRT, LEDE, and pfSense, and likely any other well-maintained router software.

Baseline Speed Tests

It’s important to measure your actual internet connection speed. Don’t go by what package your ISP advertises. There can be a pretty significant variance depending on the local conditions of the network infrastructure.

When testing, it’s important to be sure that no other traffic is being transmitted at the same time. It’s easiest to plug a laptop directly into your cable or DSL modem. Quit or pause everything you can, including background services. If your cable modem is in an inconvenient location, you can add firewall rules to restrict traffic. Here’s an example where I used the IP address of my laptop and then blocked all traffic on the LAN interface that didn’t match that IP.

undefined

I really like the DSLReports Speedtest. It’s completely JavaScript based, so it works on mobile devices, and includes a bufferbloat metric. This is what it shows for my 30/5 connection, along with a failing grade for latency.

undefined

A good CLI alternative is speedtest-cli, but you will have to run a ping alongside it by hand. I also used flent extensively, which provides nice graphs combining bandwidth and latency over time. It requires a server to test with, so it’s not for everyone. Its real-time response under load (RRUL) test is more intense, testing with 4 up and 4 down flows of traffic, combined with a ping test. We can see how badly my connection falls apart, with almost 1000ms of latency and many dropped packets.

$ flent rrul -p all_scaled -l 60 -H remote-host.example.com -o graph.png undefined

Run your chosen test tool a few times, and find the average of the download speed and the upload speed. Keep those numbers, as we’ll use them in the next step.

If your upload speed is less than 5Mbit/s, you might find that FQ CoDel performs poorly. I was originally on a 25/2 connection, and while the performance was improved, VoIP still wasn’t usable under load. Luckily, my ISP upgraded all their packages while I was setting things up, and with a 30/5 connection, everything works as expected. For slower upstream connections, you might find better performance with manual QoS rules, even though it takes much more effort to maintain.

Setting up FQ CoDel

It might not seem intuitive, but at its core, QoS is a firewall function. It’s the firewall that classifies traffic and accepts or blocks it, so the firewall is also the place to mark or modify the same traffic. In OPNSense, the settings are under “Traffic Shaper” inside of the Firewall section. One important note is that while OPNSense uses the pf firewall for rules and NAT, it uses ipfw for traffic shaping.

In OPNSense, the traffic shaper first classifies traffic using rules. Each rule can redirect traffic to a queue, or directly to a pipe. A queue is used to prioritize traffic. Finally, queues feed into pipes, which is where we constrain traffic to specified limit. While this is the flow of packets through your network, it’s simpler to configure pipes, then queues, and finally rules.

1. Configure pipes to limit bandwidth

This is where we use the results of our earlier speed tests. We need to configure two pipes; one for upstream traffic, and one for downstream traffic. While other methods of QoS recommend reserving 15-30% of your bandwidth as “overhead”, with fq_codel around 3% generally works well. Here are the settings for my “upstream” pipe. Use the same settings for your downstream pipe, adjusting the bandwidth and description.

undefined

“ECN” stands for Explicit Congestion Notification, which is a way to let other systems know a given connection is experiencing congestion, instead of relying on just TCP drops. Of course, ECN requires enough bandwidth for upstream packets to be sent reliably, and your ISP would need to respect the packets. I’d suggest benchmarking with it both on and off to see what’s best.

2. Queues for normal flows and high priority traffic

We could skip queues and have rules wired up directly to pipes. However, I achieved lower latency by pushing TCP ACKs and DNS lookups above other traffic. To do that, we need to create three queues.

  1. An Upstream queue for regular traffic being transmitted, with a weight of 1.
  2. A “High priority upstream” queue for ACKs and DNS, with a weight of 10. In other words, give high priority traffic 10 times the bandwidth of everything else.
  3. A Downstream queue for inbound traffic, with a weight of 1.

Don’t enable or change any of the CoDel settings, as we’ve handled that in the pipes instead.

undefined 3. Classification rules

Finally, we need to actually route traffic through the queues and pipes. We’ll use four rules here:

  1. Mark DNS as high priority, but only from our caching DNS server on the router itself. We do this by setting the source IP to our router in the rule.

  2. Mark all upstream traffic to the upstream queue.

  3. Mark upstream ACK packets only to the high priority queue.

  4. Finally, mark inbound download packets to the download queue.

Here's an example for the DNS rule:

undefined

Click “Apply” to commit all of the configuration from all three tabs. Time to test!

Look for Improvements

Rerun your speed tests, and see if latency is improved. Ideally, you should see a minimal increase in latency even when your internet connection is fully loaded. While lower round-trip-times are better, as long as you’re under 100ms VoIP and video should be fine. Here’s what the above rules do for my connection. Note the slightly lower transfer speeds, but a dramatic improvement in bufferbloat.

undefined

Here are the flent results - I didn’t block other traffic when generating this, but note how the maximum latency is around 60ms:

undefined

There is one important limitation in fq_codel’s effectiveness; while it can handle most scenarios, this can be significant latency with hundreds or thousands of concurrent connections over a typical home connection.

undefined

If you see latency spikes in day-to-day use, check to make sure a system on your network isn’t abusing its connection. A successor to CoDel, Cake, might handle this better, but in my basic tests using LEDE it didn’t.

It’s clear that CoDel is a dramatic improvement over manual QoS or no QoS at all. I look forward to the next generation of AQM algorithms.

Markdown Won’t Solve Your Content Problems

(This article was cross-posted from Medium.)

Every few weeks I hear from a colleague who’s dealing with the tangles of editorial tools on a web CMS project. Inevitably, someone on their team suggests that things will be easier if users can’t enter HTML at all. “We’ll use Markdown,” they say. “It’s simple.”

On most projects, it’s a terrible idea — and I’m going to rant about it. If you don’t care about the nerdy details, though, here’s the long and short of it:

Markdown turns common “plaintext” formatting conventions like asterisks, indentation, and so on into HTML markup. If you need anything more complicated (say, an image with a caption or a link that opens in a new window), you need to mix markdown and raw HTML. Markdown is easy to remember for simple stuff (blockquotes, italics, headings, etc) but more complicated structures require extensions to the standard that are just as tweaky as HTML.

It was designed to mirror the ad-hoc conventions of ASCII-only channels like Usenet, email, and IRC. As creator John Gruber said in his original introduction of the project:

The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions.

Markdown’s strength is that it speeds and simplifies the most common text formatting tasks, and does so in a way that looks correct even before the markup is transformed into visual formatting. Markdown accomplishes that by ruthlessly cutting most HTML structures — anything that can’t be turned into a fairly straightforward ASCII-ism is left behind. When it’s pushed beyond that role, things get just as ugly any error-prone as raw HTML: witness the horrors of Markdown Tables and CSS In Markdown.

In many ways, Markdown is less a markup language and more a way to hide basic formatting information in a plain text document. That’s great! I use Markdown for my Jekyll-powered blog. If your project’s body field needs are simple text formatting without complicated embedding, captioning, microformatting, etc? Markdown is probably going to work fine. But — and this is a big one — if that’s all you need, then using a WYSIWYG HTML editor will also work fine.

WYSIWYG editors aren’t a pain because they “hide the code” from content creators. They’re problematic because they’re often configured to give editors access to the full range of HTML’s features, rather than the specific structural elements they really need to do their jobs. I’ve written about this “vocabulary mismatch” problem before, but it’s worth coming back to.

When you decide to use Markdown, you aren’t just choosing markup that’s easier to read; you're choosing a specific restrictive vocabulary. If that vocabulary covers your editors’ real needs, and they’ll be using plaintext to write and revise stories during their editorial workflow, by all means: consider it!

But if what you really need is a way to reign in chaotic, crappy markup, invest the time in figuring out how it’s being used in your content, what design requirements are being foisted on your editors, and what transformations are necessary for real world usage. Modern WYSIWYG editors don’t have to be the “dreamweaver in a div” disasters they used to be — taking the time to configure them carefully can give your team a clean, streamlined semantic editor that doesn’t constrain them unnecessarily.

Photo by Lee Campbell

What to do if you lose your 2FA device?

If you're following best practices for securing your personal devices and cloud accounts, you’ve likely enabled two-factor authentication. Two-factor authentication, or '2FA' for short, comes from the computer science concept of multi-factor authentication which requires that a would-be user must present at least two separate pieces of evidence to prove they are who they claim to be. These can be something you have (like an ATM card), something you know (like a PIN), or something you are, called "inherence," such as a fingerprint or retina scan.

To use a common example, let’s say you've properly set up 2FA on your Gmail account. To access Gmail, you need to have your username and password (something you know) and a machine-generated temporary token from an app like Google Authenticator (something you have) or a text. Now if you're a tricksy-little hobbit you probably use long, unique passwords for every application and those passwords aren't necessarily easy to remember. No problem, you store these passwords in a password management app like LastPass or 1Password and it does the remembering for you. The only password you need to remember is a single, master one to access your password store. Now, you're heavily reliant on something you have. But what happens if you don't have it? Like that time I left my phone somewhere between the hotel and LAX security.

In my case, I had an Android phone and stored my passwords using LastPass. I'll describe how I handled that situation and leave it to the reader to use this information to figure out the corresponding steps for their own device or password manager. I've provided links to the iOS equivalents of the Android tools that I use below.

Mother Father!

Losing your phone or tablet can be incredibly stressful, but even more so if it’s your 2FA device—that something you need to have in order to access all the things. I recently managed to lose both an iOS and an Android device within the space of a week. I was able to find my iPad at the Lost & Found at my local airport with a simple phone call. I wasn’t so lucky when I lost my phone.

The Android device was my primary phone and 2FA device. I had just passed through the TSA checkpoint at the airport, put clothes back on, admired the renovations, and walked up the stairs into the gate area when I realized that I didn’t have it with me. Traveling stresses me out in general. Losing my phone did not help. After three deep breaths and an extra-large strawberry parfait to take the edge off, I retraced my steps to TSA. After coming up short with the TSA agent, I had the United customer service representative call the departures check-in area, which also yielded nothing. That meant:

  • Someone had grabbed my phone out of a TSA tray,
  • I’d left it in departures and it hadn’t yet been found,
  • I’d left it in the Uber,
  • Or, I’d left it in the hotel.

It can be difficult to relax and mentally retrace your steps, but that’s key. The last time I clearly remembered using my phone was when I’d ordered the Uber at the hotel and after that, I wasn’t sure.

Luckily, my phone was locked and would require either knowing the lock-screen pattern or cutting off my thumb. My mind instantly leaped to an image of sophisticated cyber-thieves viewing the falafel stains on the glass to derive the pattern that I used. Failing that, they could just place the phone in a Faraday cage while working to decrypt the device. Then my mind helpfully suggested that, having lost two devices in the space of a week, I likely had an aggressive form of dementia. Finally, my mind regaled me with a preview of the embarrassment I would surely experience when I explained to my wife that I had lost another device. “No, not my iPad, that was last week…Yes, seriously. ‘You would lose your head if it wasn’t attached to your body!’” Fortunately, I was saved any such embarrassment due to the fact that I'D LOST MY PHONE and couldn’t call anyone.

I made a quick inventory of my assets. I had one hour till my flight departed. I had my laptop. I had two United Club passes. I burned a United Club pass and used my laptop to get online. Phew, I still had access to Slack, Gmail, and Apple Messages on my laptop. But things got tricky right off the bat.

Use software to try to locate your device

I went into Gmail, clicked on My Account and selected "Sign In and Security," which is what you want if you’ve set up your Android device properly.

Ruh roh. I was prompted for my password, which was in LastPass, which required Google Authenticator, which was on my phone. Now, there’s a simple way to handle this, but we didn’t think of it until afterward. The right thing to do is to calmly ask one of the administrators on your Google account to:

  1. Change your password (and provide you with the new password)
  2. Generate backup codes to use until you can get your 2FA device back
  3. Use the new password and backup code to access Google's Find Your Phone page

Side note: You can even generate a physical print-out of these 2FA backup codes (Google, Dropbox, Slack) and carry them in your wallet. Sadly, I stored all my backup codes as encrypted notes in LastPass. Also, LastPass itself doesn’t allow backup codes for LastPass access.

Turning off 2FA on LastPass

I didn’t think of resetting my Gmail password immediately and instead became focused on trying to restore my access to LastPass, which would also give me access to my Google account.

Now obviously people have lost their phones before, and it should be easy to get access to LastPass without your 2FA device. LastPass has the following helpful page describing the process.

I clicked the “I’ve lost my device” link below the Google Authenticator prompt and followed steps to send myself an email. I received the email, clicked on the helpful link, it appeared to work, and then I tried to log in. This yielded perhaps the most frustrating moment of the entire ordeal. After entering my master password, I was prompted to set up Multi-factor Authentication to access my account.

I contacted Ben Chavet, our system administrator, on Slack and proceeded to panic.

undefined

We realized that we had set a policy, organization-wide, that 2FA was required. That policy was overriding my individual requests. Ben needed to turn off the policy for the entire organization before I could get access using my master password. Luckily, I had a level-headed companion to help me figure this out.

In retrospect, LastPass was the wrong focus, but it was still important. Assuming a thief was able to unlock your device, there’s a small chance that your LastPass might be specified as a “trusted” device for 30 days and open automatically. Changing the password prevents that potential vulnerability. In LastPass, you can also disable access by targeting the lost device under Account Settings → Mobile Devices.

If you can't locate, go nuclear…

With LastPass access restored, I was able to get my Google password and backup codes. For some reason, my Android phone was pinging home but Google Find My Phone was unable to locate it. At this point, Ben and I decided to erase it. I remotely wiped the device. I was lucky, I had a network connection, and I was also able to look at the requests that had been made on my phone and there was nothing after Uber. (Android has a more robust feature set than iOS in this regard since you can see information about any sessions on your phone along with a timeline.) I probably could have tried harder to find the phone, but in the end, my immediate peace of mind was worth more to me than the phone. My wife had called the Uber driver who said that he didn't find it but he'd had other rides. Uber had a nice interface for reporting lost devices and connecting to your driver, but in this case, it was a dead end.

undefined Get control of your SIM Card

Finally, if you’ve erased your phone, you must deactivate the SIM card with your wireless provider. Remember that many 2FA schemes use SMS messages to send authentication tokens. You definitely don’t want someone to take control of your SIM card. You may be able to do this via your cell service provider's website if you have a laptop but lack a phone, as I did. Have someone send you test text messages and voicemails to verify deactivation.

In Short

In retrospect, while having your 2FA authentication app limited to a single device "limits your attack surface," it also means there's a single point of failure. Authy offers a multi-device "inherited trust" model, allowing you to protect lost devices, limit new accounts being created, but also to have 2FA spread across multiple devices, avoiding the single point of failure problem.

Here’s a quick summary of the steps I covered in my personal odyssey with a lost 2FA device.

  • Use Find my Phone if you have access to it on another device.
  • If you don't, retrace your steps and try to find it.
  • Can't find it? You’ll need access to your email first to successfully turn off 2FA on LastPass. An admin on your Google account can provide a new password and backup tokens if you don’t have them.
  • If you use LastPass and have 2FA turned on for the organization, you’ll need to have an account admin turn it off before the email reset will work.
  • Once you have LastPass and Email you should be able to access your other accounts. This is also a sobering reality.

Create SEO Juice From JSON LD Structured Data in Drupal

TL;DR:
  • Structured data has become an important component of search engine optimization (SEO).
  • Schema.org has become the standard vocabulary for providing machines with an understanding of digital data.
  • Google prefers Schema.org data as JSON LD over the older methods using RDFa and microdata. Also, JSON LD might be a better solution for decoupled sites.
  • Google provides tools to validate structured data to ensure you’re creating the right results.
  • You can use the Schema.org Metatag module to add Schema.org structured data as JSON LD in Drupal and validate it using Google’s tools.
Why does structured data matter to SEO?

Humans can read a web page and understand who the author and publisher are, when it was posted, and what it is about. But machines, like search engine robots, can’t tell any of that automatically or easily. Structured data is a way to provide a summary, or TL;DR (Too long; didn't read), for machines, to ensure they accurately categorize the data that is being represented. Because structured data helps robots do their job, it should be a huge factor in improving SEO.

Google has a Structured Data Testing Tool that can provide a preview of what a page marked up with structured data will look like in search results. These enhanced results can make your page stand out, or at least ensure that the search results accurately represent the page. Pages that have AMP alternatives, as this example does, get extra benefits, but even non-AMP pages with structured data receive enhanced treatment in search results.

undefined Who is Schema.org and why should we care?

Schema.org has become the de-facto standard vocabulary for tagging digital data for machines. It’s used and recognized by Google and most or all of the other search engines.

If you go to the main Schema.org listing page, you’ll see a comprehensive list of all the types of objects that can be described, including articles, videos, recipes, events, people, organizations, and much much more. Schema.org uses an inheritance system for these object types. The basic type of object is a Thing, which is then subdivided into several top-level types of objects:

  • Thing
    • Action
    • CreativeWork
    • Event
    • Intangible
    • Organization
    • Person
    • Place
    • Product

These top-level Things are then further broken down. For example, a CreativeWork can be an Article, Book, Recipe, Review, WebPage, to name just a few options, and an Article can further be identified as a NewsArticle, TechArticle, or SocialMediaPosting.

Each of these object types has its properties, like ‘name,' ‘description,' and ‘image,' and each inherits the properties of its parents, and adds their own additional properties. For instance, a NewsArticle inherits properties from its parents, which are Thing, CreativeWork, and Article. Finally, NewsArticle has some additional properties of its own. So it inherits ‘author’ and ‘description’ from its parents and adds a ‘dateline’ property that its parents don’t have.

undefined

Some properties are simple key/value pairs, like description. Other properties are more complex, such as references to other objects. So a CreativeWork object may have a publisher property, which is a reference to a Person or Organization object.

Further complicating matters, an individual web page might be home multiple, related or unrelated, Schema.org objects. A page might have an article and also a video. There could be other elements on the page that are not part of the article itself, like a breadcrumb, or event information. Structured data can include as many objects as necessary to describe the page.

Because there’s no limit to the number of objects that might be described, there's also a property mainEntityOfPage, which can be used to indicate which of these objects is the primary object on the page.

What are JSON LD, RDFa, and Microdata, where do they go, and which is better?

Once you decide what Schema.org objects and properties you want to use, you have choices about how to represent them on a web page. There are three primary methods: JSON LD, RDFa, and Microdata.

RDFa and Microdata use slightly different methods of accomplishing the same end. They wrap individual items in the page markup with identifying information.

JSON LD takes a different approach. It creates a JSON array with all the Schema.org information and places that in the head of the page. The markup around the actual content of the page is left alone.

Schema.org includes examples of each method. For instance, here’s how the author of an article would be represented in each circumstance:

RDFa <div vocab="http://schema.org/" typeof="Article"> <h2 property="name">How to Tie a Reef Knot</h2> by <span property="author">John Doe</span> The article text. </div> Microdata <div itemscope itemtype="http://schema.org/Article"> <h2 itemprop="name">How to Tie a Reef Knot</h2> by <span itemprop="author">John Doe</span> The article text. </div> JSON LD <script type="application/ld+json"> { "@context": "http://schema.org", "@type": "Article", "author": "John Doe", "name": "How to Tie a Reef Knot". “description”: “The article text”. } </script> Which is better?

There are advantages and disadvantages to each of these. RDFa and Microdata add some complexity to the page markup and are a little less human-readable, but they avoid data duplication and keep the item's properties close to the item.

JSON LD is much more human-readable, but results in data duplication, since values already displayed in the page are repeated in the JSON LD array.

All of these are valid, and none is really “better” than the other. That said, there is some indication that Google may prefer JSON LD. JSON LD is the only method that validates for AMP pages, and Google indicates a preference for it in its guide to structured data.

From the standpoint of Drupal’s theme engine, the JSON LD method would be the easiest to implement, since there’s no need to inject changes into all the individual markup elements of the page. It also might be a better solution for decoupled sites, since you could theoretically use Drupal to create a JSON LD array that is not directly tied to Drupal’s theme engine, then add it to the page using a front-end framework.

What about properties that reference other objects?

As noted above, many properties in structured data are references to other objects. A WebPage has a publisher, which is either an Organization or a Person.

There are several ways to configure those references. You can indicate the author of a CreativeWork either by using a shortcut, the string name or URL of the author, or by embedding a Person or Organization object. That embedded object could include more information about the author than just the name, such as a URL to an image of the person or a web page about them. In the following example, you can see several embedded references: image, author, and publisher.

<script type="application/ld+json">{ "@context": "http://schema.org", "@graph": [ { "@type": "Article", "description": "Example description.", "image": { "@type": "ImageObject", "url": "https://www.example.com/582753085.jpg", "width": "2408", "height": "1600" }, "headline": "Example Title", "author": { "@type": "Person", "name": "Example Person", "sameAs": [ "https://www.example-person.com" ] }, "dateModified": "2017-06-03T21:38:02-0500", "datePublished": "2017-03-03T19:14:50-0600", "publisher": { "@type": "Organization", "name": "Example.com", "url": "https://www.example.com//", "logo": { "@type": "ImageObject", "url": "https://www.example.com/logo.png", "width": "600", "height": "60" } } } ] }</script>

JSON LD provides a third way to reference other objects, called Node Identifiers. An identifier is a globally unique identifier, usually an authoritative or canonical URL. In JSON LD, these identifiers are represented using @id. In the case of the publisher of a web site, you would provide structured data about the publisher that includes the @id property for that Organization. Then instead of repeating the publisher data over and over when referencing that publisher elsewhere, you could just provide the @id property that points back to the publisher record. Using @id, the above JSON LD might look like this instead:

<script type="application/ld+json">{ "@context": "http://schema.org", "@graph": [ { "@type": "Article", "description": "Example description.", "image": { "@type": "ImageObject", "@id": "https://www.example.com/582753085.jpg" }, "headline": "Example Title", "author": { "@type": "Person", "@id": "https://www.example-person.com" }, "dateModified": "2017-06-03T21:38:02-0500", "datePublished": "2017-03-03T19:14:50-0600", "publisher": { "@type": "Organization", "@id": "https://www.example.com//" } } ] }</script> How can we be sure that Google understands our structured data?

Once you’ve gone to the work of marking up your pages with structured data, you’ll want to be sure that Google and other search engines understand it the way you intended. Google has created a handy tool to validate structured markup. You can either paste the URL of a web page or the markup you want to evaluate into the tool. The second option is handy if you’re working on changes that aren't yet public.

Once you paste your code into the tool, Google provides its interpretation of your structured data. You can see each object, what type of object it is, and all its properties.

If you’re linking to a live page rather than just providing a snippet of code, you will also see a ‘Preview’ button you can click to see what your page will look like in search results. The image at the top of this article is an example of that preview.

Schema.org doesn’t require specific properties to be provided for structured data, but Google has some properties that it considers to be “required” or “recommended.” If those are missing, validation will fail.

You can see what Google expects on different types of objects. Click into the links for each type of content to see what properties Google is looking for.

undefined How and where can we add structured data to Drupal?

The next logical question is what modules are available to accomplish the task of rendering structured data on the page in Drupal 8. Especially tricky is doing it in a way that is extensible enough to support that gigantic list of possible objects and properties instead of being limited to a simple subset of common properties.

Because of the complexity of the standards and the flexibility of Drupal’s entity type and field system, there is no one-size-fits-all solution for Drupal that will automatically map Schema.org properties to every kind of Drupal data.

The RDFa module is included in core and seems like a logical first step. Unfortunately, the core solution doesn’t provide everything needed to create content that fully validates. It marks up some common properties on the page but has no way to indicate what type of object a page represents. Is it an Article? Person? Organization? Event? There is no way to flag that. And there is no way to support anything other than a few simple properties without writing code.

There is a Drupal Summer of Code project called RDF UI. It adds a way to link a content type to a Schema.org object type and to link fields to Schema.org properties. Though the module pulls the whole list of possible values from Schema.org, some linkages aren’t possible, for instance, a way to identify the title or creation date as anything other than standard values. I tried it out, but content created using this module didn’t validate for me on Google’s tool. The module is very interesting, and it is a great starting point, but it still creates RDFa rather than JSON LD.

The architecture of the Schema.org Metatag module.

After looking for an existing solution for Drupal 8, I concluded there wasn’t a simple, valid, extensible solution available to create JSON LD, so I created a module to do it, Schema.org Metatag.

Most of the heavy lifting of Schema.org Metatag comes from the Metatag module. The Metatag module manages the mapping and storing of data is managed, allowing you to either input hard-coded values or use tokens to define patterns that describe where the data originates. It also has a robust system of overrides so that you can define global patterns, then override some of them at the entity type level, or at the individual content type level, and or even per individual item, if necessary. There is no reason not to build on that framework, and any sites that care about SEO are probably already using the Metatag module already. I considered it an ideal starting point for the Schema Metatag module.

The Schema.org Metatag module creates Metatag groups for each Schema.org object type and Metatag tags for the Schema.org properties that belong to that object.

The base classes created by the Schema.org Metatag module add a flag to groups and tags that can be used to identify those that belong to Schema.org, so they can be pulled out of the array that would otherwise be rendered as metatags, to be displayed as JSON LD instead.

Some Schema.org properties need more than the simple key/value pairs that Metatag provides, and this module creates a framework for creating complex arrays of values for properties like the Person/Organization relationship. These complex arrays are serialized down into the simple strings that Metatag expects and are unserialized when necessary to render the form elements or create the JSON LD array.

The primary goal was to make it easily and endlessly extensible. The initial module code focuses on the properties that Google notes as “Required” or “Recommended” for some basic object types. Other object types may be added in the future, but could also be added by other modules or in custom code. The module includes an example module as a model of how to add more properties to an existing type, and the existing modules provide examples of how to add other object types.

Also, there is a patch for the Metatag module to refactor it a bit to make it possible for a decoupled Drupal back end to share metatags with a front-end framework. Since this module is built on the Metatag model, hopefully, that change could be exploited to provide JSON LD to a decoupled front end as well.

This approach worked well enough in Drupal 8 that I am in the process of backporting it to Drupal 7 as well.

Enough talk, how do I get JSON LD on the page?

It’s helpful to understand how Schema.org objects and properties are intended to work, which is the reason for going into some detail about that here. It helps to figure out ahead of time what values you expect to see when you get done.

Start by scanning the Schema.org lists and Google’s requirements and recommendations to identify which objects and properties you want to define for the content on your site. If you’re doing this for SEO, spend some time reviewing Google's guide to structured data to see what interests Google. Not all content types are of interest to Google, and Google considers some properties to be essential while ignoring others.

Some likely scenarios are that you will have one or more types of Articles, each with images and relationships to the People that author them or the Organization that publishes them. You might have entity types that represent Events, or Organizations, or People or Places, or Products. Events might have connections to Organizations that sponsor them or People that perform in them. You should be able to create a map of the type of content you have and what kind of Schema.org object each represents.

Then install the Schema.org Metatag module and enable the sub-modules you need for the specific content types on your site. Use this module the same way you would use the Metatag module. If you understand how that works, you should find this relatively easy to do. See the detailed instructions for Metatag 8.x or Metatag 7.x. You can set up global default values using tokens, or override individual values on the node edit form.

In Conclusion

Providing JSON LD structured data on your website pages is bound to be good for SEO. But it takes a while to get comfortable with how structured data works and the somewhat confusing Schema.org standards, let alone Google’s unique set of requirements and recommendations.

No solution will automatically configure everything correctly out of the box, and you can’t avoid the need to know a little about structured data. Nevertheless, this article and the Schema.org Metadata module should enable you to generate valid JSON LD data on a Drupal site.