Load, search, and secure data in multiple formats


Window frame.
Window frame.

(source: Robert on Wikimedia Commons)

In this podcast episode, I speak with Dave Cassel, technical community manager at MarkLogic, creator of a multi-model NoSQL database that aims to integrate data silos for a unified view. We talked about integration patterns for loading and exporting data at ease, an architecture that enables efficient search and queries, and layers of security that follow the data from its original source throughout its lifecycle.

Work on applications, as soon as you load the data

The idea of ‘load as-is’ is that your data already exists in some form, and that form can vary dramatically. It can be word documents or XML or JSON data. It can also be stuff that you’ve already got in relational databases. The idea here is that if we can take that data in whatever form it currently exists and bring it into the database in that form, then we can start exploring it in the context of that database—rather than having to first build up some schema, some representation of it, do a bunch of ETL work, and only then be able to start working with it.

What that means is as soon as we get the data into the database, we can start actually working on our applications. The application is what actually delivers value—business value—to customers. By getting to work on that faster and being able to iterate on that, we’ve found we’ve got a much better time to value, and our customers have told us that repeatedly.

Ask questions of your data, without a complex architecture

Let’s think about a common architecture. You’ve got a three-tier application with a user interface, an application layer that holds your business logic, and a database. Most of the time, with that approach, you need to add on a separate search engine, and that’s how you’re going to search the text part of your data. That means—if we think about that application layer in the middle—the source code there is going to have to go to different places to query data or to search text. Then, it’s going to have to take those results and put them together, and synthesize them in some good way before presenting them to the user.

When you’ve got the search engine built into the database, the application layer has one place to go, and that really simplifies the code you have to write. The application layer itself becomes a lot simpler. If we don’t have that, what we end up with is complexity. Complexity usually leads to two things: longer release cycles and more bugs. By having the search engine built in, it allows you a single place to go for your information, it simplifies that application layer, and makes your application more reliable.

Implementing role-based security

With a role-based security model, it means that each user will have one or more roles assigned. Those roles determine what that person’s allowed to see, what they’re allowed to modify, what they’re allowed to execute. Those roles are then applied at the document level, typically, which means that, for a given role, you either can see this document or you can’t see it. That actually translates all the way into the indexes. For instance, if you run a SQL query, based on data pulled out from a JSON document, the information that’s in that SQL table is based on the security that was applied to the original document itself. That security information follows the data all the way through—no matter what you do with it.

Beyond that, we have element level security. For example, imagine you’ve got an XML document describing an employee. In that document, you’ve got stuff like the person’s name, their position, and what department they work in, but you also might have some more sensitive information, like compensation; the person’s boss needs to know that information and HR needs to know it. You might also have some information about the person’s benefits or maybe even their use of medical insurance; the benefits team might need to know about that. All of this information is in one document, but by using the element level security feature, we can assign different types of security to different parts of that document. We can say, ‘This section over here? That’s only visible to the benefits team. This part over here? Only visible to the manager.’ The rest of the document can be accessible to anybody.

This post is a collaboration between O’Reilly and MarkLogic. See our statement of editorial independence

Powered by WPeMatico