Ethereal Computer Architecture

Saturday 17 December 2016

HBase vs Cassandra: A pointless comparison?

It's easy to find articles comparing HBase and Cassandra and offering an opinion on which is better. There is, not surprisingly, no widespread agreement on this. Let me declare my bias/history up front: I've had extensive experience with Cassandra and some experience with HBase. Most recently, I have been architecting a system that will likely use HBase, but I suppose could use Cassandra as well.

I won't make you wait until the end of this blog post to get my take on which is better. However, you may not like my conclusion: it really depends on many things, especially on your existing architecture standards and whether you are running one of the major Hadoop distros (such as Cloudera or Hortonworks).

In short, if you are integrating with an existing Hadoop distribution, you probably want to use HBase, given that it is part of Hadoop. If you are not yet convinced of Hadoop's value, and want to experiment with Bigtable-like databases and perhaps distributed frameworks like spark, you may wish to consider Cassandra instead, because it is still (in my opinion), somewhat easier to get up and running, if you haven't already installed Hadoop. You can run Hadoop on top of Cassandra if you want -- however, you should probably look at buying DataStax Enterprise for this. This may not be an option if you are a startup fueled by open source, however DataStax at one point did have a special program for startups, which could make DataStax Enterprise a viable alternative.

Another consideration is the skill set of your developers. Cassandra is a bit easier for some developers to wrap their minds around because of CQL, an SQL-like language for queries and updates (DML) that is now a standard part of it. CQL also has basic support for collections (lists and sets) which can be useful, as Redis has shown. I have been surprised by the difference that CQL makes to some people. It definitely shortens the learning curve. Personally, I find the HBase put/get java api to be quite adequate and to be fair, it is possible to to get SQL-like queries for HBase using Phoenix, Impala or Hive, although Hive queries are not quite real time (which may be fine, depending on your application). For some, Cassandra will have a slight edge here because CQL comes with a default Cassandra install.

Neither Cassandra nor HBase has ACID (Atomicity, Consistency, Isolation and Durability) transactions, which is a deal breaker for many developers and architects. Both have some form of row-level atomicity and isolation, which is sometimes enough, because one tends to de-normalize heavily when using them, and often updates that would be spread over multiple tables in a relational database are written to a single row in Cassandra and HBase. Cassandra has something called "light weight" transactions, which also appear to be on the way for HBase. These haven't been all that useful to me personally, but others may differ. There are various techniques to simulate atomic transactions that span multiple rows, should you need to do so, although I'm not aware of how you would handle isolation in this case.

Both Cassandra and HBase are "closer to the metal" than relational databases. One absolutely needs to work backwards from the desired queries in order to design a proper data model (and knowledge of how they store data in memory and on disk can be extremely helpful). Adding a new query may force you to change your data model or at very least, add a new "table" to which you will need to duplicate the data that you are already writing elsewhere. As a result, good, layered design of your code is quite necessary. You will also need to consider the layout of your data on disk and/or how often your data gets flushed to disk. It is often very useful to add more nodes to ensure that your important data stays in memory until it is no longer frequently accessed. Reading data and then immediately updating is frowned upon (especially in Cassandra), so sometimes you need to work around this need.

To some extent, both Cassandra and HBase are data stores that try to maximize horizontal scalability, concurrency and reliability (through enabling redundant servers which have redundant copies of data). Relational database features (such as joins and ACID transactions) that make either of these things hard are generally missing or their use is discouraged. Quite often, your need for these things will be the determining factor of whether you choose Cassandra or HBase as opposed to deciding to use a relational database such as MySQL, PostgreSQL, Oracle, MSSQL, etc. I would argue that the results of this choice will probably have more impact than the choice between Cassandra or HBase.

Monday 21 October 2013

Using Enterprise Architecture at a Media Company -- Part three: TOGAF

What is TOGAF?

TOGAF stands for "The Open Group Architecture Framework". But, I suppose that really doesn't tell you much about what it is. I like to think of TOGAF as analogous to a big box home improvement store, such as Home Depot, Lowes or Rona (Canada only). You go to a home improvement store when you need tools or materials for a building something or a home improvement project. You might not get everything you need there and you may need to adapt some of the materials and tools to work properly in your home.

Similarly, if you are trying to construct or improve an Enterprise Architecture, you can go to the TOGAF "store" and get at least some of the materials and especially the guidance that you need. You don't need to get everything there, as you may be using other tools such as the Zachman framework. You will also likely need to adapt what you get from TOGAF to your needs, just like you adapt the materials you buy at a home improvement store to your home.

Big box home improvement stores can be somewhat daunting places. It is sometimes hard to find what you need, because the stores contain so much merchandise. In my opinion (and the opinion of others that I have talked to) TOGAF is similar. It contains megabytes of text that makes suggestions and recommendations about how to do an Enterprise Architecture. In my opinion, there are a lot of good ideas/guidance there, but it can be a bit overwhelming, especially at first. From talking to others, some people give up on it because TOGAF seems like far too much effort, especially for a small to medium size business. Also, many Enterprise Architecture groups simply cannot get the necessary corporate buy-in to implement a large part of TOGAF as a single "big bang" type project.

Luckily, just like you don't need to buy everything in your local Home Depot in order to fix your house, you also don't need to use every word of TOGAF to build your enterprise architecture. TOGAF actually mentions this and spends quite a bit of time talking about how it must be integrated into existing initiatives (like project and portfolio management).

In this blog entry, I will give a really high level view of TOGAF, and then, in the next entry, discuss how it was used in "the Sphere", a fictitious medium-sized media company, that has a small architecture team: one enterprise architect and five solutions/information architects (some of which are only part time architects).

A Fairly Quick, High Level View of TOGAF

I like to think of TOGAF as having two main parts: the architectural development method (ADM), and the enterprise continuum. The ADM is a recommended set of phases for the development process of an Enterprise Architecture. The enterprise continuum is divided into two smaller parts: the architecture and solutions continuum. Both of these contains building blocks (for architecture and solutions) respectively ordered from the most generic to the most specific (the four categories, from the most generic to the most specific, are: foundation, common, industry and organization). You can populate the architecture and solutions continuum with your own architecture and solutions building blocks, essentially creating a customized library of re-usable components. This is a great way to encourage consistency between projects. You can also get building blocks from industry reference models, and you can use the TOGAF technical reference model as an extremely generic starting point. TOGAF recommends that you use some sort of repository tool to store all the artifacts/documents/models that you produce.

Keep in mind that there are megabytes and megabytes of documents describing TOGAF, so we really are just scratching the surface in this blog entry. Much of the TOGAF documentation contains suggested ways to use the components of TOGAF. The user of TOGAF is free to use as much of this advice as he or she wishes.

The Architecture Development Method (ADM)

The TOGAF phases (source: Stephen Marley/NASA/Sci)

TOGAF is divided into ten phases, each of which is further divided into steps and sub-steps. With the possible exception of the Preliminary phase (at the top of the diagram), TOGAF assumes that we constantly iterate through the steps in order to produce architectures. Although the iteration looks like it starts in phase A and continues sequentially to phase H, new requirements may force us to go back one or more phases and revise our work (which is why there is a "requirements" circle in the middle of the diagram). Realistically, we may also need to revisit a phase because we forgot something. After revisiting a phase, we may need to revisit the subsequent phases to deal with the effects of the changes we made.

Preliminary and Architecture Vision Phases

The preliminary phase is essentially the initiation of the Enterprise Architecture effort within an organization. The organization needs to decide to what extent they are going to use Enterprise Architecture, how it impacts the org chart and how it impacts business planning, operations management (technical systems and development) and project management. The organization needs to create and agree on basic architecture principles (such as "only one authoritative source for each data element", "avoiding overlap and duplication of applications", and "systems should be designed to operate in spite of component failures") as well as a customized version of TOGAF, which will likely evolve over time. Identifying a place to store architecture documents (such as Sharepoint, Confluence, Wikis, etc) is also desirable and some architecture tools/methodologies (ERDs, UML, etc) may be chosen.

The Architecture Vision phase is the start of an iteration of the architecture process. Depending on how one sets up TOGAF, it may be triggered by a "request for architecture", which can be part of the project management/project initiation process in an organization. The architecture vision phase needs to first identify the stakeholders and should ensure that their concerns and requirements will be addressed. With this information, and the architecture principles defined in the previous iteration (realistically, you might need to go back and add some principles), TOGAF recommends that you attempt to define the scope of the required architecture activities and create rough business, data, application and technical architectures. Although this may seem obvious to some, you should also make sure that the project isn't beyond the capabilities of the architecture team and that any necessary business transformations are possible. TOGAF recommends that you attempt to achieve consensus among various stakeholders before proceeding in order to avoid the (not unknown) scenario where you finish the architecture project and it is never accepted. Finally, you should produce an architecture vision document which contains at least the preliminary architectures and requirements, and may contain a resource plan, KPIs, milestones, etc. This should be approved by a sponsor and/or architecture board before proceeding to the next phase.

Architecture Phases

Common steps for all architecture phases

In TOGAF, the three architecture phases are broken down into the same series of steps (which are actually done somewhat iteratively):

Select reference models, viewpoints, and tools
Develop baseline architecture
Develop target architecture
Perform gap analysis
Define candidate roadmap components
Resolve impacts across the architecture
Conduct formal stakeholder review
Formalize the architecture
Create architecture definition document

The first step is basically the preparation step for the architecture phase. It asks us to consider whether there are any well known generalized models for what we are doing. For example, if you are architecting a call centre, you may wish to use the One-VA technical reference model. You should also determine the relevant stakeholder's viewpoints for the architecture you are doing, the models you need to cover them and if any of the architectural principles apply. You also need to decide what tools you want to use (e.g. Visio, Rational Rose, etc).

The first step is usually broken into the following sub steps, some of which you may not need to do in detail or may wish to skip:

Determine the overall modelling process, selecting models needed to support all the views/viewpoints.
Identify possible building blocks from the architecture repository
Identify the matrices you want to build (or are required).
Identify the diagrams that you will need
Identify the types of requirements that you will need to collect.
Select appropriate services, ideally using combinations of services from the TOGAF TRM.

The second step is to describe the existing architecture, at least to the extent necessary to show how the target architecture will change things. Where possible, existing building blocks should be used for this. It's possible that all or part of the existing architecture was done as part of a previous architectural iteration.

The third step is build enough of a target architecture to show how things will change in order to accomplish the architecture vision. Sometimes the third step is done before the second. Where possible, existing building blocks should be used for the step, but it is quite possible that you may need to define new building blocks, which will need to be added to the architecture repository and placed in the architecture continuum.

The fourth step is to perform a gap analysis. This means that you look for missing parts in the baseline and target architectures (and presumably fill them in) as well as look for the gaps between the baseline and target. You also look for conflicts between views of the system and resolve them by examining tradeoffs.

The fifth step is to create a roadmap that takes you from the baseline to the target architecture. The roadmaps built in the architecture phases can often be used to in the migration planning phase.

The sixth step is to consider whether the target architecture will impact any other architectures. You may removing building blocks that are used in other architectures, or perhaps adding capabilities that will be useful elsewhere.

The seventh step is to review the architecture with the stakeholders (possibly the ones whose viewpoints you considered in step 1) and make sure they agree you are accomplishing the architecture vision and they approve of how you are doing it.

The eighth step is to finalize the architecture, essentially filling in any necessary details in architectural deliverables (such as building blocks) and ensuring requirements traceability for what you've proposed.

The ninth and last step is to create the architecture definition document, using the deliverables and make sure it goes into the architectural repository.

Business Architecture Phase

Like all architecture phases, the business architecture phase goes through the sub steps given above. Given that you are working on a business architecture, it is important to keep the business drivers in mind. As with all architectures, you also need to make sure you are considering all stakeholders, including business owners and users.

A key activity in the business architecture phase is modelling the baseline and target architectures. You can use a lot of different models for this, including data flow diagrams (where the processes are business processes), entity-relationship diagrams (or class diagrams in UML -- the entities or classes would represent high level business entities), information exchange matrices (and node connectivity diagrams) showing what information is exchanged between business groups, use cases and structured analysis (which breaks down key business functions and assigns them to organizational units). It's also possible to use UML sequence diagrams to document business processes, so you might want to use them as one of your business architecture models.

One of the activities in the first step in any architecture phase is investigating whether there are any industry reference models for what you are doing. The U.S. Veterans administration has many such models available as do industry groups like the telemanagement forum. You should also look internally for re-usuable building blocks (essentially little bits of business architecture). For example, you might be able to re-use architectures for things like invoice processing and approval.

Information Systems Architecture Phase

The Information Systems Architecture phase includes data architecture and applications architecture. Data architecture is often done first, but not always. In some situations, the two are almost done concurrently because changes in one tend to result in changes in the other. As with the business architecture phase, the basic architecture steps (given above) are followed.

Data Architecture

As with all architecture phases, we need to start by considering the business drivers for architectural change, determine who the stakeholders are (don't forget the operations team or the internal/external auditors). We also need to consider how any new data we introduce will be created, distributed, migrated, secured, and archived. If the gap between the baseline and target data architectures is very large, we will need to make sure we plan a data migration as part of the implementation effort.

As with all architecture phases, we will use models to represent the target and baseline architectures. These models should represent the viewpoints of all the stakeholders. The changes in the target models should ideally be traceable back to the business architecture and/or the business requirements. Where possible, we should start by considering any industry reference architectures that may exist (such as ARTS for retail, and Energistics for the Petroleum industry). Probably the most basic data model is the list of data elements. From there, we can model the data using an ERD or UML class diagram. ERDs and class diagrams can be done at various levels of detail, including conceptual, logical and physical. It may make sense to do all three in this phase, although a really high level ERD or class diagram (which may essentially count as the conceptual model) may have been done in the business architecture phase. There are a number of other models which may be useful to cover viewpoints, including data flow diagrams (logical and physical), You may also wish to write documents relating data elements to business entities, business functions, access restrictions, and cover any interoperability requirements for data (such as data formats for interchange between components).

Application Architecture

The application architecture follows the steps given above for all architecture phases, and considers the baseline applications and how they must change, based on the business and usually the data target architectures in order to arrive at a target application architecture.

You can use a UML component diagram to model all applications and their flows (for both baseline and target architectures). You may also wish to create application portfolio catalogs and application migration documentation if the target architecture requires large changes to the set of applications.

Technology Architecture Phase

Technology architecture covers the hardware and physical infrastructure and may also deal with "software infrastructure" such databases and software which implements services from the TOGAF Technical Reference Model (TRM). Like the other architecture phases, the technology architecture phase follows the steps common to all architecture phases, which are outlined above. The starting reference architecture for this phase is generally the aforementioned TRM.

It's possible to use a UML component diagram or a Visio network diagram to draw the hardware/software infrastructure for the baseline and target technology architectures, being sure to remember to document locations and communications/network requirements between components. It may also be useful to produce documentation that ties the abstract services of the physical and technical infrastructure to those of the TOGAF TRM (see above). When you need to add new technology, you should ideally follow existing technical standards, where they are applicable. For any new technology that you wish to add, quite a few factors need to be considered (and of course, should ideally be put into the target architecture) including: performance, maintainability, location (and resulting network latency), requirements for availability and you will need to do sizing, costing, capacity planning, and migration planning. New technology might also be subject to your organization's technical governance processes (for example, PCI, audit compliance, etc).

Solutions and Opportunities Phase

In this phase, we go through the baseline and target architectures that we previously created, decide what needs to be done to bridge the gaps (and if it can in fact be done), and then create a preliminary plan which is finished in the next phase.

When we examine the gaps in the various architectures, we will need to look at whether business priorities will impose constraints on what we would like to do. For example, if we would like to replace a call centre, but opening new stores has a higher business priority, we may need to come up with an alternative to replacing the call centre. We will take this sort of thing into account as we consolidate the gaps from the architecture phases and then examine each one in order to make decisions on how the gap should be addressed. These decisions will then be documented (possibly into an "implementatin factor assessement and deduction matrix") as we go.

As we are examining the gaps, we may notice common requirements that span several business functions. These should be consolidated (or "factored out") to make sure we only address them once. As we determine solutions, we may end up considering applications that will need to interoperate. We need to make sure that we keep track of these interoperability requirements and deal with them by changing input/output specifications or introducing adaptor components.

We next need to look for dependencies between our gap-bridging solutions so that we can determine the order in which the solutions can be built, determine possible deliverable dates, and so that we can start grouping the solutions into work packages. As we continue to look more closely at the solutions, we always need to ask whether it is within the organization's capabilities to implement them. If not, we will need to find other solutions. We also need to decide if a solution is completely new ("greenfield"), directly obsoletes existing systems ("revolutionary"), or gradually changes existing systems ("evolutionary"). It's sometimes also valuable to identify the "quick win" solutions and distinguish them from the ones achievable in the middle and longer term. Quick win solutions can be helpful to show that you are making progress in an implementation, although not everything can be a quick win. Besides classifying solutions, we also need to determine whether the components of our baseline architecture will continue to exist in the new architecture, will be gradually phased out or will be replaced as a result of the current effort.

Once we know what we wish to do (aka the work packages) we need to determine whether we will move from baseline to the target architecture in a single step or whether we will need to plan intermediate steps or transition architectures. Ideally, transition architectures should deliver some (business) value or else they may be hard to justify.

Once we know what our transition architectures will be (or if we are going to use them at all), then we create the initial versions of the migration documents that we will refine in the next phase: the architecture roadmap, and the migration and implementation plan. We also update the architecture vision document, the architecture definition document and the architecture requirements as necessary.

Migration Planning Phase

In the migration planning phase, we finalize the migration documents we started in the previous phase. The migration phase essentially completes the architecture activities for the current iteration of the ADM.

The first step of the migration phase is to determine how/if the changes necessary to implement the target architectures will affect project/portfolio management, business planning and operations management. It may make sense to have one of these three management frameworks deal with some of the changes rather than the enterprise architecture process. It's also possible that the people who govern these frameworks may want modifications to the work packages, so it is best to find out.

Next, we try to assign a business value to each work package, considering ROI, strategic fit, or ultimate value to the business (value chain analysis). This analysis should ideally help get the work packages approved and is a good double check on whether the architecture is aligned with business objectives. Critical success factors can also be defined so that the success in implementing the work package can be measured.

The third step is to figure out what resources (such as people) are required to do each work package, how long it will take and determine whether or not the resources required can be made available for the required time.

We then try to prioritize the work packages based on cost-benefit and risk and then get the stakeholders to agree to the prioritization.

With all of the above information, we can finalize the architecture roadmap, update the architecture definition document (basically the baseline and target architectures) and generate the implementation and migration plan.

At this point, we are done the architecture activities of the current iteration of the ADM

Implementation Governance Phase

Often, we move from the baseline to target architectures in a series of intermediate steps, called transition architectures. In this phase, we monitor the implementation work that takes us from baselines to target architectures, possibly monitoring each transition architecture implementation as a separate step. Generally, there is some sort of formal or informal review process (such as a steering committee that meets periodically) to make sure the implementation is proceeding as planned and conforms to the transition or target architectures. During the implementation it is important to prepare any necessary changes to business processes or operations processes and make sure these are in place when the implementation is complete. Once the implementation is complete, it makes sense to do a "lessons learned" session.

Architecture Change Management Phase

Architectures will need to change -- that is a given. Technology is always changing and so are business priorities. The former means that sometimes a better solution comes along, while the latter can mean that the solution you architected is longer needed or needs to change significantly. Architecture change requests may be received through the architecture governance process and may possibly originate from operations management (possibly because a solution is not performing as it should) or from business process management. We might need to make changes because of the need to reduce costs, a certain technology becoming unsupported or because we have decided to standardize. Sometimes the required change may be small and doable very quickly without significant (or perhaps any) re-architecture work. It's also possible that the requested change is already accounted for in a transition architecture that hasn't yet been implemented. Other times, it may trigger a "request for architecture" and we may need to iterate through the ADM.

Assuming that an enterprise architecture is being used to try and realize value in a organization, we need to monitor how well it is meeting business and operational objectives and make changes where there are problems or gaps between what is desired and what is being delivered. We also need to consider the effects of new technologies, which may make it possible to better meet requirements or meet them more cheaply. We should watch carefully for changes in business strategies to make sure that the enterprise architecture continues to meet the needs of the organization, otherwise, we risk having an Enterprise Architecture that was perfect for the business strategy two years ago, but not now.

When we or someone else discovers a problem or gap, we or they need to prepare an architectural change request. This request needs to be analyzed to determine the risks of implementing it and to determine how well a solution will fit with the current enterprise architecture. It's also important to determine if any service level agreements or whether the business value currently being delivered by the existing system will be affected. We need to propose changes to the change request if necessary in order to mitigate risks, ensure that SLAs are met and to ensure that systems continue to provide business value.

Once all of this is done, we need to hold a meeting with the architectural council (or the appropriate governing body, depending on how the change needs to be handled) to get their feedback, buy-in, and hopefully approval. Assuming we get approval, then we need to initiate the process to implement the change, possibly starting another ADM iteration, if the change warrants it.

Requirements Management

Requirements management sits in the centre of the TOGAF ADM diagram because it operates continuously during all the phases. Essentially, it contains all the activities that collect, organize and get approval for requirements. The changes that result from the requirements are done in the other (outer) phases. Because requirements management runs continuously, it is hard to describe it as a phase. Therefore we will refer to its steps as belonging to the "requirements management activity". We will refer to the outer circles in the ADM diagram as the "outer ADM phases".

The outer phases in the ADM (especially the architecture phases) identify new requirements and then these are conceptually passed to the requirements management activity, where they are prioritized, approved by the necessary stakeholders and then put into a repository. At the same time that this is occurring, the ADM phase that identified the requirement(s) will modify or add to the requirements it is considering, noting the priorities determined in the requirements management phase and possibly changing them again. The requirements management activity then updates its repository as necessary, communicates the changes to stakeholders and architects, gets their buy-in, deals with conflicts with other requirements and prepares a requirements impact assessment. The current outer ADM phase is then responsible for determining the impact of the requirements on its activities and deciding whether to revisit earlier outer phases or to defer the requirements to a later outer ADM phase. As always, if the requirements change, the requirements management phase needs to update the requirements repository and try to get stakeholder and architect buy-in.

The requirements management activity also needs to note any requirements changes that occur during the architectural change management phase. These requirements will often (if significant enough) be resolved in a subsequent iteration of the ADM (i.e. by starting at the preliminary phase and going through all the phases again).

Tuesday 8 October 2013

Enterprise Architecture and Conflict Resolution

It happens to all Enterprise Architects sooner or later. Someone doesn't agree with your architecture and they would rather implement a solution in a way that doesn't match the current technical road map. Sometimes the resulting discussions can be quite civilized and profitable for both parties. However, to make that more likely, and to prevent a disagreement from escalating beyond where it should, here are some ideas that I have found useful.

When you are first told that someone wants to deviate from your carefully thought out architecture, don't get defensive. I think this is quite natural when the objection to your ideas comes from someone more senior than you in the company and it may happen regardless. You don't need to immediately defend your work. It's best to say something like, "That's an interesting idea, let me think about it a bit" and then go somewhere else and do just that.

Likewise, back away if you feel yourself getting angry. Showing anger at someone else's ideas when they happen to conflict with your own is not going to help you.

In some organizations, an Enterprise Architect can "pull rank" and simply inform people that disagree with them that they will follow the Enterprise Architecture. Giving people orders is probably not the best way of dealing with the situation. You may get public compliance but they could be resentful and may look for a way to circumvent you at the first opportunity.

Let the people who disagree with you know that you value their input and that you are glad that they feel strongly enough about enterprise architecture to come talk to you. Set up a time (usually not right away -- the next day or later would be good if the matter is of utmost urgency) to discuss their concerns in more detail.

When the meeting starts, let them do most of the talking. You should stick to asking questions, pointing out areas of actual agreement (sometimes this works really well) and admitting to any mistakes that you might have made that actually led to the disagreement. I know the second thing is hard to do, but sometimes when one side in a disagreement admits mistakes, the other side makes concessions as well. The key thing is that you are not trying to "win" the argument. You are trying to bring the sides together.

When they are done presenting their points, don't argue with them. Thank them again for caring enough to come to you and tell them that you will get back to them as soon as you can. Then, go away and think carefully about what they said.

You need to think carefully about the degree of disagreement. Is this an issue where you need to go to the wall (i.e. the CIO/CTO) and insist on getting your way? Keep in mind that could be damaging in the long run, but it is sometimes necessary. Is there any middle ground that is reasonable? Is it possible to use some sort of objective test to decide between the alternatives? Is this an issue that should go to the Architecture Review Board? Normally, if you don't have enough support at the Architecture Review Board on the issue in question, then perhaps you should consider accommodating the request.

If you do have to oppose the request, try to refer to the organization's architecture principles when doing so. You might also want to give the person who disagrees with you a chance to address the architecture review board and make their case. You should find out if it is possible for them to do a proof of concept for their idea. The key thing is that you need to make sure that they do not suffer anything resembling humiliation as a result of disagreeing with your architecture. Many times, the person opposing you will be a valued member of the organization. Keep in mind that they may be more valued than you are :-) and treat them the way you would expect to be treated in their shoes.

Sunday 29 September 2013

My Definition of Enterprise Architecture

Someone recently asked me to define what an Enterprise Architect is. I'm usually pretty good with definitions, but this time I was stuck. I went to Google and found out I wasn't the only one.

After some thought, here is my attempt at a definition:

Enterprise architecture is a practise concerned with:

Business - technology alignment
Disciplined Innovation (innovation where it is needed)
Disciplined delivery (not re-inventing the wheel; consistency with past efforts; repeatability)
Proactive solutions (proposing solutions when systems no longer support needed business capabilities)

I hope to have time to elaborate more on this a bit later.

Thursday 19 September 2013

Using Enterprise Architecture at a Media Company (part two, Zachman framework)

As mentioned in part one, the Zachman framework is a taxonomy for organizing architecture artifacts. In this blog post, we will discuss how we can take use the Zachman framework to guide our thinking about what needs to change in the Sphere's (our imaginary media company) architecture to accommodate a metered pay wall.

If you are not familiar with the Zachman framework, the wikipedia entry is a good place to start. This article also provides an interesting take on Zachman. Please note that all the information about the Zachman framework used below was either taken from publicly available sources or from discussions with other enterprise architects. The information about the Sphere's metered paywall system is a based on an actual implemented system, with some simplifications.

Please note that this blog entry is a bit of a work in progress. I'm hoping to improve it a bit and perhaps add some diagrams. Hopefully it is not too long for people to read.

Perspectives, Fundamental Questions and Paywalls

The Zachman framework is conceptually a grid, whose cells represent types of architectural artifacts (e.g. written documentation, diagrams, models, etc). You don't need to create artifacts to fill in all the cells if it is not useful. However, I find it is helpful to think about all the cells to try and figure out if we are forgetting some implication of a business requirement on our architecture.

The rows in the grid are the perspectives from which the architecture is viewed, or alternatively the stakeholders involved in getting something planned and built. The generic names for the perspectives are: planner, owner, designer, builder, subcontractor, and enterprise. It is Zachman's assertion that these perspectives/stakeholders exist no matter if you are architecting a company, product, building or a software system. Typically, for software systems, we use the following more specific terms: scope, business model, system model, technology model, detailed implementation, functioning enterprise. It's a bit counter-intuitive, but the final row ("functioning enterprise") represents the completed product or software system and therefore does not contain any architectural artifacts.

The columns in the grid represent fundamental questions that need to be answered for each perspective/stakeholder. The columns are often labelled as what, how, where, who, when, why. Again, for enterprise architecture projects there is a more specific labeling (which, in my opinion, doesn't completely make sense for all perspectives): data, function, network, people, time and motivation. I think it is useful to remember both sets of labels for the fundamental questions, as sometimes the label from one set is more intuitive than the label from the other for a given perspective.

In the sections below, we will run through each of the cells, starting with the top row and considering the columns in the order: what, how, where, who, when, why. Ordering the columns this way is done only to make this blog entry easier to understand. I am not trying to break the Zachman rule that "columns have no order".

The Planner or Scope Perspective

Let's start by considering the how our paywall requirement affects the architectural artifacts in first or top row of the Zachman framework. The top row represents the planner or scope perspective and answers the fundamental questions from the point of view of a business planner or a project manager during project pre-planning. This row might also be useful in agile methodologies to orient the agile team prior to the first sprint. I guess it could also be argued that, in agile, this knowledge would be had by the product owner, or perhaps the leader of all the product owners.

The "What" or "Data" fundamental question

The first cell in the top row is the "what" or "data" cell. Let's assume that the artifact in this cell is the list of entities known to the company. The "sphere" has not previously had digital subscriptions to its website. Therefore, we need to add "digital subscriber" to the list of entities known to the company.

The "How" or "Function" fundamental question

The second cell in the top row is the "how" or "function" cell. Let's assume that the artifact in this cell is a list of business processes. We are going to need new business processes to manage the digital subscriptions and do billing. We also need to know some sort of business rule that determines when non-digital subscribers will be blocked from seeing an article.

The "Where" or "Network" fundamental question

The third cell in the top row is the "where" or "network" cell. Let's assume that the artifact in this cell is a list of places that the Sphere does business. The new paywall probably doesn't change this.

The "Who" or "People" fundamental question

The fourth cell in the top row is the "who" or "people" cell. Let's assume the artifact in this cell is a list of the organization units of the company and the company's business partners. The Sphere has an internal development team to write the paywall software. The Sphere's existing credit card billing service can handle billing the new digital subscriptions, although this should be confirmed. The Sphere's SAP Team within the IT department has also confirmed they can handle the billing through SAP. However, it appears that the Sphere's call centre is not adequately staffed to handle complaints from digital subscribers. Therefore, we must consider whether we are going to expand the call centre or outsource customer care for the new digital subscribers. After the business planner consults the executive sponsor for the paywall project, he/she informs the enterprise architect that a new organization needs to be added to the artifact in this cell: a call centre outsourcer.

The "When" or "Time" fundamental question

The fifth cell in the top row is the "when" or "time" fundamental question. Let's assume that the artifact in this cell is a list of all the cycles (or repeated processes and recurring deadlines) in the Sphere's business. There are two basic cycles implied by business processes defined in the "how" cell: the cycle that controls how frequently non-digital subscribers who get blocked from seeing articles are unblocked and the cycle that controls how often digital subscribers will be billed. Let's discuss the second cycle a bit further. The business planner consults with the SAP architect (because SAP is used for billing at the Sphere) and discovers that the existing print subscriber billing cycle could be used for digital subscriptions but that a print subscriber would need a separate digital subscription if subscribed to both print and digital products. The planner consults the executive sponsor and the executive sponsor is concerned that this will create problems offering bundled subscriptions (i.e. paying a single, discounted, price for both digital and print subscriptions). After much discussion with the SAP architect, they decide a trade off can be made that combines both subscriptions into one, but which slightly reduces the amount of revenue that the Sphere will collect from bundled subscriptions. The executive sponsor indicates this is ok in the short term, and so the enterprise architect notes in the artifact that the existing print billing cycle will be used initially, but that another billing cycle may need to be added in the future. Although the actual architectural architect didn't change much, the process of considering the "Time" fundamental question during planning raised an important issue that was resolved before the project got underway.

The "Why" or "How" fundamental question

The sixth cell in the top row is the "why" or "how" fundamental question. Let's assume that the artifact in this cell is some sort of list of general business strategies. Adding a metered paywall is a substantial change in business strategy. Something along the lines of the initial few paragraphs of the first part of this series should be added to the list of general business strategies to explain why the Sphere is building a metered paywall.

The Owner or Business Model Perspective

Let's now move on to the second row in the Zachman framework, which represents the "owner", "business owner" or "business model" perspective. Zachman and others have used a construction analogy for this perspective, comparing it to that of the owner of a building being designed. The building owner cares about things such as which way window are facing, how the building is partitioned into rooms, etc, but does not necessarily care about the where the support columns are or where the water pipes are run. In the same way, a (business or product) owner of a software project does care about what the software does, but not necessarily about whether it uses an Oracle or SQL Server database. In my opinion, people who discuss the Zachman framework often think of a business analyst in the owner role, because the architecture artifacts tend to be things analogous to high level entity-relationship diagrams or data flow diagrams. Also in my opinion (which may not be the opinion of many other practitioners, to be fair), the owner is often a business or product owner who wants to see mockups and sometimes market research rather than data flow diagrams. However, if the owner is, in fact, a business analyst, then high level data-flow diagrams and entity-relationship diagrams may be the correct approach.

The "What" or "Data" fundamental question

We return to the first column in the Zachman framework, but this time for the "owner" or "data" perspective. In the planner perspective, we dealt with this same fundamental question by adding a new type of entity called a digital subscriber. Assuming the owner is a business analyst, the artifact in this cell may be a document describing entities and some of their high level attributes and perhaps their relationship to other entities. In this case, we probably want to add a digital subscriber entity to the document as well as some of the attributes (information) that the owner has decided should be collected when a digital subscriber signs up. In the modelling exercise that this document is based on, using the Zachman framework resulted in a spirited discussion of how much information should be gathered for a digital subscriber. It was useful to have this discussion early in the design process. The business owner also had a marketing research firm produce profiles of imaginary digital subscribers. In my opinion, these could also be considered architectural artifacts that would fit into the "what" or "data" cell in the owner perspective.

The "How" or "Function" fundamental question

The second cell in the owner or business model perspective's row is for artifacts which define business processes from a high level business perspective. Sometimes the artifacts are something very similar to high level data flow diagrams, showing business processes and how they accept inputs from and pass outputs to each other. If we use this approach, we will need to add details about the processes we defined for the cell above this one in the planner row. These processes would probably include signing up a subscriber, modifying the information for an existing subscriber, cancelling a subscription, performing periodic billing for a subscriber and cancelling a billing. In my opinion, an artifact consisting of UML use cases or agile user stories might work just as well and might be easier for some business owners to deal with.

The "Where" or "Network" fundamental question

The third cell in the owner or business model perspective answers the fundamental question "where" and is usually concerned with the locations a company operates from and the logistics between those locations. Because the business owner has decided that the customer care team for digital subscribers will be outsourced, it is probably wise to add the outsourced customer care team to this artifact. We will assume for now that the logistics will consist of a dedicated network connection between the headquarters of the Sphere and the outsourcer.

The "Who" or "People" fundamental question

The fourth cell in the owner or business model perspective answers the fundamental question "who" and has artifacts which show the interactions between the people involved in the system. The people are usually grouped somehow, possibly into departments or other organizational units and workflows are shown between the groups. For our metered paywall, digital subscribers will interact with customer care agents, so digital subscribers, and customer care agents will need to be added to the artifact as will the basic workflows that occur between them (modifying a subscription, creating a subscription, stopping a subscription).

The "When" or "Time" fundamental question

The fifth cell in the owner or business model perspective answers the question "when" and has artifacts which show the cycles and (therefore implicitly) the critical recurring deadlines for the company. When we looked at the "when" fundamental question in the planner perspective (the cell immediately on top of the one we are dealing with now) we determined that we would use the print subscriber billing cycle for digital subscribers. The other cycle that needs to be considering has to do with non-digital subscribers viewing digital content. The metered paywall will block them (and ask them to subscribe) after they have viewed a certain number of articles in a given time period. The business owner needed to decide at this point what this time period would be. The business owner actually threw us a curve ball and said there should be different types of articles, some of which could be viewed without restriction by non-digital subscribers, others that would be subject to a limit and still others that would be never be viewable by non-digital subscribers. It was good we discovered this! We had to go back to the "what" cell and add some attributes for articles and go back to the "how" cell and modify our article viewing procedure a bit. Obviously, it would have been better to catch these when we were considering the "how" and "what" fundamental questions, but we still caught them relatively early. Realistically, I think that modifying or creating the artifacts for a single perspective is a bit of an iterative process in which, like in this case, a discovery while working on one cell may affect a cell that you have already been working on.

The "Why" or "Motivation" fundamental question

The final cell in the owner or business model perspective answers the question "why" or explains the motivation from the business owner perspective. The artifact for this cell is often a business plan. Business plans vary in their content, but usually describe what is to be done, why it needs to be done (usually to reduce costs or drive revenue or both) as well as some basic targets along with any necessary strategies to meet these targets. In the case of the metered paywall, most of this information had been prepared prior to us starting the Zachman process and we were mostly able to use existing documents/e-mails to create a business plan, which can be added the set of business plans that maintain for this artifact.

The designer or logical model perspective

If you've done any data modelling or solutions architecture, the perspectives we've just covered may have been what you considered as "requirements" and may have been gathered by a business analyst or part of the knowledge of the product owner. The designer or logical model perspective is typically the point at which architects and data modelers get involved with a project and many (possibly all) of the artifacts in this perspective will be familiar to them.

The "What" or "Data" fundamental question

The "what" or "data" cell in the logical model perspective contains at least one artifact which describes entities and their relationships. Not surprisingly, an entity-relationship diagram might very well be used here. I like to use class diagrams (from UML), but mostly leave out the methods in this cell. This is because we have a tool for easily drawing class diagrams and we hoped it would save us some time when we got to the next cell and the next perspective (but I'm getting ahead of myself here and sort of breaking the Zachman rule that an artifact can only go into one cell). We created classes for digital subscriber and article group (which would define both a set of article groups, as well as the maximum number of articles in that group that a non-digital subscriber could access). We added an attribute to the article class which defined which article group an article was in. We added a class that counts the number of articles seen by a subscriber in a particular article group. We also created an sap billing document class to hold any billing information that might need to be passed to SAP. We finished by adding a few easy relationships: an article has a many to one relationship with an article group, a digital subscriber has a one to many relationship with an SAP billing document, etc.

The "How" or "Function" fundamental question

The "how" or "function" cell in the logical model perspective contains at least one artifact which describes the user visible functionality of our systems. I've seen people use essentially block diagrams of the components of an application with flows indicating the information that a user can access in each component, as well as sometimes the information that the components pass between themselves. Because we have already started to think in terms of classes, a UML sequence diagram can, in my opinion, be an acceptable artifact for this cell. Sequence diagrams in UML show how user tasks or business processes are executed by classes (which correspond somewhat to Zachman's idea of application components). The sequence diagram ideally can borrow from the cell above it, which contains business processes and use these as the processes it is illustrating. In our case, this meant that we had to do a sequence diagram for the subscription sign up process and the process that happens when a user (either digital subscriber or not) views an article. The sequence diagram can also borrow from the "what" cell in the same row and use the classes defined there as the things in the sequence diagram that make method calls to perform a business process. Realistically, when you start building the sequence diagrams, you may find that you are missing classes and that is exactly what happened to me (We realized we needed a class that counts the number of articles accessed in an article group by a particular user, and then we added it to the class diagram in the "what" cell).

The "Where" or "Network" fundamental question

The "where" or "network" cell in the logical model perspective contains at least one diagram that shows how distributed system components (if any) communicate by drawing lines between them, for example, a web server will often communicate with an application server, which may in turn communicate with a database. In our case, we had to add a link to SAP to retrieve billing information. This raised PCI concerns (because our SAP system stores credit card information and must follow PCI requirements) and so we ended up moving some of the billing functionality inside our PCI web server environment and including it on the diagram. We also realized that our existing web cache servers were not going to allow us to block non-digital subscribers from seeing articles, so we had to modify our distributed system diagram to show the use of a third party CDN (Akamai) that had the required capability.

The "Who" or "People" fundamental question

The "who" or "people" cell in the logical model perspective contains at least one artifact that gives a high level (or architectural) view of the user interface. Zachman states that this artifact should model roles which are connected to deliverables. In my experience most user interface/usability professionals don't really know what this means. For the metered paywall project, we used the wireframes (simple UI mockups with minimal design detail) and basic representations or mockups of the interactions that our two types of users (digital subscriber and non-digital subscriber) would have with the system (which can be done as a series of powerpoint slides if you want). For example, at one point we (or more correctly the product manager and ui team) developed a very simple set of powerpoint slides showing the rough series of web pages that a user sees to sign up as a subscriber. Similarly, we used the initial mockups of the web page that a non-digital subscriber gets when they try to access an article that would cause them to exceed their free article quota. These artifacts are a lot more visual (and perhaps more concrete) than what Zachman seems to have intended, but they are easier for usability professionals to create and work with.

The "When" or "Time" fundamental question

The "When" or "Time" cell in the logical model perspective contains artifacts that describe the way the business cycles uncovered in the corresponding cell in the business owner perspective will be mapped to the cycles of the Sphere's systems. We first determined that, because of the requirements in the previous perspective, we would need to use the print subscriber billing cycle for digital subscribers. This cycle is largely within an SAP ERP system and therefore we engaged the SAP architect to help us with the logical model. We discovered that we would need to define the typical billing frequency, that is, the time that elapses between billings if a subscriber does not temporarily suspend their newspaper. We also discovered that, because of the requirement to bundle digital and print subscriptions, subscribers with bundled subscriptions would essentially extend their billing date if they suspended their newspaper. Many people thought this was not completely desirable, however, after much discussion, we decided that the alternatives were too costly. Therefore, we added this cycle to our logical cycles artifact (a spreadsheet), noting that it was tied to the print subscription cycle and that it would be affected by newspaper suspensions when a customer had a bundled subscription (in Zachman terms the cycle is controlled by the receipt of recurring payment event and the event generated when the subscribers total payment is amortized). We also flagged this as something that should be revisited later. The other cycle that was uncovered in the previous perspective was the period for which a non-digital subscriber would be blocked from seeing some articles after their they exceeded their allowance of free articles. As discovered in the previous perspective, the allowance should reset back to the maximum at the beginning of each month. Therefore, we added a cycle to our logical cycles artifact to reset the number of free articles at the beginning of each month (we can also say that the reset is triggered by a "beginning of month" event).

The "Why" or "Motivation" fundamental question

The "Why" or "Motivation" cell in the logical model contains business rules which can be implemented in a rules engine or possibly code (but we are getting ahead of ourselves). Formal specifications for business rules do exist, but we generally just try to use concise, english statements in the artifact (generally a spreadsheet) for this cell.

Here is a sample of some of the business rules that we discovered by reviewing the business case and getting necessary clarifications from the product owner:

Each article in the system will be assigned one of several colours: green, red and yellow.
Only digital subscribers can view green, yellow and red articles without restriction.
Non-digital subscribers cannot view red articles.
Non-digital subscribers can view green articles without restriction.
Non-digital subscribers can view only n unique yellow articles per month, where n should be a configurable threshold.

The Builder or Technology Model Perspective

I've always had some trouble distinguishing this perspective from the logical model perspective above it and the detailed implementation model perspective beneath it. I think part of the problem is that we often skip parts of the technology model perspective when we do actual projects because it is easier to think about something either as part of the logical model or detailed implementation. The trick I use to try and figure out this perspective is to go back to Zachman's construction (of a building) analogy. The technology perspective corresponds to the builder's perspective. A builder needs to know the materials that will be used but not necessarily the exact details of exactly how they will be fit together to make a building. Extending this to technology projects, the technical leads need to know what technologies will be used, for example, java classes (possibly with methods defined) and Oracle tables, but not the exact algorithms or the data types and indexes of the Oracle database.

The "What" or "Data" fundamental question

I have a bit of a bias towards UML, so I like to use a class diagram as the artifact for this, but a detailed ERD might be more appropriate if you are a purist. When I use a class diagram, I generally only include the classes that are actually persisted (that is, saved into some sort of database or file) entities, as well as how they will be persisted (e.g. file, Oracle, Solr, Cassandra, HBase, MongoDB, etc). Since the metered paywall is part of a larger web system, the classes or entities for it can be added to the class diagram or ERD for the whole web system (assuming it exists!).

The "How" or "Function" fundamental question

Again, I like to use a class diagram as the artifact for this. However, it should include all known classes, their relationships (cardinality and inheritance) and the classes should be annotated with the language in which they will be written, a description of what the class does and the methods and attributes of the class to the extent that these can be known at this point. Again, since the metered paywall is part of a larger system, it's classes would normally be added to the class diagram for that larger system.

The "Where" or "Network" fundamental question

The existing system that we are adding the metered paywall to has a somewhat complex distributed architecture, for which there is an existing network diagram. As previously discussed, the metered paywall will require a new content distribution network (Akamai), which we will need to add to the network diagram. We will need to pass information to Akamai about whether the user is a digital subscriber, and, if not, how many yellow and red articles they have so far viewed this month. We will also need to tell Akamai what type of article (green, yellow, red) they are viewing. These flows were noted on the network diagram (for the larger web system of which the metered paywall is a part), with additional annotations that encrypted cookies will be used to communicate whether the user is a digital subscriber as well as the the number yellow and red articles that they have viewed so far this month. The encryption algorithm will be triple DES and another flow will be added to the network diagram to indicate how the key is exchanged. The colour or group to which an article belongs will sent using Akamai's edge side includes. This will also be indicated on the network diagram.

We also need to add a network flow between our SAP environment and the PCI environment in which will be used to serve web pages that allow a user to subscribe. We augmented this link with a note that communication will be by sending encrypted files in order to make PCI compliance easier.

The "Who" or "People" fundamental question

We used the detailed screen mockups done by the design group for this artifact. For convenience, these can be sequenced in powerpoint presentations to show how the system works for various types of users in various scenarios. For example, powerpoint presentations were built to show the subscription process as well as the user experience when a non-digital subscriber exceeds their monthly quota of yellow articles. I also like to include some web architecture guidelines (such as how to use JQuery, Javascript, the need for CSS, etc) along with the mockups to round out the set of artifacts for this cell.

The "When" or "Time" fundamental question

The artifact for this cell is a spreadsheet listing the cycles of the larger content delivery system, with some information on what components will be used to implement them. After consulting with the SAP functional analysts, we decided that the digital subscriber billing cycle will be done as part of the subscription monitoring process in SAP. After consulting with the web architects, we decided that resetting the monthly quota of yellow articles for non-digital subscribers will be done as part of the Akamai edge side include logic. These decisions were noted in the spreadsheet.

The "Why" or "Motivation" fundamental question

The artifact for this cell is a spreadsheet that contains business rules along with information on how they will be implemented. We decided not to embed a rules engine in the code for the metered paywall. Once this decision was made, the only way to implement the rules is to modify code, in one of four places: jsp or java on the application server, javascript/front end or in the Akamai edge side includes. We decided to implement the rules that I discussed in the system model perspective by using edge side includes and java in the application server. It needs to be done this way because sometimes articles that expire from the Akamai cache (edge sides includes work on Akamai) get fetched from the application server.

The Subcontractor or Implementation Model Perspective

The artifacts in this perspective are essentially very detailed models or descriptions of what is to be implemented. In my experience, many of these models may not be actually be stored anywhere -- they might exist on a whiteboard for a short period of time, they may be rough drawings on scraps of paper or they may just be conversations between members of a (often agile) team. In some cases, I think that some of the artifacts in the implementation model perspective never leave a technologist's mind. In the metered paywall project, these models were mostly done on whiteboards or in conversations, as I mentioned above. However, I will discuss ways in which they could have been recorded (and may have been in some cases).

The "What" or "Data" fundamental question

Some of the data that was persisted in the metered paywall project was stored in Oracle, while other data elements were in Cassandra (a NoSQL database). The creation scripts for the Cassandra keyspaces and column families and the Oracle tables are actually a good artifact for this cell, in my opinion. They get preserved as part of the code deployment process, so they remain available after the project is done.

The "How" or "Function" fundamental question

As an architect, I would have preferred if the UML models developed in previous perspectives for this fundamental question had been augmented with comments and possibly pseudo code for the methods and used to produce code. In reality, there were no artifacts produced for this question as the agile methodology used by the development teams favours working code over documentation. I don't think that Zachman envisioned the effects of agile development on his framework :-).

The "Where" or "Network" fundamental question

The primary architectural issue for this question is the connection to Akamai (our content delivery network) and the information that needs to be passed via cookies and contained in Akamai's edge side includes and configuration so that only digital subscribers have unrestricted access to yellow or red articles. Because some of this work required collaboration between the Sphere's developers and Akamai, network diagrams and detailed written descriptions of cookie formats had to be produced, even though the in-house teams were using an agile methodology. In some ways, our relationship with Akamai was essentially what I believe Zachman envisioned when he created the subcontractor perspective.

The "Who" or "People" fundamental question

In the implementation model perspective, this fundamental question intuitively should be something like a specification for the user interface widgets as this follows nicely from the previous perspective (at least in my opinion). Many user interface technologies are built by responding to user events (this includes web technologies and perhaps surprisingly, SAP screens) and creating a diagram or document that shows user interface widgets and how user interface events are processed would seem to be a rational choice of artifact for this fundamental question. In an agile project, like the metered paywall described in this blog entry, it is very likely that these things are discussed but not written down (due to the agile preference for working code and face-to-face discussion over documentation).

I have read Zachman discussions that argue that security architecture artifacts should be placed in this cell. To me, this doesn't quite seem intuitively correct, however security is important and it definitely needs to go somewhere. In the metered paywall project, we dealt with security in the logical and implementation perspectives and in the "network" fundamental question in this perspective. In systems like SAP ERP and Netweaver (which have extensive role-based security), it is possible to separate out the security configuration and include security artifacts as the answer to this fundamental question. In SAP, for example, we typically have a spreadsheet that lists users and the roles that are assigned to them and then another spreadsheet that lists roles and describes the authorization objects. In SAP this is definitely an implementation level document that is often completed only a short time before a system goes live (and sometimes not until after go-live, unfortunately). Therefore, I think that using this cell for security artifacts can make sense, but it requires a fairly sophisticated security sub system that you can configure separately from everything else. This is not always present in web projects and wasn't part of the metered paywall system that this document describes.

The "When" or "Time" fundamental question

The artifact for this fundamental question in most representations of the Zachman framework that I have seen is a fairly low level (almost assembly language) specification of how events/periodic processing should be implemented. The metered paywall project is a combined web and SAP project and so it does need to deal with concepts at that low a level. Earlier in our analysis, we decided that the billing cycle for digital subscribers would be tied to that of print subscribers and implemented on our SAP system. The implementation model artifacts for doing this in SAP are very well defined: an initiated change control ticket with the necessary SAP configuration objects defined. The required code changes will later be attached to this ticket and the ticket will be retained indefinitely, making it a very suitable artifact.

At the start of every month, we also need to reset to zero the count of yellow articles that non-digital subscribers have viewed, allowing them to view as many yellow articles as the threshold permits in the new month. We decided to do this by modifying java and having Akamai modify configuration and edge side include code. Again, because we involved Akamai, we produced some documents that outlined the rules in a sort of pseudo code that explained how code should be written to detect when a user counter cookie was from a previous month and then reset the count in the cookie. This pseudo code could serve as an artifact for this cell.

The "Why" or "Motivation" fundamental question

As with the "How" fundamental question, the agile development process that we use means that there were very few recorded artifacts for this fundamental question. The business rules that we identified in the previous perspectives were simply implemented by developers by modifying the necessary java, javascript, or edge side include code. It would be possible to create some pseudo code showing how the classes, javascript and edge side include code was modified and this could be the artifact for this fundamental question.

Some closing thoughts

This post was a lot longer than I thought it would be when I started. The Zachman framework can produce a lot of documentation, and I guess even trying to describe it at a high level (as I have done above) can take quite a few words. Overall, it probably does save some time by forcing you to think of things up front. However, it can be hard to justify using it, because it adds overhead to the beginning of a project and produces little in terms of demonstrable results. When starting a project, I like to at least mentally run through the perspectives and fundamental questions. Even if there is not time to properly produce each model, I find that it is a useful tool for thinking about a project.

The Zachman framework has been criticized for not really defining a process for enterprise architecture. In the next post in this series, I will talk about my attempts to take ideas from TOGAF to create a process. I also experimented a bit with lean Enterprise Architecture methods, and hope to produce a blog post on that as well.

Tuesday 17 September 2013

Using Enterprise Architecture at a Media Company (part one)

This post gives a fairly small example how Enterprise Architecture can be used in a media company. This is based on actual work, but, as they say on TV, "the names have been changed to protect the innocent". In order to make the size of this post (and the other parts) manageable, I'm only going to take a single aspect of the Enterprise Architecture for a company which I will refer to as "the Sphere". I'm also only going to deal with it on a fairly high level, in order to make the example clearer, although hopefully it will be straightforward to see how it could be made more detailed.

The Problem

The Sphere runs a newspaper and a website, which has content roughly similar to the newspaper. People are no longer paying for the newspaper because they can read all articles on the website, which is generates revenue from digital advertising. Digital advertising however, only brings in a small percentage (15%) of the revenue necessary to support the company's costs; ads in the newspaper have traditionally brought in about 70% of the Sphere's revenue, with newspaper subscription fees bringing in the remaining 15%. The Sphere's print advertisers have realized that people aren't reading newspapers as much as they once did, and are therefore shifting their advertising dollars elsewhere, seriously impacting the largest source of revenue at the Sphere.

A metered paywall is special software code that allows website visitors to see a certain number of articles without paying, but requires the visitors pay to see additional articles.

The Sphere hopes that, by introducing a metered paywall to their website, they will encourage people to continue to buy the newspaper (thus making it more appealing to print advertisers), and get revenue from a new source: subscriptions to the website that allow visitors to see as many articles as they want.

What's an Enterprise Architect to do?

Enterprise architects need to make sure that a company's technology is aligned with its business strategy. Once the decision makers at the Sphere decide to add a metered paywall to the the website, the enterprise architect must take this information and determine how to adapt the company's technology to it. It is possible that existing technical architectures may need to change and/or new components or technologies have to be architected and built. Ideally, we do this in a systematic and disciplined way.

Usually, this involves examining and manipulating (that is changing or adding to), existing architecture "artifacts", which are usually diagrams, written documents, or models constructed using UML or some other methodology. Since the artifacts represent the company's technology, this gives the enterprise architect a way to think about what needs to be done and hopefully not forget anything. The enterprise architect can then begin discussions with business stakeholders, other enterprise architects, solution architects, development managers and developers to decide what must be done. In my opinion, an enterprise architect does not necessarily solve problems, but instead uncovers and frames them (and possibly has a recommended solution or two in mind).

It often helps if there is a sort of architectural change control process that specifies how the above happens, so that it doesn't happen in an ad hoc way every time business strategy changes.

It can be helpful to use two fairly well known approaches to deal with the artifacts and create a sort of architectural change process. The somewhat inappropriately named "Zachman Framework" is actually a taxonomy for organizing architectural artifacts, making sure they are well defined (that is, don't overlap), complete and that business requirements align with the resulting architectural designs (and the technologies that get built). The TOGAF framework is a sort of strategy for building an architectural change process, which might or might not use the Zachman Framework.

We will look at how we can use the Zachman Framework to handle the Sphere's need for a metered paywall in the next part of this series. We will look at how we can add the TOGAF framework to provide a sort of architectural change process in the third part.

Continue on to part two.

Sunday 15 September 2013

Cassandra as distributed cache?

NoSQL was born as a sort of reaction to the architectural design pattern in which you put a cache (such as Ehcache Enteprise, Redis, or memcached) in front of a relational database in order to better scale the database. One of the basic rationales for NoSQL is that, if a cache is sufficient to handle most of your database queries, then you don't really need a relational database. NoSQL then goes one step further and says that, if you can live without some of the relational database features, then you can trade them off for other useful capabilities like replication.

At the moment, the company which I work for is having trouble with the caching solution we are using in front of our relational database. I don't want to name the solution we are using, because we are not using it properly and the problems we are having are therefore more of our own making. However, we are looking at moving much of our infrastructure into an IAAS cloud solution (possibly Amazon AWS, Google Compute Engine or Rackspace). Our existing caching solution is not well suited to multi-datacentre deployment (which is probably one of the big advantages to using IAAS), so we need to look for something else.

Cassandra is really well suited to this type of cloud deployment for a number of reasons. The Cassandra data model can easily support a key-value store (we will talk more about the Cassandra data model later) and it is possible to put time-to-live (ttl) values on Cassandra columns, which means we can have cached values automatically expire. One big advantage of Cassandra over some key-value stores is that it can flexibly shard and replicate the data to multiple nodes and multiple data centres.

The multi data centre support is very useful. Cloud providers generally allow you to deploy to n data centres, where n is larger than two. You can get really good fault tolerance by dividing your infrastructure into n separate and autonomous units (that I like to call "pods"), putting each one into a separate data centre and then doing load balancing between them (most IAAS providers give you a way to do the load balancing fairly painlessly). This is a pretty powerful idea because you can potentially run on cheaper, smaller cloud instances and you don't need to effectively double your infrastructure like you often do when you deploy to two data centres. Assuming you have n pods, you can probably size your instances so that your applications can run using (n-2) pods. Assuming you can get n > 6, you will likely spend less than you would by spreading your infrastructure over 2 data centres which requires that you have enough infrastructure in each data centre to run in the absence of the other data centre.

As hinted earlier, Cassandra has the concept of data centres, and makes it easy to put at least one complete copy of your data in each. My thinking is that each pod should be configured as a single Cassandra data centre. I'm not sure whether it makes sense to have more than one copy of the cached data in each Cassandra pod, because if you have six pods, you will potentially have six copies of your data, which is plenty. Assuming there is reasonable connectivity between the pods, a Cassandra node failure will cause at least some of the data to be fetched from a different pod, which may be ok.

When cached data is updated in Cassandra, it will be replicated within a few milliseconds to the other pods. There is a risk of nodes in other pods getting stale cached data, which needs to be considered. Typically, I suspect that we will want to make user sessions somewhat sticky to the pod that they initially connect to, which should lower the risk a bit.

Another issue I can see, based on my organization's use of distributed database caches, is that we will sometimes need to invalidate a cache (remove all its entries). I can think of quite a few Cassandra data models that would allow you to invalidate a particular cache, but perhaps it is simpler if we keep each cache in its own column family. We could then drop and recreate the column family to clear or invalidate the cache. I guess we could also just truncate the column family, but my experience with the nodetool truncate command is that it does not work really well on multi-node clusters (it works pretty well on single-node clusters though, but I am sure most people don't have those in production).

Most distributed caches also allow you to place an upper limit on the number of items in a cache. This is generally done to conserve memory. In Cassandra, the cache can spill to disk, so memory is less of a concern. However, it might still be desirable to have a limit on the cache size. One way to do this is to have a row (called an "all_keys" row, probably using the row key "all_keys") in each cache's column family whose column keys are a time stamp (representing cache insertion time) concatenated with the cache key for each entry in the cache. These columns would have the same time to live (ttl) as the cached data. We could also define a counter column in each cache's column family which would keep track of the current number of elements in the cache. When this counter exceeds a certain value, we could have a daemon delete the oldest entries from the cache's column family. These could be determined by doing a column slice on the all_keys row. Having the "all_keys" row would allow us to invalidate the cache by doing a column slice to get all the cache keys and then deleting all the rows, instead of dropping and recreating the column family.