Thursday 8 May 2014

Internet of Things Architecture

I've recently been thinking about the Internet of Things or more specifically home automation and control.

I'm not happy with either of the 2 main methods for accessing Things remotely.  The first method is to poke holes in your firewall and do port forwarding.  This is a really great and simple method for 1 device.  For example if you have a WiFi thermostat to control your heating, it can connect to your home network and the router can be configured so incoming connections to the appropriate port are forwarded to the device.

This method is fine if you are technically strong enough to do this.

There are some downsides to this.
  • If you have more than 1 device, you now need to manage multiple port mappings since the port can't be shared
  • Manufacturers are new to this and the software isnt always flexible enough to allowing user defined mappings
  • You are poking holes in your firewall - are the devices secure from hackers?
  • The home user needs to be technically competent to do the configuration

The alternative method is for the device to connect to the manufactuers service and you access the device via the manufacturer.  This method also has downsides
  • Some manufacturers charge for the service. I personally distrust connections to the "mothership".
  • You are locked in to the manufacturer. What if they do bankrupt - do you  lose your service?
  • You are reliant on the security of the manufacturer
  • They could be gathering personal data about you 
The other problem with the centralised method is scale. Firstly the service provider needs to understand how to scale these systems.  It would be really annoying to not be able to turn the heating on because they can't cope with load - OK maybe extreme but it could happen. Secondly these systems are proprietary  - I end up with a login for my home thermostat, another for my home lighting maybe another for the dishwasher.  It also means it won't be possible for my LG dishwasher to chat with my Sony TV if it ever needed to.

Given we have had high profile events like Sony Playstation network being shut down for days due to hacking, I generally distrust the central control method.  It's feasible a hacker could create a denial of service attack on the electricity grid by deciding to turn on all the dishwashers in the world at the same time causing a huge power surge in demand triggering brown-outs.

I'm not sure what the solution is but I can't help thinking Software Defined Networking has a role to play here. SDN is good at address abstraction and both of the scenarios above are basically address abstraction problems.  The challenge is to be able to address the devices in the home in an open non-proprietary manner.

Wednesday 9 April 2014

The rise of DevOps

The software world has for a while embraced the idea that the programmers that wrote the software are the ones best placed to run the systems.  This mindset is finding it's way into the networking arena.  It's always been the case in small enterprise networks that the networking designers are also the network support team but in large complex networks there has always been a clear distinction (and mentality) between the designers and operations. 

The Operations mindset is clearly different to that of the designer. Operations resist change. Change = Risk.  Risk = Problems.  Operations are invisible and ignored when things are running well - but shouted at when it urgently needs fixing  - it's a thankless job. 

Designers like change and are likely to make experimental changes on a live network.  Change = risk.

The shift to SDN is an interesting one.  Clearly the extreme Operations mindset of Change = Bad is not a great way to build a responsive business however there is clear value in the mindset of preserving quality and minimising risk to services and revenue that the network enables.

So the shift to DevOps certainly presents some conflicts in behaviours.

So why is the shift happening? I think the key to this is the word Software. The Software in Software Defined Network (SDN).  The change is one where the value and innvovation lies not in dumb networking boxes but at the high level applications. It's likely, at least for the next few years, that these applications will be written internally by the business- in other words software developers will be in control of the network.

The application domain is where there's opportunity for innovation, experimentation and potential new business value.

So does this mean that all these developers playing God with the network will create chaos?  Maybe.  There's clearly the opportunity for a new class of bug.  Todays legacy networking issues may become less common but a new breed of transient application specific networking bugs may emerge.

Now for the good news.  Building a test network today which fully replicates the live production network is for most businesses not possible or uneconomic. With the shift to SDN however it's possible to cheaply build a virtual replica of the current live network including behaviours. This is possible since the controller knows the exact state of the network.  In legacy networks, it's a Plan, Build, Operate model.  Someone designs the network, someone builds the network and someone operates the network.   Often changes are made to the live network so the original plan bares little resemblance to  live network and the build network probably doesnt reflect the design either!  The build engineer might find a port designated in the design is already connected so the builder uses his intelliegence and connects to the next available port and doesnt correct the design documentation.

In an SDN network the controller knows the actual "as-built" state.   Since the controller has an accurate picture, planned changes to the network can be simulated, characterised and fully tested. This enables a new way of working.

Today making changes to a legacy network is a piecemeal basis. Engineers are issued work packs and they implement the changes network element by network element.  Humans are involved so theirs the opportunity to make mistakes at each stage.  In  high risk networks, these changes are schedule for night working where the tired engineer might be more likely to make mistakes or may rush them through in his desire to get to bed.

By testing and simulating these changes off-line in an SDN network, the changes can be tested to ensure they are low risk. Implementation of the changes can be automated - the changes can be scheduled for implementaiton at night without humans to make mistakes and the tests to ensure the chanegs are succesful can be embedded into the process so that if there's a problem the network can automatically be backed out back to the known working state.

In the data centre environment, tools like Chef and Puppet have revolutionised server provisioning and automation of operational tasks.  These concepts and tools will find their way into the networking space and change the way of working.  Welcome to network DevOps.

Thursday 19 September 2013

Interworking betweeen OpenFlow and legacy IP networks

In this blog post I return to the goal of selling OpenFlow technology. If it is to become accepted, it does need to be sold.

One of the objections raised about OpenFlow is that it doesnt inter-work with "legacy" IP networks.

As with most sales objections, there is often some truth in these objections and also fear, uncertainty and doubt (FUD) spread by competitors that have something to lose.

So what does the word "interworking" actually mean? In order to answer that question, lets return to basics.

The world is full of IP networking equipment.  In fact thee human race has become dependent on the Internet.  It would be impossible to build a parallel OpenFlow network and have a big bang switch over on a particular date and time.  Lets also be realistic -  legacy IP networking does a reasonable job. The business case of ripping it out and replacing it with OpenFlow doesnt make sense if we apply it on a ubiquitous basis.

At least initially, OpenFlow is being deployed in very specific locations to solve particular problems. I am very familiar with mobile network architectures and in the core network (if you are familiar with the terminology, the Gi network) is where OpenFlow will deliver simplicity, elegance and flexibility for mobile operators and remove the mountain of different boxes which are attempting to control user policy whereas in a mobile feeder network, there are little gains to be made - at least today.  (This would be the Iu network before the SGSN if you are familiar with the terminology). So here OpenFlow has a sweet spot. It is likely that OpenFlow adoption will be on the periphery - places where the legacy technology is struggling or where it's a square peg in a round hole - in other words there are lots of boxes to do work-arounds.

Google actively uses OpenFlow in live operations.  If OpenFlow doesn't interwork then surely we wouldn't be able to use Google!

In my experience, there are real challenges getting legacy IP equipment to interwork. I don't mean interworking between different vendors.  I've seen network engineers struggle to get a Cisco box to talk with another Cisco box simply because one had a different software version!   Interworking challenges are simply part of the world of networking.

So let's look a little closer at OpenFlow. There are 2 types of interfaces: data and control.  Whereas on a legacy router, the control is usually embedded in with the data. One of the key differences is that OpenFlow separates out the data plane from the control plane.

So if you have an OpenFlow switch and we look at the data plane interfaces, they will probably be Ethernet. They will support the IP protocol and the box will forward packets from one port to another.  So far no difference to a legacy switch or router so no challenges for interworking here. At least on first sight.

So what is this control I'm referring to?  Control is the decision making how a packet is routed from A to B. In legacy networking there are a few ways to do this.

  1. Broadcast or flooding
  2. Someone manually configures it
  3. It is automatically discovered
Broadcasting or flooding works on very small networks but it doesnt scale. A basic hub does this. A packet arrives and the hub doesnt know what to do with it so sends it everywhere (including back to the originator).

Next someone manually programmes the equipment saying this address or address range can be reached down here.  Humans tend to make mistakes so traffic might be routed down a black-hole. Also the Internet is constantly changing so it's again not a scalable approach.

Finally we have discovery.  Legacy IP equipment uses a variety of protocols to discover and advertise routing information.  Protocols such as ARP, RIP, OSPF, BGP.

These protocols form the control and routers make their own decisions based on this routing information.

OpenFlow however relies on a "central" intelligence to make routing decisions. When a packet arrives at an OpenFlow switch, if it hasnt been programmed, the switch doesnt know what to do with it.  (A legacy router will have the same issue unless it has been configured and gathered routing information.)  The OpenFLow switch therefore sends a message via the control interface to a controller and says "I've got this packet - what do you want me to do with it?".  Controllers are programmable.  It is therefore possible to write a controller that behaves identically to a legacy router.  The only difference is there is either a logical or maybe a physical separation between the control and the data plane compared to a legacy router.

So if we can develop an OpenFlow switch controller combination that is conceptually identical to a legacy switch, interworking issues are clearly being overstated!

However it doesnt make sense to go to all the trouble of creating OpenFlow to only do what a legacy router does - it wouldnt create the shift away from a box driven networking industry!

Todays legacy IP networks are in-fact islands.  There are lots of fancy terms like routing domains and "autonomous systems" for these islands. The internet is not a single thing - it is a collection of networks where each network has it's own controls.

This is an important concept and it is one that will continue.  To achieve interworking between legacy networks and OpenFlow networks we need islands.  It is the joining point between islands where the challenges and complexities lie.  So at the demarcation point between a legacy network, the gateway router may be talking to the OpenFlow switch using OSPF, where it is advertising what networks it can reach. The OpenFlow switch at the border needs to participate in this discussion.

If look at the last sentence, it isn't 100% true.  The OpenFlow switch can in fact be pretty dumb. When an OSPF packet arrives from the legacy router, it can adopt the approach of forwarding it to the controller and asking what to do with it. It is the controller, using software, that can decide how to interact with the legacy router by programming the OpenFlow switch. 

So interworking with legacy networks is not an insoluble problem. It does require some thought, planning and intelligence but these are skills that you need in legacy networks anyway!


Wednesday 18 September 2013

Don't buy this USB to Ethernet dongle!

In one of my first posts on how to build an OpenFlow switch using a  Raspberry Pi, I suggested buying some USB to Ethernet adaptors to overcome the limitation that it only has one Ethernet port and you need more than 1x Ethernet to do any switching!

The USB dongle I pointed to, is however rubbish. Do no buy this. The reason not to buy it, is that the company making these in China populates it with the same MAC address! How stupid is that.

[UPDATE. I guess I'll eat my words. You may as well buy this dongle. I bought a more expensive dongle to see if it was better.  This time it was a white box with a USB fly lead.  Looks totally different on the outside but it is identical on the insider to the much cheaper blue one. Same chipset and MAC address.....So I've done another search to explore yet more alternatives. So I paid £3 for the blue ones,  £6 for the white one and the next clearly different alternative dongle is £20.  I can't see the point in spending £20 for one of these more expensive ones when these £3's work or rather can be made to work. See below for work-around]

It took me a while to figure out why my OpenFlow switch wasn't working how I wanted it.  First rule: Never make assumptions.  I assumed MAC addresses would be different. Wrong.

eth1      Link encap:Ethernet  HWaddr 00:e0:4c:53:44:58 
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth2      Link encap:Ethernet  HWaddr 00:e0:4c:53:44:58          <=== SAME!!! Ughhh
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

Here's a dump from ifconfig.  Note the MAC address is the same at 00:e0:4c:53:44:58

Several other people on the Internet have made the same discovery!

If you have bought one of these, the good news it is possible to "fix" the problem. On the Raspberry Pi type:

pi@raspberrypi ~/LINC-Switch $ sudo ifconfig eth2 down
pi@raspberrypi ~/LINC-Switch $ sudo ifconfig eth2 hw ether 00:e0:4c:53:44:59   <-Different MAC
pi@raspberrypi ~/LINC-Switch $ sudo ifconfig eth2 up

Also don't buy the  WiFi dongle I recommended based on the RTL8188CUS chipset.  The reason is although this dongle works, it is a pain to get it supporting AP mode on the Raspberry Pi. I wasted 1 hr and then switched to an RT5370 based one which I got working in AP mode in under 5 mins and they are cheaper too (approx £4).

I've just ordered a different type of USB to Ethernet adaptor to see whether this has the same problem! [UPDATE: Which it did.....]

Thursday 12 September 2013

OpenFlow Switch on Raspberry Pi Part 5: First simple experiment

This is part 5 in the series of building an OpenFlow switch on the Raspberry Pi.

On in part 4 we set-up Ryu to be an L2 switch and it applied flow rules to the LINC switch. The traffic source which triggered the rules was port eth0 which effectively is the control port since this connects the Raspberry Pi to my network and ultimately to the Ryu controller. The flows were therefore applied as a result of noise and chatter on the local LAN.

For our first very simple experiment we need to have a more controlled environment so let's modify the LINC config so that eth0 is solely to connect the switch to the controller.

To shutdown LINC

(linc@raspberrypi)2> init:stop().

Let's edit the LINC config file

sudo vi /home/pi/LINC-Switch/rel/linc/releases/1.0/sys.config

       {ports,
        [
         %% - regular hardware interface
         {port, 1, [{interface, "eth1"}]},
         {port, 2, [{interface, "eth2"}]},
         %% {port, 4, [{interface, "eth0"}]}
         {port, 3, [{interface, "wlan0"}]}
         %% - hardware interface with explicit type

Comment out port 4 (I moved it as the comma on the last entry for port 3 causes the config file to fail).

Restart the switch

pi@raspberrypi ~/LINC-Switch $ sudo rel/linc/bin/linc console

So I now have 3 ports for traffic on the Pi.  I need a traffic source so I connected a laptop via a cable to eth1. There is no DHCP so no IP addresses on the laptop will need be assigned so you will need to change the IP address on the laptop to be a static IP address. I set mine to 192.168.1.130/24 with a default gateway of 192.168.1.1.

So the controller spots the laptop

>  installing new source mac received from port 1

If we now look at the flow tables on the switch we can see what's happening in more detail and understand.

Let's view LINCs flow table:

(linc@raspberrypi)1> ets:tab2list(linc:lookup(0, flow_table_0)).
[{flow_entry,{0,#Ref<0.0.0.374>},
             0,
             {ofp_match,[]},
             <<0,0,0,0,0,0,0,0>>,
             [],
             {1370,616692,527826},
             {infinity,0,0},
             {infinity,0,0},
             [{ofp_instruction_write_actions,4,
                                             [{ofp_action_output,16,controller,65535}]}]},
 {flow_entry,{123,#Ref<0.0.0.381>},
             123,
             {ofp_match,[{ofp_field,openflow_basic,in_port,false,
                                    <<0,0,0,1>>,
                                    undefined},
                         {ofp_field,openflow_basic,eth_src,false,
                                    <<32,207,48,0,192,96>>,
                                    undefined}]},
             <<0,0,0,0,0,0,0,0>>,
             [],
             {1370,616842,124920},
             {infinity,0,0},
             {infinity,0,0},
             [{ofp_instruction_goto_table,6,1}]}]

The line with

{ofp_field,openflow_basic,eth_src,false,
                                    <<32,207,48,0,192,96>>,
                                    undefined}]},

This is the laptop's MAC address in decimal notation 20:CF:30:00:C0:60

(linc@raspberrypi)1> ets:tab2list(linc:lookup(0, linc_ports)).
[{linc_port,1,<0.164.0>},
 {linc_port,2,<0.161.0>},
 {linc_port,3,<0.157.0>}]

OK. LINC isn't the most user friendly if you are a network engineer.  There are plans to improve this and adopt a more familiar user interface like Cisco IOS.

 Right. Let's stop ryu and install a really simple controller configuration to show how things work.

ryu is written in python.  I have to admit that it's taking me a while to get used to python syntax having used c, php and other languages that use {} structures for function declarations.  Python uses just space or tabs to identify what's a function ! Seems crazy to me but that's how it's done.

## Simple ryu layer 2 hub 
## All packets arriving at the OpenFlow switch are passed to the controller
## The controller simply floods all incoming messages out of all ports on the switch
## You would never do this in reality!
## No flows are installed on the switch to remember how to handle packets

from ryu.base import app_manager
from ryu.controller import ofp_event
from ryu.controller.handler import MAIN_DISPATCHER
from ryu.controller.handler import set_ev_cls

class L2Switch(app_manager.RyuApp):
    def __init__(self, *args, **kwargs):
        super(L2Switch, self).__init__(*args, **kwargs)


## set_ev_cls decorator does all the  work. Incoming packets referred to EventOFPPacketIn

    @set_ev_cls(ofp_event.EventOFPPacketIn, MAIN_DISPATCHER)

## packet_in_handler defines rules which are processed when a packet arrives

    def packet_in_handler(self, ev):## All below is part of the packet_in_handler function 
## These are datastructures for the incoming message
## ev.msg represents a packet_in 
        msg = ev.msg
## msg.dp reepresents the datapath for the switch
        dp = msg.datapath
## dp.ofproto represents the protocol to the switch which was negotiated
        ofp = dp.ofproto
        ofp_parser = dp.ofproto_parser
## OFPActionOutput(arg) is which port the message should be sent out of
## OFPP_FLOOD refers to all ports or a flood

        actions = [ofp_parser.OFPActionOutput(ofp.OFPP_FLOOD)]
## Build the packet to send using OFPPacketOut
        out = ofp_parser.OFPPacketOut(
            datapath=dp, buffer_id=msg.buffer_id, in_port=msg.in_port,
            actions=actions)
## Send the built packet
        dp.send_msg(out)
You would never actually use OpenFlow like this. Here's what it does.

A packet arrives at the switch.  The switch checks what rules (flows) have been defined for the arriving packet. The controller hasnt actually installed any so it then refers the packet to the ryu controller.

ryu now dissects the packet passed over the OpenFlow protocol and the above programme tells ryu how to process packets.

The function packet_in_handler is called.

The key line here is
actions = [ofp_parser.OFPActionOutput(ofp.OFPP_FLOOD)]

What this is actually saying is to send the arriving packet to all interfaces.  We are building a hub which is exactly what it does - it floods arriving packets to all ports. The final line commits this.

Now we would never do this in reality since it is a massive overhead. The switch will copy every single packet to the controller asking what to do with it.  The switch never learns anything!

Now given in a real network the controller may be remote from the switch, you can see this would introduce massive latency and massive traffic duplication !

The point of this is really to show the logic of how OpenFlow works.

We can run ryu with more verbose logging to see more about what it is doing

ryu-manager --verbose l2hub.py 

Here's what it comes back with:

loading app l2hub.py
loading app ryu.controller.ofp_handler
instantiating app l2hub.py
instantiating app ryu.controller.ofp_handler
BRICK ofp_event
  PROVIDES EventOFPPacketIn TO {'L2Switch': ['main']}
  CONSUMES EventOFPEchoRequest
  CONSUMES EventOFPErrorMsg
  CONSUMES EventOFPHello
  CONSUMES EventOFPSwitchFeatures
BRICK L2Switch
  CONSUMES EventOFPPacketIn
connected socket:<socket fileno=4 sock=192.168.1.4:6633 peer=192.168.1.15:45743> address:('192.168.1.15', 45743)
hello ev <ryu.controller.ofp_event.EventOFPHello object at 0xf7e510>
move onto config mode
switch features ev version: 0x4 msg_type 0x6 xid 0x70f534ec
move onto main mode
EVENT ofp_event->L2Switch EventOFPPacketIn
Ignore the reference to L2Switch - this is a hub. L2Switch is from the class declaration at the beginning - I copied this example.

You can see ryu is initialising, then it connects to the Raspberry Pi OpenFlow switch running at 192.168.15
It negotiates to use the OF1.3 protocol (0x04)

The Raspberry Pi will report it has also connected to the controller.

16:07:49.296 [info] Connected to controller 192.168.1.4:6633/0 using OFP v4

Now on the laptop connected to the Raspberry Pi, if I set ping running to ping some address, this will be forwarded to the controller. In the controller window you'll see each ping packet event showing in the verbose log

EVENT ofp_event->L2Switch EventOFPPacketIn
In the next post I'll  evolve our simple hub to at least not broadcast out of every port.

Wednesday 4 September 2013

OpenFlow security - new exploits?

The past few weeks has seen several high profile DNS exploits.  Hackers have altered the DNS entries to route traffic elsewhere to either an unrelated  site or a fake site.  Typically the way companies discover  this is, is  that the traffic to their website has  disappeared and their web servers are sitting there idle.

More sophisticated exploits would be to leak some of the traffic to an alternative site so it is less likely to be detected through traffic anomolies.

So what has this got to do with OpenFlow?  Well OpenFlow has the potential to abstract routing so that IP addresses are mobile and traffic can be routed programmatically. This is not a million miles from the DNS hack - it would be possible to move traffic routed to a particular valid IP address to another location, in other words it's possible for the network to be the man in the middle and move traffic to another server.

Although this idea isn't new, the same can happen with today's IP networks through route injection, the OpenFlow concepts make this is simpler task.

So how do we prevent this? OpenFlow has put some basic functionality in place to prevent some of this such as secure connections between the controller and the switch,  however the logic on how a network behaves is set at the application level on the controller. The challenge, as OpenFlow networks become more prolific, is to ensure that applications sitting on the controller can be trusted and are doing what we expect.  Imagine a world where the applications installed on the controller have a virus or are simply malicious  and are taking rogue actions. How can we detect this? How can we prevent this?

With the controller exposing north bound interfaces elsewhere, the need for trust from "controllers of controllers" needs to be established.

These are not real risks today since it is likely that any OpenFlow network will be closed, secure and tightly controller by the network administrators but it is definitely something which could emerge as a real threat within the next 5 years.

Tuesday 30 July 2013

OpenFlow in the Optical Domain

There's a lot of noise about using OpenFlow in the optical domain. For example Carrier SDN – The ONF’s View.

I'm not 100% convinced about the merits of OpenFlow in the optical space.

Here's a video where Sten Nordell, Transmode's CTO, is talking about SDN in the optical space.


I think he does a reasonable job of weighing up the options and getting back to basics.

I guess the reason I am at least posing the question why OpenFlow might not work in the optical space is:

1/ There are already intelligent networking protocols for optical networks eg G.ASON. I worked for Sycamore that pioneered this and standardised it yet few optical networking companies embraced it for "optical dial-tone".  If historically optical companies didn't embrace it, what is different this time round?

2/ Optical pipes are pipes. There is not an in-band signalling channel anyway and it doesnt naturally lend it self to "MAC inspection".

I'm open to a debate on this at least on the transport side. For the optical switching layer, particularly within data centres, I can see how OpenFlow & SDN concepts can work.