Setting Up a Ceph Filesystem with Kubernetes on DigitalOcean

Recently my company created an application for managing 3D printing projects, profiles, and slices. Check it out at layerkeep.com.

We wanted users to be able to keep track of all their file revisions and also be able to manage the files without having to go through the browser. To accomplish this, we decided to use Git which meant we needed a scalable filesystem.

The first thing we did is setup a Kubernetes cluster on DigitalOcean.

Currently DigitalOcean only provides Volumes that are ReadWriteOnce. Since we have multiple services that need access to the files (api, nginx, slicers), we needed to be able to mount the same volume with ReadWriteMany.

I decided to try s3fs with DigitalOcean Spaces since they are S3-compatible object stores.  I setup the CSI from https://github.com/ctrox/csi-s3. I tried both the s3fs and goofys mounter. Both worked and both were way too slow.  Most of our APIs require accessing the filesystem multiple times and each access took between 3-15 seconds so I moved on to Ceph.

Ceph Preparation:

There is a great storage manager called Rook (https://rook.github.io/) that can be used to deploy many different storage providers to Kubernetes.
** Kubernetes on DigitalOcean doesn’t support FlexVolumes so you need to use CSI instead.

Hardware Requirements.
You can check the ceph docs to see what you might need. http://docs.ceph.com/docs/jewel/start/hardware-recommendations/#minimum-hardware-recommendations

Create the Kubernetes Cluster

Follow the directions here to create the cluster: https://www.digitalocean.com/docs/kubernetes/how-to/create-clusters/

** Initially I tried a 3 node pool each with 1 CPU and 2GB of memory but it wasn’t enough. It needed more CPU on startup. I changed each node to have 2 CPUs and 2 GBs of memory which worked.

We’ll make sure to keep all ceph services constrained to this pool by naming it “storage-pool” (or whatever name you want) and adding a node affinity using that name later.

Cluster Access

Make sure you followed DO directions to accessing the Cluster with kubectl. (https://www.digitalocean.com/docs/kubernetes/how-to/connect-to-cluster/)

You also might want to add a Kubernetes Dashboard. (https://github.com/kubernetes/dashboard)

SSH:
Right now it doesn’t look like you can ssh into the droplets that DigitalOcean creates when you create a node pool. I wanted to have access just in case so I went to the droplets section and reset the root password for each of them. I was then able to add my ssh key and disable root login. I recommend doing this before adding any services.

Create Volumes
Go to the Volumes section in DigitalOcean dashboard. We want to create a volume for each node in the node pool we just created. Don’t format it. Then attach it to the correct droplet. Remember that volumes can only be increased in size (not decreased) without having to create a new one.

Create the Ceph Cluster

Clone down the Rook repository or just copy down the ceph directory from: https://github.com/rook/rook/tree/release-1.0/cluster/examples/kubernetes/ceph

cd cluster/examples/kubernetes/ceph

Modify the cluster.yaml file.

This is where we’ll add the node affinity to run the ceph cluster only on nodes with the “storage-pool” name.

placement:
  all:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
        - key: doks.digitalocean.com/node-pool
          operator: In
          values:
          - storage-pool
    podAffinity:
    podAntiAffinity:
    tolerations:
    - key: storage-pool
      operator: Exists

 

There are also other configs that are commented out that you might need to change. For example, if your disks are smaller than 100 GB you’ll need to uncomment the ‘databaseSizeMB: “1024”‘.

Modify the filesystem.yaml file if you want. (Filesystem Design)
Once you’re done configuring you can run:


kubectl apply -f ceph/common.yaml
kubectl apply -f ceph/csi/rbac/cephfs/
kubectl apply -f ceph/filesystem.yaml
kubectl apply -f ceph/operator-with-csi.yaml
kubectl apply -f ceph/cluster.yaml

If you want the ceph dashboard you can run:
kubectl apply -f ceph/dashboard-external-https.yaml

Your operator should create your cluster. You should see 3 managers, 3 monitors, and 3 osds. Check here for issues: https://rook.github.io/docs/rook/master/ceph-common-issues.html

Deploy the CSI

https://rook.github.io/docs/rook/master/ceph-csi-drivers.html

We need to create a secret to give the provisioner permission to create the volumes.

To get the adminKey we need to exec into the operator pod. We can print it out in one line with:

POD_NAME=$(kubectl get pods -n rook-ceph | grep rook-ceph-operator | awk '{print $1;}'); kubectl exec -it $POD_NAME -n rook-ceph ceph auth get-key client.admin

Create a secret.yaml file:

apiVersion: v1
kind: Secret
metadata:
  name: csi-cephfs-secret
  namespace: default
data:
  # Required if provisionVolume is set to true
  adminID: admin
  adminKey: {{ PUT THE RESULT FROM LAST COMMAND }}

Create the CephFS StorageClass.

We’ll need to modify the example storageclass in ceph/csi/example/cephfs/storageclass.yaml.

The storageclass.yaml file should look like:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: csi-cephfs
provisioner: cephfs.csi.ceph.com
parameters:
  # Comma separated list of Ceph monitors
  # if using FQDN, make sure csi plugin's dns policy is appropriate.
  monitors: rook-ceph-mon-a.rook-ceph:6789,rook-ceph-mon-b.rook-ceph:6789,rook-ceph-mon-c.rook-ceph:6789

  # For provisionVolume: "true":
  # A new volume will be created along with a new Ceph user.
  # Requires admin credentials (adminID, adminKey).
  # For provisionVolume: "false":
  # It is assumed the volume already exists and the user is expected
  # to provide path to that volume (rootPath) and user credentials (userID, userKey).
  provisionVolume: "true"

  # Ceph pool into which the volume shall be created
  # Required for provisionVolume: "true"
  pool: myfs-data0

  # The secrets have to contain user and/or Ceph admin credentials.
  csi.storage.k8s.io/provisioner-secret-name: csi-cephfs-secret
  csi.storage.k8s.io/provisioner-secret-namespace: default
  csi.storage.k8s.io/node-stage-secret-name: csi-cephfs-secret
  csi.storage.k8s.io/node-stage-secret-namespace: default

reclaimPolicy: Retain
allowVolumeExpansion: true

Change the storage class name to whatever you want.

*If you changed metadata.name in filesystem.yaml to something other than “myfs” then make sure you update the pool name here.

Create the PVC:

Remember that Persistent Volume Claims are accessible only from within the same namespace.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: my-pv-claim
spec:
  storageClassName: csi-cephfs
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 10Gi

Use the Storage

Now you can mount your volume using the persistent volume claim you just created in your Kubernetes resource.  An example Deployment is:

apiVersion: apps/v1
kind: Deployment
metadata:
 name: webserver
 namespace: default
 labels:
   k8s-app: webserver
spec:
 replicas: 2
 selector:
   matchLabels:
     k8s-app: webserver
 template:
   metadata:
     labels:
       k8s-app: webserver
   spec:
     containers:
     - name: web-server
       image: nginx
       volumeMounts:
       - name: my-persistent-storage
         mountPath: /var/www/assets
     volumes:
     - name: my-persistent-storage
       persistentVolumeClaim:
         claimName: my-pv-claim

 

Both deployment replicas will have access to the same data inside /var/www/assets.

ADDITIONAL TOOLS

You can also test and debug the filesystem using the Rook toolbox.  (https://rook.io/docs/rook/v1.0/ceph-toolbox.html).

First start the toolbox with:  kubectl apply -f ceph/toolbox.yaml

Shell into the pod.

TOOL_POD=$(kubectl get pods -n rook-ceph | grep tools | head -n 1 | awk '{print $1;}'); kubectl exec -it $TOOL_POD -n rook-ceph /bin/bash

Run Ceph commands:  http://docs.ceph.com/docs/giant/rados/operations/control/

Validate the filesystem is working by mounting it directly into the toolbox pod.
From: https://rook.io/docs/rook/v1.0/direct-tools.html

 

# Create the directory
mkdir /tmp/registry

# Detect the mon endpoints and the user secret for the connection
mon_endpoints=$(grep mon_host /etc/ceph/ceph.conf | awk '{print $3}')
my_secret=$(grep key /etc/ceph/keyring | awk '{print $3}')

# Mount the file system
mount -t ceph -o mds_namespace=myfs,name=admin,secret=$my_secret $mon_endpoints:/ /tmp/registry

# See your mounted file system
df -h

Try writing and reading a file to the shared file system.

echo "Hello Rook" > /tmp/registry/hello
cat /tmp/registry/hello

# delete the file when you're done
rm -f /tmp/registry/hello

Unmount the Filesystem

To unmount the shared file system from the toolbox pod:

umount /tmp/registry
rmdir /tmp/registry

No data will be deleted by unmounting the file system.

Monitoring

Now that everything is working you should add monitoring and alerts.

You can add the Ceph dashboard and/or Prometheus/Grafana to monitor your filesystem.
http://docs.ceph.com/docs/master/mgr/dashboard/
https://github.com/rook/rook/blob/master/Documentation/ceph-monitoring.md

Include System Libraries Using Swift Package Manager Or CocoaPods

I’m currently using swift package manager to build a framework for an iOS project.  Why? Because I like the clean, modular approach, no need to have an Xcode project and I find it faster for CI testing.

Unfortunately swift package manager doesn’t work for the iOS project.  And since my team is already familiar with CocoaPods that is what the iOS project is using.

Now I will explain how I included a system library inside a Swift framework that is then used in an iOS project with CocoaPods.

I’ll show you how I setup CommonCrypto to use Swift Package Manager and CocoaPods.

For Swift Package Manager:

  • A. Create a git repo for the swift package wrapper around the system library.   
    1. Add a module.modulemap file to the repo and add the system header you are wrapping.
      module CCommonCrypto [system] {
      header "/usr/include/CommonCrypto/CommonCrypto.h"
      export *
      }
    2.  Add Package.swift file to repo
      import PackageDescription
      let package = Package(
      name: "CCommonCrypto"
      )
    3.  Commit the files and add a tag to them (Swift Packages need a tag to be used)
      git tag 0.0.1
      git push origin 0.0.1
      (you can overwrite previous tag by using the "-f" flag on both of those commands)
  • B. Create another repo for the Swift Package that will use the system package wrapper.
    1. Create the Package.swift file and add the repo created in step 1 as a dependency.

      import PackageDescription
      let package = Package(
      name: "CommonCrypto",
      targets: [
      Target(name: "CommonCrypto"),
      ],
      dependencies: [
      .Package(url: "https://github.com/kmussel/ccommoncrypto.git", "0.0.1"),
      ]
      )
    2. Add the “Sources” folder and a subfolder named the same thing you named your target in the Package.swift file
      Sources -> CommonCrypto
    3. Inside the subfolder (CommonCrypto in this case) add all your swift files that you want to use.
      1.  In each file that you want to use the dependency package import that package using the name from its Package.swift file
        import CCommonCrypto

Now you can create another Swift Package and include Repo 2 as a dependency and when you run “swift build” it will download and include the dependencies

You can also run “swift package generate-xcodeproj” if you want to use the Xcode project.

Getting the Swift Package to Work with CocoaPods:

  • C.  Repo 1 is not needed.  We just need to update Repo 2 with necessary CocoaPod files.
    1.  Add another subfolder under the Subfolder we created in Step B.2. The name of the subfolder should match the name of the Package from step A.2.  This is so CocoaPods will recognize the import statement from step B.3.1.
    2. Copy the module.modulemap file from step A.1. into this subfolder.
    3. Create another subfolder in the root directory.  It can be named anything you want but I called it “CocoaPods” since it is only used for that.

    4. Inside the “CocoaPods” directory you will create multiple subfolders for each target you want this package to target.
      ie. iphoneos, iphonesimulator, macosx, etc

    5. Create the files module.map under each of the subfolders you just created pointing to the correct system headers
      CocoaPods -> iphoneos -> module.map
      module CCommonCrypto [system] {
      header "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS10.1.sdk/usr/include/CommonCrypto/CommonCrypto.h"
      export *
      }
    6. Create the Podspec file for it
      1. Our source_files will just be the .swift files inside the Sources directory.  These files are added to the Xcode project.
        s.source_files = "Sources/**/*.swift"
      2. Normally any file not in the source_files will be removed but we need the project to be able to access our “CocoaPods” directory to know which files to include.  To keep the “CocoaPods” directory without adding it to the project we use the “preserve_paths” command to keep the “CocoaPods” directory.
        s.preserve_paths = 'CocoaPods/**/*'
      3. We then tell Xcode where the include paths are for each sdk.  CocoaPods installs it in the PODS_ROOT directory and under the subdirectory named the same as the name of this Podspec.
        s.pod_target_xcconfig = {
        'SWIFT_INCLUDE_PATHS[sdk=iphoneos*]' => '$(PODS_ROOT)/CommonCrypto/CocoaPods/iphoneos',
        'SWIFT_INCLUDE_PATHS[sdk=iphonesimulator*]' => '$(PODS_ROOT)/CommonCrypto/CocoaPods/iphonesimulator',
        'SWIFT_INCLUDE_PATHS[sdk=macosx*]' => '$(PODS_ROOT)/CommonCrypto/CocoaPods/macosx'
        }
      4. The next thing is that the header file in each of the module.map files probably won’t be the same for each user.  We need to change it for the user when they install via pod install.  We create a script to do this and execute it using the prepare_command in cocoa pods.  For example, my path is “/Applications/Xcode-beta.app/Contents/Developer/” so this script replaces the default to that.  I grabbed the script from this site. But should probably should modify the script to handle sdk versions too.
        s.prepare_command = <<-CMD
        ./CocoaPods/injectXcodePath.sh
        CMD

 Now you can include Repo 2 in the Podfile of your iOS project.

The entire Podspec:

You can see the entire CommonCrypto example here:

https://github.com/kmussel/commoncrypto
https://github.com/kmussel/ccommoncrypto

Google Universal Analytics and Tag Manager with Enhanced Ecommerce

Google Analytics! It used to be a simple add this snippet of code to your page and you’re finished. Now depending on the options you want there could be a lot more work to do. You could be using classic google analytics or universal analytics.  You could be using the ecommerce plugin or enhanced ecommerce or none.  You could be using the data layer or macros. You could be using any combination of those with google tag manager.  And depending on which combination you use you will have to code it differently.

I’ll show you how I setup analytics using Universal Analytics and Tag Manger with Enhanced Ecommerce and the Data Layer.

Finding the correct documentation for the analytics combination I’m using was frustrating.  Here is a list of the docs that were helpful to me.

First off if you haven’t used Google Tag Manager I would read:
Getting Started and
How It Works

Basically once the javascript snippet is deployed to the site, it allows non-developers to manage what data they want to collect from the site without involving the developers.

For example.   Let’s say a website has a button to login.  This button has an ID of “login-btn”.  Now a user can use the tag manager to add a tag called “Log In” with a rule that when a user clicks on an element with an ID of “login-btn” it will fire the tag.  The rule would look like this: {{element id}} equals login-btn.  Depending on the type of tag you use you might also need to add to the rule {{event}} equals gtm.click.

Now your site will start collecting data every time a user clicks on the login button without your developer having to make any changes to the code.

** Once you cross over to the Ecommerce world, a developer is going to be required.

The Basic Steps:

Google Tag Manager

The data that is used by the Tag Manager and then sent to google analytics is retrieved from the Data Layer or Macros.   The recommended approach is to use the Data Layer.

I recommend going over the development docs if you haven’t already. https://developers.google.com/tag-manager/devguide

In their docs they mention two ways to populate the data layer and fire the tags:

  1. Declare all needed information in the data layer above the container snippet
  2. Use HTML Event Handlers

If you have the traditional Multi Page Application you most likely will use option 1 .  If you have a Single Page Application (SPA) you will need to use option 2.   Personally I’ve been using this on a SPA with Angular so I never used option 1.

If you are using option 1, you would create a tag in Google Tag Manager with these attributes:

Tag type : Universal Analytics
Track type : Pageview
Enable Enhanced Ecommerce Features: true
Use Data Layer: true
Basic Settings – Document Path: {{url path}}
Firing Rule: {{event}} equals gtm.js

In this case you would make sure the all the data was in the dataLayer before the snippet.

dataLayer = [{
 'ecommerce': {
  'impressions': [{
   'name': productObj.name,                      // Name or ID is required.
   'id': productObj.id,
   'price': productObj.price,
   'brand': productObj.brand,
   'category': productObj.cat,
   'variant': productObj.variant,
   'list': 'Search Results',
   'position': 1
  }]
 }
}];

For option 2 you would create a tag like this:

Tag type : Universal Analytics
Track type : Event
Event Category: Ecommerce
Event Action: Product Click
Enable Enhanced Ecommerce Features: true
Use Data Layer: true
Basic Settings – Document Path: {{url path}}
Firing Rule: {{event}} equals productClick

Then in your javascript you would send the data by pushing it to the dataLayer object like this:

dataLayer.push({
'event': 'productClick',
'ecommerce': {
  'click': {
    'actionField': {'list': 'Search Results'},      // Optional list property.
    'products': [{
      'name': productObj.name,                      // Name or ID is required.
      'id': productObj.id,
      'price': productObj.price,
      'brand': productObj.brand,
      'category': productObj.cat,
      'variant': productObj.variant
    }]
  }
}
});


I recommend using a generic event to help keep things manageable. I’m using the angular javascript framework with angulartics (http://luisfarzati.github.io/angulartics/).  The readme here (https://github.com/luisfarzati/angulartics#for-google-tag-manager)  explains what tags, rules and macros need to be setup.  Even if you aren’t using angular the setup is the same for a generic event.

Also for both of those tags, make sure you check both “Enable Enhanced Ecommerce Features”  and  “Use Data Layer”.

** If you are using angulartics you’ll need to make sure it handles ecommerce. You just need to make sure one of the top level keys is ‘ecommerce’.   I just over wrote the module and did this:

$analyticsProvider.registerEventTrack(function(action, properties){
  var dataLayer = window.dataLayer = window.dataLayer || [];
  var data = {
    'event': 'interaction',
    'target': properties.category,
    'action': action,
    'target-properties': properties.label,
    'value': properties.value,
    'interaction-type': properties.noninteraction
  };
  if(properties.ecommerce)
    data['ecommerce'] = properties.ecommerce;

  dataLayer.push(data);
});

Make sure you look at the docs for google tag enhanced ecommerce.  It will show you what attributes to use when you push your data to the data layer.  https://developers.google.com/tag-manager/enhanced-ecommerce

Viewing the data you sent in google analytics.

Go to Reporting and go to the Conversions section.  Under there you’ll see the Ecommerce reports.

* When you push your products to the dataLayer the id field maps to the Product SKU.

Dynamic Remarketing

In Google Tag Manager for each tag you want to use this feature for check the “Enable Display Advertising Features” box.
In Google Analytics, go to the Admin section. Select the property you want and then click on “Dynamic Attributes”. Then for Step 2 for Product ID select “Product SKU”. https://support.google.com/analytics/answer/6002231

Debugging Locally

Click on each tag you created to edit it. Under the “More settings” click on “Cookie Configuration”. For Cookie Domain type “none”.
You can also setup another view in Google Analytics and in Tag Manager you can create a macro for the Tag’s Tracking Code like here.

A couple useful articles about Google Tag Manager Macros:
http://www.simoahava.com/analytics/macro-guide-google-tag-manager/
http://www.simoahava.com/analytics/macro-magic-google-tag-manager/

I know this can be confusing and there is a lot of questions that I didn’t answer. The biggest thing for me is to know where I can find the answers so hopefully this article along with the links I posted will help you forge ahead through the world of google analytics.

Inter-Service Communication using Client Certificate Authentication

I love the Service Oriented Architecture. But like all things, security is needed. In this case to make sure that one service has permission to talk to another service. There are a few different ways to obtain this security but I really like using SSL certificates. It’s very simple to add other services and your webserver (apache, nginx) will handle the validation for you.

I wrote an article on getting this setup here: http://www.sitepoint.com/inter-service-communication-using-client-certificate-authentication/

Asynchronous Request within UITableViewCell

The Problem:
I have an image that needs to be loaded for each table cell. Unfortunately this causes the UI to become unresponsive for a couple seconds when you click to go to that table view.

The Solution:
Make the call to get the image data asynchronous.

First in our tableview delegate method, cellForRowAtIndexPath, we’ll use NSURLConnection to create the connection.


NSURLRequest *req = [[NSURLRequest alloc] initWithURL:[NSURL URLWithString:@"http://www.test.com/image_url"]];
NSURLConnection *imgCon = [NSURLConnection alloc];
..... /** Code added later  **/
[imgCon initWithRequest:req delegate:self startImmediately:YES];

There are 2 delegate methods that you need to have to make this work. (There are others that you’ll probably want to implement as well. Apple Docs)


-(void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data;
-(void)connectionDidFinishLoading:(NSURLConnection *)connection;

The didReceiveData method will probably be called multiple times. So we need to create a variable to keep on appending the data to it. Since we are creating multiple connections we need to map each connection to the data. But we also need to know which table cell we need to update once we have all the data.
So we create 2 NSMutableDictionary objects. One to map the table cells indexPath to the data and one to map the connection to the indexPath.


NSMutableDictionary *indexPathImgData;
NSMutableDictionary *connectionIndexPath;

The problem that arrises from trying to map the connection to the indexPath is that NSURLConnection doesn’t conform to the NSCopying Protocol. To get around this we need to use CFMutableDictionary. When adding values to a CFMutableDictionary “the keys and values are not copied—they are retained” (Apple Docs on CFMutableDictionary)

So to add the values to our NSMutableDictionary we will use the CFDictionaryAddValue function.

Now our cellForRowAtIndexPath method will look like this:


NSURLRequest *req = [[NSURLRequest alloc] initWithURL:[NSURL URLWithString:@"http://www.test.com/image_url"]];
NSURLConnection *imageCon = [NSURLConnection alloc];

CFDictionaryAddValue((CFMutableDictionaryRef)self.connectionIndexPath, imageCon, indexPath);
NSMutableData *imageData = [NSMutableData dataWithCapacity:0];
CFDictionaryAddValue((CFMutableDictionaryRef)self.indexPathImgData, indexPath, imageData);

[imageCon initWithRequest:req delegate:self startImmediately:YES];

And our NSURLConnection Delegate methods will look something like this:


- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data {

    [[self.indexPathImgData objectForKey:[self.connectionIndexPath objectForKey:connection]] appendData:data];
}

-(void)connectionDidFinishLoading:(NSURLConnection *)connection
{
    NSIndexPath *temp = [self.connectionIndexPath objectForKey:connection];
    [self.tableView reloadRowsAtIndexPaths:[NSArray arrayWithObject:temp] withRowAnimation:UITableViewRowAnimationNone];

}

Now in your cellForRowAtIndexPath method you’ll want to do a check if the dictionary object has data for that indexPath and if so display it otherwise you’ll want to create a new connection.

Crawling Web Pages and Creating Sitemaps

Creating a Sitemap Based on all the Links within a Website

I built this web crawler because I wanted a way to create a sitemap of this website I was building. I know there are a few websites out there that will do this for you but I didn’t want to rely on someone else and I wanted to change a few things. So in order to do this I used php and cURL.
I started out creating a class for the crawler. When I create a new crawler class I pass in the url of the website I want to start with. This also uses cURL to access the webpage and get the content and headers. Inside this class are also methods to get all the links of a page, the page title, the entire content, just the body content, and the headers. But you could easily add more to say grab all the images on a page.

The Crawler Class

  

class Crawler {
  protected $markup='';
  protected $httpinfo='';

  public function __construct($uri, $justheaders=0){
    $output = $this->getMarkup($uri, $justheaders);
    $this->markup = $output['output'];
    $this->httpinfo = $output['code'];
  }

  public function getMarkup($uri, $justheaders) {
    $ch = curl_init($uri);
    curl_setopt($ch, CURLOPT_HEADER, 1);
    if($justheaders){
      curl_setopt($ch, CURLOPT_NOBODY, 1);
      curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
      curl_setopt($ch, CURLOPT_FAILONERROR, 1);
    }
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_MAXREDIRS, 5);

    $output['output'] = curl_exec($ch);
    $output['code'] = curl_getinfo($ch);
    curl_close($ch);
    return $output;
  }

  public function get($type){
    $method = "_get_{$type}";
    if (method_exists($this, $method)){
      return call_user_method($method, $this);
    }
  }

  protected function _get_info(){
    return $this->httpinfo;
  }

  protected function _get_links(){
    if(!empty($this->markup)){
      preg_match_all('/<a(?:.*?)href=(["|\'].*?["|\'])(.*?)>(.*?)\<\/a\>/i',
                               $this->markup, $links);
      return !empty($links[1]) ? array_flip(array_flip($links[1])) : FALSE;
    }
  }

  protected function _get_body(){
    if(!empty($this->markup)){
      preg_match('/\<body\>(.*?)\<\/body\>/msU', $this->markup, $body);
      return $body[1];
    }
  }
  protected function _get_content(){
    if(!empty($this->markup)){
      return $this->markup;
    }
  }

  protected function _get_pagetitle() {
    if (!empty($this->markup)){
     preg_match_all('/<title>(.*?)\<\/title\>/si', $this->markup, $pagetitles);
     return !empty($pagetitles[1]) ? $pagetitles[1] : FALSE;
    }
  }
}


After this I create a recursive function that will follow each of the links. Each time I call this function I create a new instance of the Crawler class. If the url isn’t valid I just return. If the url is redirected curl has an option to follow links, the CURLOPT_FOLLOWLOCATION option. Since this is set to on, you need to get the actual url which is contained in the header information. After this I call the get links function. This will return all the unique links on a page. (Calling the array_flip twice makes them unique).

I then get the title tag for each page. This is used when creating the sitemap. I remove all the script tags and all the html tags. The next thing I do is get a base url. Since I’m creating a sitemap of just one website I don’t need any links to external pages. So if any of the links don’t contain this base url I wont follow it.

Now I begin looping through all the links on the page. If its an external link I return. If it is an absolute link and it contains a “/” I replace the whole thing with a slash. If it doesn’t have the slash but is an absolute link then the new val is “”. I do this because I am prepending the base url that we got earlier to it.

I then explode on the “/”. If the first element is empty then I put the base url in there. Otherwise I prepend the url to it. This is done to get the correct link no matter if it is a relative link, root relative, or absolute.
The complete link is formed and checked to see if it already exist in the globallinkarr. If it doesn’t I add it and then begin getting the different levels. Basically everytime there is a “/” in the url then that is a different level. This is used when I am creating the html sitemap. Also to create this array of levels, I have to call array_merge_recursive. Well the regular php function didn’t quite work. If a url has numbers as one of its levels for example blog/2009/12/post then that function would turn the 2009 to its own key. So I needed it to keep the keys the same so I just got another function off of php.net.

The Function to Get all the Links


$dontfollow = array('pdf', 'jpg', 'png', 'jpeg','zip', 'gz', 'tar', 'txt');

function findAllLinks($url){
  global $globallinkarr;
  global $depthlinks;
  global $dontfollow;
  global $pagetitles;
  global $contentarr;

  $crawl = new Crawler($url);
  $info = $crawl->get('info');

  $validcodes = array(200,301,302);
  if(!in_array($info['http_code'], $validcodes))
    return;
  $url = $info['url'];
  $links = $crawl->get('links');
  $title = $crawl->get('pagetitle');
  $title = $title[0];
  $body = $crawl->get('body');
  $count++;

  $content = strip_tags(preg_replace('//msU', '', $body));

  if(!array_key_exists($url, $contentarr))
    $contentarr[$url] = array('title'=>"$title", 'pagecontent'=>"$content");

  if(!count($links) || !is_array($links)) return;
  else{
    if(preg_match('/http(?:s)?:\/\/(.*?)\/(.*)/', $url, $pattern)){
      $baseurl = $pattern[1];
    }else{
      $baseurl = $url;
    }

    foreach($links as $val){
      if(preg_match('/.*?javascript:void\(0\)/', $val) || ereg('#', $val)){
        continue;
      }
      if(!preg_match('/[0-9a-zA-Z]/', $val)) continue;
      $val = trim($val, '"\'');

      /**
       * CHECK IF LINK IS GOING TO ANOTHER DOMAIN.  IF SO DONT FOLLOW IT.
      */

      if(preg_match('/^http(s)?:\/\//', $val) &&
               !strpos($val, preg_replace('/http(s)?:\/\//', '', $baseurl))){
        continue;
      }

      if(ereg('http', $val) && preg_match('/^http(s)?:\/\/.*?\//', $val)){
        $val = preg_replace('/^http(s)?:\/\/.*?\//', '/', $val);
      }else if(ereg('http', $val)){
        $val = '';
      }

      $sl = explode('/', $val);

      if(!preg_match('/[0-9a-zA-Z]/', $sl[0])){
        $sl[0] = preg_replace('/^http(s)?:\/\//', '', $baseurl);
        $complink = implode('/', $sl);
        $sl = explode('/', $complink);

      }else{
        $prepend = explode('/', preg_replace('/^http(s)?:\/\//', '', $url));
        if(count($prepend)>1){
          array_pop($prepend);
          $prep = implode('/', $prepend);
        }else $prep = $prepend[0];
        $sl[0] = $prep.'/'.$sl[0];
        $complink = implode('/', $sl);

        $sl = explode('/', $complink);

      }
      if(!end($sl)) array_pop($sl);

      if(!in_array($complink, $globallinkarr)){
        $globallinkarr[] = $complink;
        $pagetitles[$complink] = $title;

        $depth = count($sl);
        $templinks = array();
        $newlinks = array();
        if($depth > 1){
           if(!$sl[$depth-1]) $sl[$depth-1] = 'index';
           $templinks[$sl[$depth-2]][] = $sl[$depth-1];

           if($depth > 2){
	     for($i=$depth-2; $i>0; $i--){
               $hold = $templinks;
               $templinks = array();
               $templinks[$sl[$i-1]] = $hold;

             }
          }

          $temp = $templinks[$sl[0]];
          $newlinks[$sl[0]] = $temp;

        }
        $depthlinks = array_merge_recursive2($newlinks,$depthlinks);
        $end = strtolower(end(explode(".", $complink)));

        if(!preg_match('/^http(s)?:\/\//', $complink))
          $complink = 'http://'.$complink;
          if(!in_array($end, $dontfollow) && !ereg("sitemap", $complink)){
            findAllLinks($complink);
          }
       }
    }
    return;
  }
}

The Array Merge Recursive Function I Used From php.net

function array_merge_recursive2($array1, $array2){
  $arrays = func_get_args();
  $narrays = count($arrays);

  // check arguments
  // comment out if more performance is necessary
  //   (in this case the foreach loop will trigger a warning if the argument is not an array)
  for ($i = 0; $i < $narrays; $i ++) {
   if (!is_array($arrays[$i])) {
   // also array_merge_recursive returns nothing in this case
     trigger_error('Argument #' . ($i+1) . ' is not an array - trying to merge array with scalar! Returning null!', E_USER_WARNING);
     return;
    }
  }

    // the first array is in the output set in every case
  $ret = $arrays[0];

  // merege $ret with the remaining arrays
  for ($i = 1; $i < $narrays; $i ++) {
    foreach ($arrays[$i] as $key => $value) {
     /***  KEEP THIS COMMENTED OUT TO KEEP THE ORIGINAL KEYS
     //if (((string) $key) === ((string) intval($key))) { // integer or string as integer key - append
     //   $ret[] = $value;
    // }
    // else { // string key - merge
      if (is_array($value) && isset($ret[$key])) {
        // if $ret[$key] is not an array you try to merge an scalar
        // value with an array - the result is not defined (incompatible arrays)
        // in this case the call will trigger an E_USER_WARNING and the $ret[$key] will be null.
        $ret[$key] = array_merge_recursive2($ret[$key], $value);
      }
      else {
        $ret[$key] = $value;
      }
           // }
    }
  }
  return $ret;
}

So after I created the arrays with all the links I create the sitemaps. The first one here is an xml sitemap used for the robots.txt file.
Its really simple and used the globallinkarr array.

XML Sitemap

function createXMLSiteMap($globallinkarr){
  $xml = '<?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';
   if(count($globallinkarr)){
   foreach($globallinkarr as $val){
     if(!preg_match('/http(s)?:\/\//', $val)){
       $val = 'http://'.$val;
     }
     $xml .= '
       <url>
          <loc>'.str_replace('&', '&amp',$val).'</loc>
       </url>';
    }
  }
  $xml .= '</urlset>';
  return $xml;
}

The html sitemap is created using a recursive function that goes through depthlinkarr. Also it checks if the link is valid if it isn’t in the globallinkarr since all those are already checked. When it does check it only needs the headers so to speed things up I set the curl option CURLOPT_NOBODY to true. The page was timing out on me a lot but with this option set and checking to see if it already existed in the globallinkarr helped stop it from timing out. But if you have a whole lot of links there is a good chance this will cause your page to timeout.

The HTML Sitemap

function createSiteMap($depthlinks, $before = ''){
  global $globallinkarr;
  global $pagetitles;
  $validcodes = array(200,301,302);
  if(count($depthlinks)){
  $sitetree = '<ul style="padding:5px; margin:5px;">';
  foreach($depthlinks as $key=>$val){
    if(is_array($val)){
      if($before) $newbefore = $before.'/';
      $newbefore .= $key;

      $newkey = preg_replace('/^http(s)?:\/\//', '', $newbefore);

    $title = ($pagetitles[$newkey] != "") ? $pagetitles[$newkey] : $newbefore;
      if(!preg_match('/^http(s)?:\/\//', $newbefore))
         $newbefore = 'http://'.$newbefore;
      $exist = 0;
      if(in_array($newbefore, $globallinkarr)){
        $exist = 1;
      }else{
        $test = new Crawler($newbefore, 1);
        $info = $test->get('info');
        if(in_array($info['http_code'], $validcodes))
          $exist = 1;
        }
        if($exist){
          $sitetree .= '
           <li><a style="display:block;" href="'.$newbefore.'"
           target="_blank" title="'.$title.'" />'.$title.'</a></li>';
        }else{
          $sitetree .= '
             <li>'.$title;
        }
        $temp = createSiteMap($val, $newbefore);
        if($temp){
          $sitetree .= $temp;
          $sitetree .= '</li>';
        }
      }else{
        if($before != '') $newval = $before.'/'.$val;
        else $newval = $val;
        $newkey = preg_replace('/^http(s)?:\/\//', '',$newval);
        $title = $pagetitles[$newkey] ? $pagetitles[$newkey] : $newval;
        if(!preg_match('/^http(s)?:\/\//', $newval))
           $newval = 'http://'.$newval;

        $exist = 0;
        if(in_array($newval, $globallinkarr)){
            $exist = 1;
        }else{
          $test = new Crawler($newval, 1);
          $info = $test->get('info');
          if(in_array($info['http_code'], $validcodes))
            $exist = 1;
        }

        if($exist){
          $sitetree .= '
            <li><a href="'.$newval.'" title="'.$title.'"
                target="_blank">'.$title.'</a></li>';
        }
      }
    }
  $sitetree .= '</ul>';
  return $sitetree;
  }
}

Execute Code After Browser Finishes Loading

How to execute a script after the browser stops loading

Recently I needed to execute a bit of code but the browser was timing out. I was actually using the php exec function to tar up some files. The browser kept showing a time out page. So in order to make the browser think it was done loading and then execute this bit of script, I put everything in the buffer with ob_start(). Then when I’m ready to stop the browser I get the size of the buffer. I then set the header content-length equal to this size and flush out the buffer. Now the browser will no longer show that its loading, since it has received all the content it is going to receive. So now you can execute any script.

Here is the code


<?php
ob_end_clean();
header("Connection: close");
ignore_user_abort();  // optional
ob_start();

echo "some content";

$size = ob_get_length();
 header("Content-Length: $size");
 ob_end_flush();
 flush();
//Browser done loading

//put your script here

?>

Round Robin Algorithm

Generating a Schedule Automatically

I need to create a way for every team to play each other in each round of games. I first started out creating a chart of who each team will play. After I created a chart that worked I built an algorithm to build that chart.

The Game Schedule

The Teams: Team 1 Team 2 Team 3 Team 4 Team 5 Team 6
Rounds: Round 1 Team 6 Team 5 Team 4 Team 3 Team 2 Team 1
Round 2 Team 4 Team 6 Team 5 Team 1 Team 3 Team 2
Round 3 Team 2 Team 1 Team 6 Team 5 Team 4 Team 3
Round 4 Team 5 Team 3 Team 2 Team 6 Team 1 Team 4
Round 5 Team 3 Team 4 Team 1 Team 2 Team 6 Team 5

Now you just need to figure out the pattern and we can code it up. I was told there is a name for this pattern but I didn’t know it, so if you know it let me know.

The Variables

$gamearr AND $tempgamearr are both an array of all the teams. The array would look like this: $gamearr[0] = “firstteam”, $gamearr[1] = “secondteam” and so on.

$numteamspadded is the number of teams. If the number was odd I add another team to the end of $gamearr and $tempgamearr as a “byteam”.

$gamenumperteam is however many number of games you want each team to play.

The Logic

As you can see at the beginning I set the variable $firstteam. You realize why I did this if you figured out the pattern above. The pattern above being that I have an array of teams playing the array of teams in reverse order. After each round I decrement the reverse order array and move the last one to the front. So we would have 654321 and then 165432 and then 216543. This would cause teams to play themselves though which is where the $firstteam comes in. Every time a team would play itself we switch it out with the firstteam.

I start out by looping through the number of games and then looping through just half of the teams since I’m setting both the home and away team in there. If I looped through all the teams I would have 1 vs 6 and then 6 vs 1 but I just want the unique games.

I then loop through all the games and set the $gamearr to the values that it would be had I been just decrementing the array and moving the last item to the beginning. Meaning the gamearr will look like this after each round 234561, 345612, 456123, etc. I use the $tempgamearr because I need to know the original order of the teams since we have to switch the first team out when there is a game playing itself.
After this I remove the first team from the array. I then loop through the game to find if one of the teams will be playing itself and get its value and position. This is one of the reasons why I remove the first team from the array. If I left it in there then 1 would play itself and overwrite the fact that another team was playing itself.
I then switch the team that is playing itself with the $firstteam.

After this you have an array of the games for each round.
I then create an array of all the games and check to see if the “byweek” is playing. If it is I don’t add it to the array cause we don’t want a byweek taking up a game slot.

The Round Robin Algorithm


$bottom = $numteamspadded-1;
$half = $numteamspadded/2;

$firsteam = $gamearr[0];
for($i=0; $i<$gamenumperteam; $i++){
    for($j=0; $j<$half; $j++){
         if($numteamspadded==4)
           $rrarr[$i][$j]['home'] = $tempgamearr[$j];
        else $rrarr[$i][$j]['home'] = $gamearr[$j];
        $rrarr[$i][$j]['away'] = $gamearr[$bottom-$j];
    }

    for($j=1; $j<=$bottom+1; $j++){
        $start = ($i+$j)%$numteamspadded;
        $gamearr[$j-1] = $tempgamearr[$start];
    }

    array_splice($gamearr, array_search($firsteam, $gamearr), 1);
    foreach($gamearr as $key=>$val){
        if($tempgamearr[$bottom-$key] == $val){
              $TempGameValue = $val;
              $switch = $key;
          }
    }

    $gamearr[$bottom] = $TempGameValue;
    $gamearr[$switch] = $firsteam;
}
foreach($rrarr as $key=>$val){
  foreach($val as $gkey=>$games){
    if($games['home'] != 'byweek' && $games['away'] != 'byweek'){
      $allgames[] = $games;
    }
  }
}

UPDATE: I came across another good implementation of the round robin here at http://phpbuilder.com/board/showthread.php?p=10935200

Sorting with Umlauts

Recently I needed to have a drop down box for countries. In english I just simply order by name but when you look at the german translation, ordering by name puts all the countries that have an umlaut character at the bottom. For example, I know the German translation for Austria is Österreich. So when I sort the translated names I now want this name to appear after the country named Oman. I know that the german umlaut html entity for ö is &ouml. I use that to get the character to sort the countries with.
Here is the code using PHP.

foreach($country as $val){
  $charset = htmlentities(utf8_decode($val['country_name']));
  if(substr($charset, 0,1)== '&'){
    $newcharset = str_replace(substr((html_entity_decode($charset)), 0,1),
                substr($charset, 1,1), html_entity_decode($charset));
    $origcountry[$val['country_id']] = $val['country_name'];
    $testarr[$val['country_id']]= $newcharset;
    $swap[$val['country_id']] = $newcharset;
  }else{
    $testarr[$val['country_id']] = $val['country_name'];
  }
}

asort($testarr);
foreach($testarr as $ckey=>$val){
  if($key= array_search($val, $swap)){
    $newcountryarr[$ckey] = $origcountry[$key];
  }else{
    $newcountryarr[$ckey] = $val;
  }
}

Scrolling Text using jQuery


Animating Text using jQuery Examples

Scroll Text Up
+++First Lorem ipsum dolor sit amet, consectetur adipiscing elit.
+++Second Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Scroll Text to Left mouseover to stop the animation
+++Lorem ipsum dolor sit amet, consectetur adipiscing elit.     

A client of mine recently wanted the option to have scrolling text for breaking news type items. For the most part I think that scrolling text isn’t the greatest thing to use but there are certain situations where its good. The real simple and fast way to do it is to simply use a <marquee> tag like this <marquee width=”100%;” behavior=”SCROLL” direction=”left” scrollamount=”10″> Lorem ipsum doler </marquee>

The problem with this is it doesn’t do a continuous loop. What I mean by this is that it doesn’t start over till the text has completely finished scrolling. This means you see a lot of white space. Also jQuery seems to be a smoother animation. I knew jQuery had the animate function, so I used that. I found some plugins and examples where they had images being repeated but said they couldn’t do text cause they didn’t know the width. I guess they didn’t know css. So lets start with the css/html part of it.

Scrolling Text to the Left

The first thing we need to do is to get the width of the text that is being scrolled. So we need to create a div wrapper and set the width to be really big. This way the text wont be wrapped because of parent elements. We also set the display to none and visibility to hidden. This is done so that we don’t ever see this text. The display none keeps the browser from rendering the div in the browser, otherwise with just the visibility set to hidden we would see a big white space. Then we add another element inside with the text.


<div id="textwrapper" style="width:5000px; display:none; visibility:hidden;">
  <span id="textwidth" style="disply:none;">
    +++Lorem ipsum dolor sit amet,  consectetur adipiscing elit.
    Sed magna  ligula, tempus feugiat pellentesque et,pulvinar
    eu tellus. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
  </span>
</div>
;

Now the html for the actual text being scrolled. We have a scrollwrapper element which contains the width of the the text being scrolled. We also need to set the position to relative so that the text being scrolled will be absolutely positioned relative to this element and not the page. The next element is used to make sure the text doesn’t wrap just like we did previously, we set the width to an arbitrarily large number. Then we have the scrollcontent element. This is the element that is going to be moving. This is positioned absolutely and we set the left position to the width of the scrollwrapper element to make sure it starts on the far right. Then we need one more wrapper inside.


<div id="scrollwrapper" style="overflow:hidden; border:1px solid #004F72;
      position:relative; width:500px; height:20px;">
  <div style="width:5000px;">
     <span id="scrollcontent" style="position:absolute; left:500px;">
       <span id="firstcontent" style="float:left; text-align:left;">
             +++Lorem ipsum dolor sit amet, consectetur adipiscing elit.
               Sed magna ligula, tempus feugiat pellentesque et, pulvinar
             eu tellus.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
       </span>
    </span>
  </div>
</div>

Here is the javascript but first let me explain how I am using the jquery animation function. The first object passed to it is what I’m animating. I am changing the property “left” to go to the negative of the scrollwidth. The next thing I pass to it is an object of some of the properties. The first property I am setting is “step”. This is a function that will get called everytime the animation runs. The next property is “duration” which is the amount of time it will take to finish the animation in milliseconds. The next is “easing” which is the equation used to do the animation. jQuery comes with 2 equations: linear and swing. Default is swing but we’ll set it to linear. The last property we’ll set is “complete” which is a function that’ll be called when the animation is done.


$("#textwrapper").css({"display":"block"});
var scrollwidth = $("#textwidth").width();
$("#textwrapper").remove();

var scrollwrapperwidth = $("#scrollwrapper").width();
if(scrollwidth < scrollwrapperwidth) scrollwidth = scrollwrapperwidth;
$("#scrollcontent").css({"width":scrollwidth});
$("#firstcontent").css({"width":scrollwidth});

var appending = scrollwrapperwidth-scrollwidth;
var noappend = 0;

function animatetext(rate){
  var dist = scrollwidth+parseFloat($("#scrollcontent").css("left"));
  if(!rate)rate = 1;
  var duration = Math.abs(dist/(rate/100));
  $("#scrollcontent").animate({"left":"-"+scrollwidth},{
    step: function() {
      if(parseFloat($("#scrollcontent").css("left"))< parseFloat(appending)
          && !noappend){
          noappend = 1;
          $("#scrollcontent").css({"width":scrollwidth+scrollwidth});
          $("#scrollcontent").append("<span style='float:left; text-align:left;
               width:"+scrollwidth+"px;'>"
               +$("#scrollcontent").children(":first").html()+"</span>");
      }
    },
    duration: duration,
    easing: "linear",
    complete:function(){
      $("#scrollcontent").children(":first").remove();
      $("#scrollcontent").css({"left":0});
      $("#scrollcontent").css({"width":scrollwidth});
      noappend=0;
      animatetext(6);
    }
  });
}
$("#scrollcontent").mouseover(function(e){
	$("#scrollcontent").stop();
});
$("#scrollcontent").mouseout(function(e){
	animatetext(6);
});

$(document).ready(function(){
	animatetext(6);
});

First in the javascript we will get the width of the text. We first need to set the css of the textwrapper to "block" so that it will render the child elements width. Then we will get the width of the inner element "textwidth" and then delete the textwrapper element.
If the javascript variable "scrollwidth" is less than the width that we want it to scroll we need to set it. Then we set 2 variables, appending and noappend. These variables are used in the "step" function. So when the position of the scrolling text is less than the position "appending" and we haven't already appended text (the noappend flag), then we'll set the width of the scrollcontent to double the width since we are appending the same text in there again. Then when the animation is completed we will remove the first element, change the left position to 0, change the width of the scrolling content back to the scrollwidth and then call this function again. Also if we want to stop the animation on mouseover as I have it, we simply call the .stop().

NOTE: since the .animate function takes a duration and not a speed, to keep the same speed we need to calculate the duration. That way if you are dynamically changing the text the speed wont change just because you have more or less text. To do this it is pretty straight forward. Since we are using linear as our easing property, the equation is simply d=rt, where d is the distance we need to travel and r is the rate and t is the time. So we get the distance by taking the scrollwidth and adding it to the current left position of the scrollcontent. Then we simple pass in a rate of 1- whatever and calcuate the time in milliseconds.

Scrolling Text Up

Scolling text up is even easier. The html looks like this. We need a wrapper with position relative and height equal to whatever you want. Then we need the child element which will be the element being moved. Then we need at least one element inside that to contain the text but can have more if you want. Also the line-height needs to be set to whatever the height is of the outerwrapper.


<div id="scrolltextup" style="border:1px ridge #004F72; overflow:hidden;
     position:relative; width:500px; height:20px;">
  <div id="textup" style="text-align:center; position:absolute; top:0;
      left:0; width:500px;">
    <div style="line-height:20px;">
      +++First Lorem ipsum dolor sit amet, consectetur adipiscing elit.
      +++Second Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    </div>
  </div>
</div>

Here is the javascript to do it. We get height of the entire thing. This time we are change the "top" property. Since the distance to animate will always be the same we don't need to calculate duration and can just pass in the duration.


var scrollheight = $("#textup").height();

function scrolltextup(dur){
  $("#textup").animate({"top":"-=20"},{
    duration: dur,
    easing: "linear",
    complete:function(){
      $("#textup").children(":last").after("<div style='line-height:20px;'>"+
          $("#textup").children(":first").html()+"</div>");
      if($("#textup").children(":first").height() <=
           (parseInt($("#textup").css("top"))*-1)){
        $("#textup").children(":first").remove();
        $("#textup").css({"top":0});
      }
      setTimeout("scrolltextup(3000)", 500);
    }
  });
}

And thats it!