Web Reflection: JSON

looking for a full-immersion/face-to-face HTML5, JavaScript, and Mobile Web Development training in the heart of London? Book your spot

Showing posts with label JSON. Show all posts

Sunday, August 19, 2012

Why JSON Won ... And Is Good As It Is

I keep seeing developers complaining about different things with JSON protocol and don't get me wrong, I've been the first one trying to implement any sort of alternative starting from JSOMON and many others ... OK?

Well, after so many years of client/server development is not that I've given up on thinking "something could be better or different", is just that I have learned on my skin all reasons JSON is damn good as it is, and here just a few of these reasons.

Reliable Serialization ?

No, 'cause YAGNI. There are few serialization processes I know that kinda work as expected and since ever, PHP serialize is a good example.
Recursion is not a problem, is part of the serialization process to solve it, as well as classes together with protected and private properties. You can save almost any object within its state, even if this object won't be, as reference, the same you serialized .. and I would say: of course!
There are also two handy methods, __sleep and __wakeup, able to let you save an object state in a meaningful way and retrieve it back or perform some action during deserialization.

Are these things available in JSON ? Thanks gosh NO! JSON should not take care of recursive objects ... or better, it's freaking OK if it's not compatible 'cause recursion is a developer matter or issue, not a protocol one!
All JSON can do is to provide a way to intercept serialization so that any object with a .toJSON() method can return it's own state and any time JSON.parse() is performed, it could bring back, if truly necessary, its recursive property.

So, at the end of the day, JSON implementations might provide already a similar way to __sleep and __wakeup objects but it should be the JSON string owner, the service, the developer, to take care of these problems, and simply because ....

Universal Compatibility

JSON is a protocol and as a protocol it should be as compatible as possible with all languages, not only those C like or others with similar comments ... there won't be comments ever in JSON, 'cause the moment you need comments, you don't need a transport protocol 'cause programming languages have always ignored developers comments ... and also, for compatibility reasons, not all programming languages would like to have // or /* */ or even # as inline or multiline comment ... why would they?

Specially in .NET world most of documentation is written in a pseudo XML, can you imagine you bothering yourself to write such redundant markup language to write something often ignored by developers ? Would you like to have that "crap" as part of the data you are sending or receiving via JSON as part of that protocol? I personally don't ... thanks! 'cause I believe a transport protocol should be as compact as possible and without problems.
Here JSON wins once again 'cause it's compatible, with its few universal rules, with basically everything.

Different Environments

This is the best goal ever reached from a protocol, the fact that every programming language can represent somehow what JSON transports.
Lists, Arrays, Dictionaries, Objects, Maps, Hashes, call them as you want, these are the most used and cross language entities we all deal with on daily bases, together with booleans, strings, and numbers.

OK, OK, specially numbers are quite generic but you might admit that the world is still OK with a generic Int32 or Float32 number and with 64bits compatible environments, these numbers could be of a different type but only if you will never deal with 32 bits environments ... make you choice ... you want a truly big number? Go for it, and loose the possibility to "talk" with any other 32 bit env ... not a big deal if you own your data, kinda pointless memory and CPU consumption if you deserialize everything as 64 bits ... but I am pretty sure you know what you are doing so ... JSON is good in that case too.

No Classes

And again thanks gosh! You don't want a protocol that deals with classes, trust me, 'cause you cannot write a class in all possible programming languages, can you? If you can, even in those programming languages where classes never existed 'cause classes are simply an abstract concept represented by the word "class" but representable in billion ways with other languages (e.g. via just objects in JavaScript).
Classes and namespaces issues, if you want, are there in any case.
The good part of JSON, once again, is the ability to intercept serialize and unserialize process so that if you like to send instances, rather than just objects, you can use all tools provided by the implementation, and I am showing in this case a JavaScript example;


function MyClass() {
  // doesn't matter what we do here
  // for post purpose, we do something
  this.initialized = true;
}
MyClass.prototype.toJSON = function () {
  this.__class__ = "window.MyClass";
  return this;
};

var myClassObject = JSON.stringify(new MyClass);
// "{"initialized":true,"__class__":"window.MyClass"}"

Once we send this serialized version of our instance to any other client, the .__class__ property could be ignored or simply used to understand what kind of object was it.

Still in JavaScript, we can deserialize easily the string in such way:


function myReviver(key, value) {
  if (!key) {
    var instance = myReviver.instance;
    delete instance.__class__;
    delete myReviver.instance;
    return instance;
  }
  if (key == "__class__") {
    myReviver.instance = myReviver.createInstance(
      this, this.__class__
    );
  }
  return value;
}

myReviver.createInstance = "__proto__" in {} ?
  function (obj, className) {
    obj.__proto__ = myReviver.getPrototype(className);
    return obj;
  } :
  function(Bridge) {
    return function (obj, className) {
      Bridge.prototype = myReviver.getPrototype(className);
      return new Bridge(obj);
    };
  }(function(obj){
    for (var key in obj) this[key] = obj[key];
  })
;

myReviver.getPrototype = function (global) {
  return function (className) {
    for (var
      Class = global,
      nmsp = className.split("."),
      i = 0; i < nmsp.length; i++
    ) {
      // simply throws errors if does not exists
      Class = Class[nmsp[i]];
    }
    return Class.prototype;
  };
}(this);

JSON.parse(myClassObject, myReviver) instanceof MyClass;
// true

Just imagine that __class__ could be any property name, prefixed as @class could be, or with your own namespace value @my.name.Space ... so no conflicts if more than a JSON user is performing same operations, isn't it?

Simulating __wakeup Call

Since last example is about __sleep, at least in JavaScript easily implemented through .toJSON() method, you might decide to implement a __wakeup mechanism and here what you could add in the proposed revival method:


function myReviver(key, value) {
  if (!key) {
    var instance = myReviver.instance;
    delete instance.__class__;
    delete myReviver.instance;
    // this is basically last call before the return
    // if __wakeup was set during serialization
    if (instance.__wakeup) {
      // we can remove the prototype shadowing
      delete instance.__wakeup;
      // and invoke it
      instance.__wakeup();
    }
    return instance;
  }
  if (key == "__class__") {
    myReviver.instance = myReviver.createInstance(
      this, this.__class__
    );
  }
  return value;
}

Confused ? Oh well, it's easier than it looks like ...


// JSON cannot bring functions
// a prototype can have methods, of course!
MyClass.prototype.__wakeup = function () {
  // do what you need to do here
  alert("Good Morning!");
};

// slightly modified toJSON method
MyClass.prototype.toJSON = function () {
  this.__class__ = "window.MyClass";
  // add __wakeup own property
  this.__wakeup = true;
  return this;
};

Once again, any other environment can understand what's traveling in therms of data, but we can recreate a proper instance whenever we want.

How To Serialize

This is a good question you should ask yourself. Do you want to obtain exactly the same object once unserialized? Is that important for the purpose of your application? Yes? Follow my examples ... no? Don't bother, the less you preprocess in both serializing and unserializing objects, the faster, easier, slimmer, will be the data.

If you use weird objects and you expect your own thing to happen ... just use tools you have to intercept before and after JSON serialization and put there everything you want, otherwise just try to deal with things that any other language could understand or you risk to think JSON is your own protocol that's missing this or that, while you are probably, and simply, overcomplicating whatever you are doing.

You Own Your Logic

Last chapter simply demonstrates that with a tiny effort we can achieve basically everything we want to ... and the cool part is that JSON, as it is, does not limit us to create more complex structures to pass once stringified or recreate once parsed and this is the beauty of this protocol so please, if you think there's something missing, think twice before proposing yet another JSON alternative: it works, everywhere, properly, and it's a protocol, not a JS protocol, not a X language protocol ... just, a bloody, protocol!

Thanks for your patience

Tuesday, February 14, 2012

JSON.stringify Recursion + Max Execution Stack Exceeded

I believe this is a common problem, and we had a similar one today while debugging.
JSON methods do not support recursion ... which is the only thing I am really missing back to PHP serialize days.

Recursion Is Bad

Well, I would say cyclic references are never that good but sometimes these may happen and, specially while testing and debugging, it's more than useful to understand what happened there.
If you have cyclic/cross references in your code I suggest you to use approaches which aim is to avoid these kind of direct links.
Harmony Collections, specially Map and WeakMap, are indeed good helpers to reference indirectly objects without creating, hopefully, first level links and/or recursions.

How To Serialize Anyway

JSON.stringify() accepts a second argument called replacer.
I won't explain more than MDN about its potentials, but it can be really handy to avoid recursions.
A simple way to do it is indeed to store in a stack already parsed objects, included the object itself.
Some other extra operation may be handy too so the debug will be as complete as possible.


var replacer = function (stack, undefined, r, i) {
  // a WebReflection hint to avoid recursion
  return function replacer(key, value) {
    // this happens only first iteration
    // key is empty, and value is the object
    if (key === "") {
      // put the value in the stack
      stack = [value];
      // and reset the r
      r = 0;
      return value;
    }
    switch(typeof value) {
      case "function":
        // not allowed in JSON protocol
        // let's return some info in any case
        return "".concat(
          "function ",
          value.name || "anonymous",
          "(",
            Array(value.length + 1).join(",arg").slice(1),
          "){}"
        );
      // is this a primitive value ?
      case "boolean":
      case "number":
      case "string":
        // primitives cannot have properties
        // so these are safe to parse
        return value;
      default:
        // only null does not need to be stored
        // for all objects check recursion first
        // hopefully 255 calls are enough ...
        if (!value || !replacer.filter(value) || 255 < ++r) return undefined;
        i = stack.indexOf(value);
        // all objects not already parsed
        if (i < 0) return stack.push(value) && value;
        // all others are duplicated or cyclic
        // mark them with index
        return "*R" + i;
    }
  };
}();

// reusable to filter some undesired object
// as example HTML node
replacer.filter = function (value) {
  // i.e. return !(value instanceof Node)
  // to ignore nodes
  return value;
};

A simple example about above function could be this one:


// how to test it
var o = {a:[], b:123, c:{}, e:function test(a,b){}};
o.d = o;
o.a.push(o);
o.c.o = o;
o.c.a = o.a;
o.c.c = o.c;
o.a.push(o.c);
alert(JSON.stringify(o, replacer));

Above alert will produce this kind of output:
{"a":["*R0",{"o":"*R0","a":"*R1","c":"*R2"}],"b":123,"c":"*R2","e":"function test(arg,arg){}","d":"*R0"}
which is surely not as bad as an exception, isn't it?

The Max Execution Stack Problem

Even using a stack variable, in order to avoid duplicated entries, the reason 255 < ++r is necessary is that the generic object may reference in one or more properties a DOM node.
Specially in big applications, the number of nodes, all unique, could be able to reach the function limit.
A tricky way to know this limit, which is browser and engine dependent, could be this one:


(function (Function, MAX_EXECUTION_STACK) {
  if (MAX_EXECUTION_STACK in Function) return;
  Function[MAX_EXECUTION_STACK] = function (i) {
    try {
      (function max(){
        ++i && max();
      }());
    } catch(o_O) {
      return i;
    }
  }(0);
}(Function, "MAX_EXECUTION_STACK"));

// browser dependent
alert(Function.MAX_EXECUTION_STACK);

Unfortunately in the replacer we cannot use this number in any case because we don't know how many other times the function itself will be called but a good compromise, able to generate objects almost impossible to debug, would be Function.MAX_EXECUTION_STACK / 100 so the limit will scale accordingly.
In all other situations where we still have recursion and max execution stack problems but we are those calling our own function, this limit could be more than handy, i.e.


var
  i = 0,
  fn = function (obj) {
    for (var key in obj) {
      if (++i < Function.MAX_EXECUTION_STACK) {
        parse(obj[key]);
        fn(obj[key]);
      }
    }
  }
;

... so now you know ...

Tuesday, December 06, 2011

On JSON Comments

Another active exchange with @getify about JSON comments and here my take because tweets are cool but sometimes is hard to tell everything you think in 140 bytes ...

Kyle Facts

JSON is used on daily basis for billion of things and configuration files are one, surely common, way to use JSON ( just think about npm packages for node.js ).
His frustration about the fact JSON does not allow comments is comprehensible, and he even created an online petition about allowing comments in JSON specs ... but are comments really what we need?

Just A Side Effect

JSON is extremely attractive as standard, first of all because it's available and widely adopted by basically any programming language in this world, even those that never had to deal with a single JavaScript interlocutor, secondly because it's both simple to parse, and easy to read for humans.
After all, what can be so wrong about comments inside such common serialization standard?
Aren't comments just as easy to parse as white spaces?
The problem is, in my opinion, that we are mixing up an easy process to serialize data, much easier compared to what PHP serialize and unserialize functions do, with the possibility to describe it.
I have always seen JSON as a protocol, rather than a YAML substitute, and as a protocol I expect to be as compact as possible and as cross platform as possible.
Specially about the latter point:

that's annoying to port to every single language

Precisely, what we should understand is that if JSON became so popular without needing comments, maybe the fact today we would like to use it as a "descriptive markup" does not reflect anymore the success, the adoption, and possibilities this standard brought to all these languages?

Improve The Standard

Thanks gosh software is not always stuck behind immutable standards or patents ... and neither is JSON.
If the need for comments is such big topic, define another standard able to combine the good old one with a new one.
If this new standard is truly what developers need, every JSON implementor will spend few hours to test and optimize the well defined standard in order to accept comments, isn't it?
I mean ... RFC 4627 was not meant to be the final solution, that was a Crockford proposal universally adopted.
To create a new standard able to extend RFC 4627 should not be a big problem ... or maybe ...

JSON Is Not A JavaScript Thing

The JSON serialization looks just like JavaScript ... but not JavaScript only.
Other programming languages use curly brackets and squared brackets to define lists and objects ( Python and others ) ... the fact JSON has been accepted so well is probably because the design of the format was indeed already widely adopted, it was not a JS thing, and never should be.
What's the deal here, is define a standard for JSON comments.
Let me better explain this point ...

I am a JS developer and I edit my JSON files via my JS editor ... fair enough ... I want to communicate my data to a server side service, let's say Python.
Python would like to be able to parse my data and produce a file compatible with ... Python, of course.
Does it mean that Python at that point should keep comments in a JavaScript standard? And why that, since the format used to exchange data was already somehow evaluable via Python and right now not anymore due some double slash in the file?


# this is PYTHOOOOOOOOOON
o = {"test": "data"}
o.get("test") # data

Will the renewed JSON work as well?


# still python
o = {"test": "data"} // and some WTF
o.get("test")

>> SyntaxError: invalid syntax

Well done ... making JSON with comments a JS thing python, as other languages, need extra effort to parse data.
What will happen once Python uses the json library?


import json

jsonh.loads(theReceivedData);

Should it print data with python compatible comments or with JS compatible comments so that data is not directly usable anymore for any python application, storing it as example in a file.py or a database ?
And once the renewed JSON has been transformed into Python valid syntax, wouldn't this file not be usable anymore from all others programming languages due possible syntax errors ?

Not That Easy

I like JSON as it is, except few broken implementations quite common even on browsers, because it's about creating a bloody piece of text many other languages can understand basically on the fly.
No need to remember which comment style has been saved with that version of JSON, no need to parse back and finally, no need to ask every single programming language that is using JSON as protocol to update their legacy ... that just worked, and it will aways work for what it was meant: transfer data, not transfer JS like code without functions and/or functions calls only ...
What we are doing, we as JavaScript developer, is to abuse JSON as if it's a piece of our JS code, polluting today with comments, and who knows what else tomorrow.
The right way to go, still in my opinion, would be, once again, to enrich, propose, create, a new standard that allows comments and why not, other features.
As example, what I always found annoying is that in PHP we can unserialize preserving the class, so that we can serialize objects states ... where is this in JSON?
Nowhere, indeed I have spent in my past few hours trying to enrich this protocol ... did it work? Was it interesting ? Probably not, except for my last attempt that is 100% based on current JSON and it's about optimizing bandwidth and performances ... that worked better, accordingly with JSONH github status, still I was not expecting every other that never had this problem to adopt that approach ... you know what I mean?

Still Valid JSON

If it's about writing comments, next snippet is perfectly valid JSON string. All we need to do is to use the replacer in a proper way:


{
  "@description": "something meaningful",
   "property": "value",

  "@description": "something else",
  "other": "other value"
}

// parse above text via JSON
console.log(JSON.parse(aboveText, function (key, value){
  if (key.charAt(0) != "@")
    return value;
}));

Here we are with our object, comments manually written on the original JSON, and every language able to parse them

Saturday, November 26, 2011

JSONH New schema Argument

The freaking fast and bandwidth saver JSONH Project has finally a new schema argument added at the end of every method in order to make nested Homogeneous Collections automatically "packable", here an example:


            var
                // nested objects b property
                // have same homogeneous collections
                // in properties c and d
                schema = ["b.c", "b.d"],
                
                // test case
                test = [
                    {   // homogeneous collections in c and d
                        b: {
                            c: [
                                {a: 1},
                                {a: 2}
                            ],
                            d: [
                                {a: 3},
                                {a: 4}
                            ]
                        }
                    }, {
                        a: 1,
                        // same homogeneous collections in c and d
                        b: {
                            c: [
                                {a: 5},
                                {a: 6}
                            ],
                            d: [
                                {a: 7},
                                {a: 8}
                            ]
                        }
                    }
                ]
            ;

The JSONH.pack(test, schema) output will be the equivalent of this string:


[{"b":{"c":[1,"a",1,2],"d":[1,"a",3,4]}},{"a":1,"b":{"c":[1,"a",5,6],"d":[1,"a",7,8]}}]

How Schema Works

It does not matter if the output is an object or a list of objects, as well as it does not matter if the output has nested properties.
As soon as there is an homogeneous collection somewhere deep in the nested chain and common for all other levels, the schema is able to reach that property and optimize it directly.
Objects inside objects do not need to be the same or homogeneous, these can simply have a unique property which is common for all items and this is enough to take advantage of the schema argument that could be one string, or an array of strings.

Experimental

Not because it does not work, I have added tests indeed, simply because I am not sure 100% this implementation covers all possible cases but I would rather keep it simple and let developers deal with more complex scenario via manual parsing through the JSONH.pack/unpack and without schema ... this is still possible as it has always been.
Let me know what you think about the schema, if accepted, I will implement it in Python and PHP too, thanks.

Wednesday, August 17, 2011

JSONH And Hybrid JS Objects

I have already described JSONH and now I also have the proof that it's as safe as native JSON is but on average 2X faster than native JSON operations with both small (10 objects), medium (100 objects), and massive (5000 objects and not a real world case, just a stress test to see how much JSONH scales) homogenous collections.
Wherever it's not faster it's just "as fast" but the best part is that it seems to be always faster on slower machines ( mobile ).
Moreover, the 5000 objects stress example shows that JSONH.stringify() produces a string with 54% of original JSON.stringify() size so here the summary: JSONH is faster on both compression and decompression plus it produces smaller output

yeah but ... what About Hybrid Objects

To start with, if you don't recognize/understand what is an homogenous collection and ask me: "what about nested objects?", all I can do is to point you out that Peter Michaux explained this years before me.
Have a look there and please come back after the "aaaaahh, got it: right!"

Hybrid Objects

Nowadays JSON is used everywhere but not everywhere with homogeneous collections. A simple example to screw up JSONH possibility is an object like this:


// result of a RESTful service, Ajax, query

// once again about generic articles: book!

var result = {

    category: "books",

    subcategory: "fantasy",

    description: [

        {

            title: "The Lord Of The Rings",

            description: "Learn about the darkness"

        }, {

            title: "The Holy Bible",

            description: "Learn about both light and darkness"

        },

        // all other results out of this list

    ]

};

If we receive an object with one or more properties containing an homogeneous collection, as is description in above example, we may already decide to use JSONH advantages.

JSONH On Hybrid Objects

It's that easy!


// before we send/store/write data on output

result.description = JSONH.pack(result.description);

print(JSON.stringify(result));

If the client is aware about the fact one or more specific property is an homogeneous collection, to obtain the original object we can do this:


// stringifiedResult as XHR responseText

var obj = JSON.parse(stringifiedResult);

obj.description = JSONH.unpack(obj.description);



// or simply via JSONP callback

data.description = JSONH.unpack(data.description);

For the same reason JSONH is faster than JSON, this operation will grant us less bandwidth to both send or receive objects, and faster conversion performances.

As Summary

I am willing to think soon about a possible schema able to describe homogeneous collections properties out of an object ... a sort of JSONH "mapper" to automate the procedure on both server side and client side and any suggestion will be more than welcome.
At least so far we know already how to adopt this solution :)

Tuesday, August 16, 2011

Last Version Of JSON Hpack

Update

created github repository with (currently) JavaScript, PHP5 and Python versions.

Update after quick chat on twitter with @devongovett who pointed out there is a similar standard called JSONDB I have created a JSONH(Flat) version. It looks slightly faster on mobile so I may opt for this one rather than Array of keys at index 0.
The whole array is flat and it changes from [{a:"A"},{a:"B"}] to [1,"a","A","B"] where the empty collection would be [0] rather than [[]].

Also more details here on how to JSONH Hybrid JS Objects.

A while ago I proposed a homogeneous collections optimizer nick named JSON.hpack.

What I wasn't expecting is that actually different projects and developers adopted this technique to shrink down JSON size.

Basic Advantage Of JSON.hpack

Gzip and deflate work really good with repeated chunks of strings and this is why homogeneous collections have really good compression ratio there.
However, gzip and deflate compression does not come for free!
If we compress everything on server side we can easily test the CPU overload compared with uncompressed data.
Via JSON.hpack we can still serve small or huge amount of dynamic and static data without necessarily use realtime compression.

Basic Notions On Compressors

There is no ideal algorithm yet for compressed data, any of them has pros and cons.
A really good compression ratio may cost a lot and the algorithm is efficient if at least decompression is fast. 7-Zip is one example, it takes more than normal zip to create a single file, but final ratio is usually much better and decompression extremely fast.
An incremental compressor as GIF is is both fast to encode and fast to decode. However, it's average compression ratio is really poor compared with PNG, which again is not fast as GIF to encode, but almost as fast as GIF to decode and capable of bringing much more complex data inside.
On the client side we may like a truly fast compressor in order to send data to the server where more horses power can decompress in a reasonable time. Still servers have not unlimited resources.

My Latest Improvements Over JSON.hpack

On the web is all about network layer latency, completely unpredictable specially in these smartphones/pads days.
We also need to consider high traffic, if things go really well, and most important mobile platforms computation power, basically the equivalent of a Pentium 3 with a GeForce card from 2001.

Which Is The Best Compromise

The original version of JSON.hpack is able to understand which compression level is the best one for the current collection of objects. Unfortunately this is slow both on the server side and even more on the client side.
In my opinion an intermediate layer as JSON.hpack is should bring advantages as fast as possible in both client and server.
I probably failed the first time because I was more focused on coolness rather than efficiency.
As example, if it takes 3X CPU load to save 5% of bytes compared with the most basic compression ratio, something is wrong because it's simply not worth it.
As summary, the best compromise for the latest version of this compressor is to be freaking fast with small overhead and providing a good average compression ratio.

Welcome JSONH

In a single call this object is able to pack and unpack homogeneous collections faster than native JSON and specially on mobile platforms.

How Is It Possible

To be honest I have no idea and I was surprised as well. All I could think about is the fact that JSONH makes data flat which means no recursions per each object in the original list.
This seems to boost up performances while packing and make JSON.parse life easier while unpacking.
The extreme simplification of the algorithm may have helped a lot as well.

JSONH Source Code

now on github!
~~And I had no time to create the equivalent C#, PHP, and Python version~~.
In any case you can see how simple is the logic and I bet anybody can easily reproduce those couple of loops in whatever programming language it is.
The minzipped size is 323 bytes but advantages over network calls can be massive. As example, if we check the console and the converted size in the test page, we can see the JSONH version of the same collection is 54% smaller ... and for a faster stringify and parse? ... it cannot be that good, isn't it :)

JSONH Is Suitable For

any RESTful API that returns homogenous collections

gzip on the fly costs too much due high traffic

map applications and routes, [{"latitude":1.23,"longitude":5.67},{"latitude":2.23,"longitude":6.67}]
will be [["latitude","longitude"],1.23,5.67,2.23,6.67]

any other case I am not thinking about right now

As Summary

It is good to take old projects created a while ago and think what could be done better in current days. It's both about re-thinking with different skills and experience over real world cases. I am not sure I made everybody happy with this latest version but I am pretty sure I won't ask client or server side to be slower than native JSON + native gzip compression since at that point all advantages will be simply lost.
This revisited version of JSONH is surprisingly faster, smaller, and easier to implement/maintain than precedent one so ... enjoy if you need it ;)

Wednesday, February 16, 2011

All You Need for JSONP

I have just uploaded a truly simple, still robust, function able to do generic JSONP without pretending too much magic.

The concept is simple, we pass the url, including the parameter name used to communicate the callback name, and a callback as second argument ... that's it, ~~202~~ 216 bytes minified and gzipped ( many thanks @jdalton for the catch )

Here an example:


<!-- the generic HTML page -->
<script src="JSONP.js"></script>
<script>
this.onload = function () {
 var many = 0;
 JSONP("test.php?callback", function (a, b, c) {
  this.document.body.innerHTML += [
   a, b, ++many, c
  ].join(" ") + "<br />";
 });
 JSONP("test.php?callback", function (a, b, c) {
  this.document.body.innerHTML += [
   a, b, ++many, c
  ].join(" ") + "<br />";
 });
};
</script>

And here the testable demo result that should work with every bloody damned browser.

What's on the server? Nothing more than this:


<?php
if (isset($_GET['callback'])) {
 header('Content-Type: application/javascript');
 exit($_GET['callback'].'.call(window, "Hello", "JSONP", "!!!")');
}
?>

Enjoy ;)

Sunday, April 25, 2010

JSON sleep, wakeup, serialize and unserialize

The JSON protocol is a de facto standard used in many different environments to transport objects, included arrays and Dates plus primitives such: strings, numbers, and booleans ... so far, so good!
Since this protocol is widely adopted but it has not the power that a well known function as is the PHP serialize one has, we are often forced to remember how data has been stored, what does this data represent, and eventually convert data back if we are dealing with instances rather than native hashes.
If we consider the ExtJS library architecture, we can basically have a snapshot of whatever component simply storing its configuration object. The type will eventually tell us how to retrieve a copy of the original component back and without effort at all.
Since JSON is the preferred protocol to store data in WebStorages, databases, files, etc etc, and since we can save basically only hashes, loosing every other property, I have decided to try this experiment able to bring some magic in the protocol itself.

JSONSerialize

The absolutely alpha version of this experiment has been stored here, in my little repository. If more than one developer, e.g. me, is interested, I may consider to put it in google code or github with a better documentation while what I can do right now, is to show you what JSONSerialize can do for us, step after step.

Normal JSON.stringify Behavior

This is what happens if we use JSON.stringify, or JSON.serialize, when an object is simply .... well, an object.


// let's test in console or web
if (typeof alert === "undefined") alert = print;

// our "class"
function A() {}

// our instance
var a = new A;

// a couple of dynamic properties
a.a = 123;
a.b = 456;

// our JSON call
var json = JSON.serialize(a);
alert(json); // {"a":123,"b":456}

Nothing new, serialize acts exactly as JSON.stringify, with or without extra arguments ... but things become a bit more interesting now ...

The _sleep Method

In the PHP world we all know there are several methods able to bring some magic in our classes and __sleep is one of these methods. Since the double underscore is usually considered a bad practice, due to private core magic functionalities (e.g. __noSuchMethod__, __defineGetter/Setter__, others) I have decided to call it simply _sleep, considering it a sort of magic protected method, whatever it means in JavaScript :-)


A.prototype._sleep = function () {
    return ["a"];
};

json = JSON.serialize(a);
alert(json); // {"a":123}

The aim of _sleep is to return an array with zero, one, or more public properties we would like to export. As we can see, JSON.serialize will take care of this operation returning only what is present in the list.
In few words, no needs to send properties we don't need, cool?
Another advantage of a sleep method is to notify, close, disconnect, or do whatever other operation we need to mark the instance as serialized. In other words sleep could be useful every time we need to deal with a variable that may be updated elsewhere using JSON as transport protocol (postMessage, others).

The serializer Property

_sleep is already a good start point but, if we don't know anything else about that hash, how can we understand which kind of instance was the a variable?


alert(JSON.unserialize(json).constructor);
// Object

Too bad, we have been able to define what we would like to export, but no way to understand what we actually exported.
This is where the serializer property becomes handy, letting JSON.serialze behaves differently.


// define the serializer
A.prototype.serializer = "A";

// re-assign the string and check it out
json = JSON.serialize(a);

alert(json); // {"a":123,"_wakeup":"A"}

// what's new? just this:

alert(JSON.unserialize(json).constructor);
// the function A() {}  ... oooh yeah!!!

In few words, with or without a _sleep method, we can bring back to their initial status serialized objects ... but why that _wakeup property? Thank's for asking!

The _wakeup Method

As is for PHP, there is a __wakeup method too which is invoked as soon as the string is unserialized! Being this method somehow protected, I thought it was the best one to put into the JSONSerialize logic.


// let's define a _wakeup method
A.prototype._wakeup = function () {
    alert(this instanceof A);
    // will be true !!!
};

// let's try again
a = JSON.unserialize(json);

// ... oooh yeah!

As the PHP page shows in some example, the _wakeup function can become really useful when we are saving a database connection or a WebStorage wrapper or whatever cannot be persistent and requires to be initialized so .... at least we can save some info rather than ask them every time, isn't it?
The moment _wakeup will be invoked the instance will already have every exported property assigned, exactly as is for PHP ... wanna something more?

serialize And unserialize Methods

The PHP Serializable interface brings some other magic via the SPL: we decide what we want to export and we receive it back when unserialize is invoked. Same is for JSONSerialize, with higher priority over _sleep and _wakeup but not better performances (right now, I may consider to avoid some extra operation to follow current PHP status tho ...)


// introduce the serialize method
A.prototype.serialize = function () {
    return JSON.stringify({c:789});
};

// try this at home
json = JSON.serialize(a);
alert(json); // {"c":789,"_wakeup":"A"}

// this will call the _wakeup since
// unserialize has not been defined yet
// please note the a property won't be there anymore
// cause serialize returned a c instead
a = JSON.unserialize(json);

// let's have priority via unserialize
A.prototype.unserialize = function (data) {
    this.b = JSON.parse(data).c;
};

// let's try again, _wakeup won't be invoked
a = JSON.unserialize(json);

// a won't be there, but b will
alert([a.a, a.b]); // undefined,789

And that's all folks!

JSONSerialize Pros

The serializer property accepts namespaces and does not use evaluation. The whole little script does not use evaluation at all and I think this is good, specially for security reason.
Nested objects and arrays are supported as well which means that we can serialize complex hierarchies and have them back without effort and already initialized.

JSONSerialize Cons

Nested means loops, as is for the JavaScript JSON implementation performances are surely slower than a native implementation. At the same time we should consider when we need it, cause our own implementation to bring instances back could cost more. Finally, if we all like this, we may push some browser vendor for a core implementation, no?
Another performance problem is with serialize and unserialize since these requires double parsing in order to respect the behavior.

As Summary

It's not the first time I am trying to enhance the JSON protocol to fit my requirements and this is probably the less obtrusive and secure way I could came out with. I hope somebody will appreciate at least the idea and I am up for all your thoughts ;-)

Wednesday, July 15, 2009

The Fastest Date toJSON and fromJSON?

I know yesterday post supposed to be the last 'till the the end of the month but I could not resits. Dustin Diaz twitted

anyone trying to parse our wonky rails dates in JS. do this: var date = Date.parse(str.replace(/( \+)/, ' UTC$1')); seems to fix it.

Again, I could not resist to create a toUTCString based function to create JSON strings and to parse them back. This is the result and apparently is both the fastest implementation I know and cross browser (Chrome, Firefox, Inernet Explorer, Safari).
This is a quick post so if you find some problem please let me know, thanks.


(function(){
// WebReflection Fast Date.prototype.toJSON and Date.fromJSON Suggestion
var rd  = /^.{5}(\d{1,2}).(.{3}).(.{4}).(.{2}).(.{2}).(.{2}).+$/,
    rs  = /^(.{4}).(.{2}).(.{2}).(.{2}).(.{2}).(.{2}).+$/,
    d   = new Date,
    f   = /(GMT|UTC)$/.exec(d.toUTCString())[1], // cheers gazheyes
    m   = {},
    i   = 0,
    s   = ""
;
d.setUTCDate(1);
while(i < 12){
    d.setUTCMonth(i);
    m[s = /^.{5}(\d{1,2}).(.{3})/.exec(d.toUTCString())[2]] = ++i < 10 ? "0" + i : i;
    m[m[s]] = s;
};
Date.prototype.toJSON = function(){
    var e = rd.exec(this.toUTCString());
    return e[3].concat("-", m[e[2]], "-", e[1], "T", e[4], ":", e[5], ":", e[6], "Z");
};
try{Date.fromJSON(d.toJSON());
Date.fromJSON = function(s){
    var e = rs.exec(s);
    return new Date(Date.parse(e[2].concat(" ", e[3], " ", e[1], " ", e[4], ":", e[5], ":", e[6], " ", f)));
};
}catch(e){
Date.fromJSON = function(s){
    var e = rs.exec(s);
    return new Date(Date.parse(e[3].concat(" ", m[e[2]], " ", e[1], " ", e[4], ":", e[5], ":", e[6], " ", f)));
};
}
})();

Basically everything is a shortcut and I am using only native methods ... is there anything faster except probably a substr based one?

Wednesday, May 20, 2009

re-introduction to JSON.hpack

Let me try again, this time with an official project page and a nearly complete wiki page.

JSON.hpack, the Fat-Free Alternative to JSON

JSON.hpack is a lossless, cross language, performances focused, data set compressor. It is able to reduce up to 70% number of characters used to represent a generic homogeneous collection.

To understand how to use JSON via JavaScript, PHP, or C# (Python coming soon and hopefully other languages as well), please have a look into the wiki page or if you want, try out the demo page!

I am waiting for comments :-)

Friday, May 15, 2009

Ajax better than Flash AMF? Optimized homogeneous collections

I found this dojo related post extremely interesting and I instantly though in our application we have similar JSON structure which is effectively redundant and, for thousands of database rows, it slows down visualization responsiveness (not that much in intranet, enough via internet).

JSON.hpack

I am developing on spare time an "all web languages" homogeneous collection packer in order to speed up client/server interaction specially for database results where the column name could be considered as a key and each field in the same column as a row.

So far I created a JavaScript version which seems to work perfectly and results are quite impressive. For example, the total size of the 5000 items used in that dojo article switched from 37.23Kb to 26.27Kb, gzipped size, while number of characters to send / retrieve are 776133 against 99575.

As example, the host where I put the initial test does not allow more than 50Kb as uploaded data, this means that without JSON.hpack is not even possible to consider to move 5000 of rows.

The reason it could be faster is that JSON.stringify, the good old Douglas Crockford version, is not that fast as we think ( not comparable with native IE8 JSON.stringify ) and if the given object is smaller, obviously data to parse is less as well.

Here you can test the first JSON.hpack test page.

The project is not hosted yet so to know more about the simple API please read the incomplete (code speaking stable though) source code.

I wonder how many people would be interested in their favorite server side language implementation, I can create the PHP, Python, and C#, but I am looking for Java, Ruby, and if possible other languages as well. The logic truly simple so it is just a matter of time ... anyone fancy to contribute?

Sunday, June 08, 2008

JsonTV - JsonML concept to create TreeView components

For those that do not know JsonML, this is a little summary.
JsonML aim is to use JSON to transport layouts, instead of simple data.
Unfortunately, for this purpose the size of the code could be even bigger than regular layout, considering that with a good usage of classes, internal styles, as attributes, are often superfluous.
At the same time, there are still several problems with DOM, for its not standard nature, thanks to those browsers that do not respect W3 or generic predefined guidelines.
For these reasons, and probably others, JsonML has not been widely adopted, as is for his daddy JSON.

An alternative purpose, based on both same concept and grammar

A daily common task in web development, is menu, files and directories, or trees creation, that allow us, as users, to organize data in a better, friendly, and easy to manage, way.
Removing some intermediate step, and thinking about standard data presentation, with useful information and nothing else, it is possible to use the same JsonML grammar to create automatically a DOM based tree.
The schema could be transformed in this way:


element
    = '[' name ',' attributes ',' element-list ']'
    | '[' name ',' attributes ']'
    | '[' name ',' element-list ']'
    | '[' name ']'
    | json-string
    ;
name
    = json-string
    ;
attributes
    = '{' attribute-list '}'
    | '{' '}'
    ;
attribute-list
    = attribute ',' attribute-list
    | attribute
    ;
attribute
    = attribute-name ':' attribute-value
    ;
attribute-name
    = json-string
    ;
attribute-value
    = json-string
    ;
element-list
    = element ',' element-list
    | element
    ;

As you can observe, the only difference is with name, instead of tag-name.
The element-list, if present, will be the nested list of informations inside a branch, while name will be the real name of that branch, or leaf, if no element-list is present inside.

The attribute-list will be a dedicated object that will contain, if present, every information we need for that leaf, or branch.
At this point, with a simple piece of code like this one, is possible to create a menu:


["root",
    ["demo.txt", {size:12345, date:"2008/06/07"}],
    ["temp",
        ["file1.tmp"],
        ["file2.tmp"],
        ["file3.tmp"]
    ],
    ["sound.wav", {size:567, date:"2008/05/07"}]
]

Above code will create automatically a list like this one:

root
- demo.txt
- temp
  - file1.tmp
  - file2.tmp
  - file3.tmp
- sound.wav

Every leaf, or branch, will contain at the same time passed information, as object.
In this way, as I wrote before, we are not sending an xhtml or xml structure, but simply a schema of our menu, folder navigator, or whatever we need.

The JsonTV object

This is my JsonTV object implementation. It is truly simple to use and, if you want, raw, but it is only the core which aim is to transport, transform, create, or parse from DOM, every kind of schema that will respect showed grammar.
For example, this is a piece of code that will automatically create above tree, and will hide, or show, internal leafs, if presents, and starting from the root.


onload = function(){
    var directory = JsonTV.parseArray(
        ["root",
            ["demo.txt", {size:12345, date:"2008/06/07"}],
            ["temp", ["file1.tmp"], ["file2.tmp"], ["file3.tmp"]],
            ["sound.wav", {size:456, date:"2008/05/07"}]
        ]
    );
    document.body.appendChild(
        directory
    ).onclick = function(e){
        var target = (e ? e.target : event.srcElement).parentNode,
            ul = target.getElementsByTagName("ul")[0];
        if(/li/i.test(target.nodeName)){
            if(ul)
                ul.style.display = (target.clicked = !target.clicked) ? "none" : "";
            else {
                var __JSON__ = [];
                for(var key in target.__JSON__)
                    __JSON__.push(key + " = " + target.__JSON__[key]);
                if(__JSON__.length)
                    alert(__JSON__.join("\n"));
            };
        };
    };
    directory.className = "directory";
    
    // this is only an assertion like check
    setTimeout(function(){
        var clone = document.body.appendChild(JsonTV.parseArray(JsonTV.parseDOM(directory)));
        if(directory.outerHTML === clone.outerHTML)
            document.body.removeChild(clone);
    }, 1000);
};

Simplified API

As is for JSON object, JsonTV contains two main public methods: parse, and stringify.
The first one, parse, will convert a JSON string that contains a valid schema Array, a schema Array, into a DOM based Tree.
There is an exception, if you pass a DOM based Tree to parse function, it will convert them into an Array that will respect JsonTV schema.
On the other hand, the stringify method will convert a string, a DOM based Tree, or an Array, into JsonTV string, using, if present, the official JSON.stringify method, or not standard Gecko toSource one, if there is no JSON parser.

Conclusion

This is only my first step inside JsonTV technique, but I am sure that with a good usage of CSS, and few lines of JavaScript, using or not third parts libraries, a common intermediate protocol to manage, send, save, and show tree views, could let us develop truly interesting tree based applcations, making portings from different languages more simple than ever, as it has been for JSON de-facto standard.
Stay tuned for next examples ;)

Tuesday, April 03, 2007

Are 130 byte enought to solve JavaScript JSON Hijacking problems? (Unlikely not this time)

This is the second time I open this post because I didn't do a good debug but my solution, a sort of personal brainstorming, was good enough to open one more time this post after few changes :)

This is my proposal:

(function(m){function $(c,t){t=c[m];delete c[m];try{eval(""+c)}catch(e){c[m]=t;return 1}};return $(Array)&&$(Object)})("toString")

exactly 130 byte to solve (I suppose) Array and Object modified constructor problems.


(function(m){function $(c,t){t=c[m];delete c[m];try{new(function(){}).constructor("",""+c)}catch(e){c[m]=t;return 1}};return $(Array)&&$(Object)})("toString")

exactly 158 byte to solve (I suppose) Array and Object modified constructor problems.

Let me explain this function concept.

Step 1 - There's no way to know if a constructor is native
This is the first problem, we can't believe on a generic object constructor for, at least, these two reason:

a constructor, as I wrote on MDC too, is not read only

there isn't any official method to know if a part of code is native or not, and if it is implemented, it should be changed by malicious arbitrary code

Since a constructor should be redefined in some browser (first of all, FireFox)

function Array(){ /*doStuff*/ };

is really hard to know if this new constructor is native code or not.
You could create a new original, empty, scope (using for example an iframe), then You can check if a variable constructor, in this new scope, is equal to original one, for example, a generic Array in a generic window.
The only way to know if a constructor was inherited from JavaScript core is its string rappresentation, usually something like:


function Array() {
    [native code]
}

This is a common native code string rappresentation, but as You know JavaScript is object oriented and every object has a native toString method that should be override/overwite as every other public method.


function Array(){};
Array.toString = function(){
 return "function Array(){\n\t[native code]\n}"
};

alert(Array);

This means that we can't believe on string rappresentation too ... and that's why I thought about next step.

Step 2 - native code cannot be evaluated
The uniq way to know if a method or a constructor is inherited from JavaScript core is the "[native code]" string inside its string value.
At the same time You can't evaluate a native code because it isn't a replicable JavaScript code, so a try...catch like this one should be a trick to know if code is native or not because in this last case it will not make code executable:


function isNativeCode(methodOrConstructor){
 var result = false;
 try{eval(""+methodOrConstructor)}
 catch(e){result = true};
 return result;
};

alert([
 isNativeCode(Array),  // true
 isNativeCode(String.replace), // true
 isNativeCode((1).constructor), // true
 isNativeCode(function(){}) // false
]);

As I said, overwriting toString method should produce a sort of native code simply and quickly:


function Mine(){};
Mine.toString = function(){
 return "[native code]"
};

alert(isNativeCode(Mine)); // true

This is the reason I thought about next final step.

Step 3 - enumerable methods should be deleted!
At this point, the biggest problem is the ability to reply native code string, using for example a toString method.
Is it possible to solve? Likely sure, using a delete keyword!


function Array(){};
Array.toString = function(){
 return "I'm an Array";
};

alert(Array); // I'm an Array

delete Array.toString;

alert(Array); // function Array(){\n}

Amazing, I can remove arbitrary toString method to know original value ... but does delete work without problems with native constructor too? Of course, it's on core ;)


alert(Array.toString());
// function Array(){\n\t[native code]\n}

delete Array.toString;

alert(Array.toString());
// function Array(){\n\t[native code]\n}

Well done ... but this should be quite obtrusive because some library (or some developer) should redefine toString method for its scope ... that's why I decieded to assign one more time toString method, after deleting them, obviously ... and this is my first proposal, with details:


// anonymous private scope function, accept one argument
(function(m){

 // private useful dollar function
 // accepts 2 arguments, one for constructor
 // and one for lazy programming
 function $(c,t){
 
  // second argument is used to save old toString method
  t=c[m];
  
  // delete removes constructor toString
  delete c[m];
  
  // check if constructor contains native code
  // (not usable, not evaluable)
  try{eval(""+c)}
  
  // if eval fails, constructor contains
  // [native code]
  catch(e){
   
   // so I can add safely old toString method
   // (is not a problem if it is native or not)
   c[m]=t;
   
   // and return a "true" value
   return 1
  }
 };
 
 // at this point I just need to call
 // my dollar function to know
 // if Array and Object are native constructors
 return $(Array)&&$(Object)
})
// for size reasons, I send toString method name
// using "m" var multiple times inside function
("toString")

How to use this runtime private scope function?
This is just an example that should be used before every JSONString to value convertion:


if((function(m){function $(c,t){t=c[m];delete c[m];try{eval(""+c)}catch(e){c[m]=t;return 1}};return $(Array)&&$(Object)})("toString"))
 alert("I can decode a JSON string");
else
 alert("JSON decoding is corrupted");

And that's all folks, You can view a basic example in this page and a failed hack example in this one.

Finally, this inline function should work with every Ajax ready browser, successfully tested on IE 5, 5.5, 6, 7, FireFox 1, 1.5, 2 and Opera 8, 9.

I'm waiting for Safari 1 or 2 behaviour, as konqueror or other browsers (but I'm quite sure, this time this proposal should work correctly).

If You find a way to crack this solution, please tell me (us) :D

Update - 04/04/2007
This is a revisited version dedicated for XMLHttpRequest native object

if((function(c,m,t){t=c[m];delete c[m];if(/^\[XMLHttpRequest\]$/.test(c)){c[m]=t;return 1}})(XMLHttpRequest,"toString"))
alert("Valid XMLHttpRequest");
else
alert("XMLHttpRequest is corrupted");

Same concept, same logic ... and should be as secure as first proposal for Array and Object constructors.

Try them:

function XMLHttpRequest(){};
XMLHttpRequest.toString = function(){
return "[XMLHttpRequest]"
};
XMLHttpRequest.constructor = Object;

if((function(c,m,t){t=c[m];delete c[m];if(/^\[XMLHttpRequest\]$/.test(c)){c[m]=t;return 1}})(XMLHttpRequest,"toString"))
alert("Valid XMLHttpRequest");
else
alert("XMLHttpRequest is corrupted");

Update - 04/04/2007
Do You worry about eval?

if(eval === (new function(){}).eval)
alert("OK eval");
else
alert("eval is corrupted");

And now we just need to pray that deprecated object.eval will survive for some year :D

Instant Update - 04/04/2007
damn ... Object.prototype.eval should redefine generic object eval method ... uhm, please let me think about a solution and sorry for precedent update.

Update - 05/04/2007
kentaromiura gives me a nice idea ... He wrote them next post.
Function can eval() code in a safe way.
You can redefine Function itself but in this case native behaviour will be lost so I suppose this should be the better way to be sure that code evaluation and Array/Object constructors are not modified.

var result, code = '[1,2,3]';
if((function(m){function $(c,t){t=c[m];delete c[m];try{new Function("",c)}catch(e){c[m]=t;return 1}};return $(Array)&&$(Object)})("toString"))
result = new Function("","return "+code)();

Is this the securest way, for JavaScript, to know if code has been cracked?

Instant Update
no man ...

function Function(a,b){
 // Array = function(){} ... 
 eval("function f(){"+b+"}");
 return f;
};

alert(new Function("","return [1,2,3]")());

:(

Important Update - 05/04/2007
I found a way to solve eval problems, look here to know more.
This is the last version, it should be the best way to know if Array and / or Object were cracked.


/*
function Array(){};
Array.toString = function(){
 return "function Array(){\n\t[native code]\n}"
};
//*/

if(
 (function(m){function $(c,t){t=c[m];delete c[m];try{new(function(){}).constructor("",""+c)}catch(e){c[m]=t;return 1}};return $(Array)&&$(Object)})("toString")
)
 alert("Array and Object constructors are OK");
else
 alert("Corrupted Array and Object constructor");

With this concept I should be shure that XMLHttpRequest object is native and not re-defined.

Here You can read my last proposal for XMLHttpRequest check:


if((function(x,c,m,t){t=x[m];delete x[m];if(new(function(){})[c]("c","return ("+x+")[c]!==(function(){})[c]")(c)){x[m]=t;return 1}})(XMLHttpRequest,"constructor","toString"))
 alert("XMLHttpRequest is OK");
else
 alert("XMLHttpRequest is Corrupted");

It seems to be safe as function(){} code evaluation is.
If XMLHttpRequest is not native, it's a function and removing, using delete, its toString method, it should be a constructor and a constructor has a function(){} constructor, isn't right?