Json.parse() On A Large Array Of Objects Is Using Way More Memory Than It Should

May 26, 2023 Post a Comment

I generate a ~200'000-element array of objects (using object literal notation inside map rather than new Constructor()), and I'm saving a JSON.stringify'd version of it to disk, wh

Solution 1:

A few points to note:

You've found that, for whatever reason, it's much more efficient to do individual JSON.parse() calls on each element of your array instead of one big JSON.parse().
The data format you're generating is under your control. Unless I misunderstood, the data file as a whole does not have to be valid JSON, as long as you can parse it.
It sounds like the only issue with your second, more efficient method is the fragility of splitting the original generated JSON.

This suggests a simple solution: Instead of generating one giant JSON array, generate an individual JSON string for each element of your array - with no newlines in the JSON string, i.e. just use JSON.stringify(item) with no space argument. Then join those JSON strings with newline (or any character that you know will never appear in your data) and write that data file.

When you read this data, split the incoming data on the newline, then do the JSON.parse() on each of those lines individually. In other words, this step is just like your second solution, but with a straightforward string split instead of having to fiddle with the character counts and curly braces.

Your code might look something like this (really just a simplified version of what you posted):

var fs = require('fs');
var arr2 = fs.readFileSync(
    'JMdict-all.json',
    { encoding: 'utf8' }
).trim().split('\n').map( function( line ) {
    return JSON.parse( line );
});

As you noted in an edit, you could simplify this code to:

var fs = require('fs');
var arr2 = fs.readFileSync(
    'JMdict-all.json',
    { encoding: 'utf8' }
).trim().split('\n').map( JSON.parse );

But I would be careful about this. It does work in this particular case, but there is a potential danger in the more general case.

The JSON.parse function takes two arguments: the JSON text and an optional "reviver" function.

The [].map() function passes three arguments to the function it calls: the item value, array index, and the entire array.

So if you pass JSON.parse directly, it is being called with JSON text as the first argument (as expected), but it is also being passed a number for the "reviver" function. JSON.parse() ignores that second argument because it is not a function reference, so you're OK here. But you can probably imagine other cases where you could get into trouble - so it's always a good idea to triple-check this when you pass an arbitrary function that you didn't write into [].map().

Solution 2:

I think a comment hinted at the answer to this question, but I'll expand on it a little. The 1 GB of memory being used presumably includes a large number of allocations of data that is actually 'dead' (in that it has become unreachable and is therefore not really being used by the program any more) but has not yet been collected by the Garbage Collector.

Almost any algorithm processing a large data set is likely to produce a very large amount of detritus in this manner, when the programming language/technology used is a typical modern one (e.g. Java/JVM, c#/.NET, JavaScript). Eventually the GC removes it.

It is interesting to note that techniques can be used to dramatically reduce the amount of ephemeral memory allocation that certain algorithms incur (by having pointers into the middles of strings), but I think these techniques are hard or impossible to employ in JavaScript.

JavaScript Sample

Json.parse() On A Large Array Of Objects Is Using Way More Memory Than It Should

Solution 1:

Solution 2:

Post a Comment for "Json.parse() On A Large Array Of Objects Is Using Way More Memory Than It Should"