10 August 2020

How did I created a JSON file data filterer

Welcome fellow learners, in this blog post, I'll be writing about how I created a JSON file data filterer and how you code one yourself. The program lets you filter out the unwanted value from a JSON file and generate a new file for it. This program could work mostly thanks to the recursive functions I used. You can access the source code through this Github link. This blog post will be quite long as this blog post is the documentation for it.


Well, lets first talk about why I coded this program. So I was working on improving the user experience of my weather app (weather-witness) by allowing the user to have a drop-down search whenever they type in a new character into the search field. To do that I need a list of cities names, and as OpenWeatherMap has a JSON file on the list of cities name. I took the file, and when I opened it up, the list was great. Every data was listed correctly. But there's some data that I don't need, and it'll be annoying to delete them one-by-one. So I decided to make a simple script to automate the data removing using Nodejs.


The weather app


Why Nodejs and why not python, you ask? Well, my answer is, why not? I've been wanting to create something using Nodejs, and I thought this might be a good chance for it.


The index for this post:



How does the program work?


So, now lets talk about how does the flow of the program look like. When we execute the script in our terminal, the program will first ask the user to provide it with the location of the JSON file. Then, the program will check whether the data is in a valid JSON format. If the file contains some malformed JSON data, the program will produce an error which tells the user the isn't a valid JSON file. The program will continue with the next operation if it's a valid JSON file.


Then, after getting and checking the JSON file, the program will store it inside a variable and find all the available key for the JSON data. The program will list out the key's for every available data level in the file. The key are stored in an array as strings. Keys that're a descendent of another key would have "." in between its key name and parent's key name. This also applies if they're both child key of another parent key, the name for each level will be separated with a "." character. For  example, "ancestor_name.parent_name.child_name". I'll explain to you its technical part when we arrive there, for now, let's understand the flow of the program first.


After the program has found all the key available in the JSON file, the program will then ask the user which of the keys that he/she wants to remove from the data. After the user had chosen one or more keys that he/she (separated by ",") intends to remove from the file, the program would ask for confirmation for the operation first. After the approval is done, the program will then loop through the keys given by the user and checks whether the keys are valid or not.


Example of the script running


Keys selection confirmation


After that, the program would filter out the keys from the variable that stored the JSON data just now. Then, after filtering the data, the program will then generate a new file for the JSON data. The file will be saved in the user's desktop directory.


Program finishing its operation by showing the example JSON data
and generating a new file to store the data



The technical part


Now, let's get started with the technical part of the project. First, we'll need to import some libraries for our project. We'll be using a third-party library called "prompts" as well as Nodejs filesystem library, "path", for dealing with reading and writing data to file. We'll also need to import another file that contains part of our project's code called "utility.js". The file contains most of the code that we'll need. For example, the function that let us read files and as well as the function that let us create a new file and function that filter out the specified JSON keys.

Importing libraries


The main function & getting user input


After that comes to our main function, which is an asynchronous anonymous function, I used a function to wrap the prompts operation in an async function called getUserInput() so that I could use back the code when I need to. Since the function is an asynchronous function, when we call the function, we need to use await to wait for the user input operation to complete before moving forward to the other code.


getUserInput() function to read user input in terminal.



Our main async anonymous function.



Getting file content of the JSON file


After getting the file location from the user, we then find out the filename of the file using the path.getBasename(getLocation), since I stored the user input in the getLocation variable). After that, we need to read the content of the file specified by the user. We can do that by calling the getFileData() function that's in the utility.js file. Because we have imported the file contents into the utils constant variable, we can call the method by using utils.getFileData(). The getFileData() function require two arguments; the file location and a callback function.

Getting file content from the location provided by user


In the getFileData() function, we provide the filesystem library readfile function with the file location, type of encoding and lastly a callback function. If the read file is successful, the code will pass the content of the file as an argument for the callback function. In the callback function, we first check the content of the file to see whether it is a JSON file or not. If the file is not a valid JSON file (the content has some miss-structured JSON data), then a notification message will be outputted onto the terminal. While, if the file is a valid JSON file, then we need to parse the content of the data we gotten from the file read operation just now. The data that we obtain from the file read operation just now take the file content in string form, that's why we need to parse the data into JSON data.



The FinishOperation() function


finishOperation() function, start by getting available keys


After validating and parsing the JSON file data, we can then pass the JSON data into the finishOperation(). In this function, we'll first get the list of the available key for the JSON data by calling the utils.getAvObjKeys(JSON) function. The avObjKeys will be used to store the result returned by the function.



Getting JSON data Available key


getAvObjKeys(), used to get all the available keys in the JSON data.


Inside our getAvObjKeys() function, we first check whether the JSON data we're using is an object and not a value. If the data is not an object, we'll output an Error message on the terminal to notify the user. While, If the JSON data is in object form, we'll then check if its value is in array form. If the data is in array form, we will then use a forEach loop to iterate through the array. When iterating through the array, we get the keys for the properties inside the array by using Object.key(). Then we'll pass the array's element along with their respective available inner properties keys and an empty avKeys array to the getInnerKey() function.


This also goes almost the same when the JSON data is in object form instead of an array. We'll get all the available keys for the JSON properties, as they're not in array form so we can just straight up to get the available properties key and pass them to the getInnerKey() function.



getInnerKey() function, used to search for inner keys of an Object.


Now, in the getInnerKey() function, the function actually have four arguments the first three arguments are the JSON object while the second one is the current key that we want to check if the value of the property with the supplied key contains another object. The third arguments are the array that contains the previously saved keys name. While for the fourth arguments, it's the string that contains the key name of the previous level. You'll understand why we need it when I explain the recursion I used.


We then need to pass the content of the object with the supplied key into a variable (reduce the code we need to write later on). If the content of the property is indeed an object, then we'll need to get all available keys in it not to mention the keys that are on a lower level. Now, you might be thinking is it even possible to get all the keys? Well, this is when recursion come in handy.


Since we will be dealing with multiple levels of property keys, we need a way to record the name that's constructed at the previous level. This is why we have a fourth argument for the getInnerKey() function, "head". When the content of the head is empty, it means that the JSON object provided is from the first level of the JSON data. We'll then store it in the nameTemp variable and push it into the avKeys array if it's not yet in it. The content of the inner object then will be verified to see if its an array of object/value or is it an object. If the inner object of the JSON object is an array, we'll then loop through the elements in the array. After that, we'll get all the available keys from the array elements and then loop through the available keys.


When looping through the keys, we passed the key to another function called generateInnerKey(), where we generate the keys name according to their level. The arguments that we passed are the array's element JSON properties value, the current property key's name, the inner key's name of the array's element JSON value, the array of the available keys for the JSON data and lastly the name of the previous level key. While the value passed when the inner object is an object is different than when it's an array. We'll straight up to get all of its key and loop through the keys. When we loop through the keys, we passed the key one by one to the generateInnerKey() function. This time the first argument is the inner object itself instead of the value of its properties. If the inner value of the current JSON property is not an object, we'll check if the key for that object is already in the avKeys array or not. If it's not in the array, push the value into the array and ignore it otherwise.


Generating names as well as use recursion to loop through lower-level keys.


In the generateInnerKey () function, we have a total of 5 parameters; theELe, currEle, innerKey, avKeys, and head. theEle is the current JSON object that we got from the previous level. while the currEle is the key name of the current JSON property. The innerKey are the available keys in the current JSON object while avKeys is the array of all the JSON keys that we have stored from the previous operation. Lastly, the head parameter is the string that stores the name from the previous level to construct the current keys name.


In this function, we'll first construct the name of the key by combining the previous level property's key name with the current level key name. If the head parameter is not empty, that means that the current operation is from a lower level. Therefore, we also need to combine the key name from the previous object level with the current one's (while ignoring the previous level key name, as we've already had its key name in the head parameter).  After that, we check for whether the constructed key name is already present in the avKeys array and push the newly created key name into it if it's not yet present in the array.


After all of that, we now need to check again to see whether the value of the inner object is an object. If the value turns out to be an object (array also count as object), we need to call again the getInnerKey() function again (recursion). When all the available keys are found, the keys will be returned to the first getInnerKey function call. Then, the avKeys array will then be returned back to the finishOperation() function.



Getting user input on which properties to be deleted


Now, we'll need to print out the available keys for the user to filter onto the terminal screen as well as asking the user which of the keys that he/she want to filter. But before that, we'll need to print out the available keys to the users so that he'll know what's the available options. Since the user might have entered some invalid keys name, we need to use a loop to repeat the operation when that happens. After getting the keys from the user, we then need to ask for confirmation from the user whether the options they provided are the ones that they want.


Going back to the finishOperation() function, we now need to print out the available options.


When asking for confirmation from the user, we also need to check whether the inputted value is "y", "n" or other characters (will be converted to lowercase) while wrapping them in an inner while loop. If the first character provided by the user input is neither "y" nor "n", that means that the input is wrong. Then a message will be outputted to the terminal, and since the operation is in a while loop when we reach the end of the loop code block, we then repeat back the code operations from the top of the inner loop. If the user provides "n" as their input, we'll break out of the inner while loop and repeat the code that's in the outer while loop. Which lets the user input the options again as the "done" boolean value is still true.


Ask confirmation for the options selected by user.


If the inputted value's first character is "y", then we will check the inputted options value by the user. We then need to create an array of options (in string form) from the value by splitting the string using comma (,) as the separator. Then trim() the strings in the array produced by using map() to remove the leading and trailing whitespace. After we have generated the array of options provided by the user, we then can pass the array into a function called checkOptions() to check if the options offered by the user is valid.


If the user have confirmed it, we continue by checking the value provided.


The checkOptions() function takes in two arguments, the array of options generated from user input and the array of all the available keys for the JSON data. This function basically just loop through the options array and search for the string inside all the available keys. If the function failed to search for the string in the array, then the function will return false back to the search back to the finishOperation() function. But if when the options array has completed looping through its element, and the function has not returned the false value to its caller, it means that the options provided by the user are indeed valid. The function will then return the value true to its caller, finishOperation().


checkOptions() function to check the options provided by the user with the available keys.


If the options are valid, then we can continue with the data filtering and new file generation section. To filter the selected data from the JSON data,  we need to call the filterJSON() function. This function will return a newly filtered JSON data back to the finishOperation() function. We'll provide the function call with two arguments, the JSON data and the selected options offered by the user. In this function, we'll first check if the JSON data contained in the first level is in Array form.  If its an array we need to loop through the array while looping through the options. When looping through the options array, we need to determine whether the keys include a dot (.) symbol in its string. For those that contain the sign, it means that these properties are at a lower level. When that happens, we will call a function called filterInnerData() to deal with the properties that are at lower levels. For the option string does not have the dot (.) sign in it, then we can just delete the property by using delete json_Object[key_name].


Checking the validity of the options, show error and break the loop if an error occurred.



Deleting the selected keys of the properties


filterJSON() function to remove the unwanted properties from the JSON data


If the content of the JSON data at the first level is not an array but an object, we need to loop through the options array and check which options have the dot (.) sign in it. The options that contain the dot sign (.) will trigger the filterInnerData(). Then, for the options that do not contain the dot sign will straight up use delete json[ele] to delete the JSON property from the JSON data.


While in the filterInnerData() function, we take in two arguments, the JSON object (parent) and the key that contains the dot sign. In this function, we will also be using recursion to filter out the desired properties completely. First, we will split the key names using the dot sign as a separator. Then, we store the length of the array into the variable arrLen for future use. Then we referenced the parent object to a variable called obj. This "obj" object will be used to stack up the object that we will be removed from the JSON data (referencing object dynamically).


Our filterInnerData(), used to filtering properties at the lower level.


Add caption


After that, we need to determine whether the content of the property is an object or a value. I used tempKey[0] as the key for the property because it's the first level key for the object. We'll remove it when we use recursion to call back this function so don't worry much about it. If its an object type value (object/array) then we now need to determine if its an array or an object. If the content of the property is an array, we need to loop through the value of the property. Then we'll call the filterInnerData() again while providing the value of the property as well as the key name (as its not the last level of the option provided). We'll also need to cut off the first section of the key string before we pass the key name to the filterInnerData() function again.


If the content is an object type value.


If the provided value of the property is an object, then we need to have a way to refer to the object that we want. To do this, we need to stack up the references using a for a loop. In the loop, we assigned the ele variable with the key name according to the level. Then we'll use the property according to the key provided in each loop. In every loop, if the value of a property is an array, we need to loop through the elements in it and call the function filterInnerData() to filter out the lower level's property. When calling the function, we'll also need to cut off the used key section from the array and pass a string constructed from the remaining section while using the dot sign as the combiner. When the object references is stacked up, we then can delete the property using the last part of the key section array (tempKey) as the key name.


If the content is a object typed value and not an array.


If in turn, the value of the property is not an object, its means that we're already at the last section of our destinated location. Then we can just delete the property using the key and JSON object provided in the parameter.


If the content is a value (number, string, character, etc).


Generating a new file for the filtered JSON data


These processes repeated until the JSON data is entirely filtered by the options chosen by the user. When the filtering process is completed (the filtered data return the JSON data to the finishOperation() function back), we then can generate a new file to store the JSON data. Before we generate a new file for the data, it might be wise for us to print out a sample of the filtered JSON data so that user can determine if they've correctly filtered out the unwanted properties.


After filtering the data, and new JSON data returned to finishOperation().


Now to generate a new fie, we'll call a function called generateJsonFile() while providing two arguments to the function; the JSON data and the basename of the file that we read to get the content of the JSON data. In the function, we will create a file in user's desktop directory with the basename of the file we used to read the data from while adding the text "filtered_" in front of the filename.


The generateJsonFile() function.



Conclusion


well, there are really many things that could be improved in this script. For example:

  • Users could delete properties at a certain index of the array in a JSON file.
  • Better user interface.
  • options for users to save the file at a specific location instead of their desktop.

Ideas aside, by creating this project, I really learned a lot about async and JSON data format. As well as Improved my knowledge in controlling a program function flows. For me, the most exciting things that I learned while doing this project are finding all the available properties key as well as deleting them on a lower level than the first level of the JSON data.

As always, if you have any critics, suggestion or opinions, feel free to drop a comment in the comment section below. Another thing is if you find this article useful, please share it with someone else that needs it or will find it interesting. Thanks for reading and stay tuned for more interesting articles.

No comments:

Post a Comment