If those functions arrive any time soon, I'll be considering using "generator" for some non-production environments. Postman uses the faker library to generate sample data, including random names, addresses, email addresses, and much more. The resulting random number will be rounded to 6 digits precision. The function uses a CASE statement query that queries a pre-defined view that generates all possible random numbers as described in the table above. The documentation notes functions for home address and birthplace (among others) are "To be added". In this video we will look into a (simple) way to generate 1 million fake, random email addresses using SQL (SQL Server 2019 as the tool of choice, but it ca. Method uses a combination of the DATEPART () and GETDATE () time function and RAND () function to generate the random number. There's probably not enough functions in the generator package to roll your own complete PII obfuscation solution. They look too random, as you can see by looking at a few of the top-level domain name suffixes. The email addresses are not realistic in my opinion. Those fake names are pretty cool, although if you are outside of the US, the diversity of those names (or lack of) may be a concern. Generating random (and meaningful) strings is not. Generating random dates and digits is fairly trivial with T-SQL. This presents a really interesting option, although there are many questions I haven't broached that are outside the scope of this post. Here's an example (using the functions I find most useful) via SQL Server's Machine Learning Service and sp_execute_external_script:ĭf <- as.ame(r_full_names(num_observations))ĭf$phone <- r_phone_numbers(num_observations, use_hyphens = TRUE)ĭf$DOB <- as.character(r_date_of_births(num_observations))ĭf$email <- r_email_addresses(num_observations)ĭf$SSN <- r_national_identification_numbers(num_observations) For dates of birth and phone numbers, there are some additional optional parameters for customization. It includes the following functions (whose names should be fairly self-explanatory) for generating fake data:Īll of the functions have a required n parameter for the number of observations to generate. In the above formula, a is the smallest number and b is the largest number in the range in which you want to generate a random number (inclusive of a & b). use the following formula to generate a random integer value between the two numbers: SELECT FLOOR (RAND () (b-a+1)) + a. SELECT COUNT (),X FROM T GROUP BY X ORDER BY X. Generate Random Number Between Specific Numbers. SELECT ( 1 + CAST (CRYPTGENRANDOM (1) AS TINYINT) 250) AS X INTO T FROM master.sptvalues V1, master.sptvalues. For this one, I'll focus on the " generator" package. Suppose the requirement is to generate some random number between 1 and 250. (More on this in a bit.) As with anything R-related, there are probably multiple packages that are useful for any given task. A recent article about generating a data set of fake transactional data got me thinking about this again and I wondered, can R be used to obfuscate PII data? DECLARE counter SMALLINT SET counter 1 WHILE counter < 5 BEGIN SELECT RAND() RandomNumber SET counter counter + 1 END GO See also. I've been thinking about R and how it can be used by developers, DBAs, and other SQL Server professionals that aren't data scientists per se. The following example produces four different random numbers generated by the RAND() function.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |