Want to Learn R

Tom Brown (Past Member) $organization over 7 years ago

Just attended a nice intro to R and RStudio webinare.

Here is a link to the article.

https://thomasmock.netlify.com/post/a-gentle-guide-to-tidy-statistics-in-r/

I'll try to add a link to the video in 3-4 days when it is posted.

However, here is a good start on videos

https://resources.rstudio.com/webinars

You can get access to R Studio in the cloud for free to learn at https://rstudio.cloud.

Parents

Former Member $organization over 7 years ago

I'm an intermediate level R user with machine learning and data visualization experience if anyone ever wants to use me as an additional resource! Really happy to see more people using R and look forward to hearing any success stories. Thanks Tom!
Cancel
Vote Up +2 Vote Down

Sign in to reply

Cancel

Reply

Former Member $organization over 7 years ago

I'm an intermediate level R user with machine learning and data visualization experience if anyone ever wants to use me as an additional resource! Really happy to see more people using R and look forward to hearing any success stories. Thanks Tom!
Cancel
Vote Up +2 Vote Down

Sign in to reply

Cancel

Children

Tom Brown (Past Member) $organization over 7 years ago in reply to Former Member

How have you been using R at the Aquarium?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Former Member $organization over 7 years ago in reply to Tom Brown (Past Member)

I started my position in April and have been working to clean up our data before I get to have any real fun with it, so nothing yet, unfortunately. But, I am looking forward to using R as a way to quickly summarize data and make visualizations of more complex datasets. You have exponentially more power in R for exploratory data analysis than you do in, say, Excel. It's amazing if you want advanced statistics, such as a cluster analysis. It can be especially useful when your data becomes too large to read in Excel, too.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Tom Brown (Past Member) $organization over 7 years ago in reply to Former Member
Are you using R for any of your clean up work? Over on the Developers Group here on TessituraNetwork.com we have started a conversation about "Machine learning model for identifying duplicates". Have you tried anything like this with R? There are a few libraries out there for Record Linkage.

https://cran.r-project.org/web/packages/RecordLinkage/index.html

https://github.com/kosukeimai/fastLink

However, then there is the problem of what you do about the linked records once you have found them. Which one do you keep? Which to delete, and a host of other questions.

cc: Nick Reilingh
Cancel
Vote Up +1 Vote Down

Sign in to reply

Cancel
Tom Brown (Past Member) $organization over 7 years ago in reply to Tom Brown (Past Member)

Here is a youtube showing the use of the Record Linkage Package.

https://www.youtube.com/watch?v=Msl1Q5Yv8Ow
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Former Member $organization over 7 years ago in reply to Tom Brown (Past Member)

What an exciting question! I'm going to consolidate all of the info and experiences I've had on this and get something fleshed out by Monday. My initial thoughts are that machine learning models will always have some margin of error associated with them, so unless we are willing to have some percentage of our data be merged by mistake, or not merged when it should have been, then a human will need to clean up any residuals that the cleanup model might have missed. Machine learning can really help with optimization but it rarely gives a product that has 100% accuracy. That being said, that doesn't mean a machine learning model for deduplication wouldn't be useful, it just might become exceedingly complex to produce. This ultimately depends on what information you have about your constituents and how high their error rates are/how inconsistent responses are (such as how people choose to enter their phone number). If you have an additional database to compare with that is ideal. For instance, I use the California Directory of Schools to find out which schools are the correct ones, and what should be merged.

I know that may have been a mouthful but I would love to continue this discussion. I hope to make the meeting on the 28th my first Analytic Coffee! session. Have a great weekend!
Cancel
Vote Up +1 Vote Down

Sign in to reply

Cancel
Tom Brown (Past Member) $organization over 7 years ago in reply to Former Member

,

I agree that an ML model will always have some error. This is why I wish there were a standard way to un-merge accounts in Tessitura. However, we don't have one at this point.

Regardless, there are some sites that are automatically scheduling merges when there is very high certainty of record linkage. (To date I think that most groups are looking at things like exact email address matches.) I've also thought of using additional features in the record linkage process like partial credit card numbers as part of match criteria.

Another challenge comes when an account has multiple phones, email addresses, and postal codes, past credit cards used. How do we do feature paring without blowing up computational complexity to badly?

That said I'm inclined to try to work on an MVP and see how far we get.

For me, the first step will be to extract customer records from Tessitura get the data into an analytics database outside Tessitura. Likely to use the List and Output set method through the REST API to get my data. (Because I'm on RAMP, and don't have direct database access from my data science environment, which includes PostgreSQL, Jupyter Notebooks, and R.)

cc: Nick Reilingh
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Former Member $organization over 7 years ago in reply to Tom Brown (Past Member)

One thing I can say is that we have had specific use cases that have presented some trouble. One of them is siblings that are in our summer programs. They often have the same address, use a family email, same home phone number, etc. Sometimes we even have trouble with siblings with names like 'Roberto' and 'Roberta'. Fringe cases like these have often slipped under the radar for my department and I have only recently discovered some of these issues. Would be interested to hear if you have dealt with similar issues as well.

Good luck on your journey!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Tom Brown (Past Member) $organization over 7 years ago in reply to Former Member

The layered complexities of householding I understand. However, I don't have good ways to sort out these issues. I did a bit of playing with the problems over the weekend.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel